yingjie@memoir
Skip to content

2026-04-06

Competitions

uftrace LoongClaw

uftrace -d trace.data -P . ./target/uftrace/loongclaw

# DURATION     TID      FUNCTION
   0.568 us [ 251903] | __gmon_start__();
            [ 251903] | aws_lc_0_39_1_OPENSSL_cpuid_setup() {
   1.132 us [ 251903] |   OPENSSL_cpuid();
   0.810 us [ 251903] |   OPENSSL_cpuid();
   0.495 us [ 251903] |   OPENSSL_cpuid();
   0.164 us [ 251903] |   OPENSSL_xgetbv();
   0.122 us [ 251903] |   os_supports_avx512();
   2.390 us [ 251903] |   getenv();
  19.691 us [ 251903] | } /* aws_lc_0_39_1_OPENSSL_cpuid_setup */
            [ 251903] | main() {
            [ 251903] |   std::rt::lang_start() {
            [ 251903] |     std::rt::lang_start_internal() {
   4.648 us [ 251903] |       std::sys::pal::unix::stack_overflow::imp::make_handler();
            [ 251903] |       std::sys::pal::unix::stack_overflow::thread_info::set_current_info() {
   0.699 us [ 251903] |         _RNvCsdBezzDwma51_7___rustc35___rust_no_alloc_shim_is_unstable_v2();
            [ 251903] |         _RNvCsdBezzDwma51_7___rustc12___rust_alloc() {
   1.147 us [ 251903] |           _RNvCsdBezzDwma51_7___rustc11___rdl_alloc();
   1.768 us [ 251903] |         } /* _RNvCsdBezzDwma51_7___rustc12___rust_alloc */
   0.082 us [ 251903] |         _RNvCsdBezzDwma51_7___rustc35___rust_no_alloc_shim_is_unstable_v2();
            [ 251903] |         _RNvCsdBezzDwma51_7___rustc12___rust_alloc() {
   0.124 us [ 251903] |           _RNvCsdBezzDwma51_7___rustc11___rdl_alloc();
   0.257 us [ 251903] |         } /* _RNvCsdBezzDwma51_7___rustc12___rust_alloc */
   4.812 us [ 251903] |       } /* std::sys::pal::unix::stack_overflow::thread_info::set_current_info */
            [ 251903] |       std::rt::lang_start::_{{closure}}() {
            [ 251903] |         std::sys::backtrace::__rust_begin_short_backtrace() {
            [ 251903] |           core::ops::function::FnOnce::call_once() {
            [ 251903] |             loongclaw::main() {
            [ 251903] |               tokio::runtime::builder::Builder::new_multi_thread() {
            [ 251903] |                 tokio::runtime::builder::Builder::new() {
            [ 251903] |                   alloc::sync::Arc<T>::new() {
            [ 251903] |                     alloc::alloc::exchange_malloc() {
            [ 251903] |                       alloc::alloc::Global::alloc_impl() {
   0.102 us [ 251903] |                         _RNvCsdBezzDwma51_7___rustc35___rust_no_alloc_shim_is_unstable_v2();
            [ 251903] |                         _RNvCsdBezzDwma51_7___rustc12___rust_alloc() {
   0.122 us [ 251903] |                           _RNvCsdBezzDwma51_7___rustc11___rdl_alloc();
   0.246 us [ 251903] |                         } /* _RNvCsdBezzDwma51_7___rustc12___rust_alloc */
   0.814 us [ 251903] |                       } /* alloc::alloc::Global::alloc_impl */
   1.293 us [ 251903] |                     } /* alloc::alloc::exchange_malloc */
   2.334 us [ 251903] |                   } /* alloc::sync::Arc<T>::new */
            [ 251903] |                   tokio::loom::std::rand::seed() {
            [ 251903] |                     std::hash::random::RandomState::new() {
            [ 251903] |                       std::thread::local::LocalKey<T>::with() {
:

uftrace -d trace.data.match -P "*loongclaw*" ./target/uftrace/loongclaw chat This loads significantly faster than the previous one, entering the loongclaw program in a shorter time.

The current progress is that we can display which functions are called during LoongClaw runtime. The next step is to filter it to retain only the functions from this project.

bash
 uftrace record -P . -F 'loongclaw::*' --srcline --depth 3 --no-sched --no-event ./target/uftrace/loongclaw chat

We can use --hide to suppress the display of certain library calls, but I think there should be a way to show only the project's own content.

bash
 # Only trace functions starting with loongclaw::, depth 1 for top-level only
                                     uftrace record -P . -F 'loongclaw::.*' --depth 1 --srcline --no-sched --no-event ./target/uftrace/loongclaw chat
LOONGCLAW  v0.1.0-alpha.3 · dev · 6c09d14
interactive chat

Currently, I'm still encountering some issues when using uftrace: either the trace contains many third-party library functions, or the project's own functions are missing and instead there are piles of core-related content.

Determining Parameters

I thought of a method: asked several different models what parameters are needed to accomplish this requirement. I'll note down the parameters and check them myself in the manual.

  • --srcline: Display the source code line corresponding to the function; needs to be enabled in both record and replay to be visible
  • --no-libcall: Do not record functions in user-space shared libraries
  • -F/--filter: Include, followed by a pattern
  • -N: Exclude, followed by a pattern
  • --match: Specify the pattern type: regex or glob (default is regex)
  • --depth: Specify the depth of recording
  • --no-event: Generic events: scheduling, page faults, system calls, signal delivery, process exit, etc.
  • --no-sched: Scheduling events (when your program gains/loses the CPU)
  • -L/--location: Filter using a path