8

Logging C Functions

 2 years ago
source link: https://justine.lol/ftrace/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Logging C Function Calls

May 19th, 2022 @ justine's web page

Logging C Functions

One of my favorite features of the Cosmopolitan Libc runtime is its --ftrace flag that logs C function calls. It's the simplest system for debugging programs I've ever used and it surprises me that I found no evidence of someone having invented it before.

Here's one of its most important use cases. Have you ever had you debugger stupified by a NULL function pointer before? Debuggers aren't very good at handling this situation. Because once the control flow jumps to the NULL page, you'd need to find a way to rewind execution history to figure out how it got there. But if you have function call logging, you can just look backwards in the log, see what was happening before, and get a pretty good idea.

For example, here's what happens if I run a Hello World! program:

$ o//examples/hello.com --ftrace
#     tid   nanoseconds   stack symbol
FUN 32679        91'942     176   &__cxa_atexit
FUN 32679        92'211      80 &atexit
FUN 32679        92'302     144   &__cxa_atexit
FUN 32679        93'386      48 &main
FUN 32679        93'469      96   &puts
FUN 32679        93'575     112     &stdio_unlock
FUN 32679        93'673     224       &fwrite_unlocked
FUN 32679        93'945     112     &stdio_unlock
FUN 32679        94'040     160       &fputc_unlocked
FUN 32679        94'133     256         &fwrite_unlocked
FUN 32679        94'244     352           &write
hello world
FUN 32679        94'605      64 &exit
FUN 32679        94'771     160     &FreeFds
FUN 32679        94'912     144     &fflush_unlocked
FUN 32679        95'006     160       &stdio_unlock
FUN 32679        95'106     208         &fflush_unlocked
FUN 32679        95'214     240           &__fflush_impl

In the above log, you'll see:

  1. the process pid (or tid if using threads)
  2. the number of nanoseconds elapsed since the program started (you'll notice it takes 91µs to load symbol table from the .com.dbg file, sort it so it can be binary searched, and then rewrite the binary in memory)
  3. the current stack usage in bytes
  4. the symbol name with depth (by backcounting rbp frames)

The nanosecond timestamps are particularly useful for pinpointing latency issues in programs. One of the biggest blindspots of open source tools has always been the lack of focus on latency. For example, the wonderful perf record command on Linux won't give you this kind of insight:

FUN 10808        91'065      80 &atexit
FUN 10808        91'242     144   &__cxa_atexit
FUN 10808        92'542      48 &main
FUN 10808        92'638      64   &creat
FUN 10808        92'771     240     &openat
FUN 10808        92'876     320       &sys_openat
FUN 10808       109'805     144   &write
FUN 10808       183'356      80   &fsync
FUN 10808   194'399'931      96   &close
FUN 10808   194'410'030     112     &__releasefd_unlocked
FUN 10808   194'410'962      64 &exit
FUN 10808   194'411'708     112   &__cxa_finalize
FUN 10808   194'412'713     160     &FreeFds

One of the things we've been focusing on lately has been making the Python unit tests go faster. Since we use Python unit tests to test Cosmopolitan Libc. One of the biggest culprits has always been fsync() at least for those of us who don't use tmpfs. It would frequently cause individual tests to lag for as long as thirty seconds. Every other tool we used for measuring the performance of these tests, all led us astray suggesting that we focus on the core Python evaluator loop. But all that was really needed was adding a way to pacify pointless fsync() calls.

The way these logs work, is they apply to functions at the object code level. For example, if a function gets inlined, then it won't show up in the function call trace. Another thing the log does is fold repeated lines. For example, if you call a pure function in a loop (pure in the sense that it doesn't call any additional subroutines) then you'll only see one line for it in the log with the cumulative timestamp. For example, if you were to say:

dontinline void Pause(void) {
  asm("rep nop");
}

dontinline void DoStuff(void) {
  for (int i = 0; i < 10000; ++i) {
    Pause();
  }
}

int main() {
  DoStuff();
  DoStuff();
  DoStuff();
  DoStuff();
}

Then you would get:

FUN 11650       103'932      80 &atexit
FUN 11650       104'082     144   &__cxa_atexit
FUN 11650       104'402      48 &main
FUN 11650       104'493      72   &DoStuff
FUN 11650       104'571      88     &Pause
FUN 11650     1'084'906      72   &DoStuff
FUN 11650     1'085'073      88     &Pause
FUN 11650     2'054'060      72   &DoStuff
FUN 11650     2'054'174      88     &Pause
FUN 11650     3'154'500      72   &DoStuff
FUN 11650     3'154'599      88     &Pause
FUN 11650     4'172'376      64 &exit
FUN 11650     4'172'511     112   &__cxa_finalize
FUN 11650     4'172'651     160     &FreeFds

In terms of how the flag is interpreted, sometimes it's useful to be able to have --ftrace be in the arguments list, possibly because you're passing it along to a subprocess you're executing. In that case, if you add two additional dashes:

./foo.com ----ftrace

Then "--ftrace" will appear in the argv array passed to main. Additional slashes can be supplied too, for additional layers of nesting.

Getting Started

Run the following commands on Linux:

curl https://justine.lol/cosmopolitan/cosmopolitan.zip >cosmopolitan.zip
unzip cosmopolitan.zip
echo 'int main() { printf("hello world\n"); }' >hello.c
gcc -g -Os -static -nostdlib -nostdinc -no-pie -fno-pie  \
    -fno-omit-frame-pointer -pg -mnop-mcount             \
    -o hello.com.dbg hello.c -fuse-ld=bfd -Wl,-T,ape.lds \
    -include cosmopolitan.h crt.o ape.o cosmopolitan.a
objcopy -S -O binary hello.com.dbg hello.com
./hello.com --ftrace

How It Works

GCC has several flags for inserting profiling hooks at the beginning of functions. Namely -pg, -mfentry, and -mnop-mcount. The last one is important. Because the default behavior is to insert a function call to the mcount function.

Pause:	push	%rbp
	mov	%rsp,%rbp
1:	call	mcount
	rep nop
	pop	%rbp
	ret
	.endfn	Pause,globl

Cosmopolitan Libc's mcount function does nothing, and the cost of calling an empty function can be as high as 14 cycles of overhead per function call. However if you can get GCC to insert NOPs instead:

Pause:	push	%rbp
	mov	%rsp,%rbp
1:	nopl	0x00(%rax,%rax,1)
	rep nop
	pop	%rbp
	ret
	.endfn	Pause,globl

Then the overhead is negligible on the order of picoseconds. So it's worth considering just always having the -pg nops in your program. Because Cosmopolitan Libc will self-modify your executable at runtime to turn those NOPs into function calls once you've enabled it. In fact, there's even a runtime function that lets you insert an arbitrary hook into your program's functions:

privileged noinstrument void MyTracer(void) {
  kprintf("hello\n");
}
int main() {
  __hook(MyTracer, GetSymbolTable());
}

Since there's multiple profiling hooks that GCC uses depending on the various combinations of flags, here's the hooks that __hook will understand and be able to rewrite:

0f 1f 44 00 00          nopl 0x0(%rax,%rax,1)
67 0f 1f 04 00          nopl (%eax,%eax,1)
66 0F 1F 44 00 00       nopw 0x00(%rax,%rax,1)
67 e8 xx xx xx xx       addr32 call mcount

The reason why we need the privileged keyword above is because some operating systems don't permit read+write+exec memory. So when we rewrite the binary to change the NOPs into CALL, much of the program's code needs to be disabled. Using the privileged keyword will cause your function to be relocated to a special part of the executable that never gets rewritten.

Custom Code Morphing

Let's say you wanted to write your own function that rewrites your binary at runtime. Intel's disassembler Xed comes included in the Cosmopolitan Libc amalgamation distro. So here's how you would, for instance, iterate over all the instructions in your program:

privileged void RewriteBinary(void) {
  int n, err;
  unsigned char *p;
  struct XedDecodedInst xedd;
  __morph_begin();
  for (p = _ereal; p < __privileged_start; p += n) {
    xed_decoded_inst_zero_set_mode(&xedd, XED_MACHINE_MODE_LONG_64);
    err = xed_instruction_length_decode(&xedd, p, __privileged_start - p);
    if (err == XED_ERROR_NONE) {
      n = xedd.length;
    } else {
      n = 1;
    }
  }
  __morph_end();
}

Measuring Language Complexity

The nice thing about Cosmopolitan's function tracing is that it's shell scriptable and fast. Having this debugging information easily plugged into the shell, means we can do things like study how many function calls programming language interpreters need to print Hello World.

$ o//tool/plinko/plinko.com <<<'(QUOTE HELLO-WORLD)' --ftrace |& wc -l
819
$ o//third_party/sqlite3/sqlite3.com :memory: "select 'hello world'" --ftrace |& wc -l
6253
$ o//third_party/lua/lua.com -e 'print("hello")' --ftrace |& wc -l
9741
$ o//third_party/quickjs/qjs.com -e 'console.log("hello world")' --ftrace |& wc -l
42272
$ o//third_party/python/python.com -c 'print("hello world")' --ftrace |& wc -l
931354

Here's the function logs if you're interested in reading them.

-rw-r--r-- 1 501 jart  43K May 19 19:29 plinko.txt
-rw-r--r-- 1 501 jart 397K May 19 19:28 sqlite.txt
-rw-r--r-- 1 501 jart 751K May 19 19:27 lua.txt
-rw-r--r-- 1 501 jart 2.7M May 19 19:27 qjs.txt
-rw-r--r-- 1 501 jart 150M May 19 19:28 python.txt

System Call Logging

While we're on the topic of tracing, it's also worth mentioning that Cosmopolitan Libc bakes a similar --strace flag into binaries. For example, if we were to trace a program that calls fsync():

int main() {
  creat("o/foo", 0644);
  write(3, _base, _end - _base);
  fsync(3);
  close(3);
}

It might show us the following:

SYS     0              5'590 bell system five system call support 97 magnums loaded on gnu/systemd
SYS   384              8'664 mmap(0x700000000000, 131'072, PROT_READ|PROT_WRITE, MAP_STACK|MAP_ANONYMOUS, -1, 0) → 0x700000000000
SYS   384             12'702 getenv("TERM") → "dumb"
SYS   384          1'926'421 openat(AT_FDCWD, "o/foo", O_WRONLY|O_CREAT|O_TRUNC, 0644) → 3
SYS   384          1'973'434 write(3, u"⌂ELF☻☺☺○        ☻ > ☺   ♂!@      ♂      "..., 131'072) → 131'072
SYS   384         29'834'148 fysnc(3) → 0
SYS   384         29'839'878 close(3) → 0
SYS   384         29'840'030 exit(0)
SYS   384         29'840'324 __cxa_finalize(&407eef, 0)
SYS   384         29'840'445 FreeFds()
SYS   384         29'840'729 _Exit(0)

It's been an indispensable tool for building Cosmopolitan Libc, since GDB and LLDB aren't very reliable when developing across platforms. For instance, I haven't been able to figure out how to get a debugger or system call tracer to work on Mac OS X for the better part of ten years. So much of the XNU support needed to be written a priori until this tool came along, which has greatly helped in reducing bugs!

Troubleshooting

In the above example, you may have noticed that a number &407eef was printed in place of a symbol. This is one of the tradeoffs that needed to be made by Cosmopolitan Libc's kprintf() function, which is too mission critical to be able to call into all the non-privileged code needed to open() + mmap() + etc. the concomitant ELF .com.dbg binary and load the symbol table. The simple solution for this is to have your main function call GetSymbolTable():

int main() {
  GetSymbolTable();
  creat("foo", 0644);
  write(3, _base, _end - _base);
  fsync(3);
  close(3);
}

Now the symbols should show up:

SYS   401         25'825'348 __cxa_finalize(&FreeFds, 0)

Funding

[United States of Lemuria - two dollar bill - all debts public and primate]

Funding for this technology was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like Cosmopolitan Libc possible. Thank you.

See Also



Written by Justine Tunney

[email protected]


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK