Over the past few months, the Avatar code has been having a few crashes that leave no recognisable/usable stack for GDB to read. It’s also been having a few hangs, with strace indicating futex_wait (in an application that doesn’t use threads), and gdb of the core (after killing the process) indicating __kernel_vsyscall. Unfortunately, I’m not really a programmer/coder, so my efforts to track the cause down have probably been a bit haphazard.
The most annoying part so far is that yesterday we encountered the hang situation 4 times, so I enabled a strace against the binary, and channeled the output across the ‘net to my PC where I’ve got a rolling 40,000 line buffer. 24 hours later, at a constant 2 Mbit/s, and we still haven’t hung.
I call Heisenbug.

Turns out that installing the C library debugging symbols is a useful approach. Nailed the problem area down to the regex library for some bizarre reason. Pulled that code out of the execution path and we’ve been stable for a week. Rather annoying.
Comment by cricalix — June 8, 2008 @ 22:18