Over the past few months, the Avatar code has been having a few crashes that leave no recognisable/usable stack for GDB to read. It’s also been having a few hangs, with strace indicating futex_wait (in an application that doesn’t use threads), and gdb of the core (after killing the process) indicating __kernel_vsyscall. Unfortunately, I’m not really a programmer/coder, so my efforts to track the cause down have probably been a bit haphazard.
The most annoying part so far is that yesterday we encountered the hang situation 4 times, so I enabled a strace against the binary, and channeled the output across the ‘net to my PC where I’ve got a rolling 40,000 line buffer. 24 hours later, at a constant 2 Mbit/s, and we still haven’t hung.
I call Heisenbug.