BACKGROUND:
I need some advice on debugging a Ruby program comprised of multiple
threads.
The program has the following threads,
– Main thread that starts everything and hangs around sleeping and checking
status.
– Polling thread that performs opens a UNIXSocket to another application
and communicates to a device via ioctl calls.
– Two DRb servers which are used by a web application to get status
information related to the system.
The system runs as a daemon on Linux. And has been stable under 1.6.7. and
seemed to be running well under 1.8.1 until recently. The primary addition
was using Yaml to write out event records for the system. Yaml is used
during the polling thread.
An extension is used to write syslog messages to the kernel message file.
This includes writing of backtrace information in case of exceptions.
PROBLEM:
After running for some number of hours under moderate use, the polling
thread goes away. No backtrace. No error written to the kernel message file.
A Linux 'strace -f -p ’ seems to indicate that the Main thread is still
running. Access through the web interface shows the two DRb threads to still
be running.
QUESTION:
How do I force a backtrace from the dying thread? We have added trace
statements, but they don’t tell us much. We have seen one instance where the
Yaml files are truncated (not completely written), but this is not always
the case. How can I debug if the problem is happening in Yaml?
Any help would be appreciated.