Halting Ruby's threading

Is there any Ruby program that has the capability to halt Ruby’s thread
switching in its tracks without calling a blocking system call? I’ve got
some long-running multi-threaded (50-60 at a time) processes (web spiders)
that freeze after several hours’ successful execution, while using 99% of
the processor time and dipping in and out of the stack according to gdb
(though I can’t get function names displayed to suggest any more detail).
I know the threading has stopped because one thread at least should show a
log message every ten minutes, and that hasn’t happened for hours.

I’m just trying to run the program directly under a freshly-built Ruby
1.6.7 under gdb and will revisit it in the morning to see where it’s got
stuck. But in case anyone has any ideas how this can happen, I’m all ears.

···


Matthew

There are three ways I can think of off the top of my head:

  1. Set Thread.critical to true
  2. Use Thread.exclusive {} (which is the same as #1, but is
    exception-safe)
  3. Capture SIGLARM or SIGVTALRM (depending on which is used for
    threading on your system).

There may be a fourth or fifth way as well. I’m not an expert on
threading.

Paul

···

On Mon, Jul 15, 2002 at 09:22:19AM +0900, Matthew Bloch wrote:

Is there any Ruby program that has the capability to halt Ruby’s thread
switching in its tracks without calling a blocking system call? I’ve got

Paul Brannan wrote:

Is there any Ruby program that has the capability to halt Ruby’s thread
switching in its tracks without calling a blocking system call? I’ve
got

There are three ways I can think of off the top of my head:

  1. Set Thread.critical to true
  2. Use Thread.exclusive {} (which is the same as #1, but is
    exception-safe)
  3. Capture SIGLARM or SIGVTALRM (depending on which is used for
    threading on your system).

There may be a fourth or fifth way as well. I’m not an expert on
threading.

Hmm, well, I was obliquely implying a bug since I’m not doing any of those
things you listed. I found during a debug run that the openssl SSL_read
and SSL_write calls weren’t encased in TRAP_BEG / TRAP_END macros, so a
SIGPIPE had stopped my process when I came down this morning. I’ve fixed
this (at least I think it was a bug; descendants of IO really should
protect your script against signals and throw exceptions instead, right?)
and have left it running again.

Also I note that if I interrupted my process, more often than not I found
it was stuck on a blocking (?) call to getaddrinfo every time a page fetch
was initiated, which halted the rest of the threads for a few seconds at a
time. This was in open_inet() (from ext/socket/socket.c), a function I
note has gone in the latest CVS. Obviously this is a pretty large
bottleneck too; does anyone know whether it’s fixed with later versions?

···

On Mon, Jul 15, 2002 at 09:22:19AM +0900, Matthew Bloch wrote:


Matthew