ThreadWait problem

Rubyists,

I have a bug - probably in my code or understanding, but
possibly in the ThreadWait library. I'd appreciate any
help in understanding and/or fixing it.

My application retrieves and parses data from a tree
structured, XML/HTTP accessible datastore.

I retrieve objects which contain 0 or more leaf objects
and 0 or more container objects. Container objects are
queued for subsequent retrieval and processing. To improve
performance, I decided to use threads to process multiple
objects simultaneously.

Originally (non threaded), container objects were appended
to a list, and the main loop shifted targets off as long
as there were any in the list. With the threaded version,
I changed that to use a queue.

The program works fine for small to medium size trees,
but fails for large ones, with several processes
in the wait group having a status of nil.

Here's my original attempt at the thread dispatcher.

···

---
  tgroup = ThreadsWait.new
  $tgts = Queue.new

  $tgts.enq(top_tgt)

  # have more nodes to process
  while $tgts.length > 0
    cur_tgt = $tgts.deq
    t = Thread.new do
      proc_tgt(cur_tgt, param, file_mode)
    end
    tgroup.join_nowait(t)

    # running threads can add targets to Q
    # only allow t_max threads to run
    while ((tgroup.threads.length > 0) && ($tgts.length == 0) ||
           (tgroup.threads.length >= t_max))
      tgroup.next_wait
    end
  end
---

For large trees, the code never exited, and looking I found
tgroup to contain several threads with a status of nil.

I first added the line
  abort_on_execption = true
thinking that some of the threads were dying unexpectedly,
and being ignored. Stil no joy.

In desperation i've added the following.
near the begining of the file after including thwait
----
  class ThreadsWait
    attr_accessor :threads
  end
----

Immediately prior to the 2nd while
---
    tgroup.threads.delete_if {|t| t.status == nil }
---

This seems to work, but is *UGLY*. I wish I could give
a short example, but the original code only fails on
data sets > ~10K items - which take > 30 minutes to run.

If it makes any difference, I've had t_max set to 10 and
20 for these

----
$ ruby -v
ruby 1.8.2 (2004-11-06) [i686-linux]
----

Thanks in advance for any help/thoughts/ideas
Vance

Hi,

At Thu, 25 Nov 2004 04:09:24 +0900,
Vance A Heron wrote in [ruby-talk:121320]:

For large trees, the code never exited, and looking I found
tgroup to contain several threads with a status of nil.

I first added the line
  abort_on_execption = true
thinking that some of the threads were dying unexpectedly,
and being ignored. Stil no joy.

Yes, Thread#status returns nil the thread died by an
exception. Does the following patch help you?

In desperation i've added the following.
near the begining of the file after including thwait
----
  class ThreadsWait
    attr_accessor :threads
  end
----

Immediately prior to the 2nd while
---
    tgroup.threads.delete_if {|t| t.status == nil }
---

Can you inspect @wait_queue of tgroup and Thread.list at that
time? I.E., !tgroup.threads.empty? and !tgroup.threads.all?

Index: lib/thwait.rb

···

===================================================================
RCS file: /cvs/ruby/src/ruby/lib/thwait.rb,v
retrieving revision 1.8
diff -U2 -p -d -r1.8 thwait.rb
--- lib/thwait.rb 18 Apr 2004 23:19:46 -0000 1.8
+++ lib/thwait.rb 24 Nov 2004 23:23:48 -0000
@@ -118,6 +118,9 @@ class ThreadsWait
     for th in threads
       Thread.start(th) do |t|
- t.join
- @wait_queue.push t
+ begin
+ t.join
+ ensure
+ @wait_queue.push t
+ end
       end
     end

--
Nobu Nakada

Thank you Nobu. The patch fixes the problem!
I hope it or something like it makes it into the
next "official" version of ruby.

Once again, excellent response from the
Ruby community.

In answer to your questions, I had the statment
  tgroup.threads.each {|t| print #{t} #{t.status}\n" }
which printed out several threads handles with no status.

The next_wait does a threads.emtpy? call, which
returned false (i.e. the group wasn't emtpy), which
was confirmed by seeing the list of thread handles
in the print stmt. I did not check the wait_queue.

I had set the "abort_on_exception" variable to true
to catch any threads that died unexpectedly - it
never was executed, so I think all the threads completed
normaly.

Vance

···

On Wed, 2004-11-24 at 15:26, nobu.nokada@softhome.net wrote:

Hi,

At Thu, 25 Nov 2004 04:09:24 +0900,
Vance A Heron wrote in [ruby-talk:121320]:
> For large trees, the code never exited, and looking I found
> tgroup to contain several threads with a status of nil.
>
> I first added the line
> abort_on_execption = true
> thinking that some of the threads were dying unexpectedly,
> and being ignored. Stil no joy.

Yes, Thread#status returns nil the thread died by an
exception. Does the following patch help you?

> In desperation i've added the following.
> near the begining of the file after including thwait
> ----
> class ThreadsWait
> attr_accessor :threads
> end
> ----
>
> Immediately prior to the 2nd while
> ---
> tgroup.threads.delete_if {|t| t.status == nil }
> ---

Can you inspect @wait_queue of tgroup and Thread.list at that
time? I.E., !tgroup.threads.empty? and !tgroup.threads.all?

Index: lib/thwait.rb

RCS file: /cvs/ruby/src/ruby/lib/thwait.rb,v
retrieving revision 1.8
diff -U2 -p -d -r1.8 thwait.rb
--- lib/thwait.rb 18 Apr 2004 23:19:46 -0000 1.8
+++ lib/thwait.rb 24 Nov 2004 23:23:48 -0000
@@ -118,6 +118,9 @@ class ThreadsWait
     for th in threads
       Thread.start(th) do |t|
- t.join
- @wait_queue.push t
+ begin
+ t.join
+ ensure
+ @wait_queue.push t
+ end
       end
     end

Hi,

···

In message "Re: ThreadWait problem" on Thu, 25 Nov 2004 08:26:21 +0900, nobu.nokada@softhome.net writes:

Yes, Thread#status returns nil the thread died by an
exception. Does the following patch help you?

Can you commit the patch?

              matz.