Justin Johnson wrote:
@pool << Thread.current
The above line should be within the synchronize (above), or it might
spawn more than @max_size threads (I think).
Replace with
@pool_mutex.synchronize do
while @pool.size >= @max_size
print "Pool is full; waiting to run #{args.join(',')}...\n" if
$DEBUG
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex)
end
@pool << Thread.current
end
I tried your suggestion but still get the error, and only when using
backticks or any other method that reads from stdout or stderr.
begin
yield(*args)
rescue => e
exception(self, e, *args)
ensure
@pool_mutex.synchronize do
In reality you don't need to synchronize here, as the @pool size/add to
is synchronized elsewhere. Taking this synch out may introduce new
problems, but...I'm not sure what they are
Because you're removing (and only once, and from a hash), I believe it's
thread safe to remove at any time (though, in retrospect, it might cause
a teeny bit of redundancy, but no problems)--i.e. taking out the
synchronize here
I'm not sure what you mean. This makes sense to me, as we do not want
more than one thread removing itself from the list at a time.
My question is if the wait within the shutdown function will 'suck'
signals away from processes, decreasing the number that can go at a
time, if all are at the 'wait' phase. Perhaps rethinking this code
would be nice
My hypothesis is that the shutdown function somehow or other messes up
the way the pool works. Maybe/maybe not.
Another problem with shutdown is that is seems like shutdown could be
called early enough to 'disallow' certain threads from entering the pool
(those that have not yet started to wait on the condition variable). So
that is another problem with it.
I'm not sure what you mean by "suck signals away from processes". The
shutdown method is just waiting until all threads have ended so we don't
end our program before the threads are done.
It still seems to me that stdout and stderr have something to do with my
problem, since it always occurs when using backticks or even
win32/popen3 methods that read from stdout and stderr, but never with
system. Anyone else have any ideas on this?
I was able to recreate the bug using system OR backticks, as a note. I
think that the difference is only in timing, not in functionality (one
takes longer, so aggravates the problem more, perhaps?)
In reality, though, I think this might (might) be a real bug in Ruby, or
maybe I misunderstand the functionality of signal.
My guess is still that shutdown is messing it up somehow
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/818d88a5eae23820
discusses some type of IO binding problem (but not a concurrency one).
I do also notice some 'weird lines'
like
Job 184 stopped.♪◙File Not Found
Which might be indicative of the problem you described.
What I dislike about this bug is that sometimes it shows itself and
sometimes it doesn't, so it's hard to know if you've actually overcome
it or not!
Another thought is that it might just be Ruby's threads 'misassigning'
variables or what not.
Another idea would be to have every thread that adds itself to the pool
'signal' immediately after, to allow some other waiting thread to
'check' if the pool has decreased. That would be a kind of bandaid hack
but heck maybe it would work
In my opinion it needs a different condition variable like "pool empty"
which is signaled by each thread on exit, for the shutdown method to
wait on.
Another concern is that it seems to me that there may be a possibility
that shutdown will end too early, in extreme cases where two threads
remove themselves simultaneously from the pool. Not our problem, but
hey, it might be.
From the interthreaded output it appears that Ruby does indeed have some
output problems. Not sure. Another opinion is that maybe it's ruby's
deadlock detection going amuck--you'll notice in teh original post two
threads on line 28--these two are at a synchronize point. You'd think
that this could NOT deadlock, since those two should be able to always
continue, unless there's a loop of some sort within a synchronize.
you might try rewriting dispatch to be something like
def dispatch(*args)
Thread.new do
# Wait for space in the pool.
@pool_mutex.synchronize do
if @pool.size > @max_size
print "WHAT GREATER?"
end
if @pool.size == @max_size
print "Pool is full; waiting to run #{args.join(',')}...\n"
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex) # receiving a signal should ALWAYS
mean there's space now available
@pool << Thread.current
end
if @pool.size > @max_size
print "huh? it has GOTTA be les than or equal here!"
end
end
end
Then there are no loops so you ensure that, if there is a deadlock, and
some threads are waiting on a synchronize, then it's the deadlock
protection's fault [it thinks the mutex is deadlocked, but we don't
think it is]. This is very very possibly the problem.
As per the post
"Ditto. AFAIK all external IO is blocking in Ruby on Windows..."
If this is the case you might try using a synchronize around the system
command, itself, to avoid concurrency on the IO. Maybe all running
processes are 'blocked' on IO [though I've seen interleaved results on
the screen, so they don't seem to block on write] so deadlock thinks
they're frozen? (except---you have processes stuck on synchronize, and
the IO command runs from not within a synchronize block, so you wouldn't
think blocking IO would be a synchronize problem). Maybe it is. Sigh
I have written programs with hundreds of threads that output to the
screen and (sure maybe they block when the write happens, perhaps,
but...) never had a deadlock issue from it (well, then again I didn't
try to synchronize it, either).
If it is a Ruby IO problem, then maybe two of the processes are 'getting
stuck' in IO and never ending--you could put in some debug info to test
out that hypothesis. Might not be the real problem, though, seeing the
two threads on line 28.
It seems (in my opinion--haven't checked it out too closely yet) that
the problem is only caused when the shutdown method is 'in the mix'--you
might try rewriting it to create an array of all threads trying to enter
the pool [i.e. array of size 200] and then joining on each element of
the array
while !allThreadArray.empty?
nextThread = allThreadArray.shift
nextThread.join
end
that type of thing.
For my runs I used windows. Haven't tried it in Linux at all.
In short I don't know. Lots of hypotheses
Good luck!
-Roger
--I like it free bible - Google Search
···
--
Posted via http://www.ruby-forum.com/\.