Hi everyone,
I'd recently run into some trouble trying to use timeout
with ConditionVariable#wait. (Then I tried some custom
solutions, which turned out to be unsafe themselves and
led me down that "unexpected deadlock" wild goose chase.)
I'd like to propose making ConditionVariable#wait itself
exception safe, so that using it with timeout should be
as simple as:
begin
timeout(timeout_secs) { condition_var.wait(mutex) }
rescue Timeout::Error
end
This is the original #wait from thread.rb (1.8.2preview2)
def wait(mutex)
begin
mutex.exclusive_unlock do
@waiters.push(Thread.current)
Thread.stop
end
ensure
mutex.lock
end
end
It has two problems that I know of. The most serious
is that I believe an exception can occur between the
"begin" and the "mutex.exclusive_unlock" line, such
that, when the "ensure" tries to reacquire the lock,
the mutex had not been unlocked yet, and the thread
hangs trying to get a lock on a mutex it already owns.
(I can get it to hang at the mutex.lock, and I presume
that's the reason.)
The second is that Thread.current is not removed from
@waiters when an exception occurs. This became a
problem in conjunction with timeout, because the
@waiters array was filling up with hundreds of
Thread.current references over time.
My attempt to address these two issues looks like:
def wait(mutex)
unlocked = false
begin
mutex.exclusive_unlock do
unlocked = true
@waiters.push(Thread.current)
Thread.stop
end
rescue # Timeout::Error
@waiters.delete Thread.current # Q: is Array#delete an atomic operation?
raise
ensure
mutex.lock if unlocked
end
end
Does this seem reasonable? Have I made any obvious errors?
Is this something that would best be proposed as an RCR?
I was wondering whether Array#delete is an atomic operation?
I would not want competing threads to both be trying to
delete from the same array simultaneously, if that's not safe.
(Do I need to put Thread.exclusive around it?)
Also - Have I done that rescue/raise properly, to re-raise
the exception? I'm asking because in testing with this
modified #wait, I have to hit ^C many times to interrupt
my process (both win32 and linux). It's as though the
Interrupt exception is being gobbled, which is not what I
would want.
... Hmm... I see that Timeout::Error's ancestors
include Interrupt, SignalException ... I am intentionally
catching Timeout::Error when i call #wait, like:
begin
timeout(timeout_secs) { condition_var.wait(mutex) }
rescue Timeout::Error
end
...but this shouldn't catch Interrupt exceptions
should it?
Anyway any comments/criticism/suggestions/feedback would be
most welcome . . . Is RCR the correct process for a proposal
like this?
Thanks,
Regards,
Bill