Thread#raise, Thread#kill, and timeout.rb are unsafe

I wrote up an article on Thread#raise, Thread#kill, and timeout.rb that I hope can start making the rounds. Long story short, neither Thread#raise nor Thread#kill is safe to use, and as a result all libraries that call them are also unsafe (including timeout.rb, net/*, and many other libraries in the wider world).

Have a look, add comments, discuss. Hopefully we can find a way to fix this, because it's going to hold Ruby back until we do.

- Charlie

Hi,

I wrote up an article on Thread#raise, Thread#kill, and timeout.rb that
I hope can start making the rounds. Long story short, neither
Thread#raise nor Thread#kill is safe to use, and as a result all
libraries that call them are also unsafe (including timeout.rb, net/*,
and many other libraries in the wider world).

Unlike Java, Thread#raise etc. should be treated similar to
KeyboardInterrupt in Ruby. No realtime exception posting is expected.
If you handle KeyboardInterrupt safely, do same thing for Thread#raise
etc., e.g. just turning on flags to reserve exception, then raise
exception at the safe place, as MRI does. Nothing more is required.

I admit that timeout.rb is inefficient, so that it should not be used
when performance or completeness matters. I agree with removing use
of timeout from net/* libraries. It's just for casual use. Maybe
it's good to add a note in CAPITAL LETTERS in the document of
timeout.rb.

              matz.

···

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" on Mon, 25 Feb 2008 16:18:01 +0900, Charles Oliver Nutter <charles.nutter@sun.com> writes:

There is a typo: your second example misses the "main." before the raise. :slight_smile:

I have to think a bit more about this, but one remark: IMHO the
locking should be outside the begin end block. Reason is, that if
locking fails for a reason you would not want / need to unlock.

At the moment I'm pondering implications of these options:

main = Thread.current
timer = Thread.new { sleep 5; main.raise }
lock(some_resource)
begin
  do_some_work
ensure
  unlock_some_resource
  timer.kill # moved downwards
end

main = Thread.current
timer = Thread.new { sleep 5; main.raise }
begin
  lock(some_resource)
  begin
    do_some_work
  ensure
    unlock_some_resource
  end
ensure
  timer.kill # separate ensure
end

The last one has the added benefit that you can implement it in a
method with a block because timer code is not interleaved with work
code.

But both have the disadvantage that you risk getting a timeout
exception during unlock - which would be unfortunate. OTOH, killing
the timer before the lock release can violate the timing constraint if
the unlocking takes longer - not very likely.

Nasty issues; we had some similar issues with changed timeout handling
of a TX manager in JBoss. Basically the old TX manager was able to
bust an otherwise perfect transaction by timing out in the 2PC
process. :-))

Kind regards

robert

···

2008/2/25, Charles Oliver Nutter <charles.nutter@sun.com>:

I wrote up an article on Thread#raise, Thread#kill, and timeout.rb that
I hope can start making the rounds. Long story short, neither
Thread#raise nor Thread#kill is safe to use, and as a result all
libraries that call them are also unsafe (including timeout.rb, net/*,
and many other libraries in the wider world).

Have a look, add comments, discuss. Hopefully we can find a way to fix
this, because it's going to hold Ruby back until we do.

Headius: Ruby's Thread#raise, Thread#kill, timeout.rb, and net/protocol.rb libraries are broken

--
use.inject do |as, often| as.you_can - without end

Charles Oliver Nutter wrote:

I wrote up an article on Thread#raise, Thread#kill, and timeout.rb that
I hope can start making the rounds. Long story short, neither
Thread#raise nor Thread#kill is safe to use, and as a result all
libraries that call them are also unsafe (including timeout.rb, net/*,
and many other libraries in the wider world).

Have a look, add comments, discuss. Hopefully we can find a way to fix
this, because it's going to hold Ruby back until we do.

Headius: Ruby's Thread#raise, Thread#kill, timeout.rb, and net/protocol.rb libraries are broken

- Charlie

It was the problems with timeout that make me wish we had an
EnsureCritical (an Ensure that gets Thread.critical=true set to it
before entering it), so that timeouts can end properly.
Just my $.02

···

--
Posted via http://www.ruby-forum.com/\.

I'm not convinced KeyboardInterrupt can be handled safely, either, in a
script. What happens if a KeyboardInterrupt exception is raised inside
an ensure block? It could easily result in resource leaks.

I think the right way to handle signals and timeouts is either in an
event loop or a dedicated thread. Asynchronous exceptions are
convenient but error-prone.

Paul

···

On Fri, Mar 14, 2008 at 01:02:52AM +0900, Yukihiro Matsumoto wrote:

Unlike Java, Thread#raise etc. should be treated similar to
KeyboardInterrupt in Ruby. No realtime exception posting is expected.
If you handle KeyboardInterrupt safely, do same thing for Thread#raise
etc., e.g. just turning on flags to reserve exception, then raise
exception at the safe place, as MRI does. Nothing more is required.

Anything using Thread.critical has its own problems, particularly
when native threads are involved. It can easily result in deadlocks
if the running thread inadvertently tries to use something which another
stopped thread is using (imagine a thread stopped halfway through
releasing a lock that the ensure block needs to acquire, for example
-- something which would have worked fine with normal use of locks, but
becomes a problem once threads can "stop the world").

-mental

···

On Sat, 15 Mar 2008 01:00:49 +0900, Roger Pack <rogerpack2005@gmail.com> wrote:

It was the problems with timeout that make me wish we had an
EnsureCritical (an Ensure that gets Thread.critical=true set to it
before entering it), so that timeouts can end properly.

Hi,

···

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" on Fri, 14 Mar 2008 06:00:54 +0900, Paul Brannan <pbrannan@atdesk.com> writes:

I'm not convinced KeyboardInterrupt can be handled safely, either, in a
script. What happens if a KeyboardInterrupt exception is raised inside
an ensure block? It could easily result in resource leaks.

You are right, but keyboard interrupt has been there so long in the
history, and we hardly see such leak problems, so I assume it's safer
than you might think. At least, we cannot not removed the interrupts
from the language.

              matz.

MenTaLguY wrote:

···

On Sat, 15 Mar 2008 01:00:49 +0900, Roger Pack <rogerpack2005@gmail.com> wrote:

It was the problems with timeout that make me wish we had an
EnsureCritical (an Ensure that gets Thread.critical=true set to it
before entering it), so that timeouts can end properly.

Anything using Thread.critical has its own problems, particularly
when native threads are involved. It can easily result in deadlocks
if the running thread inadvertently tries to use something which another
stopped thread is using (imagine a thread stopped halfway through
releasing a lock that the ensure block needs to acquire, for example
-- something which would have worked fine with normal use of locks, but
becomes a problem once threads can "stop the world").

And to clarify, this problem isn't restricted to native threads...even a green-threaded model could deadlock if one thread goes critical while other threads are holding unrelated locks.

- Charlie

For simple scripts, KeyboardInterrupt is probably the right behavior.

For applications that use an event loop, we can use trap('INT') to
disable KeyboardInterrupt.

I wonder if there should be an equivalent trap mechanism for other
asynchronous exceptions?

Paul

···

On Fri, Mar 14, 2008 at 07:43:28AM +0900, Yukihiro Matsumoto wrote:

Hi,

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" > on Fri, 14 Mar 2008 06:00:54 +0900, Paul Brannan <pbrannan@atdesk.com> writes:

>I'm not convinced KeyboardInterrupt can be handled safely, either, in a
>script. What happens if a KeyboardInterrupt exception is raised inside
>an ensure block? It could easily result in resource leaks.

You are right, but keyboard interrupt has been there so long in the
history, and we hardly see such leak problems, so I assume it's safer
than you might think. At least, we cannot not removed the interrupts
from the language.

Hi,

···

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" on Fri, 14 Mar 2008 22:32:33 +0900, Paul Brannan <pbrannan@atdesk.com> writes:

I wonder if there should be an equivalent trap mechanism for other
asynchronous exceptions?

Probably. Does anyone have any idea?

              matz.

In article <E1JaB7P-0000Zc-QP@x61.netlab.jp>,
  Yukihiro Matsumoto <matz@ruby-lang.org> writes:

Probably. Does anyone have any idea?

The problem is, where is safe point.

You said "just turning on flags to reserve exception, then
raise exception at the safe place, as MRI does."

Your "safe" is for interpreter. Ruby shouldn't SEGV, etc.

But Charles's "safe" is for application. acquired lock
should be released later, etc.

Your "safe" is not enough for application.

We need a mechanism to delay asynchronous exceptions until a
safe point.

trap is bad way to do it. trap is similar to Unix signal
handler. Unix signal handler makes it difficult to avoid
race conditions around blocking operations such as I/O.
With trap, applications need to re-implement the race
condition handling in Ruby level. It is very difficult if
it is possible.

It is preferable to have a way to declare safe points
directly.

The safe points vary according to applications. Even in a
application, different kinds of exceptions may have
different safe points.

For example, there are applications KeyboardInterrupt is
safe as you said. I think a filter, such as grep, is a kind
of the applications. It means that any points in the
applications are safe points.

Another example: movemail is a program to move mbox in a
mail spool. It needs to lock a mbox. Assume we implement
movemail in Ruby and the lock scheme is file lock. The lock
file is removed in ensure clause. But asynchronous
exception in the ensure block may cause to fail removing the
lock file. So the ensure block is not safe points.

Yet another example: Apache can be stopped with apachectl
"stop" and "graceful-stop". "stop" aborts open connections
but "graceful-stop" doesn't. Assume we implement a
threading http server with keep-alive and similar stopping
feature in Ruby. "stop" can be implemented by killing
threads for each connections by (dangerous) Thread.kill. It
means that any points in the threads are safe points.
"graceful-stop" need to wait the threads until they waits
requests. Since keep-alive is assumed, the threads
doesn't terminate after the first request on a connection.
This means the safe points are the request waiting points of
the threads.

I think the safe points declaration mechanism should able to
define safe points for each exception and easy to handle
blocking operations without race conditions.

···

--
Tanaka Akira

The simplest way to "tame" asynchronous exceptions would be to enqueue such
exceptions until the recieving thread makes a blocking call, or else it calls
a special method like Thread.check_interrupts.

The main difficulty with the idea is defining clearly what a "blocking call"
is. There probably also needs to be some mechanism (which does not "stop
the world" like Thread.critical) to mask asynchronous exceptions even during
blocking calls, for particularly critical code.

-mental

···

On Fri, 14 Mar 2008 23:42:37 +0900, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

Hi,

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" > on Fri, 14 Mar 2008 22:32:33 +0900, Paul Brannan <pbrannan@atdesk.com> > writes:

>I wonder if there should be an equivalent trap mechanism for other
>asynchronous exceptions?

Probably. Does anyone have any idea?

Thread.current.raises do |exception|
   queue.push exception
end

....

even_loop do
   stuff
   if exception = queue.pop
     ...
   end
end

seems like it would address a ton of issued without overly complicating ruby's internals - just provide an 'on raise' handler.

cheers.

a @ http://codeforpeople.com/

···

On Mar 14, 2008, at 8:42 AM, Yukihiro Matsumoto wrote:

Probably. Does anyone have any idea?

--
share your knowledge. it's a way to achieve immortality.
h.h. the 14th dalai lama

Hi,

But Charles's "safe" is for application. acquired lock
should be released later, etc.

I know.

Your "safe" is not enough for application.

We need a mechanism to delay asynchronous exceptions until a
safe point.

Hmm.

I think the safe points declaration mechanism should able to
define safe points for each exception and easy to handle
blocking operations without race conditions.

What should safe point declaration mechanism look like?

              matz.

···

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" on Sat, 15 Mar 2008 02:48:33 +0900, Tanaka Akira <akr@fsij.org> writes:

ara howard wrote:

Probably. Does anyone have any idea?

Thread.current.raises do |exception|
  queue.push exception
end

....

even_loop do
  stuff
  if exception = queue.pop
    ...
  end
end

seems like it would address a ton of issued without overly complicating ruby's internals - just provide an 'on raise' handler.

Is it possible to integrate this with IO#select, so a thread can handle an exception sent to it while waiting on io?

···

On Mar 14, 2008, at 8:42 AM, Yukihiro Matsumoto wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

How do you plan on protecting against non-determinism introduced
by the "on raise" handler? For example, if queue were a
user-implemented data structure, and the handler fired in the middle
of queue.pop? If queue.pop is not protected by a lock, there is
potential to corrupt the queue object. If it is protected by a lock,
there is potential for deadlock.

(This is basically the same issue as asynchronous signal safety.)

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

-mental

···

On Wed, 19 Mar 2008 00:20:54 +0900, ara howard <ara.t.howard@gmail.com> wrote:

Probably. Does anyone have any idea?

Thread.current.raises do |exception|
   queue.push exception
end

....

even_loop do
   stuff
   if exception = queue.pop
     ...
   end
end

seems like it would address a ton of issued without overly
complicating ruby's internals - just provide an 'on raise' handler.

Thread.current.raises do |exception|
   queue.push exception
end

....

event_loop do
   long_running_stuff
   if exception = queue.pop
     ...
   end
end

seems like it would address a ton of issued without overly
complicating ruby's internals - just provide an 'on raise' handler.

This looks nice!

I wish that a handler like this could be "unhandled," as well, as you
have the flexibility to handle exceptions only in parts of the code.
Say we call your first block of code as queue_future_exceptions

Then you could have code like

Thread.current.queue_future_exceptions
  # code in here is uninterruptible
Thread.current.no_longer_queue_future_exceptions # so I don't have to
worry about them EVERYWHERE in my code--only at the end of certain
blocks.
Thread.current.raises do |exception| ; end

So basically it would allow for uninterruptible blocks. That's what I
wish we had.

In response to Mental's questions of concurrency problems for raises
injected whilst you are in the middle of handling raises, some options
that might help:

1) protect the queue with mutexes--add them to the end of the queue
atomically.
2) dare I say it, require the exception handling to be either within or
without of an "uninterruptible block" --that is, allow the exception
handling to be interrupted (itself), or prevent it from being
interrupted, always, to provide for sanity.

The first option (require handling outside of an uninterruptible block)
would allow for us to "not have to worry" about concurrency of adding to
the exception queue, since we no longer add to the queue but just raise
on the thread. It would allow it to be interrupted, which might be
annoying, however, and isn't our purpose to "avoid" those pesky
asynchronous interrupts?

The second option (requiring handling within an uninterruptible block)
avoids handling's being interrupted, however, it could "miss" an
interrupt or two, should they be raised between when we handle them and
when an "uninterruptible block" ends.

So I guess the best way to deal with it would be to surround queue
addition with mutexes, and handle them both right before an
uninterruptible block ends, and again right after it ends.

Ok so not such a great idea.
Thoughts?

···

--
Posted via http://www.ruby-forum.com/\.

What should safe point declaration mechanism look like?

              matz.

Tongue in cheek I would say something like

uninterruptible do

end

or

ensureUninterruptible

end

or

uninterruptible do
  # some stuff
handle_exceptions_that_have_been_thrown
end
# where handle_exceptions is where they can all get handled.
Kind of scary still, though.

:slight_smile:

Anyway my latest thought on the whole timeout difficulty is that there
are a few difficulties with it that might be overcomeable:

Unfortunately, as a previous poster noted, using raise seems itself
almost always inherently dangerous, so these would probably all be
band-aids to a bigger problem.

1) if you have two nested timeouts then "you don't know where the
timeout came from"
timeout 5 do
timeout 5 do
begin
sleep
rescue Exception
  # which one is it? What did I just rescue?
end
end
end

You could overcome this by just ensuring that it throws distinct classes
per timeout (i.e. if different timeouts).
timeout 5 do # will by default throw class Timeout::ErrorDepth1 <
Timeout::Error
timeout 5 do # will by default throw class Timeout::ErrorDepth2 <
Timeout::Error

I suppose this would help people decipher what is happening when
timeouts are thrown, though not be helpful otherwise.

1.5) It creates a new thread once per call to 'timeout' which is less
than efficient.

While I typically don't care about efficiency when using timeout, it
might be possible to just have a "single timer thread" that handles all
the raise'ings on the others, to save on thread creation
(theoretically).

So something like this

(sorry if the syntax is wrong--this is just pseudo-code for now)

$timer_thread = nil # just so I remember the name--you wouldn't have to
use a global, of course

class TimerThread < Thread
def handle_any_requests
   # if any thread requests a timeout, add it to some list
   if some_thread_is_waiting
     sleep_until_should_wake_next_thread
     if waiting_thread_is_still_running_its_timeout_block
        waiting_thread.raise Timeout::Error
     end
   else
    sleep -1
  end
end

def TimerThread.add_me_to_the_queue_to_interrupt_me_in seconds, &block
    $timer_thread.add_it_to_queue Thread.current, seconds
    $timer_thread.wakeup # might need to synchronize this call so that
we don't get confusion
    block.call # if we get interrupted between this line and the next we
are DEAD. Exactly the same problem that we had before--an exception
could be raised "far in the future" to interrupt it, when it has already
left and it shouldn't be raised.
    $timer_thread.take_me_off_the_queue_I_am_done Thread.current # might
need to synchronize this call, too
  end

end

# begin program. Note that you could just fire this up the first time
the script calls Timeout::timeout
$timer_thread = Thread.new {
  loop do
     handle_any_requests
  end
}

Now when you want timeouts you do something like
TimerThread.add_me_to_the_queue_to_interrupt_me_in 5 do # seconds

  # do some stuff for a long time

end

That would help with efficiency, since you wouldn't have to spawn a new
thread for each call (though it would, sigh, leave one running always in
the background). It would alleviate some concurrency problems, and
still leave at least one in there. Just throwing it out there.

2) nested timeouts can interrupt each other and leave and 'errant
thread' running which raises a

timeout 5 do
timeout 50 do
begin
sleep
end
end

This causes two threads to be generated, each with a "time out bomb"
ticking, waiting to happen. The outer loops' thread could (as Charles
Nutter noted) interrupt the inner thread during its ensure block, so it
ends up not killing its ticking "time out bomb" thread. Which means
that 45 seconds later it will go off and randomly raise on the parent
thread.

To avoid this I "think" you could wrap the things with mutexes.
However, this also doesn't work because the mutex "synchronize" stuff
itself uses an ensure block, which could be interrupted by the same
confusion (and does, in my experience).

<<Sigh>>

Never mind. I give up. The only safe way to generate asynchronous
calls seems to be to use EventMachine or rev (which have asynchronous
timers) and have your own "exception handling" blocks (i.e. blocks where
it is designated to handle those exceptions).

Thanks.

-R

···

--
Posted via http://www.ruby-forum.com/\.

In article <E1JaM9H-0000jj-AB@x61.netlab.jp>,
  Yukihiro Matsumoto <matz@ruby-lang.org> writes:

What should safe point declaration mechanism look like?

Basically, asynchronous events should be queued.

* make a queue for each thread.
* Thread#raise(exc) enqueue exc to the queue.
* Thread.check_interrupt(klass) checks an exception which is
  a kind of klass in the queue of the current thread. If it
  exists, the exception is raised.
* other queue examining mechanism, such as sigpending, etc,
  may be useful.

Thread.check_interrupt is a safe point.

However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Blocking operations in Thread.blocking_interruptible are
safe points.

If a thread has no resources to release, it is safe to kill
asynchronously. Other than that, safe points should be
declared explicitly. When a resource is acquired,
application should declare it.

* Thread.delay_interrupt(klass) { ... }

It is safe points that out of outermost
Thread.delay_interrupt.

Another idea is about ensure clause. Since an ensure
clause is used to release a resource, it may delay
asynchronous events as in Thread.delay_interrupt.

···

--
Tanaka Akira

perhaps i wasn't being clear. that's exactly what i was showing - a handler that gets called (critical section of course) which simply enques the exception. then the thread itself must check for these at the 'right' moment.

a @ http://codeforpeople.com/

···

On Mar 18, 2008, at 11:52 AM, MenTaLguY wrote:

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

--
it is not enough to be compassionate. you must act.
h.h. the 14th dalai lama