Thread#raise, Thread#kill, and timeout.rb are unsafe

In response to Mental's questions of concurrency problems for raises
injected whilst you are in the middle of handling raises, some options
that might help:

1) protect the queue with mutexes--add them to the end of the queue
atomically.

This would deadlock if the handler happened to fire while the queue's
lock was held. Generally, locking doesn't mix with anything that can
interrupt or suspend the execution of a thread at an arbitrary time.

2) dare I say it, require the exception handling to be either within or
without of an "uninterruptible block" --that is, allow the exception
handling to be interrupted (itself), or prevent it from being
interrupted, always, to provide for sanity.

That's a good point -- besides conflicts between the handler and whatever
the recipient thread is running at the time, you do also need to be
concerned with conflicts between multiple handler instances (if that
is allowed).

So I guess the best way to deal with it would be to surround queue
addition with mutexes, and handle them both right before an
uninterruptible block ends, and again right after it ends.

Ok so not such a great idea.

In terms of an API for signal queuing, I think Tanaka Akira's idea
is elegant and reliable.

-mental

···

On Wed, 19 Mar 2008 03:16:05 +0900, Roger Pack <rogerpack2005@gmail.com> wrote:

In a language with side-effects, I've become convinced that the only safe
way to handle asynchronous exceptions is for code to be uninterruptible by
default, and only explicitly allow interrupts at specific points (not broad
regions of code).

This gets back to the fundamental principle that, in order to write a
correct multithreaded program, any interaction between threads needs to be
by explicit mutual agreement at specific places. One thread needs to
explicitly initiate the interaction (in this case, by calling Thread#raise),
and the other thread needs to explicitly accept it (e.g. by calling a method
which is known to permit the delivery of any pending interrupts when it is
called). Otherwise, the resulting nondeterminism is impossible to control.

-mental

···

On Tue, 18 Mar 2008 00:06:47 +0900, Roger Pack <rogerpack2005@gmail.com> wrote:

Tongue in cheek I would say something like

uninterruptible do

end

or

ensureUninterruptible

end

or

uninterruptible do
  # some stuff
handle_exceptions_that_have_been_thrown
end
# where handle_exceptions is where they can all get handled.
Kind of scary still, though.

Perl went kind-of along these lines around version 5.8, if I recall correctly. They added "safe signals" which were signals that only interrupted at times when it was safe for them to do so.

This did have a few weird side-effects though. For example, running a complex regex would shutoff signal handling for the duration. That meant you could no longer timeout a regex.

Perhaps they've fixes these issues now, but I remember the change was not as smooth as they had hoped it would be.

James Edward Gray II

···

On Mar 18, 2008, at 12:04 AM, Tanaka Akira wrote:

In article <E1JaM9H-0000jj-AB@x61.netlab.jp>,
Yukihiro Matsumoto <matz@ruby-lang.org> writes:

What should safe point declaration mechanism look like?

Basically, asynchronous events should be queued.

* make a queue for each thread.
* Thread#raise(exc) enqueue exc to the queue.
* Thread.check_interrupt(klass) checks an exception which is
a kind of klass in the queue of the current thread. If it
exists, the exception is raised.
* other queue examining mechanism, such as sigpending, etc,
may be useful.

This sounds very good! I hadn't considered anything like
blocking_interruptible, but it seems useful.

-mental

···

On Tue, 18 Mar 2008 14:04:40 +0900, Tanaka Akira <akr@fsij.org> wrote:

Basically, asynchronous events should be queued.

* make a queue for each thread.
* Thread#raise(exc) enqueue exc to the queue.
* Thread.check_interrupt(klass) checks an exception which is
  a kind of klass in the queue of the current thread. If it
  exists, the exception is raised.
* other queue examining mechanism, such as sigpending, etc,
  may be useful.

Thread.check_interrupt is a safe point.

However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Blocking operations in Thread.blocking_interruptible are
safe points.

Okay. I'd thought you were suggesting an API rather than illustrating
an idea.

-mental

···

On Wed, 19 Mar 2008 04:13:19 +0900, ara howard <ara.t.howard@gmail.com> wrote:

On Mar 18, 2008, at 11:52 AM, MenTaLguY wrote:

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

perhaps i wasn't being clear. that's exactly what i was showing - a
handler that gets called (critical section of course) which simply
enques the exception. then the thread itself must check for these at
the 'right' moment.

It is because Thread.check_interrupt with blocking operations causes
race conditions.

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

The same is true for any operation for which partial success is
possible.

So application must choose that make a blocking operation interruption
safe or uninterruptible.

* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Interesting. Which of these two is the default?

Another idea is about ensure clause. Since an ensure clause is used
to release a resource, it may delay asynchronous events as in
Thread.delay_interrupt.

Reminds me of an old C++ interview question we used to joke about:

"While digging through source code, you discover an application which
executes entirely in the body of the destructor of a global object.
Give one or more reasons why the author may have chosen to implement the
application this way."

Paul

···

On Tue, Mar 18, 2008 at 02:04:40PM +0900, Tanaka Akira wrote:

I still like Ara's idea :slight_smile:

However--is something like this already possible if we redefine the
Thread#raise method? For example define it to be something like this.

class Thread
def raise_safe *args
       # sorry, couldn't think of a better way than a mutex to prohibit
weirdness from occurring. So that's where a problem would occur--in the
ensure block of the synchronize function. Ahh well. If you re-aliased
raise to be this method, maybe avoided.

  @raise_mutex.synchronize {
    unless @currently_non_interruptible_section
       raise *args
    else
          self.queued_interrupts << *args
    end
  }
end

def uninterruptible_block
    return if @currently_non_interruptible_section # already within one.
Maybe should keep a count of how deep you are. Not sure
    @currently_non_interruptible_section = true
    begin
      yield # might want to handle queued exceptions at the end of this
block
    ensure
      @raise_mutex.synchronize {@currently_non_interruptible_section =
false}
    end
    # after this point the thread is subject to interruptions again
end

end

Then you could have a thread do something like

uninterruptible_block do
  # some stuff
  # handle the queued interrupts if desired. Note that you could loop
and 'check if you've been interupted yet' once per loop or what not, if
desired

end

# handle queued interrupts, if desired

timeout would possibly become

def timeout(sec, exception=Error)
  return yield if sec == nil or sec.zero?
  raise ThreadError, "timeout within critical session"\
                                      if Thread.critical
  raise ThreadError, "called nested functions that want to use
uninterruptible and raise on it" if
Thread.current.currently_non_interruptible_section
  # this is necessary since we do use 'real' raise this time
  # to interrupt our block's yielding
  # so if you had nested--ever--the raise of an uppermost one
  # could interrupt a lower one in it ensure. Same problem would occur.
  # we can only have one at a time.
  # this is a hack that assumes we 'only' use uninterruptible_block for
calls to timeout.

  death_mutex = Mutex.new

  uninterruptible_block do
   begin
     x = Thread.current
     y = Thread.start {
       sleep sec
       death_mutex.synchronize {
         x.raise exception, "execution expired" if x.alive? # this is
not raise_safe, but real raise. Still unsafe as per some previous
comments...but is it safer than it used to be now?
       }
     }
     yield sec # would this code be dampered somehow?
   ensure
     death_mutex.synchronize {
       y.kill if y and y.alive?
     }
   end
  end
  for interrupt in Thread.current.queued_interrupts do; raise interrupt;
end # or something like that
end

now you could "replace" the use of raise with raise_safe. Would this
work?
This would prohibit 'other threads' from interrupting a thread that is
within a timeout block (obviously bad) until it timed out, but would it
be thread safe?

Thoughts?
Thank you for reading :slight_smile:

Overall I'd say I'm in agreeance with the OP's call to deprecate
Thread#kill, Thread#raise, Thread#critical= as they can too easily hurt,
though something like this might help. This suggestion also seems far
less complete than Tanaka's somehow.

Still wishing for an ensure block that was somehow 'guaranteed' to run
all the way through, and delay interruptions or ignore them till it
ended.
Thanks.

-R

Tanaka Akira wrote:

···

In article <E1JaM9H-0000jj-AB@x61.netlab.jp>,
  Yukihiro Matsumoto <matz@ruby-lang.org> writes:

What should safe point declaration mechanism look like?

Basically, asynchronous events should be queued.

* make a queue for each thread.
* Thread#raise(exc) enqueue exc to the queue.
* Thread.check_interrupt(klass) checks an exception which is
  a kind of klass in the queue of the current thread. If it
  exists, the exception is raised.
* other queue examining mechanism, such as sigpending, etc,
  may be useful.

Thread.check_interrupt is a safe point.

However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Blocking operations in Thread.blocking_interruptible are
safe points.

If a thread has no resources to release, it is safe to kill
asynchronously. Other than that, safe points should be
declared explicitly. When a resource is acquired,
application should declare it.

* Thread.delay_interrupt(klass) { ... }

It is safe points that out of outermost
Thread.delay_interrupt.

Another idea is about ensure clause. Since an ensure
clause is used to release a resource, it may delay
asynchronous events as in Thread.delay_interrupt.

--
Posted via http://www.ruby-forum.com/\.

Tanaka Akira wrote:

Basically, asynchronous events should be queued.

* make a queue for each thread.
* Thread#raise(exc) enqueue exc to the queue.
* Thread.check_interrupt(klass) checks an exception which is
  a kind of klass in the queue of the current thread. If it
  exists, the exception is raised.
* other queue examining mechanism, such as sigpending, etc,
  may be useful.

Thread.check_interrupt is a safe point.

Agreed. This is essentially how these unsafe operations are implemented in JRuby currently. The "raise queue" is one-element long, and appropriate locks are acquired to modify it from outside the target thread. All threads periodically check to see if they've been "raised" or "killed" or whether they should sleep because another thread has "gone critical". The checkpoints are currently at the same points during execution that Ruby 1.8 checks whether it should run the thread scheduler.

However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Also good. Awful names though :slight_smile:

Blocking operations in Thread.blocking_interruptible are
safe points.

If a thread has no resources to release, it is safe to kill
asynchronously. Other than that, safe points should be
declared explicitly. When a resource is acquired,
application should declare it.

* Thread.delay_interrupt(klass) { ... }

It is safe points that out of outermost
Thread.delay_interrupt.

Another idea is about ensure clause. Since an ensure
clause is used to release a resource, it may delay
asynchronous events as in Thread.delay_interrupt.

As mental has said, if threads are uninterruptible by default, this would make ensures safe. I think that's the only reasonable option.

- Charlie

ara howard wrote:

I think the only way around this is for the receiving thread to
explicitly choose moments when it wants to receive any pending
asynchronous exceptions. Can you think of another way?

perhaps i wasn't being clear. that's exactly what i was showing - a handler that gets called (critical section of course) which simply enques the exception. then the thread itself must check for these at the 'right' moment.

Presumably by "critical section" you mean "locking on as narrow a lock as possible so as not to stop the whole freaking world because one thread wants to send an event to another". critical= is the devil.

- Charlie

···

On Mar 18, 2008, at 11:52 AM, MenTaLguY wrote:

no doubt you've read it, but for posterity i'll post it here:

   http://209.85.207.104/search?q=cache:2T61vSNQ4_EJ:home.pacbell.net/ouster/threads.pdf+"why+threads+are+a+bad+idea"&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a

a @ http://codeforpeople.com/

···

On Mar 17, 2008, at 10:30 AM, MenTaLguY wrote:

In a language with side-effects, I've become convinced that the only safe
way to handle asynchronous exceptions is for code to be uninterruptible by
default, and only explicitly allow interrupts at specific points (not broad
regions of code).

--
share your knowledge. it's a way to achieve immortality.
h.h. the 14th dalai lama

In article <41DCFB5E-7EF8-4A1E-81B7-95A6A7691F04@grayproductions.net>,
  James Gray <james@grayproductions.net> writes:

Perl went kind-of along these lines around version 5.8, if I recall
correctly. They added "safe signals" which were signals that only
interrupted at times when it was safe for them to do so.

It's from 5.7.3.

This did have a few weird side-effects though. For example, running a
complex regex would shutoff signal handling for the duration. That
meant you could no longer timeout a regex.

Ruby already defer signal handling same as Perl.

···

--
Tanaka Akira

That's a good point. To some extent it is an API design and
implementation problem; an API which can partially "succeed" in that
fashion is arguably flawed.

There are really two alternatives:

1. fix the operation so that it succeeds or fails atomically

2. otherwise, make the operation always non-interruptable

(Well, there is also a third solution, used by Java's NIO for blocking
reads and writes: on interrupt, destroy the object in question so that
it doesn't matter anymore. But I don't think it's a particularly good
solution.)

-mental

···

On Thu, 20 Mar 2008 02:29:09 +0900, Paul Brannan <pbrannan@atdesk.com> wrote:

On Tue, Mar 18, 2008 at 02:04:40PM +0900, Tanaka Akira wrote:

It is because Thread.check_interrupt with blocking operations causes
race conditions.

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

The same is true for any operation for which partial success is
possible.

yes - lock only long enough to push the exception onto a stack/queue where it can be checked at an opportune moment - as opposed to async exceptions just bubbling in whenever.

a @ http://drawohara.com/

···

On Mar 22, 2008, at 12:04 AM, Charles Oliver Nutter wrote:

Presumably by "critical section" you mean "locking on as narrow a lock as possible so as not to stop the whole freaking world because one thread wants to send an event to another". critical= is the devil

--
sleep is the best meditation.
h.h. the 14th dalai lama

In article <3bc15bbea3f2a64de52c462c8e32d6c3@localhost>,
  MenTaLguY <mental@rydia.net> writes:

This sounds very good! I hadn't considered anything like
blocking_interruptible, but it seems useful.

The idea is come from the cancelation point of pthread.

···

--
Tanaka Akira

In article <20080319172919.GL26883@atdesk.com>,
  Paul Brannan <pbrannan@atdesk.com> writes:

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

It is a very difficult problem.

I don't try to solve the problem perfectly.

I think the data lost may be acceptable if an asynchronous
event is for termination.

If the purpose is termination, data lost may be happen
anyway. For example, data from TCP is lost if a process
exits before data arrival.

If data lost is acceptable but termination delay is not
acceptable, Thread.blocking_interruptible(klass) { I/O } can
be used.

If data lost is not acceptable but termination delay is
acceptable, Thread.blocking_uninterruptible(klass) { I/O }
can be used.

If data lost and termination delay is not acceptable, it is
difficult. One idea is
Thread.blocking_interruptible(klass) { IO.select } and
nonblocking I/O. But nonblocking I/O causes inherently
partial result. So there should be a read/write buffer. If
termination procedure ignore the read buffer, data lost
occur. If termination procedure flushes the write buffer,
it may blocks. So data lost or termination delay again. It
is the difficult problem.

I hope either data lost or termination delay is acceptable.

If an asynchronous event is used for non-termination, data
lost is not acceptable in general. Assume a procedure is
called for the event. If some delay is acceptable,
Thread.blocking_uninterruptible(klass) { I/O } can be used.
If the delay is also not acceptable, it is the difficult
problem. However if the event is caused by Thread#raise,
I'm not sure why the procedure is not called directly
instead of Thread#raise. If the event is caused by signal,
I have no good idea to do it.

So I think asynchronous events should be used only for
termination. This is why I think exception on blocking
operation is tolerable.

···

--
Tanaka Akira

In article <47E4A106.7060405@sun.com>,
  Charles Oliver Nutter <charles.nutter@sun.com> writes:

As mental has said, if threads are uninterruptible by default, this
would make ensures safe. I think that's the only reasonable option.

If threads are uniterruptible by default for all kind of
asynchronous events, SIGINT and other signals are not
effective by default. It is not reasonable.

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.

···

--
Tanaka Akira

Here is a related talk/paper that offers a slightly different
perspective than Ousterhout: <http://swtch.com/~rsc/talks/threads07/&gt;

Gary Wright

···

On Mar 18, 2008, at 1:19 AM, ara howard wrote:

On Mar 17, 2008, at 10:30 AM, MenTaLguY wrote:

In a language with side-effects, I've become convinced that the only safe
way to handle asynchronous exceptions is for code to be uninterruptible by
default, and only explicitly allow interrupts at specific points (not broad
regions of code).

no doubt you've read it, but for posterity i'll post it here:

  http://209.85.207.104/search?q=cache:2T61vSNQ4_EJ:home.pacbell.net/ouster/threads.pdf+"why+threads+are+a+bad+idea"&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a

Hi,

···

In message "Re: Thread#raise, Thread#kill, and timeout.rb are unsafe" on Tue, 18 Mar 2008 23:02:24 +0900, Tanaka Akira <akr@fsij.org> writes:

This did have a few weird side-effects though. For example, running a
complex regex would shutoff signal handling for the duration. That
meant you could no longer timeout a regex.

Ruby already defer signal handling same as Perl.

1.8 regexp does check interrupts during match, 1.9 doesn't check (yet).

              matz.

Whatever it's supposed to be the link does not work for me.

Thanks

Michal

···

On 18/03/2008, ara howard <ara.t.howard@gmail.com> wrote:

On Mar 17, 2008, at 10:30 AM, MenTaLguY wrote:

> In a language with side-effects, I've become convinced that the only
> safe
> way to handle asynchronous exceptions is for code to be
> uninterruptible by
> default, and only explicitly allow interrupts at specific points
> (not broad
> regions of code).

no doubt you've read it, but for posterity i'll post it here:

   http://209.85.207.104/search?q=cache:2T61vSNQ4_EJ:home.pacbell.net/ouster/threads.pdf+"why+threads+are+a+bad+idea"&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a

For I/O operations, it is typically desirable for the operation to be
interruptible, but to get the result of the partial operation after it
is interrupted. At least, that's how it works at the C level with
respect to signals.

Paul

···

On Thu, Mar 20, 2008 at 03:50:13AM +0900, MenTaLguY wrote:

There are really two alternatives:

1. fix the operation so that it succeeds or fails atomically

2. otherwise, make the operation always non-interruptable