Nested threading? implications to timeout()

All,

I'm certainly not a ruby expert by any means. I'm trying to come up to
speed as fast as possible on ruby and rails. My last few months have
been reading, coding, reading, more reading, more coding, etc. :slight_smile: So I
was doing some coding to try to come up to speed on ruby. I really
think it's one of the most phenominal languages I've ever found. I
never got into java, too verbose. That said I'd like to move on to the
question.

At this point, I'm trying to track down some sort of memory leak. In my
vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn't actually kill any threads that were started inside that thread.
To clarify, consider the following:

irb(main):001:0> x=nil ; y = Thread.start { x = Thread.start { sleep 0.5
until false } ; sleep 0.5 until false }
=> #<Thread:0xb7abe20c run>
irb(main):002:0> x.status
=> "sleep"
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
irb(main):007:0> x.status
=> "sleep"
irb(main):008:0>

Clearly by killing, what I would call the "parent thread", did not kill
the child thread. And this would be okay, I think, except that the way
the timeout class works is by starting a thread and killing it if it
doesn't complete in the right amount of time. So if I were to wrap
anything in "timeout() {}" it's possible that something else calls
timeout inside that and then the thread is still alive, until some later
time. Is this normal? Can this cause memory leak? Am I confused? :slight_smile:

Thanks much for your help and feedback,

Geff

···

--
Posted via http://www.ruby-forum.com/.

All,

I'm certainly not a ruby expert by any means. I'm trying to come up to
speed as fast as possible on ruby and rails. My last few months have
been reading, coding, reading, more reading, more coding, etc. :slight_smile: So I
was doing some coding to try to come up to speed on ruby. I really
think it's one of the most phenominal languages I've ever found. I
never got into java, too verbose. That said I'd like to move on to the
question.

At this point, I'm trying to track down some sort of memory leak. In my
vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn't actually kill any threads that were started inside that thread.
To clarify, consider the following:

irb(main):001:0> x=nil ; y = Thread.start { x = Thread.start { sleep 0.5
until false } ; sleep 0.5 until false }
=> #<Thread:0xb7abe20c run>
irb(main):002:0> x.status
=> "sleep"
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>

[There are lines missing here...]

irb(main):007:0> x.status
=> "sleep"
irb(main):008:0>

Clearly by killing, what I would call the "parent thread", did not kill
the child thread. And this would be okay, I think, except that the way
the timeout class works is by starting a thread and killing it if it
doesn't complete in the right amount of time. So if I were to wrap
anything in "timeout() {}" it's possible that something else calls
timeout inside that and then the thread is still alive, until some later
time. Is this normal?

Reading the timeout code I see that it creates a watcher thread to
provide the timeout. The block is executed on the current thread and
interruption is provided by raising an exception (from the watcher
tread) when the timeout expires. At the end of the method there is an
ensure to kill the watcher thread in any case so it would not
encounter the case you describe.

Can this cause memory leak?

Due do what I described in the previous paragraph nested timeouts
won't cause any kind of leak.

Am I confused? :slight_smile:

Yes, I believe you are confused.

By the way, reading the code of the methods you have doubts about can
be a good way to clarify something.

···

On 9/15/06, Geff Geff <boing@boing.com> wrote:

Thanks much for your help and feedback,

Geff

--
Posted via http://www.ruby-forum.com/\.

--
I'm trying to launch the internet; so I open a terminal and go
"percent sign 'Internet'" at the prompt and it doesn't work. What
gives??!! -- random troll

Threads don't have any kind of parent-child relationship, as processes do.
None of the threading models that have seen wide use have a feature like
this. What was your basis for thinking that they do?

···

On 9/15/06, Geff Geff <boing@boing.com> wrote:

vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn't actually kill any threads that were started inside that thread.
To clarify, consider the following:

Arnaud Bergeron wrote:

···

On 9/15/06, Geff Geff <boing@boing.com> wrote:

At this point, I'm trying to track down some sort of memory leak. In my
=> "sleep"
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>

[There are lines missing here...]

sorry that was just me repeating the status method multiple times. :slight_smile:

--
Posted via http://www.ruby-forum.com/\.

Francis Cianfrocca wrote:

vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn't actually kill any threads that were started inside that thread.
To clarify, consider the following:

Threads don't have any kind of parent-child relationship, as processes
do.
None of the threading models that have seen wide use have a feature like
this. What was your basis for thinking that they do?

What was I thinking was the following: lets consider the case where you
try to timeout a HTTP connect with the net/http stuff. There are
already "timeouts" in the HTTP class. If my timeout times out before
the library's timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it's done.
Maybe not such a big deal in thise case but the situation certainly
could have been worse. It could be kind of messy if you're using a
library in general because you don't know if it's going ot call timeout
of it's own and how that might impact resource allocation/deallocation.
As far as the programmer is concerned, he / she timed out the http class
and it should be (all) gone. But it's not. It's still hanging around.
Just like my nested sleep was.

I don't have any basis for thinking there is a parent / child
relationship. Just sorta counter intuitive in the case of a nested
timeout. Because what programmer probably wants is a timeout on the
thing that they are trying to timeout. The *WHOLE* thing, not part of
it, without verifying the libraries don't call timeout themselves.

Thoughts?

Seprately I'm having some issue with some of my activerecord objects
sticking around in a long running deamon. Not all, but some. It's odd.
I haven't found the problem yet, and I really don't know what's going
on. It's probably not timeout related because those would free up
eventually.

Thanks so much for the comments thus far.

Geff

···

On 9/15/06, Geff Geff <boing@boing.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Nested timeouts have a different problem:

http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

···

On Sep 16, 2006, at 12:40 AM, Arnaud Bergeron wrote:

On 9/15/06, Geff Geff <boing@boing.com> wrote:

Clearly by killing, what I would call the "parent thread", did not kill
the child thread. And this would be okay, I think, except that the way
the timeout class works is by starting a thread and killing it if it
doesn't complete in the right amount of time. So if I were to wrap
anything in "timeout() {}" it's possible that something else calls
timeout inside that and then the thread is still alive, until some later
time. Is this normal?

Reading the timeout code I see that it creates a watcher thread to
provide the timeout. The block is executed on the current thread and
interruption is provided by raising an exception (from the watcher
tread) when the timeout expires. At the end of the method there is an
ensure to kill the watcher thread in any case so it would not
encounter the case you describe.

Can this cause memory leak?

Due do what I described in the previous paragraph nested timeouts
won't cause any kind of leak.

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

I haven't read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you'll kill the thread that is doing the TCP
connect, because it's the same thread that called net/http. The
"internal" timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there's no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

···

On 9/16/06, Geff Geff <boing@boing.com> wrote:
> What was I thinking was the following: lets consider the case where you

try to timeout a HTTP connect with the net/http stuff. There are
already "timeouts" in the HTTP class. If my timeout times out before
the library's timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it's done.

Francis Cianfrocca wrote:

···

On 9/16/06, Geff Geff <boing@boing.com> wrote:
> What was I thinking was the following: lets consider the case where
you

try to timeout a HTTP connect with the net/http stuff. There are
already "timeouts" in the HTTP class. If my timeout times out before
the library's timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it's done.

I haven't read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you'll kill the thread that is doing the TCP
connect, because it's the same thread that called net/http. The
"internal" timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there's no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

I think if I timeout a timeout that my timeout is killing the "waiter"
thread. And since the waiter thread started the TCPconnect attempt
thread that tcpconnect stays running until it ends, if it ends. Kinda
like my sleep thread example. Is that not accurate?

--
Posted via http://www.ruby-forum.com/\.

The timer thread won't even remain because of the ensure at the end of
the timeout function that kills it ...

···

On 9/16/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:

On 9/16/06, Geff Geff <boing@boing.com> wrote:
> What was I thinking was the following: lets consider the case where you
> try to timeout a HTTP connect with the net/http stuff. There are
> already "timeouts" in the HTTP class. If my timeout times out before
> the library's timeout, I end up killing the http thread that is watching
> the tcpconnect thread. The tcpconnect thread lives on until it's done.

I haven't read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you'll kill the thread that is doing the TCP
connect, because it's the same thread that called net/http. The
"internal" timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there's no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

--
I'm trying to launch the internet; so I open a terminal and go
"percent sign 'Internet'" at the prompt and it doesn't work. What
gives??!! -- random troll

I don't think so. Try the following:

···

On 9/16/06, Geff Geff <boing@boing.com> wrote:
> I think if I timeout a timeout that my timeout is killing the "waiter"

thread. And since the waiter thread started the TCPconnect attempt
thread that tcpconnect stays running until it ends, if it ends. Kinda
like my sleep thread example. Is that not accurate?

---------------------------------------
require 'timeout'
begin
  timeout(2) {
    timeout(4) {
      sleep 6
      p "I finished"
    }
  }
rescue Timeout::Error
  p "timeout fired"
end

sleep 10
------------------------------------------
You'll see the timeout message after two seconds, and the program will
end ten seconds later without saying "I finished." You should be able
to ignore the fact that timeout is implemented with threads in the
first place. Or were you asking a different question?

FYI: Eric Hodel has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

···

On 9/17/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:

On 9/16/06, Geff Geff <boing@boing.com> wrote:
> I think if I timeout a timeout that my timeout is killing the "waiter"
> thread. And since the waiter thread started the TCPconnect attempt
> thread that tcpconnect stays running until it ends, if it ends. Kinda
> like my sleep thread example. Is that not accurate?
>

I don't think so. Try the following:

---------------------------------------
require 'timeout'
begin
  timeout(2) {
    timeout(4) {
      sleep 6
      p "I finished"
    }
  }
rescue Timeout::Error
  p "timeout fired"
end

sleep 10
------------------------------------------
You'll see the timeout message after two seconds, and the program will
end ten seconds later without saying "I finished." You should be able
to ignore the fact that timeout is implemented with threads in the
first place. Or were you asking a different question?

Francis Cianfrocca wrote:

require 'timeout'
begin
  timeout(2) {
    timeout(4) {
      sleep 6
      p "I finished"
    }
  }
rescue Timeout::Error
  p "timeout fired"
end

sleep 10

Ur very right. Thanks very much. Sorry I had a bad case of HUA. I
read the timeout library code backwards, so to speak. As if the child
thread (so to speak) was the actual code block, instead of just the
sleep and the raise.

Sorry for posting something I shouldn't have wasted your time with.
Thanks for the patience and the help.

Geff

···

--
Posted via http://www.ruby-forum.com/\.

Jan Svitok wrote:

FYI: Eric Hodel has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

Wow! Thanks very much.

···

--
Posted via http://www.ruby-forum.com/\.

Jan Svitok wrote:
...

FYI: Eric Hodel has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

The second part of the article points out a potential danger of the interaction between timeouts and rescue/ensure. But the proposed solution may not be good advice.

The danger is that a timeout exception may fire during an ensure clause, preventing the ensure clause (and any necessary cleanup code within it) from finishing, which could cause resource leaks or other problems.

The solution proposed in the article is to wrap code within the ensure clause in a begin..end block to handle timeout exceptions, like this:

   require 'timeout'

   Timeout.timeout 2 do
     begin
       puts "Allocating the thingy..."
       sleep 1
       raise RuntimeError, 'Oh no! Something went wrong!'
     ensure
       # Since we might time out, hold onto the timeout we caught
       # so we can re-raise it when we're done cleaning up.
       timeout = nil
       begin # we really need to clean up
         puts "Cleaning up after the thingy..."
         sleep 2
         puts "Cleaned up after the thingy!"
       rescue Timeout::Error => e
         puts "Timed out! Trying again!"
         timeout = e # save that timeout then retry
         retry
       end
       # Raise the timeout so we time out all the way to the top.
       raise timeout unless timeout.nil?
     end
   end

However, that solution only reduces the chance of the timeout interfering with the ensure clause. Suppose the timeout fires while the main thread is executing the line "timeout = nil". (It's impossible in this example, but you can get it to happen by putting a "sleep 5" just after this line.) Then the inner begin..end clause doesn't catch the timeout, and the cleanup doesn't happen. In general, unless you are very sure about the timings of your code, you cannot guarantee that the timeout won't fire at the wrong time. So it's a race condition.

Another problem is that the "retry" may cause the cleanup to happen twice, if the timeout fires just when cleanup is finishing. That may lead to other problems (such as closing a file twice and generating another exception).

A solution in this case (but not in general) is to move the ensure block *outside* of the timeout block:

   require 'timeout'

   begin
     Timeout.timeout 2 do
       puts "Allocating the thingy..."
       sleep 1
       raise RuntimeError, 'Oh no! Something went wrong!'
     end

   rescue Timeout::Error => e
     puts "timed out"

   ensure
     puts "Cleaning up after the thingy..."
     sleep 2
     puts "Cleaned up after the thingy!"
   end

The effect of this is to kill the timeout thread before the ensure block starts (since the timeout block has been exited). Also, the ensure code will be called exactly once (assuming there are no other interrupts).

But this refactoring technique cannot be used if the ensure clause occurs in some nested method call. This has been noted before:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/113417

For example:

   require 'timeout'

   def do_some_work
     puts "do_some_work is starting"
     sleep 1
     raise 'exception in do_some_work'
   ensure
     puts "cleaning up resources in do_some_work"
     sleep 2
     puts "cleaned up resources in do_some_work"
   end

   Timeout.timeout 2 do
     do_some_work
   end

This outputs:

do_some_work is starting
cleaning up resources in do_some_work
/usr/local/lib/ruby/1.8/timeout.rb:54:in `do_some_work': execution expired (Timeout::Error)

The cleanup never finishes.

We could modify do_some_work to handle timeouts, as Eric did in his article, but then there would still be the race condition that would (depending on timing) allow the timeout to kill the ensure clause anyway. And there's still the retry problem (causing multiple executions of cleanup code). And it's not easy to modify every library method in this way.

I wish I had an easy answer to this problem, but I don't. Are timeout and ensure inherently incompatible?

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

I don't have an answer, but what if you make your ensure section into a critical section?

ensure
   Thread.critical = true
   # do stuff
   Thread.critical = false
end

Kirk Haines

···

On Mon, 18 Sep 2006, Joel VanderWerf wrote:

I wish I had an easy answer to this problem, but I don't. Are timeout and ensure inherently incompatible?

From reading rb_eval, it might be possible to fix the race condition by exploiting implementation details, but I don't have the opportunity to examine it in depth.

rb_eval looks something like this:

rb_eval(VALUE self, NODE *node) {
   switch (nd_type(node)) {
   case NODE_ENSURE:
     rb_eval(self, node->begin_body);
     rb_eval(self, node->ensure_body);
     break;
   case NODE_RESCUE:
     /* ... */
   }
   CHECK_INTS; /* check for a thread to switch to or a signal to process */
}

So threads may be switched (and Timeout::Error raised) only when is finished evaluating a node.

So if the first item of an ensure is begin (which creates a NODE_RESCUE) we might be "safe":

$ parse_tree_show -f
begin
   perform some work
ensure
   begin
     clean up after ourselves
   rescue Timeout::Error
     retry
   end
end
(eval):2: warning: parenthesize argument(s) for future version
(eval):5: warning: parenthesize argument(s) for future version
[[:ensure,
   # this is the begin body.
   [:fcall, :perform, [:array, [:fcall, :some, [:array, [:vcall, :work]]]]],
   # this is the ensure body
   [:rescue,
    # this is the inner begin body, we will always reach here after ensure.
    [:fcall,
     :clean,
     [:array,
      [:fcall,
       :up,
       [:array, [:fcall, :after, [:array, [:vcall, :ourselves]]]]]]],
    # this is the rescue statement
    [:resbody, [:array, [:colon2, [:const, :Timeout], :Error]],
     # this is the rescue body
     [:retry]]]]]

I am on vacation starting tomorrow, so I'll probably look up this thread when I return.

···

On Sep 17, 2006, at 12:36 PM, Joel VanderWerf wrote:

Jan Svitok wrote:
...

FYI: Eric Hodel has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

The second part of the article points out a potential danger of the interaction between timeouts and rescue/ensure. But the proposed solution may not be good advice.

The danger is that a timeout exception may fire during an ensure clause, preventing the ensure clause (and any necessary cleanup code within it) from finishing, which could cause resource leaks or other problems.

The solution proposed in the article is to wrap code within the ensure clause in a begin..end block to handle timeout exceptions, like this:

  require 'timeout'

  Timeout.timeout 2 do
    begin
      puts "Allocating the thingy..."
      sleep 1
      raise RuntimeError, 'Oh no! Something went wrong!'
    ensure
      # Since we might time out, hold onto the timeout we caught
      # so we can re-raise it when we're done cleaning up.
      timeout = nil
      begin # we really need to clean up
        puts "Cleaning up after the thingy..."
        sleep 2
        puts "Cleaned up after the thingy!"
      rescue Timeout::Error => e
        puts "Timed out! Trying again!"
        timeout = e # save that timeout then retry
        retry
      end
      # Raise the timeout so we time out all the way to the top.
      raise timeout unless timeout.nil?
    end
  end

However, that solution only reduces the chance of the timeout interfering with the ensure clause. Suppose the timeout fires while the main thread is executing the line "timeout = nil". (It's impossible in this example, but you can get it to happen by putting a "sleep 5" just after this line.) Then the inner begin..end clause doesn't catch the timeout, and the cleanup doesn't happen. In general, unless you are very sure about the timings of your code, you cannot guarantee that the timeout won't fire at the wrong time. So it's a race condition.

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

There is still the race problem if the exception gets raised just
before the Thread.critical.

I'm not sure if it is really possible in this case. It depends if the
implentation allows thread switching just after entering an ensure but
before executing any code. You will have to ask gurus on this one.

···

On 9/17/06, khaines@enigo.com <khaines@enigo.com> wrote:

On Mon, 18 Sep 2006, Joel VanderWerf wrote:

> I wish I had an easy answer to this problem, but I don't. Are timeout and
> ensure inherently incompatible?

I don't have an answer, but what if you make your ensure section into a
critical section?

ensure
   Thread.critical = true
   # do stuff
   Thread.critical = false
end

Kirk Haines

--
I'm trying to launch the internet; so I open a terminal and go
"percent sign 'Internet'" at the prompt and it doesn't work. What
gives??!! -- random troll

khaines@enigo.com wrote:

···

On Mon, 18 Sep 2006, Joel VanderWerf wrote:

I wish I had an easy answer to this problem, but I don't. Are timeout and ensure inherently incompatible?

I don't have an answer, but what if you make your ensure section into a critical section?

ensure
  Thread.critical = true
  # do stuff
  Thread.critical = false
end

Even if this worked reliably, using a critical section would be udesirable if the ensure clause invokes IO (e.g. waiting for some network operation to finish). That would freeze the whole process.

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Careful, you may want:

ensure
   orig_crit = Thread.critical
   Thread.critical = true
   # do stuff
   Thread.critical = orig_crit
end

If you may already be inside a critical section.

(or require 'thread' use Thread.exclusive)

···

On Sep 17, 2006, at 1:08 PM, khaines@enigo.com wrote:

On Mon, 18 Sep 2006, Joel VanderWerf wrote:

I wish I had an easy answer to this problem, but I don't. Are timeout and ensure inherently incompatible?

I don't have an answer, but what if you make your ensure section into a critical section?

ensure
  Thread.critical = true
  # do stuff
  Thread.critical = false
end

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Arnaud Bergeron wrote:

> I wish I had an easy answer to this problem, but I don't. Are timeout and
> ensure inherently incompatible?

I don't have an answer, but what if you make your ensure section into a
critical section?

ensure
   Thread.critical = true
   # do stuff
   Thread.critical = false
end

Kirk Haines

There is still the race problem if the exception gets raised just
before the Thread.critical.

Confirmed by experiment with ruby 1.8.4:

$ cat ensure-timeout.rb
require 'timeout'

Thread.abort_on_exception = true

work_thread = Thread.new do
   loop do
     begin
       Thread.pass
     ensure
       Thread.critical = true
       begin
       rescue Timeout::Error
         puts "caught a Timeout::Error"
       end
       Thread.critical = false
       #retry
     end
   end
end

loop do
   work_thread.raise Timeout::Error
end

$ ruby ensure-timeout.rb
ensure-timeout.rb:22: Timeout::Error (Timeout::Error)
         from ensure-timeout.rb:6
         from ensure-timeout.rb:5

···

On 9/17/06, khaines@enigo.com <khaines@enigo.com> wrote:

On Mon, 18 Sep 2006, Joel VanderWerf wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407