Frustrated: System call timeouts

Hello all,

I am having some (un)fun with timing out a database calls.

Basically I have some database calls that go out to a remote database
server on the other side of the planet, (using Rails' active record).

This works all fine, but occasionally, the link gets interrupted and
you get a stale session and the whole thing just locks up waiting for
the call to complete (which it never does).

This then hangs the rake task that is doing a periodic update through
the system cron, and it can jam until you go in and reset it. - quite
annoying.

Trying timeout.rb didn't help, as it does not handle system calls
(except I believe for ones that Ruby makes itself, like file I/O).

Trying system-timer (http://ph7spot.com/articles/system_timer) from
Philippe Hanrigou also didn't work - same hang, waiting for a return
call from the DB driver.

The DB adapter is Oracle instant client then OCI, then Oracle Active
Record Adapter, within ActiveRecord called from a rake task (that
includes the environment), so I am basically calling from within a
full rails stack on top of Ruby 1.8.6p36

When the rake task starts, it checks to see if another copy is running
through a lock file and exits if so, so there is only ever one copy of
the rake task running - so it is not some race condition here.

The time outs happen while I am finding an individual row of a table
[Model.find(id)] which is usually a fast operation, in the context of
where I am using it, it is the slowest part of my process, and so
seems to be where the network has the most chance to crap out, so it
is probably not that that bit of the code fails.

Has anyone found a reliable way to timeout this sort of call / does
anyone have any idea why the system timer would _not_ be timing out
this sort of call.

The hard thing is I am not 100% sure where it is failing, I think
(from looking at tcpdump and copious logging) that it is stalling in
that find method, but this I am not 100% sure.

Any pointers from others that must have tackled this problem on where
to go from here? I see my options are:

1) Figure out a solution to this problem (preferred)
2) Abandon it and monitor for a zombie by tailing a log file or the
like for inactivity and then kill appropriately (sounds like a real
hack).

Mikel

try this

cfp:~/src/ruby > cat timing.rb
Timing.out(2) do
    p 'works'
end

Timing.out(1) do
    begin
      sleep 2
    rescue Timed.out
      p 'times out'
    end
end

Timing.out(1) do
    sleep 2
    p 'blows up'
end

BEGIN {

    module Timing
      class Error < ::StandardError; end

      def Timing.out *seconds, &block
        if seconds.empty?
          return Error
        else
          seconds = Float seconds.first
        end

        pid = Process.pid
        signaler = IO.popen "ruby -e'sleep #{ seconds };
        Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
        thread = Thread.current
        handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
        begin
          block.call
        ensure
          Process.kill 'TERM', signaler.pid rescue nil
          Signal.trap('TERM', handler)
        end
      end

      ::Timed = Timing
    end

}

cfp:~/src/ruby > ruby timing.rb
"works"
"times out"
timing.rb:34:in `out': 1.0 (Timing::Error)
  from timing.rb:14:in `call'
  from timing.rb:14:in `sleep'
  from timing.rb:14
  from timing.rb:36:in `call'
  from timing.rb:36:in `out'
  from timing.rb:13

a @ http://codeforpeople.com/

···

On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:

Hello all,

I am having some (un)fun with timing out a database calls.

Basically I have some database calls that go out to a remote database
server on the other side of the planet, (using Rails' active record).

This works all fine, but occasionally, the link gets interrupted and
you get a stale session and the whole thing just locks up waiting for
the call to complete (which it never does).

This then hangs the rake task that is doing a periodic update through
the system cron, and it can jam until you go in and reset it. - quite
annoying.

Trying timeout.rb didn't help, as it does not handle system calls
(except I believe for ones that Ruby makes itself, like file I/O).

Trying system-timer (http://ph7spot.com/articles/system_timer\) from
Philippe Hanrigou also didn't work - same hang, waiting for a return
call from the DB driver.

The DB adapter is Oracle instant client then OCI, then Oracle Active
Record Adapter, within ActiveRecord called from a rake task (that
includes the environment), so I am basically calling from within a
full rails stack on top of Ruby 1.8.6p36

When the rake task starts, it checks to see if another copy is running
through a lock file and exits if so, so there is only ever one copy of
the rake task running - so it is not some race condition here.

The time outs happen while I am finding an individual row of a table
[Model.find(id)] which is usually a fast operation, in the context of
where I am using it, it is the slowest part of my process, and so
seems to be where the network has the most chance to crap out, so it
is probably not that that bit of the code fails.

Has anyone found a reliable way to timeout this sort of call / does
anyone have any idea why the system timer would _not_ be timing out
this sort of call.

The hard thing is I am not 100% sure where it is failing, I think
(from looking at tcpdump and copious logging) that it is stalling in
that find method, but this I am not 100% sure.

Any pointers from others that must have tackled this problem on where
to go from here? I see my options are:

1) Figure out a solution to this problem (preferred)
2) Abandon it and monitor for a zombie by tailing a log file or the
like for inactivity and then kill appropriately (sounds like a real
hack).

Mikel

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

There was a bit of delay (putting out some fires here over the past
two days) but I got to your code last night and this morning, and it
basically works... except it doesn't kill off the signaler threads
fully.

This is because two processes get made, first is the shell which then
creates the ruby -e "sleep..." blah thread.

The 'hack' I used to solve this is to replace the ensure block with:

      ensure
        Process.kill 'TERM', signaler.pid rescue nil
        Process.kill('TERM', signaler.pid+1) rescue nil
        Signal.trap('TERM', handler)
      end

But this obviously is insane as it assumes that no other processes get
started on the computer between sh starting up and it firing off the
ruby process.

the ps output looks like this:

$ ps -ef | grep ruby
rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
rails 2237 2153 69 17:04 sh -c ruby -e'sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil'
rails 2238 2237 69 17:04 ruby -e'sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil'

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?

Mikel

···

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com> wrote:

On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:

Hello all,
I am having some (un)fun with timing out a database calls.

try this:
      <snip>
      pid = Process.pid
      signaler = IO.popen "ruby -e'sleep #{ seconds };
      Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
      thread = Thread.current
      handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
      begin
        block.call
      ensure
        Process.kill 'TERM', signaler.pid rescue nil
        Signal.trap('TERM', handler)
      end

--

Rails, RSpec and Life blog....

I agree, that was very clever :slight_smile: Bookmarked in case I ever need this.

martin

···

On Tue, Sep 9, 2008 at 12:00 AM, Mikel Lindsaar <raasdnil@gmail.com> wrote:

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

Since you are using popen anyway you can just have your ruby process
print it's PID when it starts, and read it in your terminator.

HTH

Michal

···

On 09/09/2008, Mikel Lindsaar <raasdnil@gmail.com> wrote:

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com> wrote:
> On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:
>> Hello all,
>> I am having some (un)fun with timing out a database calls.

> try this:
> <snip>

> pid = Process.pid
> signaler = IO.popen "ruby -e'sleep #{ seconds };
> Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
> thread = Thread.current
> handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
> begin
> block.call
> ensure
> Process.kill 'TERM', signaler.pid rescue nil
> Signal.trap('TERM', handler)
> end

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

There was a bit of delay (putting out some fires here over the past
two days) but I got to your code last night and this morning, and it
basically works... except it doesn't kill off the signaler threads
fully.

This is because two processes get made, first is the shell which then
creates the ruby -e "sleep..." blah thread.

The 'hack' I used to solve this is to replace the ensure block with:

      ensure
        Process.kill 'TERM', signaler.pid rescue nil

        Process.kill('TERM', signaler.pid+1) rescue nil

        Signal.trap('TERM', handler)
      end

But this obviously is insane as it assumes that no other processes get
started on the computer between sh starting up and it firing off the
ruby process.

the ps output looks like this:

$ ps -ef | grep ruby
rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
rails 2237 2153 69 17:04 sh -c ruby -e'sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil'
rails 2238 2237 69 17:04 ruby -e'sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil'

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?

i keep meaning to turn this into a library but have not. any other advice - besides the pid issue - that you encountered trying to make it live?

cheers.

a @ http://codeforpeople.com/

···

On Sep 9, 2008, at 1:07 AM, Mikel Lindsaar wrote:

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard > <ara.t.howard@gmail.com> wrote:

On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:

Hello all,
I am having some (un)fun with timing out a database calls.

try this:
     <snip>
     pid = Process.pid
     signaler = IO.popen "ruby -e'sleep #{ seconds };
     Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
     thread = Thread.current
     handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
     begin
       block.call
     ensure
       Process.kill 'TERM', signaler.pid rescue nil
       Signal.trap('TERM', handler)
     end

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Mikel Lindsaar wrote:

···

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com> > wrote:

      begin
        block.call
      ensure
        Process.kill 'TERM', signaler.pid rescue nil
        Signal.trap('TERM', handler)
      end

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

There's also a timeout replacement lib [though I haven't tried it].
http://ph7spot.com/articles/system_timer
--
Posted via http://www.ruby-forum.com/\.

correct. this is basically how systemu does it, which you could use similarly to this

   require 'thread'

   q = Queue.new

   systemu command do |pid|

     q.push pid

   end

   pid = q.pop

this bizzare syntax will capture the pid but *also* wait for the process do start. all it's doing is reading from a pipe so your solution seems fine.

cheers.

a @ http://codeforpeople.com/

···

On Sep 9, 2008, at 6:10 AM, Michal Suchanek wrote:

Since you are using popen anyway you can just have your ruby process
print it's PID when it starts, and read it in your terminator.

HTH

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

No, the pid issue is the only thing... it sometimes misses.

A library hey?

gem install terminator

Terminate.timeout(40) do
  ... my block
end

:slight_smile:

Mikel

···

On Wed, Sep 10, 2008 at 12:45 AM, ara.t.howard <ara.t.howard@gmail.com> wrote:

i keep meaning to turn this into a library but have not. any other advice -
besides the pid issue - that you encountered trying to make it live?

--

Rails, RSpec and Life blog....

Thanks for that, I had already tried it. This doesn't _always_ catch
timed out processes in my experience.

···

On Thu, Sep 11, 2008 at 4:55 AM, Roger Pack <rogerpack2005@gmail.com> wrote:

Mikel Lindsaar wrote:

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com>
Ara, thank you _so_ much for this.

There's also a timeout replacement lib [though I haven't tried it].
http://ph7spot.com/articles/system_timer

--

Rails, RSpec and Life blog....

oh that's good! i can give you commit rights to codeforpeople and we could release. such a great name! :wink:

a @ http://codeforpeople.com/

···

On Sep 9, 2008, at 8:48 PM, Mikel Lindsaar wrote:

On Wed, Sep 10, 2008 at 12:45 AM, ara.t.howard > <ara.t.howard@gmail.com> wrote:

i keep meaning to turn this into a library but have not. any other advice -
besides the pid issue - that you encountered trying to make it live?

No, the pid issue is the only thing... it sometimes misses.

A library hey?

gem install terminator

Terminate.timeout(40) do
... my block
end

:slight_smile:

Mikel

--
http://lindsaar.net/
Rails, RSpec and Life blog....

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama