Package idea: attempt

Hi all,

I'm tired of this idiom:

max = 3
begin
    Timeout.timeout(val){
       # some op that could fail or timeout on occasion
    }
rescue Exception
    max -= 1
    if max > 0
       sleep interval
       retry
    end
    raise
end

Mark Fowler wrote a Perl module called "attempt" (http://search.cpan.org/~markf/Attempt-1.01/lib/Attempt.pm) that I think is pretty handy, and I would like this for myself. I figure the API should look like this:

# 1st arg is retries, 2nd arg is interval
attempt(3, 300){
    FTP.open(host, user, passwd){ ... }
}

Here's my possibly naive implementation:

require 'timeout'

module Kernel
    def attempt(tries = 3, interval = 60, timeout = nil)
       begin
          if timeout
             Timeout.timeout(timeout){ yield }
          else
             yield
          end
       rescue
          tries -= 1
          if tries > 0
             sleep interval
             retry
          end
          raise
       end
    end
end

What do you think? Useful? Are there any gotchas I need to consider, such as nested begin/end blocks, try/catch? Anything else? Should I provide some way to provide debug info? Finer grained error handling?

Ideas welcome.

Thanks,

Dan

We had a bug in a system that did something like this so it failed
literally 99 times out of a hundred.

Since we had a fast retry we only noticed the bug when I went hunting
another bug and went around inserting logging statements everywhere and
found the retry / fail producing the massive stream of BLAH failed
retrying messages.

Fixed that bug and suddenly system a lot faster / more stable....

Moral of the Story :

   Unlogged / unreported retries mask bugs, always log / report number of
   retries.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong later."

From this principle, all of life and physics may be deduced.

···

On Fri, 9 Jun 2006, Daniel Berger wrote:

The following is incredibly nitpicky I admit but I figure I may as
well mention it.
The line
Timeout.timeout(timeout){ yield }

Is it just me or is that a lot of "timeout" Hurts a strangers
understanding of the code. Why not change the name of the passed in
timeout var to user_timeout or anything else that isn't just 'timeout'
- kate = masukom

···

On 6/8/06, Daniel Berger <djberg96@gmail.com> wrote:

Hi all,

I'm tired of this idiom:

max = 3
begin
    Timeout.timeout(val){
       # some op that could fail or timeout on occasion
    }
rescue Exception
    max -= 1
    if max > 0
       sleep interval
       retry
    end
    raise
end

Mark Fowler wrote a Perl module called "attempt"
(http://search.cpan.org/~markf/Attempt-1.01/lib/Attempt.pm\) that I think
is pretty handy, and I would like this for myself. I figure the API
should look like this:

# 1st arg is retries, 2nd arg is interval
attempt(3, 300){
    FTP.open(host, user, passwd){ ... }
}

Here's my possibly naive implementation:

require 'timeout'

module Kernel
    def attempt(tries = 3, interval = 60, timeout = nil)
       begin
          if timeout
             Timeout.timeout(timeout){ yield }
          else
             yield
          end
       rescue
          tries -= 1
          if tries > 0
             sleep interval
             retry
          end
          raise
       end
    end
end

What do you think? Useful? Are there any gotchas I need to consider,
such as nested begin/end blocks, try/catch? Anything else? Should I
provide some way to provide debug info? Finer grained error handling?

Ideas welcome.

Thanks,

Dan

John Carter wrote:

We had a bug in a system that did something like this so it failed
literally 99 times out of a hundred.

Since we had a fast retry we only noticed the bug when I went hunting
another bug and went around inserting logging statements everywhere and
found the retry / fail producing the massive stream of BLAH failed
retrying messages.

Fixed that bug and suddenly system a lot faster / more stable....

Moral of the Story :

  Unlogged / unreported retries mask bugs, always log / report number of
  retries.

Yes, that is a potential issue. It occurred to me that errors that would normally be ignored could/should be emitted as warnings. That way, if there's an obvious problem with your code, you'll see it right away, assuming you're running from the command line (or have some other way of monitoring stderr).

Regards,

Dan

···

On Fri, 9 Jun 2006, Daniel Berger wrote:

John Carter wrote:

Moral of the Story :

   Unlogged / unreported retries mask bugs, always log / report number
of retries.

Also, beware of retries in multiple levels of protocol stack. I've
heard stories of system that retried the lowest level of a protocol 3
times with a 30 second timeout (total 90 second timeout). The next
layer above that added its own 3 tries (now we have 4 1/2 minutes before
timeout failure). The next several layers also did retries, with the
end result taking *hours* to time out.

Moral of story: Don't add retries indiscriminately.

-- Jim Weirich

···

--
Posted via http://www.ruby-forum.com/\.

kate rhodes wrote:

The following is incredibly nitpicky I admit but I figure I may as
well mention it.
The line
Timeout.timeout(timeout){ yield }

Is it just me or is that a lot of "timeout" Hurts a strangers
understanding of the code. Why not change the name of the passed in
timeout var to user_timeout or anything else that isn't just 'timeout'
- kate = masukom

Heh, I suppose it might be. I could change that.

I remember, back in the 1.6.x days, when "timeout" was a top level method and I had a variable called "timeout" in my code. That took a while to track down. :slight_smile:

Regards,

Dan

Jim Weirich wrote:

John Carter wrote:

Moral of the Story :

   Unlogged / unreported retries mask bugs, always log / report number of retries.

Also, beware of retries in multiple levels of protocol stack. I've heard stories of system that retried the lowest level of a protocol 3 times with a 30 second timeout (total 90 second timeout). The next layer above that added its own 3 tries (now we have 4 1/2 minutes before timeout failure). The next several layers also did retries, with the end result taking *hours* to time out.

Moral of story: Don't add retries indiscriminately.

-- Jim Weirich

Yep, definitely something to watch out for. What can I say? Use with caution. :slight_smile:

- Dan

for what it's worth have my own version of attempt in a few near-real-time
systems where the overriding principle is : keep going at all costs. in these
systems the 'fail big and fail early' priciple doesn't work unless one enjoys
working on sundays - so i've got lots of stuff like attempt - it all logs to
stderr and/or logs however, so it doesn't go unnoticed.

on another note i've found that incremental sleep increse with reset is almost
always what you want. retrying on the same interval seems to clog up systems
as you get in certain timing rythyms. in rq i use this alot

http://codeforpeople.com/lib/ruby/rq/rq-2.3.3/lib/rq-2.3.3/sleepcycle.rb

it's a cycle that looks like a sawtooth wave - so basically on each retry we
timeout for longer than before, essentially becoming more and more 'patient'
before getting really 'impatient' again.

i've found this matched the real world pretty well since timing out a bunch in
a short period normally means you should wait longer.

cheers.

-a

···

On Fri, 9 Jun 2006, Daniel Berger wrote:

Yep, definitely something to watch out for. What can I say? Use with
caution. :slight_smile:

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

Hm, interesting. Maybe a more advanced version would use a full fledged class with lots of options. Something like this:

attempt = Attempt.new{ |a|
    a.tries = 3 # Try 3 times
    a.interval = 30 # 30 seconds between tries but...
    a.max = 90 # In case of nested retries
    a.increment = 10 # add 10 seconds to the interval with each try
    a.log = log # Where 'log' is an IO handle
    a.warnings = $stderr # Send caught errors to IO handle as warnings
}

attempt{ # Some op }

Attempt#max would, in theory, be used to prevent Jim Weirich's nightmare scenario, where you have a bunch of nested retries, all doing their own sleep + retry thing.

So, using the above example, if I did something like this:

attempt{
    begin
       # some op
    rescue
       sleep 500
       retry
    end
}

It would error out at 90 seconds no matter what (the value we set to 'max'). I'm not sure if that's possible, however, or even how you would implement it. Thoughts?

- Dan

···

ara.t.howard@noaa.gov wrote:

On Fri, 9 Jun 2006, Daniel Berger wrote:

Yep, definitely something to watch out for. What can I say? Use with
caution. :slight_smile:

for what it's worth have my own version of attempt in a few near-real-time
systems where the overriding principle is : keep going at all costs. in these
systems the 'fail big and fail early' priciple doesn't work unless one enjoys
working on sundays - so i've got lots of stuff like attempt - it all logs to
stderr and/or logs however, so it doesn't go unnoticed.

on another note i've found that incremental sleep increse with reset is almost
always what you want. retrying on the same interval seems to clog up systems
as you get in certain timing rythyms. in rq i use this alot

http://codeforpeople.com/lib/ruby/rq/rq-2.3.3/lib/rq-2.3.3/sleepcycle.rb

it's a cycle that looks like a sawtooth wave - so basically on each retry we
timeout for longer than before, essentially becoming more and more 'patient'
before getting really 'impatient' again.

i've found this matched the real world pretty well since timing out a bunch in
a short period normally means you should wait longer.

cheers.

-a

something like:

   def done
     synchronize(:SH){ @done }
   end

   def done=d
     synchronize(:EX){ @done=d }
   end

   def ensure_max!
     @max ||= Thread.new(max, Thread.current) do |m,c|
       sleep max
       c.raise MaxError unless done
     end
   end

   def attempt
     ...
   ensure
     @max.kill
   end

or something like that :wink:

-a

···

On Sat, 10 Jun 2006, Daniel Berger wrote:

It would error out at 90 seconds no matter what (the value we set to 'max'). I'm not sure if that's possible, however, or even how you would implement it. Thoughts?

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

Hm....this has potential. I might be asking you for some help in the future.

Thanks,

Dan

···

ara.t.howard@noaa.gov wrote:

On Sat, 10 Jun 2006, Daniel Berger wrote:

It would error out at 90 seconds no matter what (the value we set to 'max'). I'm not sure if that's possible, however, or even how you would implement it. Thoughts?

something like:

  def done
    synchronize(:SH){ @done }
  end

  def done=d
    synchronize(:EX){ @done=d }
  end

  def ensure_max!
    @max ||= Thread.new(max, Thread.current) do |m,c|
      sleep max
      c.raise MaxError unless done
    end
  end

  def attempt
    ...
  ensure
    @max.kill
  end

or something like that :wink:

-a

sure thing dan. just ping me offline.

cheers.

-a

···

On Sat, 10 Jun 2006, Daniel Berger wrote:

Hm....this has potential. I might be asking you for some help in the future.

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama