Ruby from command line timing out?

Jason_N.Perkins · 9 January 2005 01:08

I'm running a script from the command line that's going to take a couple of hours to complete. Between 15 and 20 minutes into its run, the script throws an execution expired (Timeout::Error). Is there an environment variable that I should be looking at modifying? The error message in its entirety is:

/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired (Timeout::Error)
         from ./spider.rb:6334:in `join'
         from ./spider.rb:6334
         from ./spider.rb:6334:in `each'
         from ./spider.rb:6334

···

--
Jason N Perkins
<http://sneer.org/>

Francis_Hwang1 · 9 January 2005 01:14

Is it safe to guess, based on the name of the script, that it spiders web pages? If that's the case, Timeout::Error s are going to happen quite frequently as a particular web page loads too slowly.

···

On Jan 8, 2005, at 8:08 PM, Jason N.Perkins wrote:

I'm running a script from the command line that's going to take a couple of hours to complete. Between 15 and 20 minutes into its run, the script throws an execution expired (Timeout::Error). Is there an environment variable that I should be looking at modifying? The error message in its entirety is:

/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired (Timeout::Error)
        from ./spider.rb:6334:in `join'
        from ./spider.rb:6334
        from ./spider.rb:6334:in `each'
        from ./spider.rb:6334

--
Jason N Perkins
<http://sneer.org/>

Francis Hwang

Jason_N.Perkins · 9 January 2005 01:19

I'm catching those errors with no problem with a 'rescue'. This seems to be specific to the script itself.

···

On Jan 8, 2005, at 7:14 PM, Francis Hwang wrote:

Is it safe to guess, based on the name of the script, that it spiders web pages? If that's the case, Timeout::Error s are going to happen quite frequently as a particular web page loads too slowly.

--
Jason N Perkins
<http://sneer.org/>

Bill_Atkins · 9 January 2005 01:21

Can you post the code?

Bill

···

On Sun, 9 Jan 2005 10:19:39 +0900, Jason N. Perkins <jperkins@sneer.org> wrote:

On Jan 8, 2005, at 7:14 PM, Francis Hwang wrote:

> Is it safe to guess, based on the name of the script, that it spiders
> web pages? If that's the case, Timeout::Error s are going to happen
> quite frequently as a particular web page loads too slowly.

I'm catching those errors with no problem with a 'rescue'. This seems
to be specific to the script itself.

--
Jason N Perkins
<http://sneer.org/>

--
$stdout.sync = true
"Just another Ruby hacker.".each_byte do |b|
('a'..'z').step do|c|print c+"\b";sleep 0.007 end;print b.chr
end; print "\n"

Jason_N.Perkins · 9 January 2005 01:29

Sure. The blogs variable is an array of the urls of blogs - I intend to eventually have these urls stored in MySQL, but for now an array works. I emptied that array so that those sites that I have in it aren't getting hit by too many people trying to help out. The threading is derived from a sample in "Programming Ruby." I'd love any additional feedback outside of dealing with the timeout issue.

#! /usr/local/bin/ruby -w

require 'open-uri'
require 'thread'

blogs =

buffer=Queue.new

# load the blogs into the queue
blogs.each do |blog|
buffer.enq( blog )
end

consumers = (1..150).map do |i|
   Thread.new("consumer #{i}") do |name|
     begin
       blog = buffer.deq
       open( blog ) do |content|
         begin
           metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
           metas.each do |current_meta|
             current_meta = current_meta.to_s

             if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
               name = $1
               current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
               content = $1

               case name
               when "geo.position"
                 print "#{blog} \t #{content} \n"

               when "ICBM"
                 print "#{blog} \t #{content} \n"
               end
             end
           end
         rescue Exception
           p "#{blog}: $! \n"
         end
       end
     end until buffer == :END_OF_WORK
   end
end

begin
   consumers.size.times{ buffer.enq(:END_OF_WORK) }
   consumers.each{|th| th.join}
rescue Exception
   print $!
end

···

On Jan 8, 2005, at 7:21 PM, Bill Atkins wrote:

Can you post the code?

--
Jason N Perkins
<http://sneer.org/>

Francis_Hwang1 · 9 January 2005 15:33

Jason,

Is the line 6334 that shows up in the traceback this line:

consumers.each{|th| th.join}

And one tip, which may not have anything to do with this problem but might make your code easier to understand and/or debug: Since threading is so bloody difficult, I try to make it affect as little of the program as possible. In a case like your code, for example, I would've let the threaded part simply handle the loading of the web pages, but let the parsing happen afterward when all the threads have been joined again. This is how FeedBlender (http://feedblender.rubyforge.org/\) does it, so that way if there's a bug I can figure out if it's because of the threading or not.

···

On Jan 8, 2005, at 8:29 PM, Jason N.Perkins wrote:

On Jan 8, 2005, at 7:21 PM, Bill Atkins wrote:

Can you post the code?

Sure. The blogs variable is an array of the urls of blogs - I intend to eventually have these urls stored in MySQL, but for now an array works. I emptied that array so that those sites that I have in it aren't getting hit by too many people trying to help out. The threading is derived from a sample in "Programming Ruby." I'd love any additional feedback outside of dealing with the timeout issue.

#! /usr/local/bin/ruby -w

require 'open-uri'
require 'thread'

blogs =

buffer=Queue.new

# load the blogs into the queue
blogs.each do |blog|
  buffer.enq( blog )
end

consumers = (1..150).map do |i|
  Thread.new("consumer #{i}") do |name|
    begin
      blog = buffer.deq
      open( blog ) do |content|
        begin
          metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
          metas.each do |current_meta|
            current_meta = current_meta.to_s

            if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
              name = $1
              current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
              content = $1

              case name
              when "geo.position"
                print "#{blog} \t #{content} \n"

              when "ICBM"
                print "#{blog} \t #{content} \n"
              end
            end
          end
        rescue Exception
          p "#{blog}: $! \n"
        end
      end
    end until buffer == :END_OF_WORK
  end
end

begin
  consumers.size.times{ buffer.enq(:END_OF_WORK) }
  consumers.each{|th| th.join}
rescue Exception
  print $!
end

--
Jason N Perkins
<http://sneer.org/>

Francis Hwang

Carlos · 9 January 2005 16:49

["Jason N.Perkins" <jperkins@sneer.org>, 2005-01-09 02.29 CET]

begin
  consumers.size.times{ buffer.enq(:END_OF_WORK) }
  consumers.each{|th| th.join}
rescue Exception
  print $!
end

I think, when the thread that is being "joined" raises timeout error, the
program will finish and the other threads won't be joined. Maybe you should
put the begin...rescue around the join (inside the each).

Hope this helps. Good luck.

Jason_N.Perkins · 9 January 2005 17:42

Jason,

Is the line 6334 that shows up in the traceback this line:

consumers.each{|th| th.join}

Yeah, that's the line that's timing out and why I was wondering if there's a global timeout value for the script that I can either modify up or turn off completely.

And one tip, which may not have anything to do with this problem but might make your code easier to understand and/or debug: Since threading is so bloody difficult, I try to make it affect as little of the program as possible. In a case like your code, for example, I would've let the threaded part simply handle the loading of the web pages, but let the parsing happen afterward when all the threads have been joined again. This is how FeedBlender (http://feedblender.rubyforge.org/\) does it, so that way if there's a bug I can figure out if it's because of the threading or not.

OK, I'll give that a try. Thanks, Francis!

···

On Jan 9, 2005, at 9:33 AM, Francis Hwang wrote:

--
Jason N Perkins
<http://sneer.org/>

Eric_Hodel1 · 10 January 2005 18:26

Timeout::Error comes from timeout.rb.

Your Timeout::Error probably comes out of HTTP, open-uri doesn't require timeout, and has no timeout blocks.

Try Thread.abort_on_exception = true at the top of your script, and remove the begin/end block inside the thread.

PGP.sig (186 Bytes)

···

On 09 Jan 2005, at 09:42, Jason N.Perkins wrote:

On Jan 9, 2005, at 9:33 AM, Francis Hwang wrote:

Jason,

Is the line 6334 that shows up in the traceback this line:

consumers.each{|th| th.join}

Yeah, that's the line that's timing out and why I was wondering if there's a global timeout value for the script that I can either modify up or turn off completely.

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Topic		Replies	Views
Execution expired - timeout ruby-talk	5	83	25 January 2011
Net-ssh - execution expired ruby-talk	1	109	22 July 2011
Timeout Error in pingecho ruby-talk	3	80	12 May 2010
Help with timeout and IO.popen (newb) ruby-talk	0	110	15 December 2007
Escaping ie wait runtime error in ruby ruby-talk	0	140	11 June 2012

Ruby from command line timing out?

Related topics