I'm running a script from the command line that's going to take a couple of hours to complete. Between 15 and 20 minutes into its run, the script throws an execution expired (Timeout::Error). Is there an environment variable that I should be looking at modifying? The error message in its entirety is:
/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired (Timeout::Error)
from ./spider.rb:6334:in `join'
from ./spider.rb:6334
from ./spider.rb:6334:in `each'
from ./spider.rb:6334
Is it safe to guess, based on the name of the script, that it spiders web pages? If that's the case, Timeout::Error s are going to happen quite frequently as a particular web page loads too slowly.
···
On Jan 8, 2005, at 8:08 PM, Jason N.Perkins wrote:
I'm running a script from the command line that's going to take a couple of hours to complete. Between 15 and 20 minutes into its run, the script throws an execution expired (Timeout::Error). Is there an environment variable that I should be looking at modifying? The error message in its entirety is:
/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired (Timeout::Error)
from ./spider.rb:6334:in `join'
from ./spider.rb:6334
from ./spider.rb:6334:in `each'
from ./spider.rb:6334
I'm catching those errors with no problem with a 'rescue'. This seems to be specific to the script itself.
···
On Jan 8, 2005, at 7:14 PM, Francis Hwang wrote:
Is it safe to guess, based on the name of the script, that it spiders web pages? If that's the case, Timeout::Error s are going to happen quite frequently as a particular web page loads too slowly.
On Sun, 9 Jan 2005 10:19:39 +0900, Jason N. Perkins <jperkins@sneer.org> wrote:
On Jan 8, 2005, at 7:14 PM, Francis Hwang wrote:
> Is it safe to guess, based on the name of the script, that it spiders
> web pages? If that's the case, Timeout::Error s are going to happen
> quite frequently as a particular web page loads too slowly.
I'm catching those errors with no problem with a 'rescue'. This seems
to be specific to the script itself.
Sure. The blogs variable is an array of the urls of blogs - I intend to eventually have these urls stored in MySQL, but for now an array works. I emptied that array so that those sites that I have in it aren't getting hit by too many people trying to help out. The threading is derived from a sample in "Programming Ruby." I'd love any additional feedback outside of dealing with the timeout issue.
#! /usr/local/bin/ruby -w
require 'open-uri'
require 'thread'
blogs =
buffer=Queue.new
# load the blogs into the queue
blogs.each do |blog|
buffer.enq( blog )
end
consumers = (1..150).map do |i|
Thread.new("consumer #{i}") do |name|
begin
blog = buffer.deq
open( blog ) do |content|
begin
metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
metas.each do |current_meta|
current_meta = current_meta.to_s
if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
name = $1
current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
content = $1
case name
when "geo.position"
print "#{blog} \t #{content} \n"
when "ICBM"
print "#{blog} \t #{content} \n"
end
end
end
rescue Exception
p "#{blog}: $! \n"
end
end
end until buffer == :END_OF_WORK
end
end
begin
consumers.size.times{ buffer.enq(:END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!
end
Is the line 6334 that shows up in the traceback this line:
consumers.each{|th| th.join}
And one tip, which may not have anything to do with this problem but might make your code easier to understand and/or debug: Since threading is so bloody difficult, I try to make it affect as little of the program as possible. In a case like your code, for example, I would've let the threaded part simply handle the loading of the web pages, but let the parsing happen afterward when all the threads have been joined again. This is how FeedBlender (http://feedblender.rubyforge.org/\) does it, so that way if there's a bug I can figure out if it's because of the threading or not.
···
On Jan 8, 2005, at 8:29 PM, Jason N.Perkins wrote:
On Jan 8, 2005, at 7:21 PM, Bill Atkins wrote:
Can you post the code?
Sure. The blogs variable is an array of the urls of blogs - I intend to eventually have these urls stored in MySQL, but for now an array works. I emptied that array so that those sites that I have in it aren't getting hit by too many people trying to help out. The threading is derived from a sample in "Programming Ruby." I'd love any additional feedback outside of dealing with the timeout issue.
#! /usr/local/bin/ruby -w
require 'open-uri'
require 'thread'
blogs =
buffer=Queue.new
# load the blogs into the queue
blogs.each do |blog|
buffer.enq( blog )
end
consumers = (1..150).map do |i|
Thread.new("consumer #{i}") do |name|
begin
blog = buffer.deq
open( blog ) do |content|
begin
metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
metas.each do |current_meta|
current_meta = current_meta.to_s
if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
name = $1
current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
content = $1
case name
when "geo.position"
print "#{blog} \t #{content} \n"
when "ICBM"
print "#{blog} \t #{content} \n"
end
end
end
rescue Exception
p "#{blog}: $! \n"
end
end
end until buffer == :END_OF_WORK
end
end
begin
consumers.size.times{ buffer.enq(:END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!
end
begin
consumers.size.times{ buffer.enq(:END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!
end
I think, when the thread that is being "joined" raises timeout error, the
program will finish and the other threads won't be joined. Maybe you should
put the begin...rescue around the join (inside the each).
Is the line 6334 that shows up in the traceback this line:
consumers.each{|th| th.join}
Yeah, that's the line that's timing out and why I was wondering if there's a global timeout value for the script that I can either modify up or turn off completely.
And one tip, which may not have anything to do with this problem but might make your code easier to understand and/or debug: Since threading is so bloody difficult, I try to make it affect as little of the program as possible. In a case like your code, for example, I would've let the threaded part simply handle the loading of the web pages, but let the parsing happen afterward when all the threads have been joined again. This is how FeedBlender (http://feedblender.rubyforge.org/\) does it, so that way if there's a bug I can figure out if it's because of the threading or not.
Is the line 6334 that shows up in the traceback this line:
consumers.each{|th| th.join}
Yeah, that's the line that's timing out and why I was wondering if there's a global timeout value for the script that I can either modify up or turn off completely.