"Keegan Dunn" <theweeg@gmail.com> schrieb im Newsbeitrag news:65e6c89204121310527b234a7b@mail.gmail.com...
I'm trying to write a threaded program that will run through a list of
web sites and download/process a set number of them at a
time(maintaining a pool of threads that can process page
downloads/processing). I have something simple working, but I am
unsure how to approach the "pool" of threads idea. Is that even the
way to go about processing multiple pages simultaneously? Is there a
better way?
It's most likely the most efficient way. You need these ingredients:
- a thread safe queue
- a pool of processors
- a main thread that does the distribution of work
You also likely want to have a class or method that deals with the details of fetching data and analysing / storing it to keep thread body blocks small.
# untested but you'll get the picture
require 'thread'
THREADS = 10
TERM = Object.new
queue = Queue.new
threads =
THREADS.times do
threads << Thread.new( queue ) do |q|
until ( TERM == ( url = q.deq ) )
begin
# get data from url
rescue
# in case of timeout try again by putting
# it back
end
end
end
end
# now read urls and distribute work
while ( line = gets )
line.chomp!
queue.enq line
end
# write terminators
THREADS.times { queue.enq TERM }
# ... and wait for threads to terminate properly
threads.each {|t| t.join}
# exiting
Also, how can I deal with a "socket read timeout" error? I have the
http get call wrapped in a begin...rescue...end block, but it doesn't
seem to be catching it. Here is the code in question:
def getHTTP(site)
siteHost = site.gsub(/http:\/\//,'').gsub(/\/.*/,'')
begin
masterSite = Net::HTTP.new(siteHost,80)
siteURL = "/" + site.gsub(/http:\/\//,'').gsub(siteHost,'')
resp, data = masterSite.get2(siteURL, nil)
return data
rescue
return "-999"
end
You'll likely need to catch another exception. Try "rescue Exception => e" and then print e's class.
Sorry about the two for one question 
You get one answer for free. 
Kind regards
robert