Odd result when attempting to use Mechanize in parallel with Threads

I wrote a simple tool to iterate a network to try and find web servers
running on specific ports. We have a lot of devices & software with a
web UI, and I thought that this would be a handy way to find them,
and even tell what they are.

I thought this would be a handy coding project too, and a good way to
cut my teeth on Ruby threads, and build up some usage with
Mechanize.

BTW I am running this on *Windows XP*.

However my code is quite obviously executing this serially. Is there
something obviously wrong with my code below? (results after
code snippet). I am aware this could make my machine choke from
thread overkill, but I wanted to get it working in parallel first.
Perhaps Mechanize instances have some shared elements?

···

============================
require 'mechanize'

threads = Array.new
puts “sweep of 153.200.72.* segment http ports”
(1..254).each do |ran|
  threads << Thread.new(ran) { |r|
    agent = WWW::Mechanize.new
    agent.user_agent_alias = ‘Windows Mozilla’
    ports = [80,8080]
    ports.each do |p|
      begin
        page = agent.get(“http://153.200.72.”+r.to_s+":"+p.to_s)
        puts “153.200.72.”+r.to_s+":"+p.to_s+" - “+page.title
      rescue
        puts “153.200.72.”+r.to_s+”:"+p.to_s+" - NOTHING"
      end
    end
  }
  threads.each { |aThread| aThread.join }
end

153.200.72.10:80 - NOTHING
153.200.72.10:8080 - NOTHING
153.200.72.11:80 - NOTHING
153.200.72.11:8080 - NOTHING
153.200.72.12:80 - NOTHING
153.200.72.12:8080 - NOTHING
153.200.72.13:80 - NOTHING
153.200.72.13:8080 - NOTHING
153.200.72.14:80 - NOTHING
153.200.72.14:8080 - NOTHING
153.200.72.15:80 - NOTHING
153.200.72.15:8080 - NOTHING
153.200.72.16:80 - NOTHING
153.200.72.16:8080 - NOTHING
153.200.72.17:80 - NOTHING
153.200.72.17:8080 - NOTHING

require 'mechanize'

threads = Array.new

puts "sweep of 153.200.72.* segment http ports"

(1..254).each do |ran|
   threads << Thread.new(ran) { |r|
     agent = WWW::Mechanize.new
     agent.user_agent_alias = 'Windows Mozilla'
     ports = [80,8080]
     ports.each do |p|
       begin
         page = agent.get("http://153.200.72.“+r.to_s+”:"+p.to_s)
         puts "153.200.72."+r.to_s+":"+p.to_s+" - "+page.title
       rescue
         puts "153.200.72."+r.to_s+":"+p.to_s+" - NOTHING"
       end
     end
   }
end

threads.each { |aThread| aThread.join } # THIS MUST BE OUTSIDE THE LOOP!

fyi. starting a thread, and then immediately joining it is the same as not
using a thread at all!

another fyi - threads are io (even socket io) is a dealy combination on
windows. run this on linux/mac if possible.

regards.

-a

···

On Thu, 7 Dec 2006, Richard Conroy wrote:

I wrote a simple tool to iterate a network to try and find web servers
running on specific ports. We have a lot of devices & software with a
web UI, and I thought that this would be a handy way to find them,
and even tell what they are.

I thought this would be a handy coding project too, and a good way to
cut my teeth on Ruby threads, and build up some usage with
Mechanize.

BTW I am running this on *Windows XP*.

However my code is quite obviously executing this serially. Is there
something obviously wrong with my code below? (results after
code snippet). I am aware this could make my machine choke from
thread overkill, but I wanted to get it working in parallel first.
Perhaps Mechanize instances have some shared elements?

============================

--
if you want others to be happy, practice compassion.
if you want to be happy, practice compassion. -- the dalai lama

Hi, Richard,

         Actually in Ruby, only by the method ".new" we can make threads run in parallel rather than serially. And I think it can meet your requirement, pls see the programs<multithreads_ProbingHttp.rb> I post at the end of this mail, plus the running results.
          Firstly pls notice the following points: 1) The method ".new" means "Creates and runs a new thread to execute the instructions given in block". 2) The method ".join" means "The calling thread will suspend execution and run the called thread. Does not return until the called thread exits or until limit seconds have passed".
         ".new" doesn't only mean "creates", it means both "creates" and "runs". So ".new" can make son threads run in parallel. And ".join" needs to wait for the exit of the called thread, so it gives you the illusion that the theads are running serially, but in fact ".join" just wraps up the threads. It is inappropriate for us to say whether ".join" is making threads run in parallel or serially. We can say ".join" is serially waiting for the exits of threads that might be already running in parallel. :slight_smile: :slight_smile:
//////////////////////Programs multithreads_ProbingHttp.rb/////////////////////////////////////////////////////////////////////////////////////////////////////////
require 'mechanize'
threads = Array.new
ports = [80,8080];
puts "sweep of 192.168.1.* segment http ports.\nwaiting for results:"
(40..51).each do |ip|
    add="http://192.168.1."+ip.to_s;
    ports.each do |p|
         addr=add+":"+p.to_s;
         threads << Thread.new(addr){|addr|
             agent=WWW::Mechanize.new;
             agent.user_agent_alias = "Windows Mozilla";
             begin
                 page = agent.get(addr);
                 puts addr+" - "+page.title;
             rescue
                 puts addr+" - NOTHING";
             end
         }
    end
end
sleep 10;
# If the main thread exits earlier than the newly created threads, we might not see the results output by the newly created threads. So we let the main thread wait for
#some seconds (say, 10s), in order that the newly created threads can end firstly.
#I don't use ".join" method here.
puts "finished."
//////////////////////////Running Results, which can state the son thread were running in parallel rather that serially://////////
D:\BasicPjt>ruby multithreads_ProbingHttp.rb
sweep of 192.168.1.* segment http ports.
waiting for results:
http://192.168.1.51:80 - shiwei apache homepage.
http://192.168.1.44:80 - under construction
http://192.168.1.48:8080 - ScrumWorks
http://192.168.1.48:80 - Test Page for Apache Installation
http://192.168.1.43:80 - NOTHING
http://192.168.1.41:80 - NOTHING
http://192.168.1.47:80 - NOTHING
http://192.168.1.40:80 - NOTHING
http://192.168.1.41:8080 - NOTHING
http://192.168.1.40:8080 - NOTHING
http://192.168.1.49:80 - NOTHING
http://192.168.1.47:8080 - NOTHING
http://192.168.1.44:8080 - NOTHING
http://192.168.1.51:8080 - NOTHING
http://192.168.1.50:8080 - NOTHING
http://192.168.1.50:80 - NOTHING
http://192.168.1.49:8080 - NOTHING
http://192.168.1.43:8080 - NOTHING
finished.
D:\BasicPjt>

Shiwei,
The views expressed are my own and not necessarily those of Oracle and its affiliates.

Richard Conroy wrote:

···

I wrote a simple tool to iterate a network to try and find web servers
running on specific ports. We have a lot of devices & software with a
web UI, and I thought that this would be a handy way to find them,
and even tell what they are.

I thought this would be a handy coding project too, and a good way to
cut my teeth on Ruby threads, and build up some usage with
Mechanize.

BTW I am running this on *Windows XP*.

However my code is quite obviously executing this serially. Is there
something obviously wrong with my code below? (results after
code snippet). I am aware this could make my machine choke from
thread overkill, but I wanted to get it working in parallel first.
Perhaps Mechanize instances have some shared elements?

============================
require 'mechanize'

threads = Array.new
puts "sweep of 153.200.72.* segment http ports"
(1..254).each do |ran|
    threads << Thread.new(ran) { |r|
        agent = WWW::Mechanize.new
        agent.user_agent_alias = 'Windows Mozilla'
        ports = [80,8080]
        ports.each do |p|
            begin
                page = agent.get("http://153.200.72.“+r.to_s+”:"+p.to_s)
                puts "153.200.72."+r.to_s+":"+p.to_s+" - "+page.title
            rescue
                puts "153.200.72."+r.to_s+":"+p.to_s+" - NOTHING"
            end
        end
    }
    threads.each { |aThread| aThread.join }
end

153.200.72.10:80 - NOTHING
153.200.72.10:8080 - NOTHING
153.200.72.11:80 - NOTHING
153.200.72.11:8080 - NOTHING
153.200.72.12:80 - NOTHING
153.200.72.12:8080 - NOTHING
153.200.72.13:80 - NOTHING
153.200.72.13:8080 - NOTHING
153.200.72.14:80 - NOTHING
153.200.72.14:8080 - NOTHING
153.200.72.15:80 - NOTHING
153.200.72.15:8080 - NOTHING
153.200.72.16:80 - NOTHING
153.200.72.16:8080 - NOTHING
153.200.72.17:80 - NOTHING
153.200.72.17:8080 - NOTHING

threads.each { |aThread| aThread.join } # THIS MUST BE OUTSIDE THE LOOP!

<homer>*d'Oh</homer>

fyi. starting a thread, and then immediately joining it is the same as not
using a thread at all!

Ah yes, cutting & pasting a line too high ....

another fyi - threads are io (even socket io) is a dealy combination on
windows. run this on linux/mac if possible.

Has to be windows, but this isn't mission critical code - just a
development tool that may eventually post the results to a wiki or
something. I can break this
up a bit so it doesn't kill my laptop later.

regards.

Thanks. I knew it had to a WTF.

···

On 12/7/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:

This is what I noticed. I join up 5 threads at a time, the output jumps
up in batches of 5. This does slow down the algorithm, especially
if there is a lot of positive results - most of these threads are
waiting for the http
connection to timeout.

But I run this thing at night anyway.

As an aside, I have had difficulty getting more than ~ 5 joined threads
to work at all in windows.

···

On 12/12/06, Shiwei Zhang <shiwei.zhang@oracle.com> wrote:

Hi, Richard,

         Actually in Ruby, only by the method ".new" we can make threads
run in parallel rather than serially. And I think it can meet your
requirement, pls see the programs<multithreads_ProbingHttp.rb> I post at
the end of this mail, plus the running results.
          Firstly pls notice the following points: 1) The method ".new"
means "Creates and runs a new thread to execute the instructions given
in block". 2) The method ".join" means "The calling thread will suspend
execution and run the called thread. Does not return until the called
thread exits or until limit seconds have passed".
         ".new" doesn't only mean "creates", it means both "creates" and
"runs". So ".new" can make son threads run in parallel. And ".join"
needs to wait for the exit of the called thread, so it gives you the
illusion that the theads are running serially, but in fact ".join" just
wraps up the threads. It is inappropriate for us to say whether ".join"
is making threads run in parallel or serially. We can say ".join" is
serially waiting for the exits of threads that might be already running
in parallel. :slight_smile: :slight_smile: