How come this code doesnt work as designed?

Hi,

I found this web crawler code online using mechanize,

require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get('http://example.com/')

stack = page.links
counter = 0;

out = File.open("out.txt", "w")
while l = stack.pop
  begin
    next unless l.uri.host == agent.history.first.uri.host
    if not agent.visited? l.href
      counter += 1
      out.puts l.href
      stack.push(*(agent.click(l).links))
    end
  rescue
    #puts "Error encountered"
  end
end

puts "Total unique links: " + counter.to_s

So I gave it a try, and although it seemed to be working, I noticed that
the stack size quickly rocketed, and after examining the output, I
noticed that there are several duplicates (For example, one output file
had over 50k urls, but when I removed the duplicates, there was only a
bit over 9k urls). So I modified the code using a Hash to avoid
duplicates (although this design means that I am storing multiple copies
of all the urls), but the same thing happened, so I was wondering if
anyone could figure out what I am doing wrong. Here is the modified
code:

require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get('http://example.com/')

stack = page.links
hash = Hash.new
counter = 0;

out = File.open("out.txt", "w")
while l = stack.pop
  begin
    next unless l.uri.host == agent.history.first.uri.host
    if not agent.visited? l.href
      counter += 1
      out.puts "url:1 " + l.href
                        agent.click(l).links.each do |link|
                          if(hash[link] == nil)
                            hash.store(link,link)
                            stack.push(link)
                          end
                        end
      #stack.push(*(agent.click(l).links))
    end
  rescue
    #puts "Error encountered"
  end
end

puts "Total unique links: " + counter.to_s

Note: I am aware that crawling sites at random is not accepted, and this
script is not intended for that, I am crawling personal sites

···

--
Posted via http://www.ruby-forum.com/.