Trying to download files using WWW::Mechanize

Hi all,

Ruby 1.8.4
www-mechanize 0.4.5

I've got a web page. On that page are a series of links to .csv files.
I need a way to download a particular csv file. This file can either be
loaded into memory or onto the local filesystem - either way is fine.

I've gotten this far:

require 'mechanize'
include WWW

mech = Mechanize.new
agent = mech.get(url)

page.links.each{ |link|
   p link
}

With that, I can see the links to the .csv files, which look like this
on inspection:

#<WWW::link:0x33945a0 @node=<a href='foo_May_29_2006.csv'> ... </>,
@text="foo_May_29_..>", @href="foo_May_29_2006.csv">
#<WWW::link:0x3393898 @node=<a href='foo_May_30_2006.csv'> ... </>,
@text="foo_May_30_..>", @href="foo_May_30_2006.csv">

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

Thanks,

Dan

This communication is the property of Qwest and may contain confidential or
privileged information. Unauthorized use of this communication is strictly
prohibited and may be unlawful. If you have received this communication
in error, please immediately notify the sender by reply e-mail and destroy
all copies of the communication and any attachments.

Try something like this:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get(ARGV[0])

bodies =
page.links.each { |link|
  puts "Clicking '#{link.text}'"
  bodies << agent.click(link).body
}

p bodies

Or even shorter:

agent = WWW::Mechanize.new

bodies =
agent.get(ARGV[0]).links.each { |link|
  bodies << agent.click(link).body
}

p bodies

--Aaron

···

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:

I've gotten this far:

require 'mechanize'
include WWW

mech = Mechanize.new
agent = mech.get(url)

page.links.each{ |link|
   p link
}

Berger, Daniel wrote:

> ...

With that, I can see the links to the .csv files, which look like this
on inspection:

#<WWW::link:0x33945a0 @node=<a href='foo_May_29_2006.csv'> ... </>,
@text="foo_May_29_..>", @href="foo_May_29_2006.csv">
#<WWW::link:0x3393898 @node=<a href='foo_May_30_2006.csv'> ... </>,
@text="foo_May_30_..>", @href="foo_May_30_2006.csv">

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

I'm thinking you need to grab the href value, glom it onto the base URL, and use that with, say, open-uri, to fetch it.

page.links.each{ |link|
    if link.href =~ /\.csv$/
      full_url = url + link.href
      # Go read that URL ...
    end
}

···

--
James Britt

"In Ruby, no one cares who your parents were, all they care
  about is if you know what you are talking about."
   - Logan Capaldo

Hey Dan.

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

I misread the question the first time, so I'll try again! The text
method helps you match the text displayed. For example, a url that
looks like this:

<a href="http://google.com">Hello World!</a>

Would be found like this:

  page.links.text('Hello World!').first

Mechanize returns an array because there could be multiple links that
have that text. You can also use a regular expression like this:

  page.links.text(/Hello World!/).first

Or, say you need to find all files whose 'href' ends in '.csv', you
could do this:

  page.links.href(/\.csv$/).each { |link|
    puts agent.click(link).body
  }

Hope this helps!

--Aaron

···

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote: