Trying to download files using WWW::Mechanize

Berger_Daniel1 · 30 May 2006 18:16

Hi all,

Ruby 1.8.4
www-mechanize 0.4.5

I've got a web page. On that page are a series of links to .csv files.
I need a way to download a particular csv file. This file can either be
loaded into memory or onto the local filesystem - either way is fine.

I've gotten this far:

require 'mechanize'
include WWW

mech = Mechanize.new
agent = mech.get(url)

page.links.each{ |link|
p link
}

With that, I can see the links to the .csv files, which look like this
on inspection:

#<WWW:0x33945a0 @node=<a href='foo_May_29_2006.csv'> ... </>,
@text="foo_May_29_..>", @href="foo_May_29_2006.csv">
#<WWW:0x3393898 @node=<a href='foo_May_30_2006.csv'> ... </>,
@text="foo_May_30_..>", @href="foo_May_30_2006.csv">

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

Thanks,

Dan

This communication is the property of Qwest and may contain confidential or
privileged information. Unauthorized use of this communication is strictly
prohibited and may be unlawful. If you have received this communication
in error, please immediately notify the sender by reply e-mail and destroy
all copies of the communication and any attachments.

Aaron_Patterson2 · 30 May 2006 18:39

Try something like this:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get(ARGV[0])

bodies =
page.links.each { |link|
puts "Clicking '#{link.text}'"
bodies << agent.click(link).body
}

p bodies

Or even shorter:

agent = WWW::Mechanize.new

bodies =
agent.get(ARGV[0]).links.each { |link|
bodies << agent.click(link).body
}

p bodies

--Aaron

···

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:

I've gotten this far:

require 'mechanize'
include WWW

mech = Mechanize.new
agent = mech.get(url)

page.links.each{ |link|
p link
}

James_Britt4 · 30 May 2006 18:47

Berger, Daniel wrote:

> ...

With that, I can see the links to the .csv files, which look like this
on inspection:

#<WWW:0x33945a0 @node=<a href='foo_May_29_2006.csv'> ... </>,
@text="foo_May_29_..>", @href="foo_May_29_2006.csv">
#<WWW:0x3393898 @node=<a href='foo_May_30_2006.csv'> ... </>,
@text="foo_May_30_..>", @href="foo_May_30_2006.csv">

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

I'm thinking you need to grab the href value, glom it onto the base URL, and use that with, say, open-uri, to fetch it.

page.links.each{ |link|
    if link.href =~ /\.csv$/
      full_url = url + link.href
      # Go read that URL ...
    end
}

···

--
James Britt

"In Ruby, no one cares who your parents were, all they care
about is if you know what you are talking about."
- Logan Capaldo

Aaron_Patterson2 · 30 May 2006 19:00

Hey Dan.

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

I misread the question the first time, so I'll try again! The text
method helps you match the text displayed. For example, a url that
looks like this:

<a href="http://google.com">Hello World!</a>

Would be found like this:

page.links.text('Hello World!').first

Mechanize returns an array because there could be multiple links that
have that text. You can also use a regular expression like this:

page.links.text(/Hello World!/).first

Or, say you need to find all files whose 'href' ends in '.csv', you
could do this:

  page.links.href(/\.csv$/).each { |link|
    puts agent.click(link).body
  }

Hope this helps!

--Aaron

···

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:

Topic		Replies	Views
Trying to download files using WWW::Mechanize ruby-talk	0	107	30 May 2006
Trying to download files using WWW::Mechanize ruby-talk	0	112	30 May 2006
Use www mechanize issues ruby-talk	1	99	4 August 2006
Downloading a CSV using URI - mechanize ruby-talk	12	164	8 May 2013
Mechanize Download File w/o Loading it All in Memory ruby-talk	2	185	28 August 2009

Trying to download files using WWW::Mechanize

Related topics