Reading Data from a Website

anon1m0us · 18 January 2007 20:25

Hi;
No clue how to do this. My program to go to a website and read data and
process it. Don't kow where to even begin! How do I go to a website in
RUBY? How to I start reading the data?

Andy_Lester · 18 January 2007 20:32

Look at WWW::Mechanize.

···

On Jan 18, 2007, at 2:25 PM, anon1m0us wrote:

Hi;
No clue how to do this. My program to go to a website and read data and
process it. Don't kow where to even begin! How do I go to a website in
RUBY? How to I start reading the data?

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Peter_Szinek3 · 18 January 2007 21:34

anon1m0us wrote:

Hi;
No clue how to do this. My program to go to a website and read data and
process it. Don't kow where to even begin! How do I go to a website in
RUBY? How to I start reading the data?

You could check out my older (but still fine I guess) article on this:

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails

It would need some polishing and adding HPricot there (working on it actually), but even like this it could provide some help.

btw. I am just releasing (in 2-3-4 something days) a powerful web extraction language written in Ruby. It is based on Mechanize and Hpricot and it really does a lot of heavy lifting (although I may be a little bit biased for obvious reasons - well you will see it yourself next week)

Peter

···

__
http://www.rubyrailways.com

Martin_Boese · 19 January 2007 13:01

It's very easy, just do:

require 'net/http'
website = Net::HTTP.get 'www.yahoo.com', '/'

Now you have the yahoo.com startpage sourcode in website. To see it:

puts website

The Net::HTTP documentation has more examples:

http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html

Martin

···

On Thursday 18 January 2007 20:25, anon1m0us wrote:

Hi;
No clue how to do this. My program to go to a website and read data and
process it. Don't kow where to even begin! How do I go to a website in
RUBY? How to I start reading the data?

alex_f_il · 21 January 2007 01:40

You can try SWExplorerAutomation SWEA (http://webiussoft.com)

anon1m0us wrote:

···

Hi;
No clue how to do this. My program to go to a website and read data and
process it. Don't kow where to even begin! How do I go to a website in
RUBY? How to I start reading the data?

anon1m0us · 18 January 2007 20:45

Is that a website? Where do I see that stuff?
In addition;
i need to view the Source of the website since the information are
contained in tables on the website.

Andy Lester wrote:

···

On Jan 18, 2007, at 2:25 PM, anon1m0us wrote:

> Hi;
> No clue how to do this. My program to go to a website and read data
> and
> process it. Don't kow where to even begin! How do I go to a website in
> RUBY? How to I start reading the data?

Look at WWW::Mechanize.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Guest15 · 18 January 2007 20:52

> Hi;
> No clue how to do this. My program to go to a website and read data
> and
> process it. Don't kow where to even begin! How do I go to a website in
> RUBY? How to I start reading the data?

Look at WWW::Mechanize.

Or Hpricot ...

···

On 1/18/07, Andy Lester <andy@petdance.com> wrote:

On Jan 18, 2007, at 2:25 PM, anon1m0us wrote:

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

--
thanks,
-pate
-------------------------

Gavin_Baker · 19 January 2007 14:56

anon1m0us wrote:

Hi;
No clue how to do this. My program to go to a website and read data and
process it. Don't kow where to even begin! How do I go to a website in
RUBY? How to I start reading the data?

<snip>

btw. I am just releasing (in 2-3-4 something days) a powerful web extraction language written in Ruby. It is based on Mechanize and Hpricot and it really does a lot of heavy lifting (although I may be a little bit biased for obvious reasons - well you will see it yourself next week)

After finding your article on screen scraping *very* useful, I'm really looking forward to this!

Gav

···

On 18 Jan 2007, at 21:34, Peter Szinek wrote:

Sam_Smoot · 19 January 2007 15:35

Martin Boese wrote:

It's very easy, just do:

> require 'net/http'
> website = Net::HTTP.get 'www.yahoo.com', '/'

Now you have the yahoo.com startpage sourcode in website. To see it:

> puts website

The Net::HTTP documentation has more examples:

http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html

Martin

Great example, but because I'm lazy, I prefer open-uri:

require 'open-uri'
puts open('http://www.yahoo.com').read

Probably better to get familiar with Net::HTTP, but when that gets
old...

Andy_Lester · 18 January 2007 20:54

Look at WWW::Mechanize.

Or Hpricot ...

WWW::Mechanize is a wrapper around Hpricot, just as the Perl WWW::Mechanize is a wrapper around LWP. It handles lots of the drudgery.

···

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Peter_Szinek3 · 19 January 2007 15:07

Gavin Baker wrote:

After finding your article on screen scraping *very* useful, I'm really looking forward to this!

I am happy to hear this... Web scraping can be very-very-very tedious, (even with a superb tool like scRUBYt! :-)) so I will need a lot of users to try it on a lot of pages to help find and report the problems and come out with a really stable system. On the pages I am testing it works perfectly (and it already has a decent feature set), however, so far nearly always when I went to a previously unknown page there were some problems...

However, as you will see it will worth the time to report problems etc. because in the case of complex scenarios the solution will be much-much faster and robust than with a hand-coded stuff...

Back to coding

Cheers,
Peter

···

On 18 Jan 2007, at 21:34, Peter Szinek wrote:

__
http://www.rubyrailways.com

Topic		Replies	Views
Page reader ruby-talk	2	75	26 July 2007
How extract data from a web site? ruby-talk	7	102	17 April 2006
Reading website ruby-talk	11	73	18 August 2006
Scraping websites ruby-talk	11	83	28 March 2006
Simple Ruby Project ruby-talk	4	71	24 November 2009

Reading Data from a Website

Related topics