Rubyful Soup v0.8

Leonard_Richardson1 · 19 August 2005 19:41

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
http://www.crummy.com/software/RubyfulSoup/

Please let me know what you think of this library.

Leonard

Josh_Charles · 19 August 2005 19:48

How good is this for scraping google search results (web)? I have an
application I'm working on that needs this functionality. I've
currently tried writing a C# library to do this, but it worked
terribly. The google web service API is not going to work because I
need to not be limited by the number of searches I can do in a day
(there could be thousands of users doing searches).

···

On 8/19/05, Leonard Richardson <leonardr@segfault.org> wrote:

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
Rubyful Soup: "The brush has got entangled in it!"

Please let me know what you think of this library.

Leonard

Ryan_Leavengood2 · 19 August 2005 21:39

Leonard Richardson said:

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
Rubyful Soup: "The brush has got entangled in it!"

Please let me know what you think of this library.

I don't have time to play with it today, but I'm very glad to see this, as
I've heard very good things about Beautiful Soup, and I don't think anyone
else would be more qualified to port this than yourself.

Welcome to the Ruby community and I hope we see more good work from you!

Ryan

TLOlczyk · 20 August 2005 11:46

Hmm. Being new to these packages, I wouldn't understand
exactly what the differences are, but wouldn't you be better
off building it on top of htmltools?

Or at lest using their SGML parser ( which I understand is a port
or the Python SGML parser ).

The reply-to email address is olczyk2002@yahoo.com.
This is an address I ignore.
To reply via email, remove 2002 and change yahoo to
interaccess,

···

On 19 Aug 2005 12:38:50 -0700, "Leonard Richardson" <leonardr@segfault.org> wrote:

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
Rubyful Soup: "The brush has got entangled in it!"

Please let me know what you think of this library.

Leonard

**
Thaddeus L. Olczyk, PhD

There is a difference between
*thinking* you know something,
and *knowing* you know something.

James_Edward_Gray_II · 19 August 2005 20:55

I would expect Google to have fairly good pages, but that's a random guess, not me speaking from experience.

However, be careful to examine the legal issues here. I seriously doubt this is allowed.

James Edward Gray II

···

On Aug 19, 2005, at 2:48 PM, Josh Charles wrote:

How good is this for scraping google search results (web)?

Topic		Replies	Views
Scraping websites ruby-talk	11	83	28 March 2006
Seeking Contributions for O'Reilly's Ruby Cookbook ruby-talk	21	193	24 October 2005
Scraping ruby-talk	2	100	17 November 2007
Article on screen scraping w HTree+REXML, RubyfulSoup, WWW::Mechanize ruby-talk	2	111	14 June 2006
Waiter, there's a noob in my soup! ruby-talk	14	141	29 March 2006

Rubyful Soup v0.8

Related topics