Rubyful Soup v0.8

(Leonard Richardson) #1

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
http://www.crummy.com/software/RubyfulSoup/

Please let me know what you think of this library.

Leonard

(Josh Charles) #2

How good is this for scraping google search results (web)? I have an
application I'm working on that needs this functionality. I've
currently tried writing a C# library to do this, but it worked
terribly. The google web service API is not going to work because I
need to not be limited by the number of searches I can do in a day
(there could be thousands of users doing searches).

···

On 8/19/05, Leonard Richardson <leonardr@segfault.org> wrote:

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
http://www.crummy.com/software/RubyfulSoup/

Please let me know what you think of this library.

Leonard

(Ryan Leavengood) #3

Leonard Richardson said:

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
http://www.crummy.com/software/RubyfulSoup/

Please let me know what you think of this library.

I don't have time to play with it today, but I'm very glad to see this, as
I've heard very good things about Beautiful Soup, and I don't think anyone
else would be more qualified to port this than yourself.

Welcome to the Ruby community and I hope we see more good work from you!

Ryan

(TLOlczyk) #4

Hmm. Being new to these packages, I wouldn't understand
exactly what the differences are, but wouldn't you be better
off building it on top of htmltools?

Or at lest using their SGML parser ( which I understand is a port
or the Python SGML parser ).

The reply-to email address is olczyk2002@yahoo.com.
This is an address I ignore.
To reply via email, remove 2002 and change yahoo to
interaccess,

···

On 19 Aug 2005 12:38:50 -0700, "Leonard Richardson" <leonardr@segfault.org> wrote:

I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.

A beta release is available at:
http://www.crummy.com/software/RubyfulSoup/

Please let me know what you think of this library.

Leonard

**
Thaddeus L. Olczyk, PhD

There is a difference between
*thinking* you know something,
and *knowing* you know something.

(James Edward Gray II) #5

I would expect Google to have fairly good pages, but that's a random guess, not me speaking from experience.

However, be careful to examine the legal issues here. I seriously doubt this is allowed.

James Edward Gray II

···

On Aug 19, 2005, at 2:48 PM, Josh Charles wrote:

How good is this for scraping google search results (web)?