I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.
How good is this for scraping google search results (web)? I have an
application I'm working on that needs this functionality. I've
currently tried writing a C# library to do this, but it worked
terribly. The google web service API is not going to work because I
need to not be limited by the number of searches I can do in a day
(there could be thousands of users doing searches).
···
On 8/19/05, Leonard Richardson <leonardr@segfault.org> wrote:
I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.
I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.
Please let me know what you think of this library.
I don't have time to play with it today, but I'm very glad to see this, as
I've heard very good things about Beautiful Soup, and I don't think anyone
else would be more qualified to port this than yourself.
Welcome to the Ruby community and I hope we see more good work from you!
Hmm. Being new to these packages, I wouldn't understand
exactly what the differences are, but wouldn't you be better
off building it on top of htmltools?
Or at lest using their SGML parser ( which I understand is a port
or the Python SGML parser ).
The reply-to email address is olczyk2002@yahoo.com.
This is an address I ignore.
To reply via email, remove 2002 and change yahoo to
interaccess,
···
On 19 Aug 2005 12:38:50 -0700, "Leonard Richardson" <leonardr@segfault.org> wrote:
I've created a Ruby port of Beautiful Soup, my Python module for HTML
screen-scraping. The goal is to make it trivial to get the data you
need out of complex and/or poorly-formed *ML.