[mini-ANN] Web scraping article, episode 1

Peter_Szinek3 · 5 February 2007 23:37

Hi all,

Once upon the time I wrote a silly little article on web scraping with Ruby:

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails

The article got very popular somehow, so I have decided to continue with it - and since a lot of people kept me asking for the next installment that I promised at the end of the first part, I would like to announce it also here:

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails-episode1/

The article is quite different from my original plans, (and hence I guess from the expectations) because something happened in between - well, read the article and you will see what

Cheers,
Peter

···

__
http://www.rubyrailways.com

Parker_Thompson · 8 February 2007 07:59

Try Dapper (http://www.dappit.com/\), it may turn your screen scraping
problem into an xml parsing problem.

pt.

···

On 2/5/07, Peter Szinek <peter@rubyrailways.com> wrote:

Hi all,

Once upon the time I wrote a silly little article on web scraping with
Ruby:

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails

The article got very popular somehow, so I have decided to continue with
it - and since a lot of people kept me asking for the next installment
that I promised at the end of the first part, I would like to announce
it also here:

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails-episode1/

The article is quite different from my original plans, (and hence I
guess from the expectations) because something happened in between -
well, read the article and you will see what

Cheers,
Peter

__
http://www.rubyrailways.com

Peter_Szinek3 · 8 February 2007 08:11

Parker Thompson wrote:

Try Dapper (http://www.dappit.com/\), it may turn your screen scraping
problem into an xml parsing problem.

I have been playing around with dapper, and while I liked the idea, and also the GUI (scRUBYt! currently does not have any kind of GUI, so dappit is a clear winner here) I found several problems.

First of all I could not reliably scrape everything I wanted as I wanted (i.e. 100% accuracy, every records found ect) nearly on any page I tried - which is not the case with scRUBYt!. Of course there are bugs and problems and needed enhancements in scRUBYt! too, but I have total control over these (and anybody who is able to hack with Ruby on an intermediate level). Besides this, I have the extractor - I know what's happening all the time. And if this is still not enough I can sprinkle the whole stuff with pure Ruby code.

Then I don't really like the model that your extractor is on the server - what if you would like to scrape confidential data, or you are logging in to sites with passwords, or to your banking account or ...

The idea is really neat and I am sure dapper has a lot of use cases, but it's quite a different product with different philosophy and target audience compared to scRUBYt! Shortly, you should use the right tool for the right job - and for the things I would like to scrape, scRUBYt! is much better suited. I am sure dapper is great for other kind of things, so if you are into those, it's a great tool!

Cheers,
Peter

···

__
http://www.rubyrailways.com
http://scrubyt.org

Adam_Akhtar · 19 August 2008 08:05

many thanks for both of your replies

···

--
Posted via http://www.ruby-forum.com/.

Adam_Akhtar · 19 August 2008 08:06

wooops ignore my last comment, it was to the completely wrong thread

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Screen-scraping ruby-talk	5	52	19 February 2007
Article on screen scraping w HTree+REXML, RubyfulSoup, WWW::Mechanize ruby-talk	2	111	14 June 2006
Visual web scraping with Ruby ruby-talk	1	138	18 April 2006
Scraping ruby-talk	2	100	17 November 2007
Just starting out, where do I go from here? ruby-talk	7	117	22 December 2006

[mini-ANN] Web scraping article, episode 1

Related topics