[ANN] Crawler 0.0.1

Hi all,

I recently searched in vain for a web-crawling library in ruby. If
there is one out there already, i’d love to know about it.

In any case, I have written my own, and I hope it proves some use to
somebody besides me.

You can find Crawler at
http://implementality.com/projects/crawler/

This is the first time I have announced a piece of software (however
small) to a public forum. I am sure that Crawler has much room for
improvement. (Starting, perhaps, with its name? Any suggestions?) So
please feel free to provide feedback, criticism, patches, etc.

For the impatient, here is how crawler works:

instantiate crawler with a callback routine

crawler = Crawler.new do | url, page_data |
do_something_with_url()
do_something_with_page_data()
end

crawl to depth 3, invoking the above callback for each page

crawler.crawl (‘http://www.rubycentral.org/’, 3)

happy crawling,

-brian

(ruby-)miner
(ruby-)digger
etc

:wink:

···

Brian Denny (brian@implementality.org) wrote:

This is the first time I have announced a piece of software (however
small) to a public forum. I am sure that Crawler has much room for
improvement. (Starting, perhaps, with its name? Any suggestions?) So
please feel free to provide feedback, criticism, patches, etc.


Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Hi Brian,
There is a module called ‘webfetcher’ in
http://www.ruby-lang.org/en/raa.html

robert_linder_2000@yahoo.com

···

-----Original Message-----
From: Brian Denny [mailto:brian@implementality.org]
Sent: Tuesday, October 22, 2002 11:19 PM
To: ruby-talk ML
Subject: [ANN] Crawler 0.0.1

Hi all,

I recently searched in vain for a web-crawling library in ruby. If
there is one out there already, i’d love to know about it.

In any case, I have written my own, and I hope it proves some use to
somebody besides me.

You can find Crawler at
http://implementality.com/projects/crawler/

This is the first time I have announced a piece of software (however
small) to a public forum. I am sure that Crawler has much room for
improvement. (Starting, perhaps, with its name? Any suggestions?) So
please feel free to provide feedback, criticism, patches, etc.

For the impatient, here is how crawler works:

instantiate crawler with a callback routine

crawler = Crawler.new do | url, page_data |
do_something_with_url()
do_something_with_page_data()
end

crawl to depth 3, invoking the above callback for each page

crawler.crawl (‘http://www.rubycentral.org/’, 3)

happy crawling,

-brian

Hi Brian,
There is a module called ‘webfetcher’ in
http://www.ruby-lang.org/en/raa.html

thanks for the tip, that looks like a good package and i will keep it in
mind for future projects.

my Crawler program is a lot less full-featured, and probably less robust too
(considering i just wrote it). nevertheless i hope that it may be useful
for those with simple webcrawling needs.

-brian