[ANN] WWW::Mechanize 0.6.0 (Rufus)

Aaron_Patterson2 · 6 September 2006 21:00

Hi,

I would like to announce that my Mechpricot pie is done baking and is
ready to eat. The main feature of this release is that Mechanize uses
Hpricot as its internal HTML parser and that you can now treat a page
object returned from mechanize as an Hpricot object. This makes screen
scraping using mechanize much easier.

You can download it through gems:
gem install mechanize -y

or get it here:
http://rubyforge.org/projects/mechanize/

Check out the release notes and changelog for more cool stuff.

--Aaron

James_Britt3 · 6 September 2006 21:30

Aaron Patterson wrote:

Hi,

I would like to announce that my Mechpricot pie is done baking and is
ready to eat. The main feature of this release is that Mechanize uses
Hpricot as its internal HTML parser and that you can now treat a page
object returned from mechanize as an Hpricot object. This makes screen
scraping using mechanize much easier.

Currently, I use mechanize to grab nodes based on a watch list. These are REXML Element nodes, and code that works with them expects the REXML API.

Has this changed?

···

--
James Britt

"I can see them saying something like 'OMG Three Wizards Awesome'"
- billinboston, on reddit.com

Mat_Schaffer · 7 September 2006 20:20

I'm noticing some issues with the changed behavior of
WWW::Mechanize::Page#links.text

I used to just be able to grab a link using
page.links.text(/pattern/).first and it would work even if the <a> had
children. It doesn't seem to work anymore. I'm working on pinning
the issue down, but you likely have more insight. Is there a new way
to do this that's more hpricot friendly?

Hpricot integration seems like a fine idea though, glad to see you
making use of it. Thanks for all the hard work.

···

On 9/6/06, Aaron Patterson <aaron_patterson@speakeasy.net> wrote:

Hi,

I would like to announce that my Mechpricot pie is done baking and is
ready to eat. The main feature of this release is that Mechanize uses
Hpricot as its internal HTML parser and that you can now treat a page
object returned from mechanize as an Hpricot object. This makes screen
scraping using mechanize much easier.

You can download it through gems:
gem install mechanize -y

or get it here:
http://rubyforge.org/projects/mechanize/

Check out the release notes and changelog for more cool stuff.

--Aaron

Aaron_Patterson2 · 6 September 2006 21:52

Yes. You will get back Hpricot nodes in 0.6.0. I plan on having a pluggable
parser in 0.6.1 that will return REXML nodes for you. Hpricot seems to
support some methods similar to REXML, so depending on how complicated your
logic is, you may be able to use Hpricot just fine. Otherwise, don't
upgrade until 0.6.1.

--Aaron

···

On Thu, Sep 07, 2006 at 06:30:10AM +0900, James Britt wrote:

Aaron Patterson wrote:
>Hi,
>
>I would like to announce that my Mechpricot pie is done baking and is
>ready to eat. The main feature of this release is that Mechanize uses
>Hpricot as its internal HTML parser and that you can now treat a page
>object returned from mechanize as an Hpricot object. This makes screen
>scraping using mechanize much easier.

Currently, I use mechanize to grab nodes based on a watch list. These
are REXML Element nodes, and code that works with them expects the REXML
API.

Has this changed?

Aaron_Patterson2 · 7 September 2006 21:15

I'm noticing some issues with the changed behavior of
WWW::Mechanize::Page#links.text

I used to just be able to grab a link using
page.links.text(/pattern/).first and it would work even if the <a> had
children. It doesn't seem to work anymore. I'm working on pinning
the issue down, but you likely have more insight. Is there a new way
to do this that's more hpricot friendly?

This may be a bug in hpricot. That functionality should have remained
the same. The only difference is the parser being used. Could you possibly
send sample code or sample html to one of the mechanize mailing lists:

http://rubyforge.org/mail/?group_id=1453

I don't want to clutter ruby-talk with mechanize support stuff.

Hpricot integration seems like a fine idea though, glad to see you
making use of it. Thanks for all the hard work.

No problem. Hopefully I can help you out!

--Aaron

···

On Fri, Sep 08, 2006 at 05:20:38AM +0900, Mat Schaffer wrote:

James_Britt3 · 6 September 2006 23:44

Aaron Patterson wrote:

···

On Thu, Sep 07, 2006 at 06:30:10AM +0900, James Britt wrote:

Aaron Patterson wrote:

Hi,

I would like to announce that my Mechpricot pie is done baking and is
ready to eat. The main feature of this release is that Mechanize uses
Hpricot as its internal HTML parser and that you can now treat a page
object returned from mechanize as an Hpricot object. This makes screen
scraping using mechanize much easier.

Currently, I use mechanize to grab nodes based on a watch list. These are REXML Element nodes, and code that works with them expects the REXML API.

Has this changed?

Yes. You will get back Hpricot nodes in 0.6.0. I plan on having a pluggable
parser in 0.6.1 that will return REXML nodes for you. Hpricot seems to
support some methods similar to REXML, so depending on how complicated your
logic is, you may be able to use Hpricot just fine. Otherwise, don't
upgrade until 0.6.1.

Ah, thanks. My code takes these nodes and uses them to instantiate assorted domain objects, using REXML's XPath and element methods to populate interval variables. That might be simple enough to replace with Hpath, but I'll wait to upgrade until I'm sure.

James

Topic		Replies	Views
Mechanize and XPath ruby-talk	2	90	18 February 2009
Fun with WWW::Mechanize ruby-talk	7	68	23 January 2005
How can one get the Hpricot DOM document from Mechanize? ruby-talk	3	132	18 September 2008
[ANN] scRUBYt! - Hpricot and WWW::Mechanize on even more steroids, 0.2.6 released ruby-talk	1	99	28 March 2007
Hpricot? ruby-talk	3	90	1 September 2009

[ANN] WWW::Mechanize 0.6.0 (Rufus)

Related topics