How can one get the Hpricot DOM document from Mechanize?

Just_Another_Victim1 · 13 September 2008 19:03

I was wondering if there were some way of getting the Hpricot DOM (for
lack of a better term) from a Mechanize page. For example:

agent = WWW:Mechanize.new
page = agent.get(http://www.website.com)

# I am currently doing this
doc = Hpricot(page.body)

# I would like to do this
doc = page.get_hpricot_dom

    The idea is that since Mechanize apparently uses Hpricot and it's surely
using it to parse the HTML begotten from the agent.get method, it would be
nice if I didn't have to repeat that work.
    Is there a way to get this Hpricot document? ...or am I just totally
wrong about how Mechanize uses Hpricot?
    Thank you...

Lex_Williams · 13 September 2008 20:07

perhaps it's only me , but would you please detail what is it you want
to accomplish? maybe , with an example perhaps ?

···

--
Posted via http://www.ruby-forum.com/.

Matthias_Reitinger2 · 13 September 2008 20:16

Just Another Victim wrote:

# I would like to do this
doc = page.get_hpricot_dom

Try page.parser or page.root (they're eqivalent).

Regards,
Matthias

···

--
Posted via http://www.ruby-forum.com/\.

Aaron_Patterson1 · 18 September 2008 04:27

You can get at the Hpricot document by using the "parser" accessor on
WWW::Mechanize::Page. Page also responds to "search", "/", and "at",
which just delegate to the Hpricot document.

So you can just do:

(agent.get('http://tenderlovemaking.com')/'tr').each do |tr|
...
end

···

On Sun, Sep 14, 2008 at 04:03:04AM +0900, Just Another Victim of the Ambient Morality wrote:

    I was wondering if there were some way of getting the Hpricot DOM (for
lack of a better term) from a Mechanize page. For example:

agent = WWW:Mechanize.new
page = agent.get(http://www.website.com)

# I am currently doing this
doc = Hpricot(page.body)

# I would like to do this
doc = page.get_hpricot_dom

    The idea is that since Mechanize apparently uses Hpricot and it's surely
using it to parse the HTML begotten from the agent.get method, it would be
nice if I didn't have to repeat that work.
    Is there a way to get this Hpricot document? ...or am I just totally
wrong about how Mechanize uses Hpricot?

--
Aaron Patterson
http://tenderlovemaking.com/

Topic		Replies	Views
[ANN] WWW::Mechanize 0.6.0 (Rufus) ruby-talk	5	187	7 September 2006
Mechanize and XPath ruby-talk	2	92	18 February 2009
Scraping with Nokogiri while using Mechanize ruby-talk	2	118	11 March 2011
Hpricot? ruby-talk	3	92	1 September 2009
HTML parser using Hpricot ruby-talk	0	83	8 January 2010

How can one get the Hpricot DOM document from Mechanize?

Related topics