I was wondering if there were some way of getting the Hpricot DOM (for
lack of a better term) from a Mechanize page. For example:
agent = WWW:Mechanize.new
page = agent.get(http://www.website.com)
# I am currently doing this
doc = Hpricot(page.body)
# I would like to do this
doc = page.get_hpricot_dom
The idea is that since Mechanize apparently uses Hpricot and it's surely
using it to parse the HTML begotten from the agent.get method, it would be
nice if I didn't have to repeat that work.
Is there a way to get this Hpricot document? ...or am I just totally
wrong about how Mechanize uses Hpricot?
Thank you...
perhaps it's only me , but would you please detail what is it you want
to accomplish? maybe , with an example perhaps ?
···
--
Posted via http://www.ruby-forum.com/.
Just Another Victim wrote:
# I would like to do this
doc = page.get_hpricot_dom
Try page.parser or page.root (they're eqivalent).
Regards,
Matthias
···
--
Posted via http://www.ruby-forum.com/\.
You can get at the Hpricot document by using the "parser" accessor on
WWW::Mechanize::Page. Page also responds to "search", "/", and "at",
which just delegate to the Hpricot document.
So you can just do:
(agent.get('http://tenderlovemaking.com')/'tr').each do |tr|
...
end
···
On Sun, Sep 14, 2008 at 04:03:04AM +0900, Just Another Victim of the Ambient Morality wrote:
I was wondering if there were some way of getting the Hpricot DOM (for
lack of a better term) from a Mechanize page. For example:
agent = WWW:Mechanize.new
page = agent.get(http://www.website.com)
# I am currently doing this
doc = Hpricot(page.body)
# I would like to do this
doc = page.get_hpricot_dom
The idea is that since Mechanize apparently uses Hpricot and it's surely
using it to parse the HTML begotten from the agent.get method, it would be
nice if I didn't have to repeat that work.
Is there a way to get this Hpricot document? ...or am I just totally
wrong about how Mechanize uses Hpricot?
--
Aaron Patterson
http://tenderlovemaking.com/