First I'll mention I have used the search function and found some useful
topics, but I still don't really find a solution due to a lack of Ruby
and Hpricot/Xpath knowlegde.
The problem is the following: from http://users.telenet.be/weerstation.drongen/index.htm/Current_Vantage_Pro.htm
I need to scrape the temperature and Today's Rain values (need those for
Engineering Project). With Xpather and Firebug I looked up the Xpath to
the Temperature values:
/html/body/table/tbody/tr[3]/td[2]/font/strong/small/font (as Xpather
says so).
But when I try to print the value in Ruby, I got nil.
Since this returned nil, I decided to look up where I got nil returned.
Apparently /html/body/table/tbody is too far, because /html/body/table
still returns an output and tbody returns nil.
I've read that I should try to rebuild the path now, but I really don't
find a way how to do this. This is only my second serious Ruby script
(only the beginning actually) and the first time I used Hpricot.
I'm looking forward to replies, and I'm sorry to bother you with yet
another Hpricot-nil topic, but I'm kinda hopeless because of my
deadline...
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = "/html/body/table//tr[3]/td[2]/font/strong/small/font"
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = "/html/body/table//tr[3]/td[2]/font/strong/small/font"
-- Posted via http://www.ruby-forum.com/\.
There is more to it than "tbody does not work for hpricot".
When a HTML parser (Firefox and Hpricot in this case) parses a HTML page, it has to build a tree from it (a.k.a. DOM).
The problem is that a lot (most?) of the HTML out there is badly formatted, so the process of DOM building is very ambiguous (what if tags are not nested properly? tags that are never closed? and a lot of other problems) so every parser approaches it a bit differently (that's one reason why you have the 'works in IE but not in FF' kind of problems), and e.g. Firefox even makes some efforts to make the parsed HTML standards compliant - for example inserting a tbody tag after a table tag if it's missing.
However, this is but only very small difference between how Hpricot and Firefox parses the HTML/builds the DOM tree (on which XPaths are evaluated) - Hpricot tries to be as close to FF as possible, but this doesn't always happen (though _why said he considers these cases bugs).
Bottom line: you can't expect that XPath yanked from FireBug will work with Hpricot/Mechanize (though it mostly does, and adding a tbody increases your chances even further).
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = "/html/body/table//tr[3]/td[2]/font/strong/small/font"
I'll try it in a minute, thank you for the answer.
@Peter, thank you for the very complete explanation.