I try to get"Slang " and "A close companion or comrade." ONLY out of
the following a webpage(part of it) with hpricot. There are so many
javascripts there. I don't think I know path/tag for target.
There's not a whole lot of HTML structure there. If you can
definitively target the <td> with Hpricot, you can use regular
expressions to find the appropriate comments and grab the following
text.
You can get a little more specific with XPath expressions. The
following sample code (requires libxml-ruby) extracts the two values
from your sample code:
On Oct 8, 3:53 pm, Li Chen <chen_...@yahoo.com> wrote:
I try to get"Slang " and "A close companion or comrade." ONLY out of
the following a webpage(part of it) with hpricot. There are so many
javascripts there. I don't think I know path/tag for target.
I aslo search the forum and find an earlier post which helps me get the
job done. The ideas of it are 1) use regular expression to remove
non-convention HMLT stuff such as javascripts. 2) then let hpricot
handle the remaining. It works pretty good for me.
Here is the title and author of that post/reply:
Re: HTML parser Hpricot? and how to get all text
Posted by SpringFlowers AutumnMoon (winterheat) on 03.11.2007 09:10