From a URL to XPath 2.0

Hi,

I am trying to write a small script that allows me to scrape HTML using
XPath 2.0. As much as I enjoyed using hPricot, it's lack of support for
indexed paths has forced me to look to a different tool (I've heard
REXML has the best XPath support). In order to use REXML however, I need
to first convert the HTML to XML and I'm yet to find a good gem / plugin
to do that.

As I mentioned however, my main interest is having index support for
XPath queries against an HTML page arbitrarily pulled from a generated
URL. Anyone know of a good approach to handle this?

Thank you,

Ruby.new(user)

···

--
Posted via http://www.ruby-forum.com/.

Evan Senter wrote:

Hi,

I am trying to write a small script that allows me to scrape HTML using
XPath 2.0. As much as I enjoyed using hPricot, it's lack of support for
indexed paths has forced me to look to a different tool (I've heard
REXML has the best XPath support). In order to use REXML however, I need
to first convert the HTML to XML and I'm yet to find a good gem / plugin
to do that.

As I mentioned however, my main interest is having index support for
XPath queries against an HTML page arbitrarily pulled from a generated
URL. Anyone know of a good approach to handle this?

Thank you,

Ruby.new(user)
--
Posted via http://www.ruby-forum.com/\.

Hi, you might want to try HTML tidy

project : http://tidy.sourceforge.net/
try it online (output XML): HTML Tidy Online

···

--
View this message in context: http://www.nabble.com/From-a-URL-to-XPath-2.0-tp15599428p15604926.html
Sent from the ruby-talk mailing list archive at Nabble.com.