Hpricot - best way to parse based on comments

Jerome1 · 20 November 2006 22:52

I am trying to parse some files that contain comments like this:

images, text, etc...

Interesting text of site here.

</body>
</html>

I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Thanks for any ideas!

- Jerome

···

--
Posted via http://www.ruby-forum.com/.

Keith_Fahlgren · 20 November 2006 23:50

The XPath comment() selector will select all comments:

For example (xpath after -m flag):
keith@devel ~ $ xml sel -t -m '//comment()' -v '.' -n simple.xml
one comment
two comment

keith@devel ~ $ cat simple.xml
<simple>
  
  <foo/>
  
  <bar/>
</simple>

HTH,
Keith

···

On 11/20/06, Jerome --- <jerome@tut0r.com> wrote:

I am trying to parse some files that contain comments like this:
...
I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Paul_Lutus · 24 November 2006 19:47

Jerome --- wrote:

I am trying to parse some files that contain comments like this:

<html>
<body>



images, text, etc...



Interesting text of site here.

</body>
</html>

I am wondering how to go about extracting the data within the comments
block using Hpricot.

The best and easiest way to parse this file using Hpricot with your required
specification ... is not to use Hpricot.

start_mark = ""
end_mark = ""

data = File.read(page_path)

output = data.scan(%r{#{start_mark}(.*?)#{end_mark}}m)

All done, finished, no poring over documentation, no considering rewriting
the library to get it to do what you actually want, done.

By the way. Did I mention that inserting new data into the same page
structure is about the same level of difficulty?

···

--
Paul Lutus
http://www.arachnoid.com

Topic		Replies	Views
Need help with Hpricot ruby-talk	2	109	9 October 2008
Hpricot search help ruby-talk	2	77	17 March 2008
Hpricot problem ruby-talk	10	116	18 December 2006
Hpricot html parsing ruby-talk	12	130	18 December 2006
Need help parsing HTML with Hpricot ruby-talk	3	146	25 October 2007

Hpricot - best way to parse based on comments

Related topics