Hpricot - best way to parse based on comments

I am trying to parse some files that contain comments like this:

<html>
<body>

<!-- BEGIN ad_content -->

images, text, etc...

<!-- END ad_content -->

Interesting text of site here.

</body>
</html>

I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Thanks for any ideas!

- Jerome

···

--
Posted via http://www.ruby-forum.com/.

The XPath comment() selector will select all comments:

For example (xpath after -m flag):
keith@devel ~ $ xml sel -t -m '//comment()' -v '.' -n simple.xml
one comment
two comment

keith@devel ~ $ cat simple.xml
<simple>
  <!-- one comment -->
  <foo/>
  <!-- two comment -->
  <bar/>
</simple>

HTH,
Keith

···

On 11/20/06, Jerome --- <jerome@tut0r.com> wrote:

I am trying to parse some files that contain comments like this:
...
I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Jerome --- wrote:

I am trying to parse some files that contain comments like this:

<html>
<body>

<!-- BEGIN ad_content -->

images, text, etc...

<!-- END ad_content -->

Interesting text of site here.

</body>
</html>

I am wondering how to go about extracting the data within the comments
block using Hpricot.

The best and easiest way to parse this file using Hpricot with your required
specification ... is not to use Hpricot.

start_mark = "<!-- BEGIN ad_content -->"
end_mark = "<!-- END ad_content -->"

data = File.read(page_path)

output = data.scan(%r{#{start_mark}(.*?)#{end_mark}}m)

All done, finished, no poring over documentation, no considering rewriting
the library to get it to do what you actually want, done.

By the way. Did I mention that inserting new data into the same page
structure is about the same level of difficulty?

···

--
Paul Lutus
http://www.arachnoid.com