Correcting HTML in Ruby

Hello,

I need to get content out of files which have incorrect HTML.
Is there any library that does that?

I have been looking for a HTML->XML library but all attempts I have made
have been futile.

Thanks for your time,

Roland

···

--
Posted via http://www.ruby-forum.com/.

It's not exactly what you're looking for but the W3C has a tool called
HTML Tidy [1] which may be of help. It can fix a lot of brain damaged
HTML, and even does wonders for the horrible HTML generated by MS
Word.

[1] Clean up your Web pages with HTML TIDY

···

On 6/21/06, Roland Mai <roland.mai@gmail.com> wrote:

I need to get content out of files which have incorrect HTML.
Is there any library that does that?

There was an article on doing something like this posted to the list a short while back:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/197214

It seems to give a good overview of a few different solutions.

best of luck,
matthew smillie.

···

On Jun 21, 2006, at 16:53, Roland Mai wrote:

Hello,

I need to get content out of files which have incorrect HTML.
Is there any library that does that?

I have been looking for a HTML->XML library but all attempts I have made
have been futile.