Incident to one urgent bug and many good speed/memory improvements,
here's a new release of Hpricot, a little and flexible HTML parser.
gem install hpricot --source http://code.whytheluckystiff.net
Should appear on Rubyforge soon.
The urgent bug was a problem with <script> and <style> tags. If
Hpricot encounters any kind of a tag inside a <script> or <style>
block, it could cause the parser to treat those as real tags (even
tags which are quoted.) As you can imagine, this causes problems
with pages that use document.write, and I can't stand for that.
I'm still doing a lot of work to trim down Hpricot's memory use.
The first part of this is to use an RStruct-based mechanism for
storing elements in memory, a change which is included in this
Okay, nothing more to add, that's it.