I recently did a small head-to-head with RubyfulSoup, Hpricot, and the up-and-coming (now in CVS, release in a few weeks) libxml-ruby binding to the libxml2 HTML parser. Running against the RubyfulSoup homepage (perhaps ironically, it's pretty badly formed) over 100 iterations, the attached benchmark gave out the following results. Each benchmark is parsing the original HTML and then getting back a specific node set (Hpricot and libxml2 using Xpath, RubyfulSoup using it's own query API):
user system total real
rubyful soup - simple 25.900000 0.710000 26.610000 ( 26.669350)
user system total real
rubyful soup - trickier 26.220000 0.010000 26.230000 ( 26.252975)
user system total real
hpricot - simple xpath 7.930000 0.000000 7.930000 ( 7.950092)
user system total real
hpricot - trickier xpath 8.200000 0.010000 8.210000 ( 8.212230)
user system total real
libxml2 - simple xpath 0.900000 0.000000 0.900000 ( 0.899329)
user system total real
libxml2 - trickier xpath 0.940000 0.000000 0.940000 ( 1.217441)
In terms of preserving the original HTML, I found the libxml2 and Hpricot parsers to be fairly even, with both doing pretty good job of fixing up broken HTML. There were minor differences in the XML produced, and from a (biased, nitpicking) spec point of view I think libxml2's output is slightly more 'proper' (self closing tags, etc). RubyfulSoup on the other hand seemed to have a few inconsistencies - it would occasionally lose tag attributes, and sometimes return varying results to the same query.
As for feature support, well, I don't want to rain on anyone's parade but the libxml HTML parser outputs an XML::Document with which you can transparently use all of libxml2's (many) features ...
I couldn't get XPath functions to work with Hpricot, but then I'm not sure how complete an XPath implementation it's aiming for, and apart from that it seems pretty solid. OTOH RubyfulSoup has no Xpath support at all 
libxml-perfcomp.rb (1.63 KB)
···
On Tue, 21 Nov 2006 22:27:15 -0000, Wes Gamble <weyus@att.net> wrote:
Has anyone done a head to head comparison of Hpricot and Rubyful Soup
(both HTML parsers)?
If so, would you be willing to comment on which one a) is faster for an
average sized HTML page and b) preserves the original HTML better.
--
Ross Bamford - rosco@roscopeco.remove.co.uk