Testing speed of xml parsing in MRI and JRuby

FYI:

I'm hoping to use Hpricot for general XML processing instead of Rexml or Libxml in some projects and I wanted to find out the speeds of different XML parsers in MRI and JRuby.

I was very impressed by how much faster JRuby is when running in Java 1.6 than in 1.5. In Java 1.6 Hpricot in JRuby was only 10% slower than in MRI.

So far I've only got one test parsing a 100k xml file and counting a certain type of element. I'm planning to add more tests that cover the kind of processing I need to do.

This is the test:

Do this 100 times:
   - parse a 100k XML file and count the 466 leaf nodes

The results shown below are the times after a "rehearsal". The times for JRuby are faster when the JVM has been "warmed-up". The rehearsal has no effect on the MRI timings.

Platform and method total time

···

-----------------------------------------------------------
JRuby (Java 1.6.0) jdom_document_builder 0.363
MRI: libxml 0.389
JRuby (Java 1.6.0 server) jdom_document_builder 0.412
JRuby (server) jdom_document_builder 0.617
JRuby: jdom_document_builder 1.451
MRI: hpricot 2.056
JRuby (Java 1.6.0 server) hpricot 2.272
JRuby (Java 1.6.0) hpricot 2.273
JRuby (server) hpricot 3.447
JRuby: hpricot 6.198
JRuby (Java 1.6.0 server) rexml 6.251
JRuby (Java 1.6.0) rexml 6.356
MRI: rexml 7.624
JRuby (server) rexml 9.609
JRuby: rexml 12.944

* I'd also like to add tests for Ruby 1.9.

The timings reported here are taken from the second time the 100x loop is run for each platform/library test so the JVM should be warmed up.

Tested on:

   MacBook Pro
   2.33 GHz Intel Core 2 Duo
     4 GB memory
   running MacOS X 10.5.2

   Ruby versions tested:
     MRI: ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]
     JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.5.0_13
     JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.6.0_03 (Soylatte)

   Library versions MRI:
     libxml-ruby 0.5.4
     hpricot 0.6

   Library versions JRuby:
     hpricot 0.6.161

More details are available in thelinks below:

Benchmark code and data checked into subversion here:
https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks

Trac:
http://trac.cosmos.concord.org/projects/browser/trunk/common/ruby/xml_benchmarks

* Hpricot uses code created by Ragel, a state machine compiler that can produce C or Java code, for the initial parsing. The Ragel => Java compiler can only produce one style of code generation and it is not the fastest. The style chosen by Hpricot for generating the C code produces a larger executable and is faster.