REXML Performance

I am using the Tree Parser in REXML. While parsing short documents performance is great, performance is significantly impacted when parsing large documents. Specifically, this occurs when the following line is executed:

xmlDoc = Document.new(File.new(fileToParse))

Any ideas?

crazyfishpants wrote:

I am using the Tree Parser in REXML. While parsing short documents
performance is great, performance is significantly impacted when
parsing large documents. Specifically, this occurs when the following
line is executed:

There was a longish thread on this topic earlier this month. The punch
line is here:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/91212

Steve

“crazyfishpants” scott.fisher@trans.ge.com wrote in message news:c6f0d9b9c0e34a53830a96f69b3335ef@localhost.talkaboutprogramming.com

I am using the Tree Parser in REXML. While parsing short documents
performance is great, performance is significantly impacted when parsing
large documents. Specifically, this occurs when the following line is
executed:

xmlDoc = Document.new(File.new(fileToParse))

Any ideas?

Which version of REXML are you using? Recent versions had a speed
degredation bug which has been fixed in the current version.

However, this won’t entirely fix your problem, since processing XML
entirely in Ruby is always going to be relatively slow (compared to
compiled languages, such as C or Java).

You can improve performance a little by using the Pull or SAX2
parsers, but then you lose the tree functions. There is also a new,
incomplete, undocumented light-weight tree API that you can play with
that should be just a little slower than the Pull API (the fastest API
in REXML).

— SER