John Carter wrote:
Was it easier? Fewer lines of Code? (by what ratio)
Easier in most ways, and easier overall. The logic is different, so if you're used to doing a transformation in XSLT (am ostensibly functional language) and then try to do the same thing in REXML, you need to shift your perspective.
My impression is XSLT is excellent at doing small pattern match and templating tasks, and just plain lousy at doing substantial logic.
Logic in XSLT requires an appreciation of functional programming. XSLT is really quite good at complex matching and templating, especially if you need to grab and match stuff from all over the document, or when you are not quite sure where something will be.
But for highly regular data sources then it can be overkill.
This tasks seems to be substantially string manipulation.
How about speed? The XSLT is chewing on 14500 lines of XML in (almost) too long a time.
What XSLT engine are you using?
xalan. It's a Java implementation.
Oh, sorry, I thought you had tried this in Ruby + XSLT.
I've found that a big bottleneck can be having to load a large document into memory for processing. Using a stream or pull parser alleviates much of that, but that may not be an option (depends on the sort of document and the nature of the transformation).
Does I have used REXML before, I can't remember whether it had a pull parser or not.
Yes, it does.
What I need is the ability to rapidly pull the XML document into native Array and Hash objects which would be _much_ smaller than the corresponding XML.
I've written a magazine article describing how to do XML transformations with REXML's pull parser, but it is currently in editorial limbo.
If the source data has readily identifiable demarcation points (e.g., a particular element or attribute), one can use the pull parser to keep yanking content off the input stream, stashing it in buffer. When the demarcation point is encountered, the buffer can be processed using the REXML DOM and XPath, saved off someplace, and cleared for the next round.
Better, if you can do this, is to keep pulling content and processing it right away, based on the current element/attribute values, avoiding the intermediate DOM objects. This typically requires your code to track more state i order to know what to do ant any given point in the process.
James
···
On Tue, 21 Jun 2005, James Britt wrote:
--
http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys