I thought I'd put this out to see if there's any interest.
Recently I wanted to do some XML reading with Ruby, but looking
(not deeply) at REXML and other packages like xmlcodec, I couldn't
find anything that seemed to fit the way I thought about things,
so I [of course... (:-)] wrote a little wrapper to REXML that
fit me a little better.
First, I wanted to have a 'stream' parser, rather than reading the
whole tree into memory and then working on it. The documents I was
interested in (XML representation of midifiles) are not very deep,
but can get very lengthy, and the processing I wanted to do was
mostly sequential.
However, the stream parsers I've seen -- including REXML::StreamListener --
simply pass the pieces of the document in turn to the app, without any
real notion of the tree, so the app has to keep track of all that itself.
In other languages, protocols, and situations (strarting with IFF on
the Amiga, I guess!) I've had success with what then would have been
a "table driven" scheme. Now, it's more a "linked object" approach:
each node of the Document Model gets a node that specifies what is to
be done when an element that it represents is encountered, and also has
a list (the 'table') of the possible subordinate nodes. You create
and develop these nodes before reading the document, then with a call to
the stream reader all the appropriate actions get taken as needed.
With Ruby it's a snap to create such a node structure. I just formalised
it a bit and provided an 'XMLStreamListener' class to extend REXML's
basic version, which keeps track of the node structure and dispatches
to the appropriate current one. The 'XMLSpec' nodes themselves have
methods to handle start, end, and empty tags and of course enclosed text.
So if anyone is interested in digging deeper, I've provided a web
page (with the module, example use, and downloadable archives) at
http://jwgibbs.cchem.berkeley.edu/pete/xmlstreamin/
Cheers,
-- Pete --