I’ve been looking at REXML, and I really like the architecture: A very
ruby way to do standard XML parsing – it’s DOM and SAX-alike – but
it’s still hard to use as you have to be aware that you’re dealing with
XML. Since all XML happens to be is a serialization format for a tree-
and semi-graph structured data structure, why is there no library that
treats it as such?
I’m thinking that there has to be a much more Ruby way to deal with
tree-structured data – after all, Ruby objects could be seen as a tree
(or at least a graph, a tree being a subset of that) when in core, and
XML even has limited graph-representation support with IDREFs
What I’d like to see is a library (I /am/ working on code for this) that
would allow one to tell the parser what classes represent what tags in
what namespaces, and how to map them, so that all one ever needs to do
is pretend the XML file is a bunch of Ruby objects.
Here’s something more code-like to illustrate my idea:
John Doe Mixer-blender 15.32and
class Invoice
attr_accessor :customer
attr_accessor :items
end
class Item
attr_accessor :description
attr_accessor :price
end
class Price < Number
attr_accessor :currency
attr_accessor :value
end
class Customer
attr_accessor :name
end
and some sort of map of namespace-tag-class triples should instantiate
Invoice, instantiate Customer, instantiate a string as name within
customer, instantiate an Array of items, connect Customer to Invoice and
connect the Array to Invoice.
Any ideas on how to simply express such a map?
I’m planning to implement the XML-specific stuff as a module that would
be mixed into instantiated classes when read from the XML file if it had
not been already. Basically, the class structure becomes a schema (It
would be possible to tell the parser to be lax and either ignore or
connect nodes that were undefined in the map, or to throw an exception)
I think it would also be possible to make a DOM module that would be a
mix-in so that standards-based parsing on arbitrary classes would be
possible.
I’d also love to plot this out so that the whole tree doesn’t have to be
available at once – DOM doesn’t actually require the whole thing to be
in core for most operations, so I know it’s possible. I’d like the
parser to be able to be SAX-alike, just spitting out objects serially,
and if they get garbage-collected, fine [if they have ID attributes,
though, I think I’d either mark how to get them back from the file in
another parse, or store the instantiated object in a hash]. If they
don’t get garbage-collected, then the API would automagically be more
DOM-alike, since all the objects would be in core.
This is probably already more words than the code to implement it would
be, but I think what I have in mind is clear… I’d love to hear what
other rubyists take on the problem-space is.
Ari
P.S. If I’m just a really lazy bum who should accept that dealing with
XML is hard, tell me so. I probably won’t believe you, though.