I’ve written an HTML parser that builds trees from HTML source. After
I wrote it, I discovered REXML, which does the same thing for XML.
Then I made an add-on that uses REXML’s XPath support to do XPath
queries on the resultant HTML tree. In my test version, these queries
return REXML tree elements, rather than my HTML tree elements.
Having two very similar tree structures (HTML and REXML) smells, to
me. The fact that they have somewhat different APIs confuses even me.
What I’m wondering (and would like your input on):
Should I just require REXML and not bother with my own tree
elements? I could, after all, just build a REXML document instead.
This has the disadvantage of requiring yet another package to be
If I don’t build an REXML tree, I could still return my own
elements from XPath queries. That is, I could use REXML transparently
and not expose the user to any of REXML’s elements. Would this be a
preferable way to provide XPath support?
GPG key ID: BEEA7EFE