XML Namespaces in sgml-parser

I posted this to c.l.ruby the other day. Any comments?

I’ve been working on extending the sgml-parser included with the std
windows distribution and have run into several issues.

  1. Tags with xml namespace qualifiers would get split thusly
    dc:language would end up with tag=“dc” with an attribute of
    “language”

  2. Attributes cannot contain namespace qualifiers - e.g. <l:link
    l:rel="http://purl.org/rss/1.0/modules/proposed/link/#permalink>
    would end up as [{l => “”}, {rel=>“http…”}]

  3. directives such as <![CDATA[ were not being recognized as “Special”

Here’s my diffs: all three items are working for me now.

17c17
< Special = /<![^<>]*>/

···

Special = /<![/

20,21c20,21
< Tagfind = /[a-zA-Z][a-zA-Z0-9.-]/
< Attrfind = Regexp.compile('[\s,]
([a-zA-Z_][a-zA-Z_0-9.-]*)’ +

Tagfind = /[a-zA-Z][a-zA-Z0-9.-:]/
Attrfind = Regexp.compile('[\s,]
([a-zA-Z_][-:.a-zA-Z_0-9]*)’ +

Are these changes perceived as valuable? Any comments, gotchas?

-Jeff