I posted this to c.l.ruby the other day. Any comments?
I’ve been working on extending the sgml-parser included with the std
windows distribution and have run into several issues.
-
Tags with xml namespace qualifiers would get split thusly
dc:language would end up with tag=“dc” with an attribute of
“language” -
Attributes cannot contain namespace qualifiers - e.g. <l:link
l:rel="http://purl.org/rss/1.0/modules/proposed/link/#permalink>
would end up as [{l => “”}, {rel=>“http…”}] -
directives such as <![CDATA[ were not being recognized as “Special”
Here’s my diffs: all three items are working for me now.
17c17
< Special = /<![^<>]*>/