Well, there are three problems with writing a validating XML parser.
Problem 1, parse XML has been done, thats REXML and it does it very very
nicely thank you.
Problem 2, parse the DTD. That’s yucky. Can be done, but it’s no joy.
Solution. Don’t do it. Feed the DTD through one of the several DTD to
Relax NG convertors, and then you can slurp it in using REXML. After all,
the really nifty thing about Relax is that it is in XML.
Problem 3, check that the XML conforms to the Schema (DTD). This is the
missing bit.
But not really hard.
It merely means you need to traverse the DOM, (which REXML makes really
easy), and at each point check whether this item is valid here. REXML
makes querying the Relax Schema easy too.
One of the marks of a good compiler is that it can make sensible fixups
and keep going. But for the purposes of validating XML you usually just
want to say, “Aye! It fits”, or “Nay! It doesn’t, expected XXX at line NNN col MM”
I bet it won’t be a large program at all.
I would write it now, except I have other things to do first. I mentioned
it now, since I can forsee a future where I will need it, and will have to
write it.
I just hoped that by the time I need it, someone else would have done so
first. (Larry Wall of Perl says the primary virtues of a programmer are
Laziness, Impatience and Hubris. I don’t have the pressing need right now,
so I’m not Impatient on this one, but Laziness tells me I will need it.
Hubris tells me my idea is a “Better way”.)
John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand
John’s law :-
All advances in computing have arisen through the creation of an
additional level of indirection, the trick is to work out which
indirection is actually useful.
···
On Thu, 20 Feb 2003, Sperberg, Roger wrote:
I’m not following you here. Your schema sets up the rules you want the data
to follow, whether it’s a DTD or Relax NG or W3C XML Schemas. Once it’s
parsed, you would then have to see that the data matched your rules.