Validating REXML Was Re: ANN: REXML 2.5.4

I’m not following you here. Your schema sets up the rules you want the data
to follow, whether it’s a DTD or Relax NG or W3C XML Schemas. Once it’s
parsed, you would then have to see that the data matched your rules.

Instead of a “bit”, it looks to me like you’d have to have code that would
translate from your schema’s granmmar to Ruby and REXML and then you check
every node for the right sequence and the acceptable children.

Although a Relax NG schema might be easier to translate than a DTD, it still
seems significant.

Or am I missing something?

Roger Sperberg

···

On Tuesday, 18 Feb 2003, John Carter wrote:

Hmm. It occurs to me that you don’t need a validating parser.

You can parse and then validate.

Especially if instead of a DTD, you use James Clark’s Relax NG schema.

ie. XML doc + Relax NG Schema + REXML + A bit of ruby magic
and you have a validating parser.

All that is missing is the “bit of ruby magic”, which would probably be
quite small.

Well, there are three problems with writing a validating XML parser.

Problem 1, parse XML has been done, thats REXML and it does it very very
nicely thank you.

Problem 2, parse the DTD. That’s yucky. Can be done, but it’s no joy.
Solution. Don’t do it. Feed the DTD through one of the several DTD to
Relax NG convertors, and then you can slurp it in using REXML. After all,
the really nifty thing about Relax is that it is in XML.

Problem 3, check that the XML conforms to the Schema (DTD). This is the
missing bit.

But not really hard.

It merely means you need to traverse the DOM, (which REXML makes really
easy), and at each point check whether this item is valid here. REXML
makes querying the Relax Schema easy too.

One of the marks of a good compiler is that it can make sensible fixups
and keep going. But for the purposes of validating XML you usually just
want to say, “Aye! It fits”, or “Nay! It doesn’t, expected XXX at line NNN col MM”

I bet it won’t be a large program at all.

I would write it now, except I have other things to do first. I mentioned
it now, since I can forsee a future where I will need it, and will have to
write it.

I just hoped that by the time I need it, someone else would have done so
first. (Larry Wall of Perl says the primary virtues of a programmer are
Laziness, Impatience and Hubris. I don’t have the pressing need right now,
so I’m not Impatient on this one, but Laziness tells me I will need it.
Hubris tells me my idea is a “Better way”.)

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

John’s law :-

All advances in computing have arisen through the creation of an
additional level of indirection, the trick is to work out which
indirection is actually useful.

···

On Thu, 20 Feb 2003, Sperberg, Roger wrote:

I’m not following you here. Your schema sets up the rules you want the data
to follow, whether it’s a DTD or Relax NG or W3C XML Schemas. Once it’s
parsed, you would then have to see that the data matched your rules.

John Carter john.carter@tait.co.nz wrote in message news:Pine.LNX.4.50.0302200917430.12258-100000@localhost.localdomain

Problem 3, check that the XML conforms to the Schema (DTD). This is the
missing bit.

It merely means you need to traverse the DOM, (which REXML makes really
easy), and at each point check whether this item is valid here. REXML
makes querying the Relax Schema easy too.

One of the marks of a good compiler is that it can make sensible fixups
and keep going. But for the purposes of validating XML you usually just
want to say, “Aye! It fits”, or “Nay! It doesn’t, expected XXX at line NNN col MM”

I bet it won’t be a large program at all.

I would write it now, except I have other things to do first. I mentioned
it now, since I can forsee a future where I will need it, and will have to
write it.

I just hoped that by the time I need it, someone else would have done so
first. (Larry Wall of Perl says the primary virtues of a programmer are
Laziness, Impatience and Hubris. I don’t have the pressing need right now,
so I’m not Impatient on this one, but Laziness tells me I will need it.
Hubris tells me my idea is a “Better way”.)

John, shoot me an email, and we’ll collaborate on this.

My plan, as it has stood for the past 5 months, is to write a RelaxNG
→ Ruby state machine generator, then slap a SAX2 interface on it. To
validate, users will instantiate the validator and pass it to the
parser. The parser will call its usual listener notification events;
as far as REXML will be concerned, the validator will be just another
listener. The validator will just make sure that the events it
receives puts the state machine into a valid state. I’ll need to add
some hooks into REXML to support validation in tree parsing, unless I
can think of a less invasive solution, but I’m more than willing to do
that work to get validation.

I’m hoping that other people will follow the RelaxNG example, and
write DTD->FSM and W3C XML Schema->FSM converters. I’d like this
mostly because I have zero interest in writing DTD /or/ W3C XSD
parsers, so if someone else doesn’t do them, they probably will never
be done. I’m happy to give contributors repository accounts and as
much support as they need to do this work.

I’ve started on this a number of times, and then gotten distracted
with other things (like bug fixes). My main task right now is to
improve namespace handling in the streaming and pull parsers; I think
I just fixed the last namespace non-conformance in REXML in this
next release. I’ve also got a dozen other “important” sub-projects,
such as trying to separate the various parsers so that people can trim
down REXML to a minimal API subset, for inclusion in their own
applications. If REXML is ever bundled with Ruby, this won’t be as
important, but I’d still like to do it to clean up the code base. The
API documentation is in total chaos, I need to extend XPath support to
do more than return nodes, and then there’s XPath 2.0 looming –
that’ll be a huge job.

Anyway, if anyone is interested in collaborating on the RelaxNG
validator, let me know and we’ll get started.