Libxml SAX parser?

Hello,

I need to read XML documents from an open network socket.
Rexml’s Document.parse_stream works fine, but is a tad slow
for this application.

I installed libxml, but while there is a wrapper for the SAX
parser, I haven’t found a way to set the callbacks. This limits
it’s usefulness severly. It isn’t in the TODO file either.
Have I missed something here ?

None if the DOM parsers seem to have an option: Stop when you
have a valid (complete) document, which would be just what I
need.

Any suggestions ? (apart from don’t use XML).

Cheers,

Han Holl

I need to read XML documents from an open network socket. Rexml's
Document.parse_stream works fine, but is a tad slow for this
application.

I installed libxml, but while there is a wrapper for the SAX
parser, I haven't found a way to set the callbacks. This limits
it's usefulness severly. It isn't in the TODO file either.
Have I missed something here ?

None if the DOM parsers seem to have an option: Stop when you
have a valid (complete) document, which would be just what I
need.

Any suggestions ? (apart from don't use XML).

I haven't completed the SAX handlers, there's just some infrastructure
for it. I prefer the text reader interface over SAX, but I haven't
done either since I haven't had a need to support SAX. DOM + XPath
has satisfied all of my needs to date. patches welcome though. :slight_smile:
Nag me enough about it, and I'll get to it, but it's not that high on
my list of things to work on.

http://people.FreeBSD.org/~seanc/TODO

-sc

···

--
Sean Chittenden

Sean Chittenden sean@chittenden.org wrote in message news:20030417011450.GN79923@perrin.int.nxad.com

I haven’t completed the SAX handlers, there’s just some infrastructure
for it. I prefer the text reader interface over SAX, but I haven’t
done either since I haven’t had a need to support SAX. DOM + XPath
has satisfied all of my needs to date. patches welcome though. :slight_smile:
Nag me enough about it, and I’ll get to it, but it’s not that high on
my list of things to work on.

No, I won’t nag you about it. If you don’t need it, you don’t need it.
It’s just surprising how little choice there is if you need to grab
xml-documents from an input stream.
As far as I can see, libxml2 doesn’t support this at all, so a ruby
wrapper, even with SAX would do me no good.
I tried xmlparser, which claims to have an stream constructor, but this
is so bug-ridden I had to give up.
I could yet try xmlscan, but it’s pure ruby, and I doubt the performance
win over REXML would be earth-shattering. And xmlscan isn’t really well
documented.

So I’ll stick with REXML for the time being.

Cheers,

Han Holl

> I haven't completed the SAX handlers, there's just some
> infrastructure for it. I prefer the text reader interface over
> SAX, but I haven't done either since I haven't had a need to
> support SAX. DOM + XPath has satisfied all of my needs to date.
> patches welcome though. :slight_smile: Nag me enough about it, and I'll get
> to it, but it's not that high on my list of things to work on.

No, I won't nag you about it. If you don't need it, you don't need
it. It's just surprising how little choice there is if you need to
grab xml-documents from an input stream.

Agreed.

As far as I can see, libxml2 doesn't support this at all, so a ruby
wrapper, even with SAX would do me no good.

As a matter of fact, it would, and libxml2's arguably the fastest SAX
parser out there. It's DOM is constructed via a set of SAX callbacks.

http://xmlbench.sourceforge.net/results/benchmark/index.html

Unless you're parsing documents that are hundreds of MB in size,
libxml2's pretty efficient. -sc

···

--
Sean Chittenden

Sean Chittenden sean@chittenden.org wrote in message news:20030417225405.GX79923@perrin.int.nxad.com

As far as I can see, libxml2 doesn’t support this at all, so a ruby
wrapper, even with SAX would do me no good.

As a matter of fact, it would, and libxml2’s arguably the fastest SAX
parser out there. It’s DOM is constructed via a set of SAX callbacks.

http://xmlbench.sourceforge.net/results/benchmark/index.html

Unless you’re parsing documents that are hundreds of MB in size,
libxml2’s pretty efficient. -sc

I don’t doubt for a moment that you know libxml2 better than I, but the only
constructors I could find in the docs are xmlSAXUserParseFile and
xmlSAXUserParseMemory. The first takes a filename, and the second a pointer to
char.
So I assumed that libxml2 doesn’t do stream parsing.

I hope I’m wrong.

Cheers,

Han Holl