Libxml SAX parser?

Han_Holl3 · 16 April 2003 11:11

Hello,

I need to read XML documents from an open network socket.
Rexml’s Document.parse_stream works fine, but is a tad slow
for this application.

I installed libxml, but while there is a wrapper for the SAX
parser, I haven’t found a way to set the callbacks. This limits
it’s usefulness severly. It isn’t in the TODO file either.
Have I missed something here ?

None if the DOM parsers seem to have an option: Stop when you
have a valid (complete) document, which would be just what I
need.

Any suggestions ? (apart from don’t use XML).

Cheers,

Han Holl

Sean_Chittenden2 · 17 April 2003 01:14

I need to read XML documents from an open network socket. Rexml's
Document.parse_stream works fine, but is a tad slow for this
application.

I installed libxml, but while there is a wrapper for the SAX
parser, I haven't found a way to set the callbacks. This limits
it's usefulness severly. It isn't in the TODO file either.
Have I missed something here ?

None if the DOM parsers seem to have an option: Stop when you
have a valid (complete) document, which would be just what I
need.

Any suggestions ? (apart from don't use XML).

I haven't completed the SAX handlers, there's just some infrastructure
for it. I prefer the text reader interface over SAX, but I haven't
done either since I haven't had a need to support SAX. DOM + XPath
has satisfied all of my needs to date. patches welcome though.
Nag me enough about it, and I'll get to it, but it's not that high on
my list of things to work on.

http://people.FreeBSD.org/~seanc/TODO

-sc

···

--
Sean Chittenden

Han_Holl3 · 17 April 2003 11:01

Sean Chittenden sean@chittenden.org wrote in message news:20030417011450.GN79923@perrin.int.nxad.com…

I haven’t completed the SAX handlers, there’s just some infrastructure
for it. I prefer the text reader interface over SAX, but I haven’t
done either since I haven’t had a need to support SAX. DOM + XPath
has satisfied all of my needs to date. patches welcome though.
Nag me enough about it, and I’ll get to it, but it’s not that high on
my list of things to work on.

No, I won’t nag you about it. If you don’t need it, you don’t need it.
It’s just surprising how little choice there is if you need to grab
xml-documents from an input stream.
As far as I can see, libxml2 doesn’t support this at all, so a ruby
wrapper, even with SAX would do me no good.
I tried xmlparser, which claims to have an stream constructor, but this
is so bug-ridden I had to give up.
I could yet try xmlscan, but it’s pure ruby, and I doubt the performance
win over REXML would be earth-shattering. And xmlscan isn’t really well
documented.

So I’ll stick with REXML for the time being.

Cheers,

Han Holl

Sean_Chittenden2 · 17 April 2003 22:54

> I haven't completed the SAX handlers, there's just some
> infrastructure for it. I prefer the text reader interface over
> SAX, but I haven't done either since I haven't had a need to
> support SAX. DOM + XPath has satisfied all of my needs to date.
> patches welcome though. Nag me enough about it, and I'll get
> to it, but it's not that high on my list of things to work on.

No, I won't nag you about it. If you don't need it, you don't need
it. It's just surprising how little choice there is if you need to
grab xml-documents from an input stream.

Agreed.

As far as I can see, libxml2 doesn't support this at all, so a ruby
wrapper, even with SAX would do me no good.

As a matter of fact, it would, and libxml2's arguably the fastest SAX
parser out there. It's DOM is constructed via a set of SAX callbacks.

http://xmlbench.sourceforge.net/results/benchmark/index.html

Unless you're parsing documents that are hundreds of MB in size,
libxml2's pretty efficient. -sc

···

--
Sean Chittenden

Han_Holl3 · 18 April 2003 11:13

Sean Chittenden sean@chittenden.org wrote in message news:20030417225405.GX79923@perrin.int.nxad.com…

As far as I can see, libxml2 doesn’t support this at all, so a ruby
wrapper, even with SAX would do me no good.

As a matter of fact, it would, and libxml2’s arguably the fastest SAX
parser out there. It’s DOM is constructed via a set of SAX callbacks.

http://xmlbench.sourceforge.net/results/benchmark/index.html

Unless you’re parsing documents that are hundreds of MB in size,
libxml2’s pretty efficient. -sc

I don’t doubt for a moment that you know libxml2 better than I, but the only
constructors I could find in the docs are xmlSAXUserParseFile and
xmlSAXUserParseMemory. The first takes a filename, and the second a pointer to
char.
So I assumed that libxml2 doesn’t do stream parsing.

I hope I’m wrong.

Cheers,

Han Holl

Topic		Replies	Views
XML in ruby ruby-talk	11	133	10 July 2006
Parsing xml (xmpp) with ruby ruby-talk	3	155	27 September 2008
XML parser ruby-talk	7	150	20 July 2008
Xml parser + dom + robust + speed =? ruby-talk	2	147	10 July 2006
REXML ... performance & memory usage ruby-talk	13	153	9 November 2006

Libxml SAX parser?

Related topics