Rexml

Chilkat_Software · 8 November 2006 04:09

Thanks,

By the way, the Chilkat XML parser is not better than
REXML, it's just different. To give you a little history, it was originally
developed about 7 years ago to handle:

1) Large XML data files where the MSXML parser was s-l-o-w.
At one point, I remember Chilkat XML parsing files in a few seconds
that took MSXML minutes to parse. However, since then MSXML has
improved to the point where it's as good or better in speed...

2) I wanted to create a parser that was forgiving with errors.
Back then, it was a nightmare to have a large XML file with one
small error, perhaps a byte or two that didn't fit the charset encoding,
that would prevent the entire document from loading.

3) I wanted a parser that made it easy to do the common tasks
I'm always faced with in XML -- such as reading/writing config files.

4) I wanted to make it easy to do things not normally handled in
an API -- sorting, compression, encryption, loading / encoding binary data, etc.

If you give it a try -- let me know what you think. Send me a request for
an example or two and I'll be happy to provide what I can...

-Matt

···

At 09:30 PM 11/7/2006, you wrote:

Figured it out.

It was running into problems with videos and podcasts as they don't
have an artist tag.
Got round it by searching for children tags with the title of "Movie"
or "Podcast", and not running the print statements for those items.

pdg wrote:
> Hi thanks for the link, it seems to be working much better, but...
>
> It's getting to about the 1000th file and doing its job, but then
> returning the following error:
>
> undefined method 'NextSibling' for nil:NilClass (noMethodError) from
> rtunes rb (which is basically just your sample code).
>
> Does this mean I have a broken xml file? Or is something else the
> matter?
>
> Chilkat Software wrote:
> > I tested the Chilkat XML parser (an in-memory DOM) on a 21MB XML file
> > that looks like this:
> >
> > <phonebook>
> > <address><company>yuy25uiFfaku</company><street>A7ZbA3jP48rp</street><city>fSgWAhn3i3lD</city><state>p3rfNqf6kzUq</state><postal_code>lqVZ0b4daYWQ</postal_code><country>VjfXvb0AdxSt</country><extra>TEST</extra></address>
> > <address><company>Ki78Ypx8FlbZ</company><street>340PK6u2DsZQ</street><city>EqbFawBo0mCi</city><state>fTZK5YT0Tur8</state><postal_code>EXP29c5Hi2Hj</postal_code><country>sfGB4EzWR3Ft</country><extra>TEST</extra></address>
> > ..
> > </phonebook>
> >
> > (the data is random garbage...)
> >
> > The XML is parsed in 11.5 seconds on a 18.Ghz Pentium 4. Peak memory
> > usage is 180MB.
> > I don't think the parser would break a sweat on the 6MB file...
> >
> > I uploaded the XML test data to:
> > http://www.example-code.com/downloads/bigXml.zip
> > The code for parsing the iTunes XML is easy:
> > http://www.example-code.com/ruby/ruby-parse-itunes-xml.asp
> >
> > -Matt
> >
> > At 02:45 PM 11/7/2006, you wrote:
> >
> > >Hi David (or others)
> > >
> > >I am still not sure I get it, could you explain a little more?
> > >
> > >Thanks,
> > >Paul.
> > >
> > >David Vallner wrote:
> > > > pdg wrote:
> > > > > Assuming I go with the Ruby pull parser, how do I use this in my code.
> > > > > I see from the link the code sample, but I have no idea how to throw
> > > > > that into my code and make it work. Any suggestions.
> > > > >
> > > >
> > > > Generally, you should have some layer between XML input, and processing
> > > > the records themselves. E.g. a trivial Song class, or at least a hash.
> > > > Personally, I'd make a XMLSongList class that's enumerable (implements
> > > > #each), and rework the REXML code that works for small files into one
> > > > that yields a Song object for each of the records in succession by
> > > > querying the tree accordingly.
> > > >
> > > > That shouldn't then be too hard to rework so that while #each is
> > > > running, it opens a pull parser, and for each yield, builds up a Song
> > > > object going through the record in the order how the elements appear in
> > > > the XML file, instead of a random one. Once you isolate the code that
> > > > manipulates the XML to the smallest significant unit (a song record in
> > > > this case, I presume), it shouldn't be conceptually that difficult to
> > > > rework from a tree parser to a pull parser. The code probably will get a
> > > > little messier and verbose, but the main shift of thinking is in not
> > > > asking the XML for what your object needs, but feeding an object what
> > > > the XML has.
> > > >
> > > > > PS: idiot (slaps head). Yes it was 5-6meg not gig!
> > > > >
> > > >
> > > > 6MB is still Huge (tm) for a XML file.
> > > >
> > > > --------------enigF9D6B5236ACE2603700BC85A
> > > > Content-Type: application/pgp-signature
> > > > Content-Disposition: inline;
> > > > filename="signature.asc"
> > > > Content-Description: OpenPGP digital signature
> > > > X-Google-AttachSize: 188
> > >
> > >--
> > >No virus found in this incoming message.
> > >Checked by AVG Free Edition.
> > >Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
> >
> > --
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006

--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006

Glenn_Smith · 8 November 2006 10:34

Is the rexml website down or is it just me?

···

On 08/11/06, Chilkat Software <support@chilkatsoft.com> wrote:

Thanks,

By the way, the Chilkat XML parser is not better than
REXML, it's just different. To give you a little history, it was originally
developed about 7 years ago to handle:

1) Large XML data files where the MSXML parser was s-l-o-w.
At one point, I remember Chilkat XML parsing files in a few seconds
that took MSXML minutes to parse. However, since then MSXML has
improved to the point where it's as good or better in speed...

2) I wanted to create a parser that was forgiving with errors.
Back then, it was a nightmare to have a large XML file with one
small error, perhaps a byte or two that didn't fit the charset encoding,
that would prevent the entire document from loading.

3) I wanted a parser that made it easy to do the common tasks
I'm always faced with in XML -- such as reading/writing config files.

4) I wanted to make it easy to do things not normally handled in
an API -- sorting, compression, encryption, loading / encoding binary
data, etc.

If you give it a try -- let me know what you think. Send me a request for
an example or two and I'll be happy to provide what I can...

-Matt

At 09:30 PM 11/7/2006, you wrote:

>Figured it out.
>
>It was running into problems with videos and podcasts as they don't
>have an artist tag.
>Got round it by searching for children tags with the title of "Movie"
>or "Podcast", and not running the print statements for those items.
>
>pdg wrote:
> > Hi thanks for the link, it seems to be working much better, but...
> >
> > It's getting to about the 1000th file and doing its job, but then
> > returning the following error:
> >
> > undefined method 'NextSibling' for nil:NilClass (noMethodError) from
> > rtunes rb (which is basically just your sample code).
> >
> > Does this mean I have a broken xml file? Or is something else the
> > matter?
> >
> > Chilkat Software wrote:
> > > I tested the Chilkat XML parser (an in-memory DOM) on a 21MB XML file
> > > that looks like this:
> > >
> > > <phonebook>
> > >
>
<address><company>yuy25uiFfaku</company><street>A7ZbA3jP48rp</street><city>fSgWAhn3i3lD</city><state>p3rfNqf6kzUq</state><postal_code>lqVZ0b4daYWQ</postal_code><country>VjfXvb0AdxSt</country><extra>TEST</extra></address>
> > >
>
<address><company>Ki78Ypx8FlbZ</company><street>340PK6u2DsZQ</street><city>EqbFawBo0mCi</city><state>fTZK5YT0Tur8</state><postal_code>EXP29c5Hi2Hj</postal_code><country>sfGB4EzWR3Ft</country><extra>TEST</extra></address>
> > > ..
> > > </phonebook>
> > >
> > > (the data is random garbage...)
> > >
> > > The XML is parsed in 11.5 seconds on a 18.Ghz Pentium 4. Peak memory
> > > usage is 180MB.
> > > I don't think the parser would break a sweat on the 6MB file...
> > >
> > > I uploaded the XML test data to:
> > > http://www.example-code.com/downloads/bigXml.zip
> > > The code for parsing the iTunes XML is easy:
> > > http://www.example-code.com/ruby/ruby-parse-itunes-xml.asp
> > >
> > > -Matt
> > >
> > > At 02:45 PM 11/7/2006, you wrote:
> > >
> > > >Hi David (or others)
> > > >
> > > >I am still not sure I get it, could you explain a little more?
> > > >
> > > >Thanks,
> > > >Paul.
> > > >
> > > >David Vallner wrote:
> > > > > pdg wrote:
> > > > > > Assuming I go with the Ruby pull parser, how do I use
> this in my code.
> > > > > > I see from the link the code sample, but I have no idea
> how to throw
> > > > > > that into my code and make it work. Any suggestions.
> > > > > >
> > > > >
> > > > > Generally, you should have some layer between XML input,
> and processing
> > > > > the records themselves. E.g. a trivial Song class, or at
> least a hash.
> > > > > Personally, I'd make a XMLSongList class that's enumerable
> (implements
> > > > > #each), and rework the REXML code that works for small files into
one
> > > > > that yields a Song object for each of the records in succession by
> > > > > querying the tree accordingly.
> > > > >
> > > > > That shouldn't then be too hard to rework so that while #each is
> > > > > running, it opens a pull parser, and for each yield, builds up a
Song
> > > > > object going through the record in the order how the
> elements appear in
> > > > > the XML file, instead of a random one. Once you isolate the code
that
> > > > > manipulates the XML to the smallest significant unit (a
> song record in
> > > > > this case, I presume), it shouldn't be conceptually that difficult
to
> > > > > rework from a tree parser to a pull parser. The code
> probably will get a
> > > > > little messier and verbose, but the main shift of thinking is in
not
> > > > > asking the XML for what your object needs, but feeding an object
what
> > > > > the XML has.
> > > > >
> > > > > > PS: idiot (slaps head). Yes it was 5-6meg not gig!
> > > > > >
> > > > >
> > > > > 6MB is still Huge (tm) for a XML file.
> > > > >
> > > > > --------------enigF9D6B5236ACE2603700BC85A
> > > > > Content-Type: application/pgp-signature
> > > > > Content-Disposition: inline;
> > > > > filename="signature.asc"
> > > > > Content-Description: OpenPGP digital signature
> > > > > X-Google-AttachSize: 188
> > > >
> > > >--
> > > >No virus found in this incoming message.
> > > >Checked by AVG Free Edition.
> > > >Version: 7.1.409 / Virus Database: 268.13.31/522 - Release
> Date: 11/7/2006
> > >
> > > --
> > > No virus found in this outgoing message.
> > > Checked by AVG Free Edition.
> > > Version: 7.1.409 / Virus Database: 268.13.31/522 - Release
> Date: 11/7/2006
>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006

--

All the best
Glenn
Aylesbury, UK

Topic		Replies	Views
One more way to parse XML ruby-talk	5	135	24 October 2006
Rexml - StreamListener - Where I am in the XML? ruby-talk	3	116	22 February 2007
Parsing iTunes Libary ruby-talk	5	63	21 July 2007
Stream Parsing with REXML ruby-talk	12	101	14 January 2008
REXML ... performance & memory usage ruby-talk	13	99	9 November 2006

Rexml

Related topics