As a first exercise with Ruby, I am going through the Pickaxe book and
creating a jukebox. I haven't even tried to create an array of songs
yet, because I got distracted and wanted to work this out. I am trying
toi feed in the data from my iTunes xml file to it to get the data, I
can get it to work if I delete most of the xml file, but when it's 5-6
gig, rexml just seems to die. I have vaguely heard that stream parsing
may be the answer, but am totally unaware of how to use it.
here is the code in my xml reading program so far (saample.rb basically
just creates song items):
require 'rexml/document'
require "sample.rb"
doc = File.open("iTunes.xml")
xml = REXML::Document.new(doc)
name = "name"
artist = "artist"
time = 60
cnt = 0
xml.elements.each("//key") do |k|
if k.text == "Name" then
name = k.next_sibling.text
cnt += 1
end
if k.text == "Artist" then
artist = k.next_sibling.text
end
if k.text == "Total Time" then
time = k.next_sibling.text.to_i/1000.0
song = Song.new(name,artist,time)
song.to_s
As a first exercise with Ruby, I am going through the Pickaxe book and
creating a jukebox. I haven't even tried to create an array of songs
yet, because I got distracted and wanted to work this out. I am trying
toi feed in the data from my iTunes xml file to it to get the data, I
can get it to work if I delete most of the xml file, but when it's 5-6
gig,
OMFG. That's a -huge- XML file. Probably all of my MP3s together would
fit into there with base64-encoded contents
rexml just seems to die. I have vaguely heard that stream parsing
may be the answer, but am totally unaware of how to use it.
Well, time to learn. I probably never even saw a computer that could
handle a XML file that size using straightforward DOM parsing - which
normally "blows up" the original XML document's size in bytes five times
and more. And REXML definitely doesn't have performance of any kind
amongst its qualities. (And for completeness' sake, I never 'clicked'
with the API either, but I'm a minority there.)
You want a Ruby binding to a stream or pull parser - to my best
knowledge, REXML is neither. That means libxml2, expat, or Xerces.
Compiling Required - I think the one-click installer comes with one of
these, buggered if I know which.
After that, Google is your friend. Look at the documentation to
whichever parser you decided to use and use that - personally, I don't
do much / no non-tree XML parsing at all, so I'm mainly guessing around
on this. The main difference is that while with REXML, you can
arbitrarily look around the XML document, with stream and pull parsing,
you can only process the document in order, and have to keep the state
of that processing (e.g. which track you're currently "working on") in
your Ruby code.
Best to lean towards a database approach when you get to large files.
Neat thing working with XML & REX.
Then you can go to SleepyCat DBxml.
Though the routines are different, that's fer sure.
Someone has a neat Ruby lib for it out there.
Away from my machines for details.
pdg wrote:
> Hi All,
>
> As a first exercise with Ruby, I am going through the Pickaxe book and
> creating a jukebox. I haven't even tried to create an array of songs
> yet, because I got distracted and wanted to work this out. I am trying
> toi feed in the data from my iTunes xml file to it to get the data, I
> can get it to work if I delete most of the xml file, but when it's 5-6
> gig,
OMFG. That's a -huge- XML file. Probably all of my MP3s together would
fit into there with base64-encoded contents
> rexml just seems to die. I have vaguely heard that stream parsing
> may be the answer, but am totally unaware of how to use it.
>
Well, time to learn. I probably never even saw a computer that could
handle a XML file that size using straightforward DOM parsing - which
normally "blows up" the original XML document's size in bytes five times
and more. And REXML definitely doesn't have performance of any kind
amongst its qualities. (And for completeness' sake, I never 'clicked'
with the API either, but I'm a minority there.)
You want a Ruby binding to a stream or pull parser - to my best
knowledge, REXML is neither. That means libxml2, expat, or Xerces.
Compiling Required - I think the one-click installer comes with one of
these, buggered if I know which.
On Tue, Nov 07, 2006 at 08:03:40AM +0900, David Vallner wrote:
After that, Google is your friend. Look at the documentation to
whichever parser you decided to use and use that - personally, I don't
do much / no non-tree XML parsing at all, so I'm mainly guessing around
on this. The main difference is that while with REXML, you can
arbitrarily look around the XML document, with stream and pull parsing,
you can only process the document in order, and have to keep the state
of that processing (e.g. which track you're currently "working on") in
your Ruby code.
As a first exercise with Ruby, I am going through the Pickaxe book and
creating a jukebox. I haven't even tried to create an array of songs
yet, because I got distracted and wanted to work this out. I am trying
toi feed in the data from my iTunes xml file to it to get the data, I
can get it to work if I delete most of the xml file, but when it's 5-6
gig,
OMFG. That's a -huge- XML file. Probably all of my MP3s together would
fit into there with base64-encoded contents
rexml just seems to die. I have vaguely heard that stream parsing
may be the answer, but am totally unaware of how to use it.
Well, time to learn. I probably never even saw a computer that could
handle a XML file that size using straightforward DOM parsing - which
normally "blows up" the original XML document's size in bytes five times
and more. And REXML definitely doesn't have performance of any kind
amongst its qualities. (And for completeness' sake, I never 'clicked'
with the API either, but I'm a minority there.)
You want a Ruby binding to a stream or pull parser - to my best
knowledge, REXML is neither. That means libxml2, expat, or Xerces.
Compiling Required - I think the one-click installer comes with one of
these, buggered if I know which.
After that, Google is your friend. Look at the documentation to
whichever parser you decided to use and use that - personally, I don't
do much / no non-tree XML parsing at all, so I'm mainly guessing around
on this. The main difference is that while with REXML, you can
arbitrarily look around the XML document, with stream and pull parsing,
you can only process the document in order, and have to keep the state
of that processing (e.g. which track you're currently "working on") in
your Ruby code.
David Vallner
Actually, I recently had to rewrite an xml parser to go stream ( SAX ) style ... REXML made the task VERY easy ...
Yes, it's not the fastest thing there is, but it was "fast enough" ...
Definitely try writing it with REXML before taking the route of anything heavier.
Is that a mistake? Out of curiosity I took a look on my wife's computer
(she's the iPod user) and her XML file was only 231KB. The structure
of it conforms to the code you shared, so I know it's the right file...
Did you mean to say MB instead of GB?
-Matt
···
At 05:03 PM 11/6/2006, you wrote:
pdg wrote:
> Hi All,
>
> As a first exercise with Ruby, I am going through the Pickaxe book and
> creating a jukebox. I haven't even tried to create an array of songs
> yet, because I got distracted and wanted to work this out. I am trying
> toi feed in the data from my iTunes xml file to it to get the data, I
> can get it to work if I delete most of the xml file, but when it's 5-6
> gig,
OMFG. That's a -huge- XML file. Probably all of my MP3s together would
fit into there with base64-encoded contents
> rexml just seems to die. I have vaguely heard that stream parsing
> may be the answer, but am totally unaware of how to use it.
>
Well, time to learn. I probably never even saw a computer that could
handle a XML file that size using straightforward DOM parsing - which
normally "blows up" the original XML document's size in bytes five times
and more. And REXML definitely doesn't have performance of any kind
amongst its qualities. (And for completeness' sake, I never 'clicked'
with the API either, but I'm a minority there.)
You want a Ruby binding to a stream or pull parser - to my best
knowledge, REXML is neither. That means libxml2, expat, or Xerces.
Compiling Required - I think the one-click installer comes with one of
these, buggered if I know which.
After that, Google is your friend. Look at the documentation to
whichever parser you decided to use and use that - personally, I don't
do much / no non-tree XML parsing at all, so I'm mainly guessing around
on this. The main difference is that while with REXML, you can
arbitrarily look around the XML document, with stream and pull parsing,
you can only process the document in order, and have to keep the state
of that processing (e.g. which track you're currently "working on") in
your Ruby code.
David Vallner
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.28/518 - Release Date: 11/4/2006
I probably never even saw a computer that could
handle a XML file that size using straightforward DOM parsing
This is off-topic but I have a theory that it's possible using a variant of the Flyweight pattern with index offsets into the document and reparsing individual tags on demand. (I would use weak referencing to cache them after a parse.)
I've been meaning to code up a proof of concept here and just haven't had time yet...
You want a Ruby binding to a stream or pull parser - to my best
knowledge, REXML is neither.
Best to lean towards a database approach when you get to large files.
Neat thing working with XML & REX.
Then you can go to SleepyCat DBxml.
Though the routines are different, that's fer sure.
Someone has a neat Ruby lib for it out there.
Away from my machines for details.
Markt
He's not the one creating the file. So unless you can persuade Apple to
use a XML DB to store iTunes playlists...
(PS: The whole concept of XML DBs is an abomination. The XML Infoset
concept looks like a bloated cloudfest compared to relational data
storage...)
I wish I had the foggiest idea of what you guys were talking about.
(Roobist here)
I'm still working on Y's book.
···
On Tue, 2006-11-07 at 08:12 +0900, Jeff Wood wrote:
David Vallner wrote:
> pdg wrote:
>
>> Hi All,
>>
>> As a first exercise with Ruby, I am going through the Pickaxe book and
>> creating a jukebox. I haven't even tried to create an array of songs
>> yet, because I got distracted and wanted to work this out. I am trying
>> toi feed in the data from my iTunes xml file to it to get the data, I
>> can get it to work if I delete most of the xml file, but when it's 5-6
>> gig,
>>
>
> OMFG. That's a -huge- XML file. Probably all of my MP3s together would
> fit into there with base64-encoded contents
>
>
>> rexml just seems to die. I have vaguely heard that stream parsing
>> may be the answer, but am totally unaware of how to use it.
>>
>>
>
> Well, time to learn. I probably never even saw a computer that could
> handle a XML file that size using straightforward DOM parsing - which
> normally "blows up" the original XML document's size in bytes five times
> and more. And REXML definitely doesn't have performance of any kind
> amongst its qualities. (And for completeness' sake, I never 'clicked'
> with the API either, but I'm a minority there.)
>
> You want a Ruby binding to a stream or pull parser - to my best
> knowledge, REXML is neither. That means libxml2, expat, or Xerces.
> Compiling Required - I think the one-click installer comes with one of
> these, buggered if I know which.
>
> After that, Google is your friend. Look at the documentation to
> whichever parser you decided to use and use that - personally, I don't
> do much / no non-tree XML parsing at all, so I'm mainly guessing around
> on this. The main difference is that while with REXML, you can
> arbitrarily look around the XML document, with stream and pull parsing,
> you can only process the document in order, and have to keep the state
> of that processing (e.g. which track you're currently "working on") in
> your Ruby code.
>
> David Vallner
>
>
Actually, I recently had to rewrite an xml parser to go stream ( SAX )
style ... REXML made the task VERY easy ...
Yes, it's not the fastest thing there is, but it was "fast enough" ...
Definitely try writing it with REXML before taking the route of anything
heavier.
jd
--
You have a new sung; unsung.
I sing a song falling upon deaf ears,
unsung.
If you want speed, look at libxml-ruby. It is many many times faster than
REXML, and it supports SAX parsing as well.
Mark
···
On 11/6/06, Chilkat Software <support@chilkatsoft.com> wrote:
Is that a mistake? Out of curiosity I took a look on my wife's computer
(she's the iPod user) and her XML file was only 231KB. The structure
of it conforms to the code you shared, so I know it's the right file...
Did you mean to say MB instead of GB?
-Matt
At 05:03 PM 11/6/2006, you wrote:
>pdg wrote:
> > Hi All,
> >
> > As a first exercise with Ruby, I am going through the Pickaxe book and
> > creating a jukebox. I haven't even tried to create an array of songs
> > yet, because I got distracted and wanted to work this out. I am trying
> > toi feed in the data from my iTunes xml file to it to get the data, I
> > can get it to work if I delete most of the xml file, but when it's 5-6
> > gig,
>
>OMFG. That's a -huge- XML file. Probably all of my MP3s together would
>fit into there with base64-encoded contents
>
> > rexml just seems to die. I have vaguely heard that stream parsing
> > may be the answer, but am totally unaware of how to use it.
> >
>
>Well, time to learn. I probably never even saw a computer that could
>handle a XML file that size using straightforward DOM parsing - which
>normally "blows up" the original XML document's size in bytes five times
>and more. And REXML definitely doesn't have performance of any kind
>amongst its qualities. (And for completeness' sake, I never 'clicked'
>with the API either, but I'm a minority there.)
>
>You want a Ruby binding to a stream or pull parser - to my best
>knowledge, REXML is neither. That means libxml2, expat, or Xerces.
>Compiling Required - I think the one-click installer comes with one of
>these, buggered if I know which.
>
>After that, Google is your friend. Look at the documentation to
>whichever parser you decided to use and use that - personally, I don't
>do much / no non-tree XML parsing at all, so I'm mainly guessing around
>on this. The main difference is that while with REXML, you can
>arbitrarily look around the XML document, with stream and pull parsing,
>you can only process the document in order, and have to keep the state
>of that processing (e.g. which track you're currently "working on") in
>your Ruby code.
>
>David Vallner
>
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.28/518 - Release Date: 11/4/2006
Assuming I go with the Ruby pull parser, how do I use this in my code.
I see from the link the code sample, but I have no idea how to throw
that into my code and make it work. Any suggestions.
Thanks for the discussion so far.
PS: idiot (slaps head). Yes it was 5-6meg not gig!
Assuming I go with the Ruby pull parser, how do I use this in my code.
I see from the link the code sample, but I have no idea how to throw
that into my code and make it work. Any suggestions.
Generally, you should have some layer between XML input, and processing
the records themselves. E.g. a trivial Song class, or at least a hash.
Personally, I'd make a XMLSongList class that's enumerable (implements #each), and rework the REXML code that works for small files into one
that yields a Song object for each of the records in succession by
querying the tree accordingly.
That shouldn't then be too hard to rework so that while #each is
running, it opens a pull parser, and for each yield, builds up a Song
object going through the record in the order how the elements appear in
the XML file, instead of a random one. Once you isolate the code that
manipulates the XML to the smallest significant unit (a song record in
this case, I presume), it shouldn't be conceptually that difficult to
rework from a tree parser to a pull parser. The code probably will get a
little messier and verbose, but the main shift of thinking is in not
asking the XML for what your object needs, but feeding an object what
the XML has.
PS: idiot (slaps head). Yes it was 5-6meg not gig!
Unfortunately, it only runs on Windows. (Sorry!) It is freeware however.
Here's the example source. I suspect you won't have memory problems with it.
If you try it, please let me know how fast it runs and whether it uses
too much memory...
require 'chilkat'
# The Chilkat XML parser for Ruby is freeware.
xml = Chilkat::CkXml.new()
xml.LoadXmlFile("c:/temp/itunes.xml")
# Search for this node: <key>Tracks</key>
tracksKey = xml.SearchForContent(xml,"key","Tracks")
# Assuming it's found, the <dict> node is the next sibling
dict = tracksKey.NextSibling()
# Loop over the <dict> child nodes...
n = dict.NumChildrenHavingTag("dict")
for i in 0..(n-1)
trackRec = dict.GetNthChildWithTag("dict",i)
print "Name: " + trackRec.GetChildExact("key","Name").NextSibling().content + "\n"
print "Artist: " + trackRec.GetChildExact("key","Artist").NextSibling().content + "\n"
print "Time: " + trackRec.GetChildExact("key","Total Time").NextSibling().content + "\n"
end
-Matt
···
At 08:10 PM 11/6/2006, you wrote:
Assuming I go with the Ruby pull parser, how do I use this in my code.
I see from the link the code sample, but I have no idea how to throw
that into my code and make it work. Any suggestions.
Thanks for the discussion so far.
PS: idiot (slaps head). Yes it was 5-6meg not gig!
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.28/518 - Release Date: 11/4/2006
I am still not sure I get it, could you explain a little more?
Thanks,
Paul.
David Vallner wrote:
···
pdg wrote:
> Assuming I go with the Ruby pull parser, how do I use this in my code.
> I see from the link the code sample, but I have no idea how to throw
> that into my code and make it work. Any suggestions.
>
Generally, you should have some layer between XML input, and processing
the records themselves. E.g. a trivial Song class, or at least a hash.
Personally, I'd make a XMLSongList class that's enumerable (implements #each), and rework the REXML code that works for small files into one
that yields a Song object for each of the records in succession by
querying the tree accordingly.
That shouldn't then be too hard to rework so that while #each is
running, it opens a pull parser, and for each yield, builds up a Song
object going through the record in the order how the elements appear in
the XML file, instead of a random one. Once you isolate the code that
manipulates the XML to the smallest significant unit (a song record in
this case, I presume), it shouldn't be conceptually that difficult to
rework from a tree parser to a pull parser. The code probably will get a
little messier and verbose, but the main shift of thinking is in not
asking the XML for what your object needs, but feeding an object what
the XML has.
> PS: idiot (slaps head). Yes it was 5-6meg not gig!
>
I am still not sure I get it, could you explain a little more?
Thanks,
Paul.
David Vallner wrote:
> pdg wrote:
> > Assuming I go with the Ruby pull parser, how do I use this in my code.
> > I see from the link the code sample, but I have no idea how to throw
> > that into my code and make it work. Any suggestions.
> >
>
> Generally, you should have some layer between XML input, and processing
> the records themselves. E.g. a trivial Song class, or at least a hash.
> Personally, I'd make a XMLSongList class that's enumerable (implements
> #each), and rework the REXML code that works for small files into one
> that yields a Song object for each of the records in succession by
> querying the tree accordingly.
>
> That shouldn't then be too hard to rework so that while #each is
> running, it opens a pull parser, and for each yield, builds up a Song
> object going through the record in the order how the elements appear in
> the XML file, instead of a random one. Once you isolate the code that
> manipulates the XML to the smallest significant unit (a song record in
> this case, I presume), it shouldn't be conceptually that difficult to
> rework from a tree parser to a pull parser. The code probably will get a
> little messier and verbose, but the main shift of thinking is in not
> asking the XML for what your object needs, but feeding an object what
> the XML has.
>
> > PS: idiot (slaps head). Yes it was 5-6meg not gig!
> >
>
> 6MB is still Huge (tm) for a XML file.
>
> --------------enigF9D6B5236ACE2603700BC85A
> Content-Type: application/pgp-signature
> Content-Disposition: inline;
> filename="signature.asc"
> Content-Description: OpenPGP digital signature
> X-Google-AttachSize: 188
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
>Hi David (or others)
>
>I am still not sure I get it, could you explain a little more?
>
>Thanks,
>Paul.
>
>David Vallner wrote:
> > pdg wrote:
> > > Assuming I go with the Ruby pull parser, how do I use this in my code.
> > > I see from the link the code sample, but I have no idea how to throw
> > > that into my code and make it work. Any suggestions.
> > >
> >
> > Generally, you should have some layer between XML input, and processing
> > the records themselves. E.g. a trivial Song class, or at least a hash.
> > Personally, I'd make a XMLSongList class that's enumerable (implements
> > #each), and rework the REXML code that works for small files into one
> > that yields a Song object for each of the records in succession by
> > querying the tree accordingly.
> >
> > That shouldn't then be too hard to rework so that while #each is
> > running, it opens a pull parser, and for each yield, builds up a Song
> > object going through the record in the order how the elements appear in
> > the XML file, instead of a random one. Once you isolate the code that
> > manipulates the XML to the smallest significant unit (a song record in
> > this case, I presume), it shouldn't be conceptually that difficult to
> > rework from a tree parser to a pull parser. The code probably will get a
> > little messier and verbose, but the main shift of thinking is in not
> > asking the XML for what your object needs, but feeding an object what
> > the XML has.
> >
> > > PS: idiot (slaps head). Yes it was 5-6meg not gig!
> > >
> >
> > 6MB is still Huge (tm) for a XML file.
> >
> >
> > --------------enigF9D6B5236ACE2603700BC85A
> > Content-Type: application/pgp-signature
> > Content-Disposition: inline;
> > filename="signature.asc"
> > Content-Description: OpenPGP digital signature
> > X-Google-AttachSize: 188
>
>
>
>
>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
If you want, send me your test code + file and I'll have a look...
(tomorrow morning though)
Don't worry about a large attachment, just send it zipped...
Best Regards,
Matt
···
At 08:25 PM 11/7/2006, you wrote:
Hi thanks for the link, it seems to be working much better, but...
It's getting to about the 1000th file and doing its job, but then
returning the following error:
undefined method 'NextSibling' for nil:NilClass (noMethodError) from
rtunes rb (which is basically just your sample code).
Does this mean I have a broken xml file? Or is something else the
matter?
Chilkat Software wrote:
> I tested the Chilkat XML parser (an in-memory DOM) on a 21MB XML file
> that looks like this:
>
> <phonebook>
> <address><company>yuy25uiFfaku</company><street>A7ZbA3jP48rp</street><city>fSgWAhn3i3lD</city><state>p3rfNqf6kzUq</state><postal_code>lqVZ0b4daYWQ</postal_code><country>VjfXvb0AdxSt</country><extra>TEST</extra></address>
> <address><company>Ki78Ypx8FlbZ</company><street>340PK6u2DsZQ</street><city>EqbFawBo0mCi</city><state>fTZK5YT0Tur8</state><postal_code>EXP29c5Hi2Hj</postal_code><country>sfGB4EzWR3Ft</country><extra>TEST</extra></address>
> ..
> </phonebook>
>
> (the data is random garbage...)
>
> The XML is parsed in 11.5 seconds on a 18.Ghz Pentium 4. Peak memory
> usage is 180MB.
> I don't think the parser would break a sweat on the 6MB file...
>
> I uploaded the XML test data to:
> http://www.example-code.com/downloads/bigXml.zip
> The code for parsing the iTunes XML is easy:
> http://www.example-code.com/ruby/ruby-parse-itunes-xml.asp
>
> -Matt
>
> At 02:45 PM 11/7/2006, you wrote:
>
> >Hi David (or others)
> >
> >I am still not sure I get it, could you explain a little more?
> >
> >Thanks,
> >Paul.
> >
> >David Vallner wrote:
> > > pdg wrote:
> > > > Assuming I go with the Ruby pull parser, how do I use this in my code.
> > > > I see from the link the code sample, but I have no idea how to throw
> > > > that into my code and make it work. Any suggestions.
> > > >
> > >
> > > Generally, you should have some layer between XML input, and processing
> > > the records themselves. E.g. a trivial Song class, or at least a hash.
> > > Personally, I'd make a XMLSongList class that's enumerable (implements
> > > #each), and rework the REXML code that works for small files into one
> > > that yields a Song object for each of the records in succession by
> > > querying the tree accordingly.
> > >
> > > That shouldn't then be too hard to rework so that while #each is
> > > running, it opens a pull parser, and for each yield, builds up a Song
> > > object going through the record in the order how the elements appear in
> > > the XML file, instead of a random one. Once you isolate the code that
> > > manipulates the XML to the smallest significant unit (a song record in
> > > this case, I presume), it shouldn't be conceptually that difficult to
> > > rework from a tree parser to a pull parser. The code probably will get a
> > > little messier and verbose, but the main shift of thinking is in not
> > > asking the XML for what your object needs, but feeding an object what
> > > the XML has.
> > >
> > > > PS: idiot (slaps head). Yes it was 5-6meg not gig!
> > > >
> > >
> > > 6MB is still Huge (tm) for a XML file.
> > >
> > > --------------enigF9D6B5236ACE2603700BC85A
> > > Content-Type: application/pgp-signature
> > > Content-Disposition: inline;
> > > filename="signature.asc"
> > > Content-Description: OpenPGP digital signature
> > > X-Google-AttachSize: 188
> >
> >--
> >No virus found in this incoming message.
> >Checked by AVG Free Edition.
> >Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
It was running into problems with videos and podcasts as they don't
have an artist tag.
Got round it by searching for children tags with the title of "Movie"
or "Podcast", and not running the print statements for those items.
pdg wrote:
···
Hi thanks for the link, it seems to be working much better, but...
It's getting to about the 1000th file and doing its job, but then
returning the following error:
undefined method 'NextSibling' for nil:NilClass (noMethodError) from
rtunes rb (which is basically just your sample code).
Does this mean I have a broken xml file? Or is something else the
matter?
Chilkat Software wrote:
> I tested the Chilkat XML parser (an in-memory DOM) on a 21MB XML file
> that looks like this:
>
> <phonebook>
> <address><company>yuy25uiFfaku</company><street>A7ZbA3jP48rp</street><city>fSgWAhn3i3lD</city><state>p3rfNqf6kzUq</state><postal_code>lqVZ0b4daYWQ</postal_code><country>VjfXvb0AdxSt</country><extra>TEST</extra></address>
> <address><company>Ki78Ypx8FlbZ</company><street>340PK6u2DsZQ</street><city>EqbFawBo0mCi</city><state>fTZK5YT0Tur8</state><postal_code>EXP29c5Hi2Hj</postal_code><country>sfGB4EzWR3Ft</country><extra>TEST</extra></address>
> ..
> </phonebook>
>
> (the data is random garbage...)
>
> The XML is parsed in 11.5 seconds on a 18.Ghz Pentium 4. Peak memory
> usage is 180MB.
> I don't think the parser would break a sweat on the 6MB file...
>
> I uploaded the XML test data to:
> http://www.example-code.com/downloads/bigXml.zip
> The code for parsing the iTunes XML is easy:
> http://www.example-code.com/ruby/ruby-parse-itunes-xml.asp
>
> -Matt
>
>
>
>
> At 02:45 PM 11/7/2006, you wrote:
>
> >Hi David (or others)
> >
> >I am still not sure I get it, could you explain a little more?
> >
> >Thanks,
> >Paul.
> >
> >David Vallner wrote:
> > > pdg wrote:
> > > > Assuming I go with the Ruby pull parser, how do I use this in my code.
> > > > I see from the link the code sample, but I have no idea how to throw
> > > > that into my code and make it work. Any suggestions.
> > > >
> > >
> > > Generally, you should have some layer between XML input, and processing
> > > the records themselves. E.g. a trivial Song class, or at least a hash.
> > > Personally, I'd make a XMLSongList class that's enumerable (implements
> > > #each), and rework the REXML code that works for small files into one
> > > that yields a Song object for each of the records in succession by
> > > querying the tree accordingly.
> > >
> > > That shouldn't then be too hard to rework so that while #each is
> > > running, it opens a pull parser, and for each yield, builds up a Song
> > > object going through the record in the order how the elements appear in
> > > the XML file, instead of a random one. Once you isolate the code that
> > > manipulates the XML to the smallest significant unit (a song record in
> > > this case, I presume), it shouldn't be conceptually that difficult to
> > > rework from a tree parser to a pull parser. The code probably will get a
> > > little messier and verbose, but the main shift of thinking is in not
> > > asking the XML for what your object needs, but feeding an object what
> > > the XML has.
> > >
> > > > PS: idiot (slaps head). Yes it was 5-6meg not gig!
> > > >
> > >
> > > 6MB is still Huge (tm) for a XML file.
> > >
> > >
> > > --------------enigF9D6B5236ACE2603700BC85A
> > > Content-Type: application/pgp-signature
> > > Content-Disposition: inline;
> > > filename="signature.asc"
> > > Content-Description: OpenPGP digital signature
> > > X-Google-AttachSize: 188
> >
> >
> >
> >
> >
> >--
> >No virus found in this incoming message.
> >Checked by AVG Free Edition.
> >Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
>
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006