REXML Optimization

So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
    $balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
    $balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and the
ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

  <debit category="clothes">
    <amount>31.19</amount>
    <date><year>2002</year><month>1</month><day>3</day></date>
    <payto>Walking Store</payto>
    <description>shoes</description>
  </debit>

  <deposit category="salary">
    <amount>1549.58</amount>
    <date><year>2002</year><month>1</month><day>7</day></date>
    <payor>Bob's Bolts</payor>
  </deposit>
</checkbook>

#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

Thanks:)

SA

Bucco said:

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

It looks good to me. In fact, I didn't find the Perl that bad (which is
unusual for me...I'm not a big Perl fan.)

But you seem to have the Ruby "style" down pretty well.

Ryan

Bucco wrote:

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

The DOM is not a database, and it shows.

XPath queries can get real slow as the document size grows.

Suggestion: Read and parse the XML once, and store it internally in a format better suited for queries. XML is great for all sorts of things, particularly for inter-app data exchange, but once the data is inside your system that value drops. So, if the code is mainly concerned with executing queries and such, slurp in the XML and stash it in some optimized internal structure. Maybe use Madeleine for in-memory storage and queries.

If need be, add code to serialize the data back to XML for persistence when the app is shut down.

Try to compute the start-up cost of the parsing and restructuring and indexing the data right up front, versus the cost of running XPath calls over and over. See if it gains you anything.

James

···

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

Bucco wrote:

So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
    $balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
    $balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and
the ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

  <debit category="clothes">
    <amount>31.19</amount>
    <date><year>2002</year><month>1</month><day>3</day></date>
    <payto>Walking Store</payto>
    <description>shoes</description>
  </debit>

  <deposit category="salary">
    <amount>1549.58</amount>
    <date><year>2002</year><month>1</month><day>7</day></date>
    <payor>Bob's Bolts</payor>
  </deposit>
</checkbook>

#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

You could get rid of one traversal by iterating all "amounts" and do the
calculation based on the parent element's type.

Kind regards

    robert

Robert Klemme wrote:

Bucco wrote:

So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml
file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
    $balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
    $balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and
the ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

  <debit category="clothes">
    <amount>31.19</amount>
    <date><year>2002</year><month>1</month><day>3</day></date>
    <payto>Walking Store</payto>
    <description>shoes</description>
  </debit>

  <deposit category="salary">
    <amount>1549.58</amount>
    <date><year>2002</year><month>1</month><day>7</day></date>
    <payor>Bob's Bolts</payor>
  </deposit>
</checkbook>

#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

You could get rid of one traversal by iterating all "amounts" and do
the calculation based on the parent element's type.

If you want to speed up things even more you can do stream processing with
REXML's SAX like API:
http://www.germane-software.com/software/rexml/docs/tutorial.html#id2248482

Kind regards

    robert