Help needed with rexml

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
  <fault>
    <value>
      <struct>
        <member>
          <name>faultCode</name>
            <value>
              <int>5</int>
            </value>
        </member>
        <member>
          <name>faultString</name>
        <value>
          <string>system error (nologin)</string>
        </value>
      </member>
    </struct>
  </value>
  </fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
  i.texts.each { |t|
    puts "Class: #{t.class}"
    puts "Value: #{t.value}"
    puts "String: #{t.to_s}"
  }
}

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):

use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"

my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";

I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.

Responses to the list or the newsgroup, please, for future googlers
to find.

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
  <fault>
                <value>
                        <struct>
                                <member>
                                        <name>faultCode</name>
                                                <value>
                                                        <int>5</int>
                                                </value>
                                </member>
                                <member>
                                        <name>faultString</name>
                                <value>
                                        <string>system error (nologin)</string>
                                </value>
                        </member>
                </struct>
        </value>
        </fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
        i.texts.each { |t|
                puts "Class: #{t.class}"
                puts "Value: #{t.value}"
                puts "String: #{t.to_s}"
        }
}

I'd start with something like this (untested, from memory):

xml.elements.each do |elem|
  p elem.node_type
  p elem.text
end

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.

Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Kind regards

robert

···

2005/8/28, Michael <invalid@dev.null>:

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Maybe that could be added to a new release of rexml (if the dev is
reading this?)

What type of information do you want to get out of this? You never posted what you *thought* your sample ruby code would give you. I ran your perl example, and it looks like the xml document but where < > are gsub'd for { }.

Here is an example which shows some xpath usage.:

require 'rexml/document'

file = File.new("test.xml")
root = REXML::Document.new(file).root
fault_arr = root.elements.each( "fault" ) do |e1|
   e1.elements.each( "value/struct/member" ) do |e2|
     e2.elements.each( '*' ) { |e3| print e3.text.strip }
     e2.elements.each( '*/*' ){ |e3| puts " " + e3.text.strip }
   end
end

puts "Message response faulted!" if fault_arr.length > 0

Zach

Michael wrote:

···

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
  <fault>
    <value>
      <struct>
        <member>
          <name>faultCode</name>
            <value>
              <int>5</int>
            </value>
        </member>
        <member>
          <name>faultString</name>
        <value>
          <string>system error (nologin)</string>
        </value>
      </member>
    </struct>
  </value>
  </fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
  i.texts.each { |t|
    puts "Class: #{t.class}"
    puts "Value: #{t.value}"
    puts "String: #{t.to_s}"
  }
}

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):

use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"

my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";

I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.

Responses to the list or the newsgroup, please, for future googlers
to find.

PS. There's a great XML plugin for JEdit that will show you the results
of an XPath search on your document. That way you can try out different
variations until you get the result set that you want without having to
run your script each time to test it.

Written off teh top of my head, but you could write your own.

  class REXML::Element
    def each_element_recurse
      each_element { |e|
        unless e.children.empty? rescue false
          e.each_element_recurse
        end
        yield(e)
    end
  end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! :-p)

T.

In article <1125257760.436346.52640@f14g2000cwb.googlegroups.com>,
  "zerohalo" <zerohalo@gmail.com> writes:

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Thanks! This is what I'm looking for. I read through the tutorial and
the rdoc documentation but I just couldn't figure out what I was missing.

When I mentioned iterating through the document, what I'm really doing
is describing the process I've been using for deciding how to handle
some arbitrary XML document I wound up with. It's not ideal, I admit. It
might do me some good to read up on XML.
--Michael

In article <9e3fd2c8050828122755267c9@mail.gmail.com>,
  Robert Klemme <shortcutter@googlemail.com> writes:

[...]

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.
Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Thanks for the reply. I did read the tutorial--I just wasn't making the
necessary connections.
--Michael

Correction to my last post. It's in the XSLT plugin.

Trans wrote:

Written off teh top of my head, but you could write your own.

  class REXML::Element
    def each_element_recurse
      each_element { |e|
        unless e.children.empty? rescue false
          e.each_element_recurse
        end
        yield(e)
    end
  end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! :-p)

If you really think you need to visit every element you may be better off using the stream or pull parsers.

James

···

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

zerohalo wrote:

James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

I don't think there is much written about the pull parser (though I should see an article of mine on the topic published in a mainstream geek mag in the next few months. I hope.)

Back in 2001 I wrote an article on using the REXML stream parser that may be relevant, and possibly accurate:

http://www.rubyxml.com/articles/REXML/Stream_Parsing_with_REXML

The pull parser sits below all the other REXML parsers, and has a sparser API, but is quite handy for many things.

The basic idea is to pull events from the parser, see what you've got (start_element? end_element? text?), and act on it. You can also push things back onto the parse stream, too, as well as peek down the stream to see what's ahead (while not disrupting the current stream order).

# Simple example:
require 'rexml/parsers/pullparser'

# foo.xml has
# <foo>
# <baz>This is baz</baz>
# <bar>Ignore me!</bar>
# <baz>This is baz, also</baz>
# </foo>

File.open( 'foo.xml', 'r' ) do |f|
   parser = REXML::Parsers::PullParser.new( f )
   while parser.has_next?
     pull_event = parser.pull
     puts( "Element: " + pull_event[0] ) if pull_event.start_element?
     if pull_event.start_element? and pull_event[0] == 'baz'
       while !(pull_event = parser.pull).end_element?
         puts pull_event[0] if pull_event.text?
       end
     end
   end
end

Or something like that.

See also

http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Parsers/PullParser.html

James

···

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

Google on REXML and you get some good results!
Start here:
http://raa.ruby-lang.org/project/rexml
and you'll get to
http://www.germane-software.com/software/rexml/
and
http://www.germane-software.com/software/rexml/docs/tutorial.html

Have fun!
Cheers,
David

···

2005/8/29, zerohalo <zerohalo@gmail.com>:

Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

zerohalo wrote:

Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

No, the site is missing some obvious UI clues for friendlier usage.

You can see past items by tweaking the URL:

http://rubyxml.com/index.rb/2004/12
   Shows items from December of 2004

http://rubyxml.com/index.rb/2005
   Shows items from 2005.

http://rubyxml.com/index.rb/Articles
   Shows items in the Articles category

http://rubyxml.com/index.rb/Applications
   Shows items in the Applications category

More or less.

James

···

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys