Help needed with rexml

Forrest · 28 August 2005 18:16

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
  <fault>
    <value>
      <struct>
        <member>
          <name>faultCode</name>
            <value>
              <int>5</int>
            </value>
        </member>
        <member>
          <name>faultString</name>
        <value>
          <string>system error (nologin)</string>
        </value>
      </member>
    </struct>
  </value>
  </fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
  i.texts.each { |t|
    puts "Class: #{t.class}"
    puts "Value: #{t.value}"
    puts "String: #{t.to_s}"
  }
}

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):

use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"

my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";

I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.

Responses to the list or the newsgroup, please, for future googlers
to find.

Robert_K1 · 28 August 2005 19:27

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
  <fault>
                <value>
                        <struct>
                                <member>
                                        <name>faultCode</name>
                                                <value>
                                                        <int>5</int>
                                                </value>
                                </member>
                                <member>
                                        <name>faultString</name>
                                <value>
                                        <string>system error (nologin)</string>
                                </value>
                        </member>
                </struct>
        </value>
        </fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
        i.texts.each { |t|
                puts "Class: #{t.class}"
                puts "Value: #{t.value}"
                puts "String: #{t.to_s}"
        }
}

I'd start with something like this (untested, from memory):

xml.elements.each do |elem|
p elem.node_type
p elem.text
end

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.

Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Kind regards

robert

···

2005/8/28, Michael <invalid@dev.null>:

Francois_Montel · 28 August 2005 19:36

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Maybe that could be added to a new release of rexml (if the dev is
reading this?)

Zach_Dennis1 · 29 August 2005 05:15

What type of information do you want to get out of this? You never posted what you *thought* your sample ruby code would give you. I ran your perl example, and it looks like the xml document but where < > are gsub'd for { }.

Here is an example which shows some xpath usage.:

require 'rexml/document'

file = File.new("test.xml")
root = REXML::Document.new(file).root
fault_arr = root.elements.each( "fault" ) do |e1|
   e1.elements.each( "value/struct/member" ) do |e2|
     e2.elements.each( '*' ) { |e3| print e3.text.strip }
     e2.elements.each( '*/*' ){ |e3| puts " " + e3.text.strip }
   end
end

puts "Message response faulted!" if fault_arr.length > 0

Zach

Michael wrote:

···

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
  <fault>
    <value>
      <struct>
        <member>
          <name>faultCode</name>
            <value>
              <int>5</int>
            </value>
        </member>
        <member>
          <name>faultString</name>
        <value>
          <string>system error (nologin)</string>
        </value>
      </member>
    </struct>
  </value>
  </fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
  i.texts.each { |t|
    puts "Class: #{t.class}"
    puts "Value: #{t.value}"
    puts "String: #{t.to_s}"
  }
}

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):

use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"

my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";

I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.

Responses to the list or the newsgroup, please, for future googlers
to find.

Francois_Montel · 28 August 2005 21:46

PS. There's a great XML plugin for JEdit that will show you the results
of an XPath search on your document. That way you can try out different
variations until you get the result set that you want without having to
run your script each time to test it.

7rans · 28 August 2005 21:51

Written off teh top of my head, but you could write your own.

  class REXML::Element
    def each_element_recurse
      each_element { |e|
        unless e.children.empty? rescue false
          e.each_element_recurse
        end
        yield(e)
    end
  end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! :-p)

T.

Forrest · 29 August 2005 04:41

In article <1125257760.436346.52640@f14g2000cwb.googlegroups.com>,
"zerohalo" <zerohalo@gmail.com> writes:

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Thanks! This is what I'm looking for. I read through the tutorial and
the rdoc documentation but I just couldn't figure out what I was missing.

When I mentioned iterating through the document, what I'm really doing
is describing the process I've been using for deciding how to handle
some arbitrary XML document I wound up with. It's not ideal, I admit. It
might do me some good to read up on XML.
--Michael

Grice · 29 August 2005 04:46

In article <9e3fd2c8050828122755267c9@mail.gmail.com>,
Robert Klemme <shortcutter@googlemail.com> writes:

[...]

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.
Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Thanks for the reply. I did read the tutorial--I just wasn't making the
necessary connections.
--Michael

Francois_Montel · 28 August 2005 21:51

Correction to my last post. It's in the XSLT plugin.

James_Britt4 · 28 August 2005 23:02

Trans wrote:

Written off teh top of my head, but you could write your own.

  class REXML::Element
    def each_element_recurse
      each_element { |e|
        unless e.children.empty? rescue false
          e.each_element_recurse
        end
        yield(e)
    end
  end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! :-p)

If you really think you need to visit every element you may be better off using the stream or pull parsers.

James

···

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

Francois_Montel · 28 August 2005 23:16

James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

James_Britt4 · 29 August 2005 04:44

zerohalo wrote:

James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

I don't think there is much written about the pull parser (though I should see an article of mine on the topic published in a mainstream geek mag in the next few months. I hope.)

Back in 2001 I wrote an article on using the REXML stream parser that may be relevant, and possibly accurate:

http://www.rubyxml.com/articles/REXML/Stream_Parsing_with_REXML

The pull parser sits below all the other REXML parsers, and has a sparser API, but is quite handy for many things.

The basic idea is to pull events from the parser, see what you've got (start_element? end_element? text?), and act on it. You can also push things back onto the parse stream, too, as well as peek down the stream to see what's ahead (while not disrupting the current stream order).

# Simple example:
require 'rexml/parsers/pullparser'

# foo.xml has
# <foo>
# <baz>This is baz</baz>
# <bar>Ignore me!</bar>
# <baz>This is baz, also</baz>
# </foo>

File.open( 'foo.xml', 'r' ) do |f|
   parser = REXML::Parsers::PullParser.new( f )
   while parser.has_next?
     pull_event = parser.pull
     puts( "Element: " + pull_event[0] ) if pull_event.start_element?
     if pull_event.start_element? and pull_event[0] == 'baz'
       while !(pull_event = parser.pull).end_element?
         puts pull_event[0] if pull_event.text?
       end
     end
   end
end

Or something like that.

Topic		Replies	Views
REXML Input File Question ruby-talk	7	106	28 July 2010
REXML parsing issue ruby-talk	3	82	10 November 2007
Need help in parsing REXML::Document ruby-talk	3	110	5 September 2007
Ruby and XML ruby-talk	8	109	5 September 2011
Parameterizing REXML::XPath expressions ruby-talk	0	63	24 June 2007

Help needed with rexml

Related topics