I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.
This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.
For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):
use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"
my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";
I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.
Responses to the list or the newsgroup, please, for future googlers
to find.
I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.
I'd start with something like this (untested, from memory):
xml.elements.each do |elem|
p elem.node_type
p elem.text
end
This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.
What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.
Michael, I came across the same problem recently when using ruby/rexml
for the first time.
The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.
If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)
There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).
Maybe that could be added to a new release of rexml (if the dev is
reading this?)
What type of information do you want to get out of this? You never posted what you *thought* your sample ruby code would give you. I ran your perl example, and it looks like the xml document but where < > are gsub'd for { }.
Here is an example which shows some xpath usage.:
require 'rexml/document'
file = File.new("test.xml")
root = REXML::Document.new(file).root
fault_arr = root.elements.each( "fault" ) do |e1|
e1.elements.each( "value/struct/member" ) do |e2|
e2.elements.each( '*' ) { |e3| print e3.text.strip }
e2.elements.each( '*/*' ){ |e3| puts " " + e3.text.strip }
end
end
puts "Message response faulted!" if fault_arr.length > 0
Zach
Michael wrote:
···
I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.
This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.
For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):
use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"
my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";
I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.
Responses to the list or the newsgroup, please, for future googlers
to find.
PS. There's a great XML plugin for JEdit that will show you the results
of an XPath search on your document. That way you can try out different
variations until you get the result set that you want without having to
run your script each time to test it.
In article <1125257760.436346.52640@f14g2000cwb.googlegroups.com>,
"zerohalo" <zerohalo@gmail.com> writes:
Michael, I came across the same problem recently when using ruby/rexml
for the first time.
The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.
If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)
There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).
Thanks! This is what I'm looking for. I read through the tutorial and
the rdoc documentation but I just couldn't figure out what I was missing.
When I mentioned iterating through the document, what I'm really doing
is describing the process I've been using for deciding how to handle
some arbitrary XML document I wound up with. It's not ideal, I admit. It
might do me some good to read up on XML.
--Michael
James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!
I don't think there is much written about the pull parser (though I should see an article of mine on the topic published in a mainstream geek mag in the next few months. I hope.)
Back in 2001 I wrote an article on using the REXML stream parser that may be relevant, and possibly accurate:
The pull parser sits below all the other REXML parsers, and has a sparser API, but is quite handy for many things.
The basic idea is to pull events from the parser, see what you've got (start_element? end_element? text?), and act on it. You can also push things back onto the parse stream, too, as well as peek down the stream to see what's ahead (while not disrupting the current stream order).
# foo.xml has
# <foo>
# <baz>This is baz</baz>
# <bar>Ignore me!</bar>
# <baz>This is baz, also</baz>
# </foo>
File.open( 'foo.xml', 'r' ) do |f|
parser = REXML::Parsers::PullParser.new( f )
while parser.has_next?
pull_event = parser.pull
puts( "Element: " + pull_event[0] ) if pull_event.start_element?
if pull_event.start_element? and pull_event[0] == 'baz'
while !(pull_event = parser.pull).end_element?
puts pull_event[0] if pull_event.text?
end
end
end
end
By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?
By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?
By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?
No, the site is missing some obvious UI clues for friendlier usage.