Ruby and XML

Hi Everyone,

I am new to Ruby and trying to use it to parse XML files so that I can verify that name/value pairs in two (or more) XML files are defined consistently

I have been trying to use the rexml API
http://www.germane-software.com/software/rexml/

The website has been down a lot lately and I haven't found too many other sources for information

Here is a snippet of my code

require "rexml/document"
include REXML

file1 = File.new( "test.xml" )
doc1 = REXML::Document.new file1
names1 = XPath.each(doc1, "//name") { |e|}
values1 = XPath.each(doc1, "//value") { |e|}
now I can parse names1 and values1 and then do the same for the second XML file.
This approach is not great because it does ensure that the name/value pairs are siblings.

I am wondering if there is a better way to do this.

Any help would be great appreciated. BTW - I would welcome articles on using Ruby for configuration, release and deployment management on CM Crossroads (www.cmcrossroads.com)

Bob Aiello
http://www.linkedin.com/in/BobAiello
bob.aiello@ieee.org

Rexml is in the standard library, but is slow and awkward. Try
nokogiri instead: http://nokogiri.org/

First of all, I'd recommend you a different library. Personalyl I
found REXML awkward to use, Nokogiri (`gem install nokogiri`) is much
better. (It also parses HTML.)

We'll probably need the XML file to be able to help you.

-- Matma Rex

Welcome to Ruby. I concur with the other post that Nokogiri is even better than REXML.

I don't fully understand your goal or the problem you mentioned about siblings. Can you post a small sample XML and the output you'd like to get from it?

···

--
(-, /\ \/ / /\/

On Sep 4, 2011, at 1:02 PM, Bob Aiello <raiello@acm.org> wrote:

Hi Everyone,

I am new to Ruby and trying to use it to parse XML files so that I can verify that name/value pairs in two (or more) XML files are defined consistently

I have been trying to use the rexml API
http://www.germane-software.com/software/rexml/

The website has been down a lot lately and I haven't found too many other sources for information

Here is a snippet of my code

require "rexml/document"
include REXML

file1 = File.new( "test.xml" )
doc1 = REXML::Document.new file1
names1 = XPath.each(doc1, "//name") { |e|}
values1 = XPath.each(doc1, "//value") { |e|}
now I can parse names1 and values1 and then do the same for the second XML file.
This approach is not great because it does ensure that the name/value pairs are siblings.

I am wondering if there is a better way to do this.

Any help would be great appreciated. BTW - I would welcome articles on using Ruby for configuration, release and deployment management on CM Crossroads (www.cmcrossroads.com)

Bob Aiello
http://www.linkedin.com/in/BobAiello
bob.aiello@ieee.org

Here is a snippet of my code

require "rexml/document"
include REXML

file1 = File.new( "test.xml" )
doc1 = REXML::Document.new file1
names1 = XPath.each(doc1, "//name") { |e|}
values1 = XPath.each(doc1, "//value") { |e|}
now I can parse names1 and values1 and then do the same for the second
XML file.
This approach is not great because it does ensure that the name/value
pairs are siblings.

Can you provide this test.xml file or a sample file like it, and what the expected output is?
PS: Some docs for ReXML, which is now part of the Ruby standard library:http://furious-waterfall-55.heroku.com/yard_stdlib/REXML.html
Regards,Chris WhiteTwitter: http://www.twitter.com/cwgem

<?xml version="1.0"?>
<note>

  <name>Tove</name>
  <value>Tove's value is: 10</value>
  <garbage>xxxx</garbage>

  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>

  <parent>
    <name>Jani</name>
    <value>Jani's value is: 20</value>
    <garbage>xxxx</garbage>
  </parent>

  <parent>
    <name>No sibling</name>
  </parent>
  <value>No sibling's value is: 30</value>

</note>

require 'rexml/document'

f = File.new("xml.xml")
doc = REXML::Document.new(f)

REXML::XPath.each(doc, "//name") do |element|
  puts "name: #{element.text}"
  if sibling = element.next_element
    puts sibling.text
  else
    puts "No <value> tag that is a sibling"
  end

  puts "-" * 20
end

--output:--
name: Tove
Tove's value is: 10

···

--------------------
name: Jani
Jani's value is: 20
--------------------
name: No sibling
No <value> tag that is a sibling
--------------------

--
Posted via http://www.ruby-forum.com/.

For comparison, here's one way to write code with the similar functionality using Nokogiri, but ensuring that the <value> element is a sibling of the <name> element. Unlike the above code, the following allows any number of elements between the <name> and the <value>:
    require 'nokogiri'

    doc = Nokogiri::XML(IO.read("my.xml"))
    doc.search('name').each do |name|
      puts "name: #{name.text}"
      if value=name.at_xpath('following-sibling::value')
        puts value.text
      else
         puts "No <value> tag that is a sibling"
       end
       puts "-" * 20
    end

Output:
    name: Tove
    Tove's value is: 10

···

On Sep 5, 2011, at 1:28 AM, 7stud -- wrote:

require 'rexml/document'

f = File.new("xml.xml")
doc = REXML::Document.new(f)

REXML::XPath.each(doc, "//name") do |element|
puts "name: #{element.text}"
if sibling = element.next_element
   puts sibling.text
else
   puts "No <value> tag that is a sibling"
end

puts "-" * 20
end

    --------------------
    name: Jani
    Jani's value is: 20
    --------------------
    name: No sibling
    No <value> tag that is a sibling
    --------------------

Again using this XML:
    <?xml version="1.0"?>
    <note>
      <name>Tove</name>
      <value>Tove's value is: 10</value>
      <garbage>xxxx</garbage>
  
      <to>Tove</to>
      <from>Jani</from>
      <heading>Reminder</heading>
      <body>Don't forget me this weekend!</body>

      <parent>
        <name>Jani</name>
        <value>Jani's value is: 20</value>
        <garbage>xxxx</garbage>
      </parent>

      <parent>
         <name>No sibling</name>
      </parent>
      <value>No sibling's value is: 30</value>
    </note>

Gavin Kistner wrote in post #1020210:

   puts "No <value> tag that is a sibling"
end

puts "-" * 20
end

For comparison, here's one way to write code with the similar
functionality using Nokogiri, but ensuring that the <value> element is a
sibling of the <name> element. Unlike the above code, the following
allows any number of elements between the <name> and the <value>:
    require 'nokogiri'

    doc = Nokogiri::XML(IO.read("my.xml"))
    doc.search('name').each do |name|
      puts "name: #{name.text}"
      if value=name.at_xpath('following-sibling::value')
        puts value.text
      else
         puts "No <value> tag that is a sibling"
       end
       puts "-" * 20
    end

You can even use an XPath expression to find names which do not have
proper values:

//name[not(following-sibling::value)]
//name[following-sibling::*[1][name()!="value"]]
//name[count(following-sibling::value)!=1]
//name[following-sibling::*[1][name()!="value"]]|//name[count(following-sibling::value)!=1]

attached and also here: Solutions for ruby-talk 387408 · GitHub

Kind regards

robert

Attachments:
http://www.ruby-forum.com/attachment/6578/nv.rb

···

On Sep 5, 2011, at 1:28 AM, 7stud -- wrote:

--
Posted via http://www.ruby-forum.com/\.

Gavin Kistner wrote in post #1020210:

For comparison, here's one way to write code with the similar
functionality using Nokogiri, but ensuring that the <value> element is a
sibling of the <name> element. Unlike the above code, the following
allows any number of elements between the <name> and the <value>:
    require 'nokogiri'

    doc = Nokogiri::XML(IO.read("my.xml"))
    doc.search('name').each do |name|
      puts "name: #{name.text}"
      if value=name.at_xpath('following-sibling::value')
        puts value.text
      else
         puts "No <value> tag that is a sibling"
       end
       puts "-" * 20
    end

I couldn't figure out the syntax for xpath's following-sibling, but now
that I see it in your code, here it is in REXML:

require 'rexml/document'

f = File.new("xml.xml")
doc = REXML::Document.new(f)

REXML::XPath.each(doc, "//name") do |element|
  puts "name: #{element.text}"

  if sibling = REXML::XPath.match(element,
"following-sibling::value").first
    puts sibling.text
  else
    puts "Can't find a <value> tag that is a sibling"
  end

  puts "-" * 20
end

And here is some trickier xml that exercises 'following-sibling::value':

<?xml version="1.0"?>
<note>

  <name>Tove</name>
  <garbage>xxxx</garbage>
  <value>Tove's value is: 10</value>

  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>

  <parent>
    <name>Jani</name>
    <garbage>xxxx</garbage>
    <garbage>xxxx</garbage>
    <value>Jani's value is: 20</value>
    <value>1200</value>
  </parent>

  <parent>
    <name>Diane</name>
  </parent>
  <value>Diane's value is: 30</value>

</note>

--output:--
name: Tove
Tove's value is: 10

···

--------------------
name: Jani
Jani's value is: 20
--------------------
name: Diane
Can't find a <value> tag that is a sibling
--------------------

--
Posted via http://www.ruby-forum.com/\.