Ruby and XML

thanks for all of the excellent responses. I am going through them all.

I have attached a copy of an XML (with some of the name/value pairs
deleted for brevity.

<?xml version="1.0" encoding="UTF-8"?>
<product-state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
xmlns="urn://mycompany.com/ia/product-state" xsi:type="product-state">
<properties>
<!-- WebSphere properties -->
<property>
<name>core.was.home</name>
<value>/usr/IBM/WebSphere/AppServer1/profiles/AppSrvQA</value>
</property>
<property>
<name>core.was.username</name>
<value>admin</value>
</property>
<property>
<name>core.was.password</name>
<value>password</value>
</property>
<property>
<name>core.application.name</name>
<value>myapp</value>
</property>
<property>
<name>core.application.context.root</name>
<value>myroot</value>
</property>

</properties>
</product-state>

Essentially, I have hundreds of XML files like this one that contain
many many name value pairs. I am concerned that the value is defined
differently (actually I have seen this) in one or more of the XML. I
want to take an xml file and then parse the name/value pairs into a
list. Then I want to check that list against all of the other XML in the
system that have the same name/value pairs.

So I might find that one XML defines

<name>core.application.context.root</name>
<value>myroot</value>

and another XML defines this name/value pair as

<name>core.application.context.root</name>
<value>bigroot</value>

Which would indicate a configuration management error that needs to be
corrected.
(Half the application is looking in the wrong place for the
core.application.context.root)

Ultimately, I want to implement this as part of an application
deployment framework possibly using Puppet or Chef.

Bob Aiello
http://www.linkedin.com/in/BobAiello

test.xml (907 Bytes)

···

_______________________________________________________________

On 9/4/2011 3:12 PM, Bartosz Dziewoński wrote:

  First of all, I'd recommend you a different library. Personalyl I
  found REXML awkward to use, Nokogiri (`gem install nokogiri`) is much
  better. (It also parses HTML.)

  We'll probably need the XML file to be able to help you.

  -- Matma Rex

For your consideration, below is how I would write a script to handle this. It creates a Hash storing names and key/value pairs; when the same key is seen again with a new value, it keeps track of all values seen as an array. The "SourceFile" module associates with each value string the file that it was defined in, so that you can later see where the values were defined. Use a module for this is both tricky

### collider.rb
    require 'nokogiri'

    # Perhaps use Marshal to load this from a file if it exists,
    # and save out the values seen so far at the end of the run.
    $all_values = {}

    # Find your file(s) to analyze however you want here
    files = %w[ my.xml ]

    module SourceFile
      attr_accessor :source_file
    end

    files.each do |file|
      doc = Nokogiri::XML(IO.read(file))
      doc.remove_namespaces!

      # Find every <name> that has a <value> sibling
      doc.xpath('//property/name[following-sibling::value]').each do |name|
        value = name.at_xpath('following-sibling::value').text
        # Record where this value came from
        value.extend(SourceFile); value.source_file = file

        name = name.text
        if $all_values.key?(name)
          old = $all_values[name]
          unless old==value
            warn "#{name} is #{old.inspect} and #{value.inspect}"
            $all_values[name] = [*old,value]
          end
        else
          $all_values[name] = value
        end
      end
    end
    #=> core.app.root is "myroot" and "bigroot"
    #=> core.app.root is ["myroot", "bigroot"] and "sarsaparilla root"

    # Print any keys that point to an array of values...
    $all_values.select{ |key,val| val.is_a?(Array) }.each do |key,values|
      puts "#{key}:"
      puts values.map{ |v| "%20s: '%s'" % [v.source_file,v] }
    end
    #=> core.app.root:
    #=> my.xml: 'myroot'
    #=> my.xml: 'bigroot'
    #=> my.xml: 'sarsaparilla root'

### my.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <product-state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
      xmlns="urn://mycompany.com/ia/product-state" xsi:type="product-state">
      <properties>
        <!-- WebSphere properties -->
        <property>
          <name>core.was.home</name>
          <value>/usr/IBM/WebSphere/AppServer1/profiles/AppSrvQA</value>
        </property>
        <property>
          <name>core.was.username</name>
          <value>admin</value>
        </property>
        <property>
          <name>core.was.password</name>
          <value>password</value>
        </property>
        <property>
          <name>core.application.name</name>
          <value>myapp</value>
        </property>
        <property>
          <name>core.app.root</name>
          <value>myroot</value>
        </property>
        <property>
          <name>core.app.root</name>
          <value>bigroot</value>
        </property>
        <property>
          <name>core.was.password</name>
          <value>password</value>
        </property>
        <property>
          <name>core.app.root</name>
          <value>sarsaparilla root</value>
        </property>
      </properties>
    </product-state>

···

On Sep 5, 2011, at 5:42 PM, Bob Aiello wrote:

Essentially, I have hundreds of XML files like this one that contain
many many name value pairs. I am concerned that the value is defined
differently (actually I have seen this) in one or more of the XML. I
want to take an xml file and then parse the name/value pairs into a
list. Then I want to check that list against all of the other XML in the
system that have the same name/value pairs.

Upon further reflection, I'd make one minor change to my code: instead of requiring that the name key always come first in the XML, the following tweak allows for the possibility of <property><value>…</value><name>…</name></property>:

# Find all name elements that are in properties that also have a <value> element
doc.xpath('//property[value]/name').each do |name|
  # Find the first 'value' child in the property (assumes only one)
  value = name.parent.at('value').text

···

On Sep 5, 2011, at 10:21 PM, Gavin Kistner wrote:

On Sep 5, 2011, at 5:42 PM, Bob Aiello wrote:

Essentially, I have hundreds of XML files like this one that contain
many many name value pairs. I am concerned that the value is defined
differently (actually I have seen this) in one or more of the XML. I
want to take an xml file and then parse the name/value pairs into a
list. Then I want to check that list against all of the other XML in the
system that have the same name/value pairs.

     # Find every <name> that has a <value> sibling
     doc.xpath('//property/name[following-sibling::value]').each do |name|
       value = name.at_xpath('following-sibling::value').text