Parsing XML into a complete domain object

Brian_Cowdery · 19 May 2006 21:06

Recently at work we've decided to attempt to build a basic XML driven
automation framework to work with Watir (a web development testing
library for ruby).

I cant figure out how to loop through each level of the REXML document
to extract the data needed to build the complete object.

It seems that if i try to use any iterators on a root.elements[] object
it converts it to text so i can't nest another iterator or loop to
access the innards.

my only recourse has been to resort to a ton of nested while loops which
is ugly when compared to most other ruby loops.

eg.
i = 1
while root.elements['cases'].elements[i] != nil

n = 1
while root.elements['cases'].elements['test-case'] != nil

#more loops here. continue down the chain until i can build the
#object from the inside out.

i = i+i
end

my xml looks like this
<script>
<project-name></project-name>
<start-url></start-url>

<test-case id="1">
  <test-step id="1">
   <test>
    <interaction>double click</interaction>
    <element>
     <name>button 1</type>
     <type>button</type>
    <element>
   </test>
   <check>
    <element>
     <name>Page title</type>
     <type>text</type>
    <element>
   </check>
  </test-step>

  <test-step id="2">
    ...
  </test-step>
</test-case>

<test-case id="2">
...
</test-case>
</script>

somehow i have to get THAT modeled into an object
script object contains test-cases, test-cases contain test-steps etc....

Any thoughts? (sorry for the long post... its kinda hard to explain
without showing EVERYTHING.

···

--
Posted via http://www.ruby-forum.com/.

Matthew_Desmarais · 19 May 2006 23:40

Brian Cowdery wrote:

Recently at work we've decided to attempt to build a basic XML driven automation framework to work with Watir (a web development testing library for ruby).

I cant figure out how to loop through each level of the REXML document to extract the data needed to build the complete object.

It seems that if i try to use any iterators on a root.elements object it converts it to text so i can't nest another iterator or loop to access the innards.

my only recourse has been to resort to a ton of nested while loops which is ugly when compared to most other ruby loops.

eg.
i = 1
while root.elements['cases'].elements[i] != nil

n = 1
while root.elements['cases'].elements['test-case'] != nil

   #more loops here. continue down the chain until i can build the
   #object from the inside out.

i = i+i
end

i = i+i
end

my xml looks like this
<script>
<project-name></project-name>
<start-url></start-url>

<test-case id="1">
  <test-step id="1">
   <test>
    <interaction>double click</interaction>
    <element>
     <name>button 1</type>
     <type>button</type>
    <element>
   </test>
   <check>
    <element>
     <name>Page title</type>
     <type>text</type>
    <element>
   </check>
  </test-step>

  <test-step id="2">
    ...
  </test-step>
</test-case>

<test-case id="2">
  ...
</test-case>
</script>

somehow i have to get THAT modeled into an object
script object contains test-cases, test-cases contain test-steps etc....

Any thoughts? (sorry for the long post... its kinda hard to explain without showing EVERYTHING.

Hi Brian,

Have you given any thought to using YAML instead of XML?

If you're comfortable with a data format that's a little less self-descriptive than XML, you may find that YAML's ease of use could work for you. It's pretty nice to load up your YAML and have all of your Ruby objects pieced together for you. I can send you a small example if you'd like.

Regards,
Matthew

Rob_Burrowes · 20 May 2006 09:04

If you just want to walk the tree from any entry point, through all the sub-levels, you can use the standard each_recurse method.

    #Recurse end to end, printing the tags
    @doc.elements.each("definitions/src") do |element|
      print "<", element.name.to_s, ">"
      element.each_recursive do |childElement|
        print "<", childElement.name.to_s, ">"
      end
    end

If you just want the next level of children, but no deeper, I'm not sure what you call. I did this when I played with REXML, and the obvious each_child doesn't give you an REXML::Element. It gives a REXML::Text element at the first iteration, then the next REXML::Element, then another REXML::Text object, etc. Not quite what you want. But adding this to your code will work.

module REXML
  # Visit all children of this node, but don't recurse to their children
  def each_child_element(&block)
    self.elements.each {|node|
      block.call(node)
    }
  end
end

It probably exists in some form in the REXML module, but I can't find it, so I recreated it (by a little hacking of the modules each_recurse).

You can then
    #printing the tags of the immediate children.
    @doc = Document.new(File.new(format_file))
    @doc.elements.each("definitions") do |element|
      element.each_child_element do |childElement|
        print "<", childElement.name.to_s, ">"
      end
    end

or recursively walk the tree by calling each_child_element for each returned childElement (as with the first example)

  def recurse(the_element)
      the_element.each_child_element do |childElement|
        print "<", childElement.name.to_s, ">"
        recurse(childElement)
      end
  end

  @doc.elements.each("definitions/src") do |element|
    recurse(element)
  end

···

On 20/05/2006, at 9:06 AM, Brian Cowdery wrote:

Recently at work we've decided to attempt to build a basic XML driven
automation framework to work with Watir (a web development testing
library for ruby).

I cant figure out how to loop through each level of the REXML document
to extract the data needed to build the complete object.

It seems that if i try to use any iterators on a root.elements object
it converts it to text so i can't nest another iterator or loop to
access the innards.

my only recourse has been to resort to a ton of nested while loops which
is ugly when compared to most other ruby loops.

eg.
i = 1
while root.elements['cases'].elements[i] != nil

n = 1
while root.elements['cases'].elements['test-case'] != nil

   #more loops here. continue down the chain until i can build the
   #object from the inside out.

i = i+i
end

i = i+i
end

my xml looks like this
<script>
<project-name></project-name>
<start-url></start-url>

<test-case id="1">
  <test-step id="1">
   <test>
    <interaction>double click</interaction>
    <element>
     <name>button 1</type>
     <type>button</type>
    <element>
   </test>
   <check>
    <element>
     <name>Page title</type>
     <type>text</type>
    <element>
   </check>
  </test-step>

  <test-step id="2">
    ...
  </test-step>
</test-case>

<test-case id="2">
  ...
</test-case>
</script>

somehow i have to get THAT modeled into an object
script object contains test-cases, test-cases contain test-steps etc....

Any thoughts? (sorry for the long post... its kinda hard to explain
without showing EVERYTHING.

--
Posted via http://www.ruby-forum.com/\.

Ross_Bamford4 · 20 May 2006 11:35

Looks like an ideal DigestR[1] opportunity, if you're able to get
Libxml-ruby installed too(*):

#!/usr/local/bin/ruby
require 'xml/digestr'
require 'pp'

class Script
attr_accessor :name, :starturl, :testcases
def initialize; @testcases = ; end
end

class TestCase
attr_accessor :id, :steps
def initialize; @steps = ; end
end

class TestStep
attr_accessor :id, :tests, :checks
def initialize; @tests, @checks = , ; end
end

class Check
attr_accessor :elements
def initialize; @elements = ; end
end

class Test < Check
attr_accessor :interaction
end

class Element
attr_accessor :name, :type
end

d = XML::Digester.new(true)
d.add_object_create('/script', Script)

d.add_call_method('/script/project-name', :name=)
d.add_call_param('/script/project-name')
d.add_call_method('/script/start-url', :starturl=)
d.add_call_param('/script/start-url')

d.add_object_create('/script/test-case', TestCase)
d.add_set_properties('/script/test-case')
d.add_link('/script/test-case') { |sc,tc| sc.testcases << tc }

d.add_object_create('/script/test-case/test-step', TestStep)
d.add_set_properties('/script/test-case/test-step')
d.add_link('/script/test-case/test-step') { |tc,ts| tc.steps << ts }

d.add_object_create('/script/test-case/test-step/test', Test)
d.add_link('/script/test-case/test-step/test') { |ts, t| ts.tests << t }
d.add_call_method('/script/test-case/test-step/test/interaction', :interaction=)
d.add_call_param('/script/test-case/test-step/test/interaction')

d.add_object_create('/script/test-case/test-step/check', Check)
d.add_link('/script/test-case/test-step/test') { |ts, t| ts.checks << t }

d.add_object_create('*/element', Element)
d.add_link('*/element') { |p, ele| p.elements << ele }

d.add_call_method('*/element/name', :name=)
d.add_call_param('*/element/name')
d.add_call_method('*/element/type', :type=)
d.add_call_param('*/element/type')

script = d.parse_file('watir.xml')

pp script
__END__

This outputs (with the data you posted, with some mismatched close tags
fixed up):

#<Script:0xb7e8d64c
@name="My Project",
@starturl="http://localhost:3000/",
@testcases=
  [#<TestCase:0xb7edcaec
    @id="1",
    @steps=
     [#<TestStep:0xb7edae90
       @checks=
        [#<Test:0xb7ed9978
          @elements=[#<Element:0xb7ed7dbc @name="button 1", @type="button">],
          @interaction="double click">],
       @id="1",
       @tests=
        [#<Test:0xb7ed9978
          @elements=[#<Element:0xb7ed7dbc @name="button 1", @type="button">],
          @interaction="double click">]>,
      #<TestStep:0xb7e63ff0 @checks=, @id="2", @tests=>]>,
   #<TestCase:0xb7e63a14 @id="2", @steps=>]>

Which I think is what you're after?

[1]: http://digestr.rubyforge.org/
(*): If you can't/won't install native extensions, DigestR's API is
intended to be mostly compatible with an older, REXML-based (IIRC)
digester at http://rubyforge.org/projects/xmldigester

···

On Sat, 2006-05-20 at 06:06 +0900, Brian Cowdery wrote:

my xml looks like this
<script>
<project-name></project-name>
<start-url></start-url>

<test-case id="1">
  <test-step id="1">
   <test>
    <interaction>double click</interaction>
    <element>
     <name>button 1</type>
     <type>button</type>
    <element>
   </test>
   <check>
    <element>
     <name>Page title</type>
     <type>text</type>
    <element>
   </check>
  </test-step>

  <test-step id="2">
    ...
  </test-step>
</test-case>

<test-case id="2">
  ...
</test-case>
</script>

somehow i have to get THAT modeled into an object
script object contains test-cases, test-cases contain test-steps etc....

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

Henrik_Martensson · 24 May 2006 06:11

You might find a treewalker useful:

module XmlUtil

  class TreeWalker

    def initialize(strategy)
      @strategy = strategy
    end

    def walk(node)
      @strategy.execute_before(node) if @strategy.respond_to?
:execute_before
      if node.instance_of?(REXML::Document)
        walk(node.root)
      elsif node.instance_of?(REXML::Element) then
        node.children.each { |child|
          walk(child)
        }
      end
      @strategy.execute_after(node) if @strategy.respond_to?
:execute_after
    end
  end
end

The treewalker will walk the XML document, calling the execute_before
and execute_after methods of a strategy object.

You also need a strategy object. The strategy object looks something
like this:

class MyStrategy

  def execute_before(node)
    # Process start tags
    case node
    when REXML::Document :
      # Do nothing with Document nodes.
      # Necessary because Document inherits Element
    when REXML::Element :
      # Do something with the element
    end
  end

  def execute_after(node)
    # Process end tags
  end
end

If the treewalker does not suit your needs, a node iterator (Java Xerces
style) might do the trick. Let me know if you need one. I've got working
code, but the implementation could be more elegant. (One of my first
Ruby classes.)

/Henrik

···

On Fri, 2006-05-19 at 23:06, Brian Cowdery wrote:

Recently at work we've decided to attempt to build a basic XML driven
automation framework to work with Watir (a web development testing
library for ruby).

I cant figure out how to loop through each level of the REXML document
to extract the data needed to build the complete object.

--
http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://tocsim.rubyforge.com/ - Process simulation
http://testunitxml.rubyforge.org/ - XML test framework
http://declan.rubyforge.org/ - Declarative XML processing

James_Britt4 · 20 May 2006 03:14

Matthew Desmarais wrote:

Brian Cowdery wrote:

Recently at work we've decided to attempt to build a basic XML driven automation framework to work with Watir (a web development testing library for ruby).

I cant figure out how to loop through each level of the REXML document to extract the data needed to build the complete object.

Have you looked at REXML's pull parser?

...

... Have you given any thought to using YAML instead of XML?

Why not not just use Ruby to describe the data?

···

--
James Britt

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.30secondrule.com - Building Better Tools

Ross_Bamford4 · 20 May 2006 12:56

Oops, small bugfix:

d.add_call_method('/script/test-case/test-step/test/interaction', :interaction=)
d.add_call_param('/script/test-case/test-step/test/interaction')

d.add_object_create('/script/test-case/test-step/check', Check)

- d.add_link('/script/test-case/test-step/test') { |ts, t| ts.checks << t }
+ d.add_link('/script/test-case/test-step/check') { |ts, t| ts.checks << t }

···

On Sat, 2006-05-20 at 20:35 +0900, I wrote:

d.add_object_create('*/element', Element)
d.add_link('*/element') { |p, ele| p.elements << ele }

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

Matthew_Desmarais · 20 May 2006 04:41

YAML buys you a small amount of language independence. I've chosen YAML
before because I like how well it plays with Ruby. I've been _able_ to
choose YAML because of how well it plays with other languages.

···

On Sat, 2006-05-20 at 12:14 +0900, James Britt wrote:

Matthew Desmarais wrote:
> Have you given any thought to using YAML instead of XML?
>
Why not not just use Ruby to describe the data?

James_Britt4 · 20 May 2006 05:19

Matthew Desmarais wrote:

Matthew Desmarais wrote:

Have you given any thought to using YAML instead of XML?

Why not not just use Ruby to describe the data?

YAML buys you a small amount of language independence. I've chosen YAML
before because I like how well it plays with Ruby. I've been _able_ to
choose YAML because of how well it plays with other languages.

Perhaps, though more and more I run into YAML files with custom object-specific serializations (e.g. the YAML files used in Ruby's ri system); XML tends to do better on that count, with far less coupling of data and types.

Still, if one is using WATIR, then I suspect that cross-language configuration is not an concern. (And if becomes a requirement, then the Ruby used to defined the tests can be exported as XML or YAML or whatever works best.)

···

On Sat, 2006-05-20 at 12:14 +0900, James Britt wrote:

--
James Britt

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.30secondrule.com - Building Better Tools

James_Edward_Gray_II · 21 May 2006 00:26

Right, and if it's something I need to hand edit, I find my brain can remember XML syntax easier than YAML's myriad of choices.

James Edward Gray II

···

On May 20, 2006, at 12:19 AM, James Britt wrote:

Matthew Desmarais wrote:

On Sat, 2006-05-20 at 12:14 +0900, James Britt wrote:

Matthew Desmarais wrote:

Have you given any thought to using YAML instead of XML?

Why not not just use Ruby to describe the data?

YAML buys you a small amount of language independence. I've chosen YAML
before because I like how well it plays with Ruby. I've been _able_ to
choose YAML because of how well it plays with other languages.

Perhaps, though more and more I run into YAML files with custom object-specific serializations (e.g. the YAML files used in Ruby's ri system); XML tends to do better on that count, with far less coupling of data and types.

Topic		Replies	Views
Rexml & nested loops ruby-talk	4	71	9 June 2008
Rexml ruby-talk	1	83	30 October 2002
REXML screen scraping questions ruby-talk	4	68	15 September 2005
Ruby classes and objects from XML ruby-talk	4	115	3 January 2007
Unit Testing with REXML questions ruby-talk	11	110	19 October 2005

Parsing XML into a complete domain object

Related topics