Comparing xml

a11 · 23 March 2009 19:06

this is a rough start

gist.github.com

https://gist.github.com/ahoward/83721

gistfile1.rb

# comparing xml is always a b-i-a-t-c-h in any testing environment.  here is a
# little snippet for ruby that, i think, it a good first pass at making it
# easier.  comment with your improvements please!
#
#
  require 'rubygems'
  require 'xmlsimple'

  def xml_cmp a, b
    eq_all_but_zero = Object.new.instance_eval do

This file has been truncated. show original

care to improve?

kind regards.

a @ http://codeforpeople.com/

···

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Phlip1 · 23 March 2009 19:22

ara.t.howard wrote:

this is a rough start

83721’s gists · GitHub

care to improve?

Use LibXML-Ruby, Nokogiri, or REXML, read both documents, and convert them to DOMs.

Recursively compare each node, and all its children, to the matching node in the other document, and fault if anythings out of tolerance.

a11 · 23 March 2009 19:27

err - that is precisely what that code is doing?

a @ http://codeforpeople.com/

···

On Mar 23, 2009, at 1:22 PM, Phlip wrote:

ara.t.howard wrote:

this is a rough start
83721’s gists · GitHub
care to improve?

Use LibXML-Ruby, Nokogiri, or REXML, read both documents, and convert them to DOMs.

Recursively compare each node, and all its children, to the matching node in the other document, and fault if anythings out of tolerance.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Phlip1 · 23 March 2009 21:42

ara.t.howard wrote:

Recursively compare each node, and all its children, to the matching node in the other document, and fault if anythings out of tolerance.

err - that is precisely what that code is doing?

I should have explored that. Isn't the code simply printing out both XMLs, with consistent blanks and indenting, and then comparing their strings for pure equality?

If so, would that break over details like attributes out of order?

a11 · 23 March 2009 22:42

I should have explored that. Isn't the code simply printing out both XMLs, with consistent blanks and indenting, and then comparing their strings for pure equality?

it *is* comparing strings, but strings built up inside rexml using the approach you outlined.

If so, would that break over details like attributes out of order?

ah - good catch - i'll check on that. my alternate approach, comparing xmlsimple data structures will not, i believe, suffer from that, but i wanted to avoid a dependancy.

i'll check and report back

cheers.

a @ http://codeforpeople.com/

···

On Mar 23, 2009, at 3:42 PM, Phlip wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Phlip1 · 23 March 2009 23:07

If so, would that break over details like attributes out of order?

ah - good catch - i'll check on that. my alternate approach, comparing xmlsimple data structures will not, i believe, suffer from that, but i wanted to avoid a dependancy.

I caught it because I just recently solved a subset of your problem. assert_xhtml uses Nokogiri to match a subset of HTML within a page. The code is too weird for you to use, but I indeed had to defeat all the issues you will encounter!

a11 · 23 March 2009 23:13

next version is up

83721’s gists · GitHub

a @ http://codeforpeople.com/

···

On Mar 23, 2009, at 5:07 PM, Phlip wrote:

I caught it because I just recently solved a subset of your problem. assert_xhtml uses Nokogiri to match a subset of HTML within a page. The code is too weird for you to use, but I indeed had to defeat all the issues you will encounter!

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Phlip1 · 24 March 2009 00:26

ara.t.howard wrote:

···

On Mar 23, 2009, at 5:07 PM, Phlip wrote:

I caught it because I just recently solved a subset of your problem. assert_xhtml uses Nokogiri to match a subset of HTML within a page. The code is too weird for you to use, but I indeed had to defeat all the issues you will encounter!

next version is up

83721’s gists · GitHub

2kewt. Now you are using XmlSimple.==, so it will walk the object model for you, recursively. It takes care of the attribute order issue, and you then only need to tell XmlSimple to normalize blanks.

What about excess spaces in attributes? And what about class='foo bar' matching class='bar foo'? (Feel free to ignore them!..)

a11 · 24 March 2009 00:31

latest version handles the first and i'm ok with the later. feeling like this is reasonably complete now. crazy none of the ruby xml libs offer a good doc==other.

cheers.

a @ http://codeforpeople.com/

···

On Mar 23, 2009, at 6:26 PM, Phlip wrote:

What about excess spaces in attributes? And what about class='foo bar' matching class='bar foo'? (Feel free to ignore them!..)

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Phlip1 · 24 March 2009 02:26

ara.t.howard wrote:

latest version handles the first and i'm ok with the later. feeling like this is reasonably complete now. crazy none of the ruby xml libs offer a good doc==other.

That is _supposed_ to be XSLT's space. Don't hold your breath. I'm beginning to suspect XSLT just might be a mission-statement without a company for it to guide! (-:

a11 · 24 March 2009 02:49

that is the one with 'bacon' isn't it?

a @ http://codeforpeople.com/

···

On Mar 23, 2009, at 8:26 PM, Phlip wrote:

That is _supposed_ to be XSLT's space. Don't hold your breath. I'm beginning to suspect XSLT just might be a mission-statement without a company for it to guide! (-:

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Topic		Replies	Views
Comparing XML documents ruby-talk	4	92	21 November 2010
XML file comparison ruby-talk	2	108	26 February 2009
Compare 2 XML files with libxml-ruby ruby-talk	1	127	19 June 2010
Anybody have any suggestions for DIFF? ruby-talk	0	143	7 February 2014
Ruby and XML ruby-talk	8	108	5 September 2011

Comparing xml

Related topics