Hpricot test for equivalence of two xml segments?

I'm looking through what documentation I can find for Hpricot (nokogirl wouldn't install for me, and I just wand a quick an simple solution), and I cannot find a simple method to take two xml strings and find out if they are equivalent. I'm getting a bunch of xhmtl back from our rendering agent with random permutations of attributes inside of the tags, and I want a quick and easy ruby way to find out of segments are equivalent without writing my own regex based parser...???

It seems like there should be a simple method for this. If I had written Hpricot, equivalence of segments would have been the first method I would have written...???

xc

···

--
"It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC member

I can think of a few definitions for equivalence. One definition would
simply require unifying the case of both strings and checking if they are
the same. A second definition would require building a tree of the structure
in each string, including attributes, sorting it, and looping over them to
check if they contain the same elements (Nokogiri's XML::NodeSet does
something like this with ==). A third definition would build on the second
one, while treating certain tags as equivalent to other tags (for example q
is equivalent to blockquote).

What's *your* definition of equivalence for two xml documents or fragments?

Ammar

···

On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail < xeno.campanoli@gmail.com> wrote:

I'm looking through what documentation I can find for Hpricot (nokogirl
wouldn't install for me, and I just wand a quick an simple solution), and I
cannot find a simple method to take two xml strings and find out if they are
equivalent. I'm getting a bunch of xhmtl back from our rendering agent with
random permutations of attributes inside of the tags, and I want a quick and
easy ruby way to find out of segments are equivalent without writing my own
regex based parser...???

I'm looking through what documentation I can find for Hpricot (nokogirl
wouldn't install for me, and I just wand a quick an simple solution), and I
cannot find a simple method to take two xml strings and find out if they are
equivalent. I'm getting a bunch of xhmtl back from our rendering agent with
random permutations of attributes inside of the tags, and I want a quick and
easy ruby way to find out of segments are equivalent without writing my own
regex based parser...???

I can think of a few definitions for equivalence. One definition would
simply require unifying the case of both strings and checking if they are
the same. A second definition would require building a tree of the structure
in each string, including attributes, sorting it, and looping over them to
check if they contain the same elements (Nokogiri's XML::NodeSet does
something like this with ==). A third definition would build on the second
one, while treating certain tags as equivalent to other tags (for example q
is equivalent to blockquote).

What's *your* definition of equivalence for two xml documents or fragments?

Ammar

The only thing I am concerned about is permutations of attributes inside the tags. Everything else I'm seeing is regular. Is there something where I can parse all the tags in a segment and tell if they are equivalent and just have the attributes in different orders? I'm not even concerned about different tag forms. We don't see that. A typical example is:

< <li><img src="/my/image/path/thingy.jpg" alt="alt text" />My Text</li>
> <li><img alt="alt text" src="/my/image/path/thingy.jpg" />My Text</li>

I need to have something that can help me judge such things as equivalent. Again, I NEVER see tag permutations, but just attribute permutations.

Thank you for you response.

Sincerely, Xeno

···

On 10-07-16 03:16 PM, Ammar Ali wrote:

On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail< > xeno.campanoli@gmail.com> wrote:

--
"It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC member

I'm looking through what documentation I can find for Hpricot (nokogirl

wouldn't install for me, and I just wand a quick an simple solution), and
I
cannot find a simple method to take two xml strings and find out if they
are
equivalent. I'm getting a bunch of xhmtl back from our rendering agent
with
random permutations of attributes inside of the tags, and I want a quick
and
easy ruby way to find out of segments are equivalent without writing my
own
regex based parser...???

I can think of a few definitions for equivalence. One definition would
simply require unifying the case of both strings and checking if they are
the same. A second definition would require building a tree of the
structure
in each string, including attributes, sorting it, and looping over them to
check if they contain the same elements (Nokogiri's XML::NodeSet does
something like this with ==). A third definition would build on the second
one, while treating certain tags as equivalent to other tags (for example
q
is equivalent to blockquote).

What's *your* definition of equivalence for two xml documents or
fragments?

Ammar

The only thing I am concerned about is permutations of attributes inside
the tags. Everything else I'm seeing is regular. Is there something where
I can parse all the tags in a segment and tell if they are equivalent and
just have the attributes in different orders? I'm not even concerned about
different tag forms. We don't see that. A typical example is:

< <li><img src="/my/image/path/thingy.jpg" alt="alt text" />My Text</li>
> <li><img alt="alt text" src="/my/image/path/thingy.jpg" />My Text</li>

I need to have something that can help me judge such things as equivalent.
Again, I NEVER see tag permutations, but just attribute permutations.

You should take a look at Lorax:

which is Nokogiri-based.

Your definition of equivalence (the semantically correct one, imho) can be
tested with:

    Lorax::Signature.new(Nokogiri::XML(string1).root).signature ==
Lorax::Signature.new(Nokogiri::XML(string2).root).signature

And note that Nokogiri will also alllow you to parse XML fragments.

HTH,
-m

···

On Fri, Jul 16, 2010 at 6:52 PM, Xeno Campanoli / Eskimo North and Gmail < xeno.campanoli@gmail.com> wrote:

On 10-07-16 03:16 PM, Ammar Ali wrote:

On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail< >> xeno.campanoli@gmail.com> wrote:

Thank you for you response.

Sincerely, Xeno

--
"It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC
member

The only thing I am concerned about is permutations of attributes inside
the tags. Everything else I'm seeing is regular. Is there something where
I can parse all the tags in a segment and tell if they are equivalent and
just have the attributes in different orders? I'm not even concerned about
different tag forms. We don't see that. A typical example is:

< <li><img src="/my/image/path/thingy.jpg" alt="alt text" />My Text</li>

<li><img alt="alt text" src="/my/image/path/thingy.jpg" />My Text</li>

I need to have something that can help me judge such things as equivalent.
Again, I NEVER see tag permutations, but just attribute permutations.

I believe you. Nokogirl wouldn't install though...yes, and nor did Lorax...

Looks like there's an install site, but I hesitate to use something this outside the mainstream on a project like this. I don't want to impose needless maintenance problems on my environment.

···

You should take a look at Lorax:

GitHub - flavorjones/lorax: XML/HTML diff generator, based on Nokogiri.

which is Nokogiri-based.

Your definition of equivalence (the semantically correct one, imho) can be
tested with:

     Lorax::Signature.new(Nokogiri::XML(string1).root).signature ==
Lorax::Signature.new(Nokogiri::XML(string2).root).signature

And note that Nokogiri will also alllow you to parse XML fragments.

HTH,
-m

Thank you for you response.

Sincerely, Xeno

--
"It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC
member

--
"It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC member

>>The only thing I am concerned about is permutations of attributes inside
>>the tags. Everything else I'm seeing is regular. Is there something where
>>I can parse all the tags in a segment and tell if they are equivalent and
>>just have the attributes in different orders? I'm not even concerned about
>>different tag forms. We don't see that. A typical example is:
>>
>>< <li><img src="/my/image/path/thingy.jpg" alt="alt text" />My Text</li>
>>><li><img alt="alt text" src="/my/image/path/thingy.jpg" />My Text</li>
>>
>>I need to have something that can help me judge such things as equivalent.
>>Again, I NEVER see tag permutations, but just attribute permutations.
>>

I believe you. Nokogirl wouldn't install though...yes, and nor did Lorax...

Do you mind emailing our list with the problems? We do our best to make
sure that nokogiri works on most systems, so if you're having trouble
we'd love to hear about it:

  http://groups.google.com/group/nokogiri-talk

Looks like there's an install site, but I hesitate to use something
this outside the mainstream on a project like this. I don't want to
impose needless maintenance problems on my environment.

I'm not sure that nokogiri is outside the mainstream. Take a look at
our gem downlods vs the hpricot gem downloads:

  nokogiri | RubyGems.org | your community gem host
  hpricot | RubyGems.org | your community gem host

Or the frequency of commits:

  http://github.com/tenderlove/nokogiri/commits/master
  Commits · hpricot/hpricot · GitHub

Or the mailing list activity:

  http://groups.google.com/group/nokogiri-talk
  http://librelist.com/browser/hpricot/

But "mainstream" is a judgement is for you to make! :slight_smile:

···

On Sat, Jul 17, 2010 at 09:03:21AM +0900, Xeno Campanoli / Eskimo North and Gmail wrote:

--
Aaron Patterson
http://tenderlovemaking.com/