Comparing hashes based on their keys

Hi list,
     let's say we want to write a method that compares two hashes for equality based on their keys.

Each key is a symbol, and the corrisponding value can either be a string or another hash with this same properties.

Two hashes, let's call them h1 and h2, are to be considered equal if:

* they have the same keys

* h1[:x] and h2[:x] are both hashes, and they are equal according to the very same rules you are reading :wink:

* they are both nil

In short, we only care about the values when they are hashes.
If they are strings, we don't care whether they are equal or not.

To make an example, when fed these two hashes the method should return true:

h1 = {
   :one => "one",
   :two => "two",
   :three => {
     :alpha => "alpha",
     :bravo => "bravo",
     :charlie => "charlie"
   }
}

h2 = {
   :one => "whatever",
   :two => "hey",
   :three => {
     :alpha => "zulu",
     :bravo => "oscar",
     :charlie => "brown"
   }
}

When fed these other two arrays, the method should return false:

h3 = h1

h4 = {
   :one => "one",
   :two => "two",
   :three => {
     :alpha => "alpha",
     :bravo => "bravo",
   }
}

The difference is that Ole :charlie is missing in h2.
The values don't change to make it clear that we don't care about them.

I came up with the following implementation that seems to work (at least according to the specs I wrote), but what I would like to ask is if it could be any simpler/more idiomatic/more elegant.

Corner cases that one might spot are also good to know about.

def compare(hash1, hash2)
   args = [hash1, hash2]

   return true if args.all? {|h| h.nil?}
   return false if args.one? {|h| h.nil?}

   hash1.each_key do |k|
     values = [hash1[k], hash2[k]]

     if values.all? {|h| h.is_a?(Hash)}
       return compare(*values)
     else
       return false if values.one? {|value| value.nil? }
     end
   end

   true
end

Should the code turn out to be unreadable, I posted it here[0] too.

[0] = https://gist.github.com/1084916

Thanks in advance.

···

--
Stefano

Hi list,
let's say we want to write a method that compares two hashes for equality
based on their keys.

Each key is a symbol, and the corrisponding value can either be a string or
another hash with this same properties.

Two hashes, let's call them h1 and h2, are to be considered equal if:

* they have the same keys

* h1[:x] and h2[:x] are both hashes, and they are equal according to the
very same rules you are reading :wink:

* they are both nil

It seems your spec is not very precise because your bullet list mixes
AND and OR. I assume you meant

Two objects o1 and o2 are considered equal for this relation if and only if

( o1 is nil AND o2 is nil ) OR
( o1 is a Hash AND o2 is a Hash AND
  o1.keys == o2.keys AND
  for each key k ( o1[k] is not a Hash OR
                          o2[k] is not a Hash OR
                          o1[k] equals o2[k] according to this relation ) )

In short, we only care about the values when they are hashes.
If they are strings, we don't care whether they are equal or not.

To make an example, when fed these two hashes the method should return true:

h1 = {
:one => "one",
:two => "two",
:three => {
:alpha => "alpha",
:bravo => "bravo",
:charlie => "charlie"
}
}

h2 = {
:one => "whatever",
:two => "hey",
:three => {
:alpha => "zulu",
:bravo => "oscar",
:charlie => "brown"
}
}

When fed these other two arrays, the method should return false:

h3 = h1

h4 = {
:one => "one",
:two => "two",
:three => {
:alpha => "alpha",
:bravo => "bravo",
}
}

The difference is that Ole :charlie is missing in h2.
The values don't change to make it clear that we don't care about them.

I came up with the following implementation that seems to work (at least
according to the specs I wrote), but what I would like to ask is if it could
be any simpler/more idiomatic/more elegant.

Corner cases that one might spot are also good to know about.

def compare(hash1, hash2)

I would rename arguments because these need not be Hashes.

args = [hash1, hash2]

return true if args.all? {|h| h.nil?}
return false if args.one? {|h| h.nil?}

hash1.each_key do |k|
values = [hash1[k], hash2[k]]

if values.all? {|h| h.is_a?(Hash)}
return compare(*values)
else
return false if values.one? {|value| value.nil? }
end
end

true
end

Hm, I'd probably remove #all?, #any? and the like and code conditions
directly - especially since you are always dealing with two items
because it's likely faster. Your code creates a lot of temporary
Arrays along the way.

Btw, I believe your implementation misses the case where hash2's key
set is a superset of hash1's, i.e. contains additional keys.

Kind regards

robert

···

On Fri, Jul 15, 2011 at 5:38 PM, Stefano Mioli <stefano.mioli@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

What would be especially cool is if you could post your specs, then it would
be a lot easier to try and refactor it.

···

On Fri, Jul 15, 2011 at 10:38 AM, Stefano Mioli <stefano.mioli@gmail.com>wrote:

Should the code turn out to be unreadable, I posted it here[0] too.

[0] = https://gist.github.com/**1084916 <https://gist.github.com/1084916&gt;

Looks like you've probably got a bug. You explicitly return in your code,
causing it to stop evaluating after the first hash:

def compare(hash1, hash2)
args = [hash1, hash2]

return true if args.all? {|h| h.nil?}
return false if args.one? {|h| h.nil?}

hash1.each_key do |k|
   values = [hash1[k], hash2[k]]

   if values.all? {|h| h.is_a?(Hash)}
     return compare(*values)
   else
     return false if values.one? {|value| value.nil? }
   end
end

true
end

h1 = {
:one => "one",
:two => { :alpha => "alpha" },
:three => { :alpha => "alpha" },
}

h2 = h1.merge :three => { :bravo => 'bravo' }

h1 # => {:one=>"one", :two=>{:alpha=>"alpha"}, :three=>{:alpha=>"alpha"}}
h2 # => {:one=>"one", :two=>{:alpha=>"alpha"}, :three=>{:bravo=>"bravo"}}
compare h1, h2 # => true
RUBY_VERSION # => "1.9.2"

···

On Fri, Jul 15, 2011 at 10:38 AM, Stefano Mioli <stefano.mioli@gmail.com>wrote:

Hi list,
   let's say we want to write a method that compares two hashes for
equality based on their keys.

Each key is a symbol, and the corrisponding value can either be a string or
another hash with this same properties.

Two hashes, let's call them h1 and h2, are to be considered equal if:

* they have the same keys

* h1[:x] and h2[:x] are both hashes, and they are equal according to the
very same rules you are reading :wink:

* they are both nil

In short, we only care about the values when they are hashes.
If they are strings, we don't care whether they are equal or not.

To make an example, when fed these two hashes the method should return
true:

h1 = {
:one => "one",
:two => "two",
:three => {
   :alpha => "alpha",
   :bravo => "bravo",
   :charlie => "charlie"
}
}

h2 = {
:one => "whatever",
:two => "hey",
:three => {
   :alpha => "zulu",
   :bravo => "oscar",
   :charlie => "brown"
}
}

When fed these other two arrays, the method should return false:

h3 = h1

h4 = {
:one => "one",
:two => "two",
:three => {
   :alpha => "alpha",
   :bravo => "bravo",
}
}

The difference is that Ole :charlie is missing in h2.
The values don't change to make it clear that we don't care about them.

I came up with the following implementation that seems to work (at least
according to the specs I wrote), but what I would like to ask is if it could
be any simpler/more idiomatic/more elegant.

Corner cases that one might spot are also good to know about.

def compare(hash1, hash2)
args = [hash1, hash2]

return true if args.all? {|h| h.nil?}
return false if args.one? {|h| h.nil?}

hash1.each_key do |k|
   values = [hash1[k], hash2[k]]

   if values.all? {|h| h.is_a?(Hash)}
     return compare(*values)
   else
     return false if values.one? {|value| value.nil? }
   end
end

true
end

Should the code turn out to be unreadable, I posted it here[0] too.

[0] = https://gist.github.com/**1084916 <https://gist.github.com/1084916&gt;

Thanks in advance.

--
Stefano

Hi list,
   let's say we want to write a method that compares two hashes for equality
based on their keys.

Each key is a symbol, and the corrisponding value can either be a string or
another hash with this same properties.

Two hashes, let's call them h1 and h2, are to be considered equal if:

* they have the same keys

* h1[:x] and h2[:x] are both hashes, and they are equal according to the
very same rules you are reading :wink:

* they are both nil

It seems your spec is not very precise because your bullet list mixes
AND and OR. I assume you meant

Two objects o1 and o2 are considered equal for this relation if and only if

( o1 is nil AND o2 is nil ) OR
( o1 is a Hash AND o2 is a Hash AND
o1.keys == o2.keys AND
for each key k ( o1[k] is not a Hash OR
                         o2[k] is not a Hash OR
                         o1[k] equals o2[k] according to this relation ) )

I don't see why you couldn't just add your own method to Hash. (Tested on ruby-1.9.2)

class Hash
   def same_key_structure?(other)
     return false unless other.is_a?(Hash)
     return false unless self.keys == other.keys && self.keys.all?{|_|_.is_a?(Symbol)}
     self.keys.all? {|_|self[_].is_a?(Hash) ? self[_].same_key_structure?(other[_]) : !other[_].is_a?(Hash)}
   end
end

If you really cared about the nil case, you could do:

class NilClass
   def same_key_structure?(other)
     other.nil?
   end
end

But that's probably a stretch and only *you* can decided if it's right for the situation.

-Rob

P.S. I'm assuming that you don't consider supersets of keys to be "equal" and that you really do care that all the keys are symbols.

In short, we only care about the values when they are hashes.
If they are strings, we don't care whether they are equal or not.

To make an example, when fed these two hashes the method should return true:

h1 = {
:one => "one",
:two => "two",
:three => {
   :alpha => "alpha",
   :bravo => "bravo",
   :charlie => "charlie"
}
}

h2 = {
:one => "whatever",
:two => "hey",
:three => {
   :alpha => "zulu",
   :bravo => "oscar",
   :charlie => "brown"
}
}

When fed these other two arrays, the method should return false:

h3 = h1

h4 = {
:one => "one",
:two => "two",
:three => {
   :alpha => "alpha",
   :bravo => "bravo",
}
}

The difference is that Ole :charlie is missing in h2.
The values don't change to make it clear that we don't care about them.

I came up with the following implementation that seems to work (at least
according to the specs I wrote), but what I would like to ask is if it could
be any simpler/more idiomatic/more elegant.

Corner cases that one might spot are also good to know about.

def compare(hash1, hash2)

I would rename arguments because these need not be Hashes.

args = [hash1, hash2]

return true if args.all? {|h| h.nil?}
return false if args.one? {|h| h.nil?}

hash1.each_key do |k|
   values = [hash1[k], hash2[k]]

   if values.all? {|h| h.is_a?(Hash)}
     return compare(*values)
   else
     return false if values.one? {|value| value.nil? }
   end
end

true
end

Hm, I'd probably remove #all?, #any? and the like and code conditions
directly - especially since you are always dealing with two items
because it's likely faster. Your code creates a lot of temporary
Arrays along the way.

Btw, I believe your implementation misses the case where hash2's key
set is a superset of hash1's, i.e. contains additional keys.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

···

On Jul 15, 2011, at 12:04 PM, Robert Klemme wrote:

On Fri, Jul 15, 2011 at 5:38 PM, Stefano Mioli <stefano.mioli@gmail.com > > wrote:

Here are the specs:

  hash comparison spec · GitHub

And here is a slightly modified version of the code that adds the method to Hash, as suggested by Rob Biederhan.

  Comparing hashes based on their keys · GitHub

@Rob Biederhan: I removed the check that ensures that all the keys are symbols, because I know they will be.

···

On 7/15/11 7:21 PM, Josh Cheek wrote:

What would be especially cool is if you could post your specs, then it would
be a lot easier to try and refactor it.

--
Stefano Mioli

Two hashes, let's call them h1 and h2, are to be considered equal if:

* they have the same keys

* h1[:x] and h2[:x] are both hashes, and they are equal according to the
very same rules you are reading :wink:

* they are both nil

It seems your spec is not very precise because your bullet list mixes
AND and OR. I assume you meant

Two objects o1 and o2 are considered equal for this relation if and only if

( o1 is nil AND o2 is nil ) OR
( o1 is a Hash AND o2 is a Hash AND
   o1.keys == o2.keys AND
   for each key k ( o1[k] is not a Hash OR
                           o2[k] is not a Hash OR
                           o1[k] equals o2[k] according to this relation ) )

Exactly.
Sorry about the confusion.

def compare(hash1, hash2)

I would rename arguments because these need not be Hashes.

Fair enough.

  args = [hash1, hash2]

  return true if args.all? {|h| h.nil?}
  return false if args.one? {|h| h.nil?}

  hash1.each_key do |k|
    values = [hash1[k], hash2[k]]

    if values.all? {|h| h.is_a?(Hash)}
      return compare(*values)
    else
      return false if values.one? {|value| value.nil? }
    end
  end

  true
end

Hm, I'd probably remove #all?, #any? and the like and code conditions
directly - especially since you are always dealing with two items
because it's likely faster. Your code creates a lot of temporary
Arrays along the way.

True.
On the other hand, the hashes will be fairly small, so performance wasn't the main goal.

That, and I the fact that I find:

  [x, y].all? {|x| x.is_a?(Hash) }

to be a bit more nice-looking than

  if x.is_a?(Hash) && y.is_a?(Hash)

But you make a very good point, thanks.

Btw, I believe your implementation misses the case where hash2's key
set is a superset of hash1's, i.e. contains additional keys.

True. I posted again the link to the gist with a revised version according to Rob's suggestions.

Thank you very much, Robert.
Ah, there are so many things to learn.

···

On 7/15/11 6:04 PM, Robert Klemme wrote:

--
Stefano Mioli

[snip]

Very true.
This should work better. It still returns explicitly, but only if compare() itself returns false.

def compare(hash1, hash2)
  args = [hash1, hash2]

  return true if args.all? {|h| h.nil?}
  return false if args.one? {|h| h.nil?}

  hash1.each_key do |k|
    values = [hash1[k], hash2[k]]

    if values.all? {|h| h.is_a?(Hash)}
      return false unless compare(*values)
    else
      return false if values.one? {|value| value.nil? }
    end
  end

  true
end

Thank you so much.

···

On 7/15/11 7:31 PM, Josh Cheek wrote:

Looks like you've probably got a bug. You explicitly return in your code,
causing it to stop evaluating after the first hash:

--
Stefano Mioli

I don't see why you couldn't just add your own method to Hash. (Tested
on ruby-1.9.2)

[snip]

But that's probably a stretch and only *you* can decided if it's right
for the situation.

I posted in the same gist a revised version of your method (see comment below).
I ended up not monkey-patching NilClass. I will just throw an ArgumentError if the argument is nil.

P.S. I'm assuming that you don't consider supersets of keys to be
"equal" and that you really do care that all the keys are symbols.

You're right about the supersets: if hash x as a key k and hash y misses that key, the two hashes are different.

About the symbols: I don't really care, because I know they will be symbols.

Thank you very much.

···

On 7/15/11 7:18 PM, Rob Biedenharn wrote:

--
Stefano Mioli

class Hash
   def same_key_structure?(other)
     return false unless other.is_a?(Hash)
     return false unless self.keys == other.keys && self.keys.all?{|_|_.is_a?(Symbol)}
     self.keys.all? {|_|self[_].is_a?(Hash) ? self[_].same_key_structure?(other[_]) : !other[_].is_a?(Hash)}
   end
end

Now that I think about it, that

  self.keys == other.keys

might cause problems, because I didn't specify (sorry, my bad) that I don't care about which order the keys appear in.

In other words, I consider these two hashes to be equal:

  h1 = {:x => nil, :y => nil}
  h2 = {:y => nil, :x => nil}

but h1.keys == h2.keys would return false in this context.

Array could probably benefit from something like RSpec's '=~' to test arrays for equality regardless of the order of their elements, wouldn't it?
That is, unless I'm missing some existing method that does that.

···

On 7/15/11 7:18 PM, Rob Biedenharn wrote:

--
Stefano Mioli

class Hash
def same_key_structure?(other)
   return false unless other.is_a?(Hash)
   return false unless self.keys == other.keys && self.keys.all?{|_|_.is_a?(Symbol)}
   self.keys.all? {|_|self[_].is_a?(Hash) ? self[_].same_key_structure?(other[_]) : !other[_].is_a?(Hash)}
end
end

Now that I think about it, that

  self.keys == other.keys

might cause problems, because I didn't specify (sorry, my bad) that I don't care about which order the keys appear in.

In other words, I consider these two hashes to be equal:

  h1 = {:x => nil, :y => nil}
  h2 = {:y => nil, :x => nil}

but h1.keys == h2.keys would return false in this context.

   then sort them:

   return false unless self.keys.sort == other.keys.sort

Since they are Hashes, I'd certainly hope that the order of the keys didn't matter. :wink:

I should have realized that particularly since 1.9.2's Hashes return their keys in an order dependent on their insertion to the Hash. (And I gave a talk to the Cincinnati Ruby Brigade just last month about differences between 1.8.6 and 1.9.2 that included this fact.)

-Rob

Array could probably benefit from something like RSpec's '=~' to test arrays for equality regardless of the order of their elements, wouldn't it?
That is, unless I'm missing some existing method that does that.

--
Stefano Mioli

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

···

On Jul 15, 2011, at 3:01 PM, Stefano Mioli wrote:

On 7/15/11 7:18 PM, Rob Biedenharn wrote:

Hah! True.
I must be very tired! :wink:

Thanks again.

···

On 7/15/11 9:47 PM, Rob Biedenharn wrote:

On Jul 15, 2011, at 3:01 PM, Stefano Mioli wrote:

but h1.keys == h2.keys would return false in this context.

then sort them:

return false unless self.keys.sort == other.keys.sort

Since they are Hashes, I'd certainly hope that the order of the keys
didn't matter. :wink:

--
Stefano Mioli

1st send attempt died in spamassassin...

···

---------- Forwarded message ----------
From: Robert Klemme <shortcutter@googlemail.com>
Date: Sat, Jul 16, 2011 at 12:21 AM
Subject: Re: Comparing hashes based on their keys
To: ruby-talk@ruby-lang.org

On Fri, Jul 15, 2011 at 9:47 PM, Rob Biedenharn <Rob@agileconsultingllc.com> wrote:

On Jul 15, 2011, at 3:01 PM, Stefano Mioli wrote:

On 7/15/11 7:18 PM, Rob Biedenharn wrote:

class Hash
def same_key_structure?(other)
return false unless other.is_a?(Hash)
return false unless self.keys == other.keys &&
self.keys.all?{|_|_.is_a?(Symbol)}
self.keys.all? {|_|self[_].is_a?(Hash) ?
self[_].same_key_structure?(other[_]) : !other[_].is_a?(Hash)}
end
end

Now that I think about it, that

   self\.keys == other\.keys

might cause problems, because I didn't specify (sorry, my bad) that I
don't care about which order the keys appear in.

In other words, I consider these two hashes to be equal:

   h1 = \{:x =&gt; nil, :y =&gt; nil\}
   h2 = \{:y =&gt; nil, :x =&gt; nil\}

but h1.keys == h2.keys would return false in this context.

then sort them:

return false unless self.keys.sort == other.keys.sort

There is one reason to not sort: Hash keys are not required to
implement <=>. Now, we "know" that keys will be only Symbols but what
if not? You might see an exception from #sort. Comparing via Set
would be more robust here since we know that keys fulfill the
requirements for a Set element. So that would be

return false unless keys.to_set == other.keys.to_set

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/