> > Just about every class in the standard library implements == and eql? as
> > I describe in the article, i.e. eql? tests for equal values and == tests
> > for "natural" equality (which normally means equal values).
>
> > Hash, however, is an exception. Hash#== tests for equal values.
> > Hash.eql?, however, tests for object identity:
>
> > Why is hash the odd one out? I'm sure that there must be a good reason
> > (Matz?) but I can't at the moment work out what it might be.
>
> I think the reason is twofold:
>
> 1) Using hashs as keys in another hash is not a common use case. I'm a
> little hard-pressed to think of why I'd want to, although I'm famous
> for lack of imagination.
I've wanted it on 3 occasions (that I can remember) now. Here's a
contrived example derived from the real-world use case I can no longer
remember:
You have a file like this...
alpha,beta,15
gamma,delta,3
beta,alpha,4
alpha,alpha,3
delta,alpha,5
gamma,delta,7
...and you want to sum up the numbers for each unique pair of greek
letters. Naively, I'd do (and initially tried) something like:
sums = Hash.new{ 0 }
DATA.each{ |line|
_, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
sums[ { g1=>true, g2=>true } ] += num.to_i
}
I believe I instead resorted to sorting the keys and using a nested
hash to drill down to the value. It was annoying.
Well in the above code you can change the line:
sums[ { g1=>true, g2=>true } ] += num.to_i
to
sums[[g1,g2].sort] += num.to_i
which I actually think looks cleaner
> 2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
> obj2.hash, implementing Hash#hash requires iterating over the keys and
> values and would be fairly expensive and make accessing a hash with
> hash keys by key impractical.
That logic seems slightly mothering, though. "Ruby prevents you from
doing A because if you did A it might be slow." Ruby doesn't prevent
me from writing:
my_huge_array.delete_if{ |v1|
my_huge_array.find{ |v2| (v1 - v2).abs < mu }
}
I suppose the distinction is that the above is a foolish pairing of
individually-reasonable parts, while Hash#hash is an atomic method
written to optimize speed for one (reasonably useless) use case at the
expense of allowing another use case.
I suspect that it's just pragmatism
As a related aside:
Having never written a hashing function, I'm uncertain how I'd write
Hash#hash in a way that reasonably prevented two hashes with different
keys and/or values from ending up with the same value. (Multiply
the .hash values of all keys and values in the Hash and then mod them
on a big prime number?) Has anyone taken a stab at implementing this?
That's not the requirement, the requirement is that if two objects are
eql? then their hashes must also be ==, there's nothing to prevent two
objects which aren't eql? to have the same hash value.
In fact let me offer:
require "benchmark"
DATA = <<-END
alpha,beta,15
gamma,delta,3
beta,alpha,4
alpha,alpha,3
delta,alpha,5
gamma,delta,7
END
def hash_key_impl
sums = Hash.new(0)
DATA.each{ |line|
_, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
sums[ { g1=>true, g2=>true } ] += num.to_i
}
sums
end
class HashableHash < Hash
def hash
to_a.sort.hash
end
def eql?(other)
self == other
end
end
def hashable_key_impl
sums = Hash.new(0)
DATA.each{ |line|
_, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
sums[ HashableHash[g1=>true, g2=>true ] ] += num.to_i
}
sums
end
class HashableHash2 < Hash
def hash
1
end
def eql?(other)
self == other
end
end
def hashable2_key_impl
sums = Hash.new(0)
DATA.each{ |line|
_, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
sums[ HashableHash2[g1=>true, g2=>true ] ] += num.to_i
}
sums
end
def array_key_impl
sums = Hash.new(0)
DATA.each{ |line|
_, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
sums[ [g1,g2].sort ] += num.to_i
}
sums
end
p hash_key_impl
p hashable_key_impl
p hashable2_key_impl
p array_key_impl
TESTS = 1000
Benchmark.bmbm do |results|
results.report("hash key:") do
TESTS.times do
hash_key_impl
end
end
results.report("hashable key:") do
TESTS.times do
hashable_key_impl
end
end
results.report("hashable2 key:") do
TESTS.times do
hashable2_key_impl
end
end
results.report("array key:") do
TESTS.times do
array_key_impl
end
end
end
{{"delta"=>true, "gamma"=>true}=>7, {"alpha"=>true, "delta"=>true}=>5,
{"alpha"=>true}=>3, {"alpha"=>true, "beta"=>true}=>4, {"delta"=>true,
"gamma"=>true}=>3, {"alpha"=>true, "beta"=>true}=>15}
{{"delta"=>true, "gamma"=>true}=>10, {"alpha"=>true}=>3,
{"alpha"=>true, "beta"=>true}=>19, {"alpha"=>true, "delta"=>true}=>5}
{{"alpha"=>true, "delta"=>true}=>5, {"alpha"=>true}=>3,
{"delta"=>true, "gamma"=>true}=>10, {"alpha"=>true, "beta"=>true}=>19}
{["alpha", "alpha"]=>3, ["delta", "gamma"]=>10, ["alpha", "beta"]=>19,
["alpha", "delta"]=>5}
Rehearsal --------------------------------------------------
hash key: 0.050000 0.000000 0.050000 ( 0.080213)
hashable key: 0.100000 0.000000 0.100000 ( 0.104884)
hashable2 key: 0.080000 0.000000 0.080000 ( 0.079243)
array key: 0.050000 0.000000 0.050000 ( 0.055140)
----------------------------------------- total: 0.280000sec
user system total real
hash key: 0.050000 0.000000 0.050000 ( 0.050082)
hashable key: 0.100000 0.010000 0.110000 ( 0.101206)
hashable2 key: 0.070000 0.000000 0.070000 ( 0.077322)
array key: 0.050000 0.000000 0.050000 ( 0.051378)
The original hash implementation performs almost exactly the same as
my simple substitution of sorted arrays as the keys, but it has the
distinct disadvantage of getting the wrong answer
I did two different implementations of a HashableHash which use == for
eql? and differ in that one uses information in the hash to compute a
hash value, while the other returns a constant value for hash.
Both are slower than the array key implementation. HashableHash2 is
somewhat faster than HashableHash since the hash method runs in
constant time, but I'm not sure that mapping the hash value to a
constant is a good idea.
···
On 10/17/07, Phrogz <phrogz@mac.com> wrote:
On Oct 17, 6:12 am, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> On 10/17/07, Paul Butcher <p...@texperts.com> wrote:
--
Rick DeNatale
My blog on Ruby
http://talklikeaduck.denhaven2.com/