Newbie question > Hash workings

Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

h = Hash.new([])
h[0] = [1,2,3]
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20] ???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ???

Grtz. André

Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

There's no bug involved.

h = Hash.new()

Here you create Hash with an empty array as
default value. For every lookup of a key that is
not in the hash, this single array instance is
returned, but not automatically stored in the hash.
Again: It's a single object, not a new one on every
lookup.

h[0] = [1,2,3]

Here you associate 0 with an array. The hash default
value is not involved.

p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}

This is a syntactic shortcut for "h[0] = h[0] + [4]".
It looks up the value of 0 in the hash, which is
an array. array + array creates a new array,
which overwrites the previous value of h[0].

h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}

Again, this is a shortcut for "h[1] = h[1] + [10, 11]".
The hash doesn't have a value for 1 yet, so
the default value () is returned. + [10, 11] is
the new array [10, 11] which is then assigned
to h[1].

h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}

Look up the existing array stored in h[1] and append
12. No new array created.

h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12] ?

The hash doesn't have a value for 2, so the default
value, , is returned. The default value is destructively
modified by appending 20. _No new hash entry is
created (there is no assignment!)._

Future lookups of non-existent keys will return
the modified default value!

p h[2] #> [20] ???

The hash still doesn't have an entry for 2, so
the default value is returned, which by now is
our modified array.

h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]} ?

Still no entry for 2, default value is returned and modified
by appending 21.

h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20, 21, 22]} ???

Again, this is equivalent to "h[2] = h[2] + [22]".
Still no entry for h[2], so the default value is returned,
which by now is [20, 21]. Then the two arrays
[20, 21] and [22] are concatenated, returning the
new array [20, 21, 22] which is then finally associated
with 2 in the hash (see: here is an assignment!).

What you probably want to achieve can be
done using Hash's block constructor:

  h = Hash.new { |hash, key| hash[key] = }

This automatically stores a new empty array
on every lookup of a non-existent key.

HTH,
  Stefan

···

2009/2/28 AD60 <andredeen@gmail.com>:

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

p works correctly but...

h = Hash.new()
h[0] = [1,2,3]
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20] ???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ???

... you are modifying the default value of the Hash in those cases where you place the question marks. The default value is the one returned in case a key does not exist. But there is no automatic insertion, that's why you do not see key 2 after doing h[2] << 20.

You probably rather want this famous idiom:

h = Hash.new {|ha,ke| ha[ke] = }

Now run your tests again. :slight_smile:

Kind regards

  robert

···

On 28.02.2009 22:15, AD60 wrote:

Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

By doing "h[2] << 20" I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
[]}" and just doing "p h[2]" before any assigment to h[2] has the
surprising/logical side effect of creating a new key "2", thank god
with value "[]" so no problems there, but still... I am now thinking
that stating "h = Hash.new([])" is problematic idiom under all
circumstances. Am I correct?

Kind regards,
André

Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

Good!

By doing "h[2] << 20" I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Yes. Although "assignment to h" is a bit ambiguous. I guess you
meant the correct thing but keep in mind that normally "assignment to
h" means "h = something" and not "h[key] = something". The latter is
actually a method defined in class Hash.

Replacing the "h = Hash.new()" with "h = Hash.new { |h,k| h[k] =
}" and just doing "p h[2]" before any assigment to h[2] has the
surprising/logical side effect of creating a new key "2", thank god
with value "" so no problems there, but still... I am now thinking
that stating "h = Hash.new()" is problematic idiom under all
circumstances. Am I correct?

No, because it works well in other cases - the most popular probably being

counters = Hash.new 0
...
counters[foo] += 1

Kind regards

robert

···

2009/3/1 AD60 <andredeen@gmail.com>:

--
remember.guy do |as, often| as.you_can - without end

I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]
p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

hiv = Hash.new 0
hiv[0] += 1
p hiv # {0=>1}
p hiv[1] # 0
p hiv # {0=>1}

it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Kind regards,
André

I see the use with an integer default, I specifically meant the
Hash.new() syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about "idiom" I thought you were referring to the general pattern.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = }
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances all the time without any benefit.

p hav # {0=>[1]}
p hav[1] #
p hav # {0=>[1], 1=>}

hiv = Hash.new 0
hiv[0] += 1
p hiv # {0=>1}
p hiv[1] # 0
p hiv # {0=>1}

it's the difference between "{0=>[1], 1=>}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Good.

Kind regards

  robert

···

On 01.03.2009 12:35, AD60 wrote:

Robert Klemme wrote:

I see the use with an integer default, I specifically meant the
Hash.new() syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about "idiom" I thought you were referring to the
general pattern.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = }
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.

it's the difference between "{0=>[1], 1=>}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Good.

Kind regards

  robert

The method I would use is this:

hav = Hash.new {}

This gives you the benefit of creating a new array when necessary
(rather than a single default array which can be destructively modified
in troubling ways), but means that you only get permanent effects if you
explicitly use assignment, so you don't get this

p hav # {0=>[1]}
p hav[1] #
p hav # {0=>[1], 1=>}

Of course, it means you can't use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.

···

On 01.03.2009 12:35, AD60 wrote:

--
Posted via http://www.ruby-forum.com/\.

You probably should get rid of that habit. And here's why: your approach has no means to efficiently work with Arrays as Hash values. Either, you have to use += all the time which is inefficient because it will create a lot of garbage - or you have to explicitly test for presence of the key, which is less efficient than letting Hash's C implementation do it. It is much more elegant to just always do hash[key] << value.

If you just want to make sure that you always get an Array back, you could as well use Hash.new(.freeze) to avoid accidentally appending to the single default all the time.

But it's a free country so if you like to stick with your habits...

Cheers

  robert

···

On 02.03.2009 19:18, Adam Gardner wrote:

Robert Klemme wrote:

On 01.03.2009 12:35, AD60 wrote:

I see the use with an integer default, I specifically meant the
Hash.new() syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about "idiom" I thought you were referring to the
general pattern.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = }
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.

it's the difference between "{0=>[1], 1=>}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Good.

The method I would use is this:

hav = Hash.new {}

This gives you the benefit of creating a new array when necessary (rather than a single default array which can be destructively modified in troubling ways), but means that you only get permanent effects if you explicitly use assignment, so you don't get this

p hav # {0=>[1]}
p hav[1] #
p hav # {0=>[1], 1=>}

Of course, it means you can't use the << operator to stick values into non-present keys, but I rarely use the << operator anyway.

* Robert Klemme <shortcutter@googlemail.com> (2009-03-02) schrieb:

You probably should get rid of that habit. And here's why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage

The problem is creating a lot of useless objects and copying over data,
not leaving garbage behind.

mfg, simon .... l

Well, both (creation of objects as well as garbage collection left overs) contribute to wasted CPU and thus slower execution.

  robert

···

On 05.03.2009 14:47, Simon Krahnke wrote:

* Robert Klemme <shortcutter@googlemail.com> (2009-03-02) schrieb:

You probably should get rid of that habit. And here's why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage

The problem is creating a lot of useless objects and copying over data,
not leaving garbage behind.