Unexpected problem: hash[key] << value

# ruby 1.9.2p180 (2011-02-18) [i386-mingw32]

get_page_hash = {}
get_page_hash.default = []

File.foreach("page.txt") do |line|
  word, page = line.chomp.split(':')
  get_page_hash[word] << page # the problem is here
end

p get_page_hash['Aword'] # => ["1", "2", "3", "4", "5"]
p get_page_hash['Bword'] # => ["1", "2", "3", "4", "5"]
p get_page_hash.default # => ["1", "2", "3", "4", "5"]

__END__

content of page.txt:
Aword:1
Bword:2
Cword:3
Aword:4
Dword:5

Simple program, clear purpose. I don't know why get_page_hash.default
becomes ["1", "2", "3", "4", "5"], it seems radiculous.

Only if I modify the very line to:

  get_page_hash[word] += [page]

I get what I want:

p get_page_hash['Aword'] # => ["1", "4"]
p get_page_hash['Bword'] # => ["2"]
p get_page_hash.default # => []

I think use "<<" maybe intuitive, but the result is unexpected. What's
wrong with it?

Thank you!

Joey

···

--
Posted via http://www.ruby-forum.com/.

Hash.default refers to the same Array at each element.
You cannot use the first code, because "<<" changes the default array,
not each element.

Here is a good example from the manual, but it is Japanese... sorry
http://www.ruby-lang.org/ja/man/html/trap_Hash.html

···

On Wed, Mar 9, 2011 at 12:45 PM, Joey Zhou <yimutang@gmail.com> wrote:

I think use "<<" maybe intuitive, but the result is unexpected. What's
wrong with it?

--
Haruka YAGNI
hyagni@gmail.com

I just stumbled across this surprising behavior myself. It's the first
counter-intuitive mechanism I have come across in my short sweet
experience with Ruby.

Check this thread for an elaborate discussion (in English) of this
behavior:

http://www.ruby-forum.com/topic/134424#new

Here's my take:

Before we look at your case, let's look at a case that actually works as
you'd expect: initializing a hash with a Fixnum:

Code
h = Hash.new(0)
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] += 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result:
h['key1']: 0
h['key2']: 0
after updating key1
h['key1']: 1
h['key2']: 0

Perfect! Mighty handy for word count programs and all sorts of other use
cases.

Which would lead you to expect the following behavior when you
initialize a hash with an empty array, then append:

Code
h = Hash.new([])
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] << 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result
h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [] #<-- what you'd expect, but NOT what you get

The actual result is the following:

h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [1]
. . . and so on

The problem is that when you initialize a hash with a mutable default
value, all of the defaults are actually references to THE SAME OBJECT.
So when you append to the default array in one hash value, you're
actually changing them all. Witness:

puts "#{h['key1'].object_id}"
puts "#{h['key2'].object_id}"
puts "#{h['key3'].object_id}"

Result:
116528
116528
116528

By contrast, when you update a value with the += construction rather
than <<, you're actually creating a new array object for that value. So
that particular one is no longer referring to the default value.

The thread referred to above mentions other ways to get what you'd
expect with a default empty array. Still, I gotta admit that I simply
don't understand why Hash.new([]) works the way it does. Who would want
to create a Hash table where changing a single value can potentially
change all other values, past, present, and to come. Talk about side
effects gone wild!

If anyone can explain the rationale for this behavior,I'd really
appreciate it. I'm probably just missing something.

···

--
Posted via http://www.ruby-forum.com/.

Mark Beek wrote in post #986380:

Which would lead you to expect the following behavior when you
initialize a hash with an empty array, then append:

Code
h = Hash.new()
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] << 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result
h['key1']:
h['key2']:
after updating key1
h['key1']: [1]
h['key2']: #<-- what you'd expect, but NOT what you get

To get that behaviour, you need the Hash to create a *new* empty array
for every unknown element. What I do is:

h = Hash.new { |o,k| o[k] = }

The problem is that when you initialize a hash with a mutable default
value, all of the defaults are actually references to THE SAME OBJECT.

...

If anyone can explain the rationale for this behavior,I'd really
appreciate it. I'm probably just missing something.

The question is, how else could it work in the general case?

Perhaps you pass a prototype object, and the Hash constructor would call
.dup on that object every time it needs a new distinct instance? No,
that doesn't work, because .dup is only a shallow copy. Check out:

a = [[1,2],[3,4]]
b = a.dup
b[0] << 3
a
b

Perhaps you could pass a Class, and then Hash would call your class's
.new method every time it wanted an instance? Sure, you could pass Array
in this case, but it's quite restrictive. And the simple case of
Hash.new(0) wouldn't work.

So to work in the general case you have to give it some code to execute
to create a new object every time one is needed - a factory block.

The same applies with arrays: compare

a = Array.new(5, )
b = Array.new(5) { }
puts a.map { |x| x.object_id }
puts b.map { |x| x.object_id }

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

I just stumbled across this surprising behavior myself. It's the first
counter-intuitive mechanism I have come across in my short sweet
experience with Ruby.

Check this thread for an elaborate discussion (in English) of this
behavior:

Problem with Hash of Arrays - Ruby - Ruby-Forum

Here's my take:

Before we look at your case, let's look at a case that actually works as
you'd expect: initializing a hash with a Fixnum:

Code
h = Hash.new(0)
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] += 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result:
h['key1']: 0
h['key2']: 0
after updating key1
h['key1']: 1
h['key2']: 0

Perfect! Mighty handy for word count programs and all sorts of other use
cases.

Which would lead you to expect the following behavior when you
initialize a hash with an empty array, then append:

Code
h = Hash.new()
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] << 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result
h['key1']:
h['key2']:
after updating key1
h['key1']: [1]
h['key2']: #<-- what you'd expect, but NOT what you get

The actual result is the following:

h['key1']:
h['key2']:
after updating key1
h['key1']: [1]
h['key2']: [1]
. . . and so on

The problem is that when you initialize a hash with a mutable default
value, all of the defaults are actually references to THE SAME OBJECT.
So when you append to the default array in one hash value, you're
actually changing them all. Witness:

puts "#{h['key1'].object_id}"
puts "#{h['key2'].object_id}"
puts "#{h['key3'].object_id}"

Result:
116528
116528
116528

By contrast, when you update a value with the += construction rather
than <<, you're actually creating a new array object for that value. So
that particular one is no longer referring to the default value.

Well, in this case actually the better idiom is this:

h = Hash.new {|h,k| h[k] = }
...

h[key] << something

Reason: Array#+ will create a new object every time you add something
while the idiom presented above only ever creates one Array per key.

The thread referred to above mentions other ways to get what you'd
expect with a default empty array. Still, I gotta admit that I simply
don't understand why Hash.new() works the way it does. Who would want
to create a Hash table where changing a single value can potentially
change all other values, past, present, and to come. Talk about side
effects gone wild!

Well, first of all this is the default return value. This does not
necessarily mean that it will be modified. You might do something
like

h = Hash.new("missing".freeze)
...

puts h[key]

And then of course there is a very common idiom

counters = Hash.new 0
...
counters[key] += 1

If anyone can explain the rationale for this behavior,I'd really
appreciate it. I'm probably just missing something.

Hopefully that explanation helps.

Kind regards

robert

···

On Wed, Mar 9, 2011 at 6:35 AM, Mark Beek <markbeek@carolina.rr.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Haruka YAGNI wrote in post #986377:

···

On Wed, Mar 9, 2011 at 12:45 PM, Joey Zhou <yimutang@gmail.com> wrote:
Hash.default refers to the same Array at each element.
You cannot use the first code, because "<<" changes the default array,
not each element.

Here is a good example from the manual, but it is Japanese... sorry
http://www.ruby-lang.org/ja/man/html/trap_Hash.html

Thank you. I can read the codes :slight_smile:

--
Posted via http://www.ruby-forum.com/\.

Robert Klemme wrote in post #986401:

Well, in this case actually the better idiom is this:

h = Hash.new {|h,k| h[k] = }

This is actually what I need. Thank you.

···

--
Posted via http://www.ruby-forum.com/\.