Iterating a changing Hash under 1.9.1

Gavin_Kistner3 · 15 February 2009 14:48

The following code shows that Hash#each under 1.9.1p0 does not iterate
over keys added during iteration:
  a = [ 1, 2, 3 ]; h = { 0=>0 }
  h.each{ |k,v| h[a[k]] = a[k] }
  p h
  #=> {0=>0, 1=>1}

However, this code (on 1.9.1p0) results in a ruby process with
unending 100% CPU usage, presumably due to an unending loop that keeps
traversing newly-added items:
h = { 1=>nil, 2=>nil }
h.each{ |k,v| h.delete(k); h[k]=v }
(Credit to "tama" for posting this on the ramaze group.)

I'm assuming that one of these two are a bug, since they seem
contradictory. I'd like to think that the latter behavior is the bug,
and that hashes aren't supposed to iterate over keys added or moved
during iteration. Can anyone confirm or deny this?

David_A_Black1 · 15 February 2009 15:22

Hi --

The following code shows that Hash#each under 1.9.1p0 does not iterate
over keys added during iteration:
a = [ 1, 2, 3 ]; h = { 0=>0 }
h.each{ |k,v| h[a[k]] = a[k] }
p h
#=> {0=>0, 1=>1}

However, this code (on 1.9.1p0) results in a ruby process with
unending 100% CPU usage, presumably due to an unending loop that keeps
traversing newly-added items:
h = { 1=>nil, 2=>nil }
h.each{ |k,v| h.delete(k); h[k]=v }
(Credit to "tama" for posting this on the ramaze group.)

I'm assuming that one of these two are a bug, since they seem
contradictory. I'd like to think that the latter behavior is the bug,
and that hashes aren't supposed to iterate over keys added or moved
during iteration. Can anyone confirm or deny this?

I can only add what I think is another interesting example:

h = {1,2,3,4}

h.select {|k,v|
h[rand] = 1
v > 4
}

This exits in 1.8 but goes on forever in 1.9. I'm not sure whether
it's because of the override that Hash#select does of
Enumerable#select.

David

···

On Sun, 15 Feb 2009, Phrogz wrote:

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Coming in 2009: The Well-Grounded Rubyist (http://manning.com/black2\)

http://www.wishsight.com => Independent, social wishlist management!

Robert_K1 · 15 February 2009 17:18

I agree to Pit: the bug is to iterate and modify a collection at the same time.

Cheers

robert

···

On 15.02.2009 15:48, Phrogz wrote:

The following code shows that Hash#each under 1.9.1p0 does not iterate
over keys added during iteration:
  a = [ 1, 2, 3 ]; h = { 0=>0 }
  h.each{ |k,v| h[a[k]] = a[k] }
  p h
  #=> {0=>0, 1=>1}

However, this code (on 1.9.1p0) results in a ruby process with
unending 100% CPU usage, presumably due to an unending loop that keeps
traversing newly-added items:
  h = { 1=>nil, 2=>nil }
  h.each{ |k,v| h.delete(k); h[k]=v }
(Credit to "tama" for posting this on the ramaze group.)

I'm assuming that one of these two are a bug, since they seem
contradictory. I'd like to think that the latter behavior is the bug,
and that hashes aren't supposed to iterate over keys added or moved
during iteration. Can anyone confirm or deny this?

Charles_Oliver_Nutte · 15 February 2009 21:55

Phrogz wrote:

The following code shows that Hash#each under 1.9.1p0 does not iterate
over keys added during iteration:
  a = [ 1, 2, 3 ]; h = { 0=>0 }
  h.each{ |k,v| h[a[k]] = a[k] }
  p h
  #=> {0=>0, 1=>1}

In this case, you're only reassigning the same keys over and over again. Since they're just being reassigned, they don't get pushed to the end of the iteration and you don't loop forever.

Lesson one: reassigning an existing key does not move it to the end of iteration order.

However, this code (on 1.9.1p0) results in a ruby process with
unending 100% CPU usage, presumably due to an unending loop that keeps
traversing newly-added items:
h = { 1=>nil, 2=>nil }
h.each{ |k,v| h.delete(k); h[k]=v }
(Credit to "tama" for posting this on the ramaze group.)

Here, you are deleting the key before assigning it. That removes it from the original order and re-adds it at the end. So the iteration runs forever because there's always another key to walk...the one you've just re-added.

Lesson two: Keys deleted and re-added or keys newly added appear at the end of iteration order.

- Charlie

Pit_Capitain · 15 February 2009 15:35

I think modifying a collection while iterating over it is undefined.

Regards,
Pit

···

2009/2/15 David A. Black <dblack@rubypal.com>:

On Sun, 15 Feb 2009, Phrogz wrote:

(...)

Gavin_Kistner3 · 15 February 2009 23:03

I think you're wrong.
Initial state: { 0=>0 }
First loop: k is 0, a[0] is 1, assign h[1]=>1
A brand new key/value has been inserted to the hash this point.
If #each then covered the next key/value pair:
Second loop: k is 1, a[1] is 2, assign h[2]=>2 (and so on)

···

On Feb 15, 2:55 pm, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

Phrogz wrote:
> The following code shows that Hash#each under 1.9.1p0 does not iterate
> over keys added during iteration:
> a = [ 1, 2, 3 ]; h = { 0=>0 }
> h.each{ |k,v| h[a[k]] = a[k] }
> p h
> #=> {0=>0, 1=>1}

In this case, you're only reassigning the same keys over and over again.
Since they're just being reassigned, they don't get pushed to the end of
the iteration and you don't loop forever.

W_James · 15 February 2009 20:14

Pit Capitain wrote:

···

2009/2/15 David A. Black <dblack@rubypal.com>:
> On Sun, 15 Feb 2009, Phrogz wrote:
>> (...)

I think modifying a collection while iterating over it is undefined.

+1

It seems a very poor practice to me.

Gavin_Kistner3 · 15 February 2009 23:32

Ah, doggone it, that was my third choice between "is it a bug or is it
not". And I kept saying to myself as I wrote that up "Remember, you
should never claim something is a bug unless you're really really
sure."

I'll accept this answer as reasonable, though it seems a bit of a
shame. Always relegating deletions to a delete_if or reject or
explicit calls while iterating a duplicate may be inefficient in some
cases where complex logic needs to happen and ideally happen in a
single pass through the data.

It's 'safest' to say "DON'T TRUST ANYTHING, EVER", but provides the
lest flexibility during coding. It's most dangerous to say "YOU CAN
EXPLOIT WHATEVER IMPLEMENTATION QUIRKS AND BUGS THE CURRENT VERSION
HAS, WE PROMISE TO KEEP THOSE IN FUTURE VERSIONS", but also may
provide for convenient or efficiency via 'tricky' coding. Somewhere in
between is (my) ideal.

Imagine if SQL said "Inserting multiple records use a select on the
table you are inserting into is undefined." Programmers everywhere
would nod their heads wisely and say "Good call". And some
implementations would allow it, some wouldn't, and some people would
accidentally rely on it, and others would be pissed to not be able to.

Before it seems like I'm taking a strong stand for modification during
iteration: In my opinion, the only problem in this situation is simply
that the documentation for Hash#each provides no information. "We've
gotta be able to get some kind of reading on that shield, up or down!"

Ideally, in my mind, we should document:

a) [Implementation] How each iterating method happens to behave
currently with respect to additions, modifications, and deletions
during traversal, and

b) [Design] What is intended to be true about the implementation for
(foreseeable) future versions, and may be relied upon.

If the documentation for Hash#each said that inserting new entries
during traversal might cause an infinite loop, I never would have even
started this topic. And (possibly) the real-world bugs that happened
to exist in Ramaze that caused it to hang when moving from 1.8 to 1.9
would never have been coded.

I'll take this to ruby-core and see if I can gather details on Design
for a variety of methods, and offer up a doc patch. If anyone here can
provide any details about either Implementation or (for sure) Design,
I'd be happy to hear it.

···

On Feb 15, 8:35 am, Pit Capitain <pit.capit...@gmail.com> wrote:

2009/2/15 David A. Black <dbl...@rubypal.com>:

> On Sun, 15 Feb 2009, Phrogz wrote:
>> (...)

I think modifying a collection while iterating over it is undefined.

Charles_Oliver_Nutte · 16 February 2009 00:07

Phrogz wrote:

I think you're wrong.
Initial state: { 0=>0 }
First loop: k is 0, a[0] is 1, assign h[1]=>1
A brand new key/value has been inserted to the hash this point.
If #each then covered the next key/value pair:
Second loop: k is 1, a[1] is 2, assign h[2]=>2 (and so on)

There are a maximum of four keys possible and you never remove any. The second time through you're reassigning keys that are already there from the first time.

- Charlie

Charles_Oliver_Nutte · 16 February 2009 00:54

Charles Oliver Nutter wrote:

Phrogz wrote:

I think you're wrong.
Initial state: { 0=>0 }
First loop: k is 0, a[0] is 1, assign h[1]=>1
A brand new key/value has been inserted to the hash this point.
If #each then covered the next key/value pair:
Second loop: k is 1, a[1] is 2, assign h[2]=>2 (and so on)

There are a maximum of four keys possible and you never remove any. The second time through you're reassigning keys that are already there from the first time.

Actually it looks like 1.9 is slightly different then what I described (which was what I know of ordered hash iteration in JRuby. Ruby 1.9 appears to 'each' only once, since it's already at the end of the list. In JRuby, we continue to iterate as long as you keep adding new keys:

a = [ 1, 2, 3 ]; h = { 0=>0 }
h.each{ |k,v| p [k,v]; p a[k]; h[a[k]] = a[k] }
p h

JRuby:

[0, 0]
1
[1, 1]
2
[2, 2]
3
[3, 3]
nil
[nil, nil]

Ruby 1.9.1:

[0, 0]
1
{0=>0, 1=>1}

Could be just a minor difference or a bug in one or the other.

- Charlie

Gavin_Kistner3 · 16 February 2009 00:54

I'm confused. As far as I can tell, the hash starts with 1 key, and
ends up with 2. What 'first time' are the keys being set?

···

On Feb 15, 5:07 pm, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

Phrogz wrote:
> I think you're wrong.
> Initial state: { 0=>0 }
> First loop: k is 0, a[0] is 1, assign h[1]=>1
> A brand new key/value has been inserted to the hash this point.
> If #each then covered the next key/value pair:
> Second loop: k is 1, a[1] is 2, assign h[2]=>2 (and so on)

There are a maximum of four keys possible and you never remove any. The
second time through you're reassigning keys that are already there from
the first time.

W_James · 16 February 2009 02:53

Phrogz wrote:

Always relegating deletions to a delete_if or reject or
explicit calls while iterating a duplicate may be inefficient in some
cases where complex logic needs to happen and ideally happen in a
single pass through the data.

h={:b,22, :c,33, :d,44, :e,55}
==>{:b=>22, :c=>33, :e=>55, :d=>44}
h.merge( {:b,202, :e,505, :f,66} )
==>{:b=>202, :c=>33, :f=>66, :e=>505, :d=>44}

ThoML · 16 February 2009 05:28

> I think modifying a collection while iterating over it is undefined.
It seems a very poor practice to me.

It shouldn't go into an infinite loop though. IMHO an exception
("modification during iteration" or something like that) would be
nice.

Robert_K1 · 16 February 2009 08:30

>> (...)

I think modifying a collection while iterating over it is undefined.

Ah, doggone it, that was my third choice between "is it a bug or is it
not". And I kept saying to myself as I wrote that up "Remember, you
should never claim something is a bug unless you're really really
sure."

I'll accept this answer as reasonable, though it seems a bit of a
shame. Always relegating deletions to a delete_if or reject or
explicit calls while iterating a duplicate may be inefficient in some
cases where complex logic needs to happen and ideally happen in a
single pass through the data.

I think you left out plenty of options here. Of course, it depends on
the issue at hand but you can do at least these

1. Iterate through keys only

hash.keys.each {|k| ... }

This is safe for inserts and deletions because Hash#keys creates a
new, unrelated Array. This fits well your original example since you
are using keys only anway.

2. Using delete_if directly

hash.delete_if {|k,v| ...}

This method iterates all entries as well as safely deleting particular items.

3. storing keys prepared for deletion separately

del =
hash.each {|k,v| .... del << k if ...}
del.each {|k| hash.delete k}

It's 'safest' to say "DON'T TRUST ANYTHING, EVER", but provides the
lest flexibility during coding. It's most dangerous to say "YOU CAN
EXPLOIT WHATEVER IMPLEMENTATION QUIRKS AND BUGS THE CURRENT VERSION
HAS, WE PROMISE TO KEEP THOSE IN FUTURE VERSIONS", but also may
provide for convenient or efficiency via 'tricky' coding. Somewhere in
between is (my) ideal.

If changing during iteration is yields undefined results it is
perfectly ok to in one case endlessly loop and in the other do
something else. Code which does something forbidden is never safe.

Imagine if SQL said "Inserting multiple records use a select on the
table you are inserting into is undefined." Programmers everywhere
would nod their heads wisely and say "Good call". And some
implementations would allow it, some wouldn't, and some people would
accidentally rely on it, and others would be pissed to not be able to.

IMHO SQL is a bad language for an example because it has a dramatic
different nature than Ruby: SQL is declarative while Ruby is
procedural. Apart from that there is a SQL standard which defines
legal and illegal constructs.

Before it seems like I'm taking a strong stand for modification during
iteration: In my opinion, the only problem in this situation is simply
that the documentation for Hash#each provides no information. "We've
gotta be able to get some kind of reading on that shield, up or down!"

Ideally, in my mind, we should document:

a) [Implementation] How each iterating method happens to behave
currently with respect to additions, modifications, and deletions
during traversal, and

b) [Design] What is intended to be true about the implementation for
(foreseeable) future versions, and may be relied upon.

I agree, this should be documented. OTOH it is very common for
programming languages to not allow modifications of at least some
types of collections during iteration. So I personally do not expect
it to work unless explicitly stated - especially for hash based
structures which can change dramatically with the insertion of a
single entry just because of the way hash tables work.

I'll take this to ruby-core and see if I can gather details on Design
for a variety of methods, and offer up a doc patch. If anyone here can
provide any details about either Implementation or (for sure) Design,
I'd be happy to hear it.

AFAIK modification during iteration with #each is never safe for
Array, Hash (and thus also Set).

Kind regards

robert

···

2009/2/16 Phrogz <phrogz@mac.com>:

On Feb 15, 8:35 am, Pit Capitain <pit.capit...@gmail.com> wrote:

2009/2/15 David A. Black <dbl...@rubypal.com>:
> On Sun, 15 Feb 2009, Phrogz wrote:

--
remember.guy do |as, often| as.you_can - without end

Nobuyoshi_Nakada1 · 16 February 2009 01:23

Hi,

At Mon, 16 Feb 2009 09:54:44 +0900,
Charles Oliver Nutter wrote in [ruby-talk:328324]:

Ruby 1.9.1:

[0, 0]
1
{0=>0, 1=>1}

Could be just a minor difference or a bug in one or the other.

It's a bug fixed already in the trunk.

···

--
Nobu Nakada

Charles_Oliver_Nutte · 16 February 2009 04:01

Phrogz wrote:

I'm confused. As far as I can tell, the hash starts with 1 key, and
ends up with 2. What 'first time' are the keys being set?

I mean the first time passing through the set of keys, in comparison to some additional times you see in the repeating case (example #2 in your original email).

Note Nobu's response to my other email also. The JRuby behavior appears to be the correct one, and ends up with five keys for 0-3 plus nil.

- Charlie

Yukihiro_Matsumoto2 · 16 February 2009 06:32

Hi,

···

In message "Re: Iterating a changing Hash under 1.9.1" on Mon, 16 Feb 2009 14:28:07 +0900, Tom Link <micathom@gmail.com> writes:

It shouldn't go into an infinite loop though. IMHO an exception
("modification during iteration" or something like that) would be
nice.

Do you think speed decrease for normal case is acceptable?

matz.

Joel_VanderWerf1 · 16 February 2009 06:39

Yukihiro Matsumoto wrote:

Hi,

>It shouldn't go into an infinite loop though. IMHO an exception
>("modification during iteration" or something like that) would be
>nice.

Do you think speed decrease for normal case is acceptable?

matz.

It's also worrying that there is no clear definition of "during iteration", bearing in mind that any method which yields is an iterator, of sorts. Or would this only apply to a fixed set of core methods?

···

In message "Re: Iterating a changing Hash under 1.9.1" > on Mon, 16 Feb 2009 14:28:07 +0900, Tom Link <micathom@gmail.com> writes:

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Topic		Replies	Views
Question on iterating a hash ruby-talk	16	150	5 January 2006
Iterate over specific keys in a hash ruby-talk	4	125	23 June 2010
Iterating over an Array of Hashes ruby-talk	22	186	11 June 2011
Modifying a Hash in one process when .each is running in another ruby-talk	10	115	7 April 2010
Problem with each_key ruby-talk	4	69	25 July 2009

Iterating a changing Hash under 1.9.1

Related topics