Why Does Hash Apparently Reorder Its Internal Representation And Other Associated Ponderings

I was surprised to find the following...

irb(main):001:0> {:a => 'a', :b => 'b'}
=> {:a=>"a", :b=>"b"}
irb(main):002:0> {:a => 'a', :b => 'b', :c => 'c'}
=> {:a=>"a", :b=>"b", :c=>"c"}
irb(main):003:0> {:a => 'a', :b => 'b', :c => 'c', :d => 'd'}
=> {:a=>"a", :d=>"d", :b=>"b", :c=>"c"}
irb(main):004:0> {:a => 'a', :b => 'b', :c => 'c', :d => 'd', :e => 'e'}
=> {:a=>"a", :d=>"d", :b=>"b", :e=>"e", :c=>"c"}
irb(main):005:0> {:a => 'a', :b => 'b', :c => 'c', :d => 'd', :e => 'e', :f => 'f'}
=> {:a=>"a", :d=>"d", :b=>"b", :e=>"e", :c=>"c", :f=>"f"}

I expected that the sequence of hashes returned would be identical in order to the presented sequence. Or, that there would be some discernable sequence such as if a stack, if not as a queue.

I realise that it is a hash and that access is expected to be by key, but sometimes retaining the presented order may be desired. And anyway, why reorder it at all?

Furthermore, the returned order is different from a previous sequence I did! So, it appears that this is somewhat random, however unlikely that it is.

I then assumed that there might be some kind of optimisation going on, but how much optimisation of {:a => 'a', :b => 'b', ... } can there be?

It doesn't seem very PoLS to have it reordered, although perhaps one shouldn't be surprised that a hash is unordered? Perhaps Matz is convincing us of this statement? Said Matz unto the flock in a loud, Godly voice: "Make no assumptions about the order of hashes!"

And would eval %{ instance_of_hash[instance_of_fixnum] } really be so evil? Perhaps that was a little obscure... What's wrong with ordered access, using a numeric element reference as with Array, to Hash? Excepting that the order can't be relied upon, but assume that it could. Even simpler might have been to give an example:

h = {:a => 'a', :b => 'b'}
h[0]
=> 'a'

Similarly one might be able to treat an Array like a Hash as in eval %{ instance_of_array['instance_of_string_representation_of_a_fixnum'] }, such as with:

a = [ 1, 2 ]
a['0']
=> 1

There's no great call for the immediately above I would think, but if I implemented one then I would implement the other also, simply for the purposes of symmetry. I'm not even sure there is any need for either, such that I may be trying to achieve the unnecessary?...

Even so, would some unwritten law be being broken if I did this stuff?

I was surprised to find the following...

Really? Take a comsci course at uni

I realise that it is a hash and that access is expected to be by key,
but sometimes retaining the presented order may be desired. And
anyway, why reorder it at all?

It's not 'reordered', it's stored in a hash table. Think of how MD5
represents a unique key for almost every possible value, but the hash
has no real connection to the original value. A hash table is stored
using similar keys, though you are allowed duplicate keys for a hash
table, and if you go in to the mathematical theory of it there is a
lot of speed gained this way especially if you're using large keys,
such as in i18n applications where each string is stored in a hash
table. That's why it's called a hash and not an array.

···

On 8/21/06, thoran @ thoran. com <thoran@thoran.com> wrote:

--
Phillip Hutchings
http://www.sitharus.com/

I was surprised to find the following...

irb(main):001:0> {:a => 'a', :b => 'b'}
=> {:a=>"a", :b=>"b"}

[..]

I then assumed that there might be some kind of optimisation going on,
but how much optimisation of {:a => 'a', :b => 'b', ... } can there
be?

Hashes are unordered so that Ruby can use the hash function of the
keys. Read up on hashes at wikipedia:

There was a thread about making an "ordered hash" in the last few
weeks. Worth searching for.

;Daniel

···

On 8/21/06, thoran @ thoran. com <thoran@thoran.com> wrote:

--
Daniel Baird
http://tiddlyspot.com (free, effortless TiddlyWiki hosting)
http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog ::
Things That Suck)

thoran@thoran.com wrote:

I was surprised to find the following...

[snip]

This has been discussed many times over the last
six years. Do a search of the archives.

It doesn't seem very PoLS to have it reordered, although perhaps one shouldn't be surprised that a hash is unordered? Perhaps Matz is convincing us of this statement? Said Matz unto the flock in a loud, Godly voice: "Make no assumptions about the order of hashes!"

Don't invoke POLS. It's Matz's surprise that matters, not yours or
mine. And his voice is anything but loud.

Even so, would some unwritten law be being broken if I did this stuff?

Personally I favor introducing an ordered hash of some form into Ruby.
Other people don't. Many want the original Hash kept as it is for speed
(though I am still unconvinced that merely keeping a sequence number
along with each key would impact speed dramatically).

What might be best is a class that inherits from Hash and preserves
order. We'd need a literal notation, however.

Hal

<thoran@thoran.com> wrote in message
news:2d4684b07b9f965e5078fdb46754fac3@shiny...

I was surprised to find the following...

I realise that it is a hash and that access is expected to be by key, but
sometimes retaining the presented order may be desired. And anyway, why
reorder it at all?

    I am always surprised whenever this question comes up. Whenever it
does, it's obvious that the poster does not know what a hash is.

    I think what you are thinking of is a red black tree (or just a binary
tree, in general) and not a hash...

It doesn't seem very PoLS to have it reordered, although perhaps one
shouldn't be surprised that a hash is unordered? Perhaps Matz is
convincing us of this statement? Said Matz unto the flock in a loud,
Godly voice: "Make no assumptions about the order of hashes!"

    People invoke the PoLS way too often considering it's a claim that Matz
has never made.
    Personally, the Principle of Least Surprise is more about consistency
than it is about how often you're literally surprised. Often, you're
surprised because of your own ignorance (as in this instance), so you can
hardly blame Ruby for that! Ruby is very consistent and, in my opinion,
follows the PoLS. Whenever it doesn't, it ususally does so for a good
(pragmatic) reason...

    Honestly, complaining that hashes aren't ordered is like complaining
that rand() doesn't return the same number every time. Pray tell, what
made you think it should be ordered? That was an honest question, by the
way! You know enough about programming to come here and decree that this
is very weird yet you didn't already know what a hash is or how one works.
Very strange...

<thoran@thoran.com> wrote in message
news:2d4684b07b9f965e5078fdb46754fac3@shiny...

And would eval %{ instance_of_hash[instance_of_fixnum] } really be so
evil? Perhaps that was a little obscure... What's wrong with ordered
access, using a numeric element reference as with Array, to Hash?
Excepting that the order can't be relied upon, but assume that it could.
Even simpler might have been to give an example:

h = {:a => 'a', :b => 'b'}
h[0]
=> 'a'

Similarly one might be able to treat an Array like a Hash as in eval %{
instance_of_array['instance_of_string_representation_of_a_fixnum'] },
such as with:

a = [ 1, 2 ]
a['0']
=> 1

There's no great call for the immediately above I would think, but if I
implemented one then I would implement the other also, simply for the
purposes of symmetry. I'm not even sure there is any need for either,
such that I may be trying to achieve the unnecessary?...

    How would this even work?
    For instance:

hash = { 0 => 'a', 1 => 'b', 'c' => 'd' }

# for me, this prints 'a', 'd', 'b', in that order...
hash.each { |k,e| puts e }

puts hash[1] # what will this print?

    There's no way to tell whether the last line of this code is trying to
access the second element of the hash or the one that maps to the Fixnum
1...

Hi,

···

In message "Re: Why Does Hash Apparently Reorder Its Internal Representation And Other Associated Ponderings" on Mon, 21 Aug 2006 16:37:00 +0900, Hal Fulton <hal9000@hypermetrics.com> writes:

Personally I favor introducing an ordered hash of some form into Ruby.
Other people don't. Many want the original Hash kept as it is for speed
(though I am still unconvinced that merely keeping a sequence number
along with each key would impact speed dramatically).

I am open to introduce order into 1.9 Hash, as long as we can
accomplish reasonable performance. I haven't yet read the "A use case
for an ordered hash" thread yet. I am facing a huge mountain of mails
after the vacation.

              matz.

OMG! That's why my project is so buggy. :wink:

···

On Aug 21, 2006, at 9:05 AM, Just Another Victim of the Ambient Morality wrote:

s like complaining
that rand() doesn't return the same number every time

--
Vegetarians eat Vegetables, Humanitarians frighten me

Just Another Victim of the Ambient Morality wrote:

<thoran@thoran.com> wrote in message news:2d4684b07b9f965e5078fdb46754fac3@shiny...
  

I was surprised to find the following...

I realise that it is a hash and that access is expected to be by key, but sometimes retaining the presented order may be desired. And anyway, why reorder it at all?
    
    I am always surprised whenever this question comes up. Whenever it does, it's obvious that the poster does not know what a hash is.

    I think what you are thinking of is a red black tree (or just a binary tree, in general) and not a hash...

It doesn't seem very PoLS to have it reordered, although perhaps one shouldn't be surprised that a hash is unordered? Perhaps Matz is convincing us of this statement? Said Matz unto the flock in a loud, Godly voice: "Make no assumptions about the order of hashes!"
    
    People invoke the PoLS way too often considering it's a claim that Matz has never made.
    Personally, the Principle of Least Surprise is more about consistency than it is about how often you're literally surprised. Often, you're surprised because of your own ignorance (as in this instance), so you can hardly blame Ruby for that! Ruby is very consistent and, in my opinion, follows the PoLS. Whenever it doesn't, it ususally does so for a good (pragmatic) reason...

    Honestly, complaining that hashes aren't ordered is like complaining that rand() doesn't return the same number every time. Pray tell, what made you think it should be ordered? That was an honest question, by the way! You know enough about programming to come here and decree that this is very weird yet you didn't already know what a hash is or how one works. Very strange...
  
I think people expect it to be an associative array, like in PHP or Javascript. (I am pretty sure those preserve insertion order...)

But that's just a guess.

-Justin

    I think what you are thinking of is a red black tree (or just a binary
tree, in general) and not a hash...

No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.

    Honestly, complaining that hashes aren't ordered is like complaining
that rand() doesn't return the same number every time. Pray tell, what
made you think it should be ordered? That was an honest question, by the
way! You know enough about programming to come here and decree that this
is very weird yet you didn't already know what a hash is or how one works.
Very strange...

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

-austin

···

On 8/21/06, Just Another Victim of the Ambient Morality <ihatespam@rogers.com> wrote:
--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
               * austin@zieglers.ca

Yukihiro Matsumoto wrote:

Hi,

>Personally I favor introducing an ordered hash of some form into Ruby.
>Other people don't. Many want the original Hash kept as it is for speed
>(though I am still unconvinced that merely keeping a sequence number
>along with each key would impact speed dramatically).

I am open to introduce order into 1.9 Hash, as long as we can
accomplish reasonable performance. I haven't yet read the "A use case
for an ordered hash" thread yet. I am facing a huge mountain of mails
after the vacation.

My use case (I started that thread) may not be compelling. Other people,
including Bil Kleb, have said it would be useful to them also.

Bil's example was related to NASA's CEV. So an ordered hash would help
put people back on the moon. :wink:

Hal

···

In message "Re: Why Does Hash Apparently Reorder Its Internal Representation And Other Associated Ponderings" > on Mon, 21 Aug 2006 16:37:00 +0900, Hal Fulton <hal9000@hypermetrics.com> writes:

Yukihiro Matsumoto wrote:

Hi,

>Personally I favor introducing an ordered hash of some form into Ruby.
>Other people don't. Many want the original Hash kept as it is for speed
>(though I am still unconvinced that merely keeping a sequence number
>along with each key would impact speed dramatically).

I am open to introduce order into 1.9 Hash, as long as we can
accomplish reasonable performance.

Ruby is already slow enough. Those who are griping should be
using association lists.

···

In message "Re: Why Does Hash Apparently Reorder Its Internal Representation And Other Associated Ponderings" > on Mon, 21 Aug 2006 16:37:00 +0900, Hal Fulton <hal9000@hypermetrics.com> writes:

Honestly I would prefer a subclass and Mixins allowing to define the order
of retrieval.
As a matter of fact that is the only "order definition" that makes sense to
me

Now as it comes to implementation of course there must be a price to pay,
either on ordered access or on storage. So I agree with the others who
pointed out they are not willing to pay that price.
Hash is for fast lookup after all.
But having a subclass or a mixin for ordered hash access in the ruby core
would be *great*!!!

I am just dreaming:

ohhhhhh = OrderedHash.new { |k1,k2| tell_my_friend_what_order_is(k1, k2) }

class MyOH < OrderedHash
       def key_order(k1, k2) ### Here we could define Mixins for common key
orders

And would asking for having two like the following be too much?

class FastOrderdHash (key order is stored in an AST e.g.)
class SlimOrderedHash (keys are sorted on ordered access only)

just my micromoney worth

Cheers
Robert

···

On 8/21/06, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

Hi,

In message "Re: Why Does Hash Apparently Reorder Its Internal > Representation And Other Associated Ponderings" > on Mon, 21 Aug 2006 16:37:00 +0900, Hal Fulton < > hal9000@hypermetrics.com> writes:

>Personally I favor introducing an ordered hash of some form into Ruby.
>Other people don't. Many want the original Hash kept as it is for speed
>(though I am still unconvinced that merely keeping a sequence number
>along with each key would impact speed dramatically).

I am open to introduce order into 1.9 Hash, as long as we can
accomplish reasonable performance. I haven't yet read the "A use case
for an ordered hash" thread yet. I am facing a huge mountain of mails
after the vacation.

                                                        matz.

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

Austin Ziegler wrote:

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

And, for what it's worth, JavaScript's Object primitive is also an
associative array that retains insertion order for iterating keys,
while exhibiting performance characteristics of a hash.

"Austin Ziegler" <halostatue@gmail.com> wrote in message
news:9e7db9110608211246t5ba563e6sfc8ac48b3a5e517d@mail.gmail.com...

    I think what you are thinking of is a red black tree (or just a
binary
tree, in general) and not a hash...

No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.

    Ah, of course. I get the two (sorted order and insertion order) mixed
up since they are both rather popular in terms of what people
(surprisingly) expect of a hash...

    Honestly, complaining that hashes aren't ordered is like complaining
that rand() doesn't return the same number every time. Pray tell, what
made you think it should be ordered? That was an honest question, by
the
way! You know enough about programming to come here and decree that
this
is very weird yet you didn't already know what a hash is or how one
works.
Very strange...

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

    I see... This explains a lot, thank you...

···

On 8/21/06, Just Another Victim of the Ambient Morality > <ihatespam@rogers.com> wrote:

Austin Ziegler wrote:

    I think what you are thinking of is a red black tree (or just a binary
tree, in general) and not a hash...

No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.

    Honestly, complaining that hashes aren't ordered is like complaining
that rand() doesn't return the same number every time. Pray tell, what
made you think it should be ordered? That was an honest question, by the
way! You know enough about programming to come here and decree that this
is very weird yet you didn't already know what a hash is or how one works.
Very strange...

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

-austin

I don't think 'ordered' is a good name for something sorted by _insertion order_.

I am probably 'tainted' by my exposure to the Java API, but just like I expect an ordered tree to be sorted by value, I'd expect an ordered map to be sorted by key.

Insertion order maps are very useful too, and I'd love to see both added to the Ruby stdlib. I(o?)Hash and OHash? Let's keep Hash as lean and mean as possible.

Isak

···

On 8/21/06, Just Another Victim of the Ambient Morality > <ihatespam@rogers.com> wrote:

Austin Ziegler wrote:

···

On 8/21/06, Just Another Victim of the Ambient Morality > <ihatespam@rogers.com> wrote:

    I think what you are thinking of is a red black tree (or just a binary
tree, in general) and not a hash...

No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.

Red-black trees are (like binary trees) sorted by keys, rather than insertion order (but you could use the insertion ordinal as the key to get insertion order behavior). But they don't scale as well as Hash.

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Austin Ziegler wrote:

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

That probably speaks for me. I've never used PHP, but
an associative array is what I want.

I tend to think in terms of interface rather than
implementation. A hash has the interface I want,
*except* that it doesn't preserve order.

Hal

Hal Fulton wrote:

My use case (I started that thread) may not be compelling. Other people,
including Bil Kleb, have said it would be useful to them also.

So far, I've managed to duplicate the functionality of two
Java codes having 834 and 1084 lines of codes with 24 and 55
lines of Ruby, respectively. The second one is so long[1]
due to the lack of ordered Hashes (and my lack of Ruby mastery).

FWIW, the Ruby versions are more robust and general than the
Java versions due to the mini DSL I wrote for expressing
tolerances in a natural way.

Thanks again for Ruby!

Regards,

···

--
Bil
http://fun3d.larc.nasa.gov

[1] If you can call 55 LOC long!

William James wrote:

Ruby is already slow enough. Those who are griping should be
using association lists.

What's an association list?

Hal