Hi,
I switched from 'my' thread because this seems more appropriate to this thread...
Previously "Re: Why Does Hash Apparently Reorder Its Internal Representation And Other Associated Ponderings"
Or, Is It Possible To Have One's Cake And Eat It Too?
First, a taxa of the different sorts of beasts of which we are speaking might help:
a. associative array
i. unordered
ii. insertion order-ordered
iii. sort function-ordered
1. manual (adhoc)
2. automatic (upon insertion)
b. hash
i. unordered
ii. insertion order-ordered
iii. sort function-ordered
1. manual (adhoc)
2. automatic (upon insertion)
Does this cover it or is there a redundancy or something missing? For instance are a.i. and a.ii. really the same thing by virtue of there not being a hashing function? Does a.i. make any sense and can be done away with from the get-go? And are a.ii. and b.ii, and a.iii. and b.iii. really the same thing also? Not if the hash still retained the hash as you would it expect it to.
And is there a term which could reasonably cover all of a. and b.? I might have used the term 'hash' to refer to anything which is a key-value pair, but this is apparently confusing or inaccurate.
I tried set, but that's wrong. I tried enumerable, but that's wrong too.
I propose that the term, list, be used to cover anything which would be able to mix in Enumerable and that the constituent parts of a list are called items, rather than an element, which is a set/array term and not appropriate for a hash or associative array. Still that doesn't give a name for associative lists. I love that when it just comes out like that... AssociativeList it is then!
And so are these three sorts of data structures (array, associative array, and hash) sufficiently different to warrant none subclassing from the other? If so, then it would also probably warrant a syntax addition and a core implementation (as discussed). And they are presumably sufficiently similar that it might make sense to subclass all of them from the same base class; list for instance, and then mixin Enumerable there instead. Although one may as well just put the methods in directly to List then however?
There have been concerns that a sort function-ordered or an insertion-ordered, or just ordered hash will cause slowdowns and memory bloat, so I was wondering if it was possible to adjust according to need. So, thank you to Charles, as I am reminded of what I did in compsci all that time ago!
Now given that it is possible or desirable to have a Hash redefine the number of buckets, or rehash, is there any reason why it can't also reassign its internal type at a deeper level depending on access?
Thought experiment time...
What if all Hashes were able to be ordered, but rehashable for storage type depending on how it is instantiated and how it is accessed? One could specify storage organisation at instantiation or later, or allow it to decide for itself what sort of hash it will be.
m = [:a => 'a'] # m is for map. I could go for this term too as it has brevity.
=> [:a => 'a']
m[0]
=> [:a => 'a'] # m remains as an associative array.
m[:a]
=> {:a => 'a'} # m becomes an explicit hash.
m[0]
=> {:a => 'a'} # m becomes an associative array.
m[:a]
=> {:a => 'a'} # m becomes an explicit hash.
m[[0]]
=> [:a => 'a'] # m becomes an associative array.
{} = m, or
m {}= m, or
m.to_h
=> {:a => 'a'}
This would be read as "make m equal to itself if it is already a hash, else make it so". Similarly, going the other way:
[]= m, or
m []= m, or
m.to_assoc
=> [:a => 'a']
Then perhaps one might like to also do something like this?...
n = [:b => 'b']
=> [:b => 'b']
o {}= n
=> {:b => 'b'}
Useful? Anyhow, that was an aside.
Now, why not take this further?... What about a generalized list? As in:
l = [:a => 'a', :b, {:c => 'c'}] # which is really, or at least appears to be as if it were: {0 => [:a => 'a'], 1 => :b, 2 => {:c => 'c'}}
The method of access determines how it is to be treated...
l{0}
=> {:a => 'a'} # or should that be {0 => [:a => 'a']}?
l[[0]]
=> 'a'
l[0]
=> [:a => 'a']
l{1}
=> {1 => :b} # or should that be {1 => {:b => nil}}?
l[1]
=> [1 => :b] # or should that simply be :b?
l[[1]]
=> :b
l[2]
=> {:c => 'c'}
l[[2]]
=> 'c'
l{2}
=> {2 => {:c => 'c'}} # or should that be {:c => 'c'}?
Having only lists might then do away with the literal instantiation of an empty AssociativeArray problem (assuming a double bracket notation):
a = [[]] # Is this a nested array or an associative array?
=> [[]]
list = []
array = [] # Really a list.
hash = {} # Explicit hash list sub-type declaration.
list = {:a => 'a'}
=> {:a => 'a'}
By assigning a non key-value pair to the hash it becomes a more general list internally and this is represented by the form of notation, from {} to [].
list[1] = 'b'
=> [:a => 'a', 'b']
In a list the numeric value of an item is always present, so an assignment which uses a numeric index will assume that it is a simple placement at the position given by the numeric index of whatever object is presented to the list.
An assignment which treats this as an associative array will work the same as if it were an assignment without:
list[[2]] = 'c'
=> [:a => 'a', 'b', 'c']
Another example might be good:
list[3] = {:d => 'd'}
=> [:a => 'a', 'b', 'c', {:d => 'd'}]
list[3]
=> {:d => 'd'}
list[[4]] = {:e => 'e'} # Allowable?
=> [:a => 'a', 'b', 'c', {:d => 'd'}, {:e => 'e'}]
list[4]
=> {:e => 'e'}
list[[4]]
=> 'e'
However,
list[0]
=> [:a => 'a']
list[[0]]
=> 'a'
This is all of the top of my head, so it probably needs some work...
And as for a dynamically altering Hash, List also could change its internal organisation on the fly depending on need. Such that if all elements are homogenous, then the internal organisation would be optimised for that homogeneity. If that changes to be heterogenous, then a more general and less optimised form of internal organisation can be chosen dynamically to accomodate.
I realise the overhead if types and methods of access changed all the time, but does it?
t
P.S. The irony of my apparently having no thoughts about language design is choking me.