Proposal: Array#to_h, to simplify hash generation

dblack@superlink.net wrote:

But the special case of converting an associative array to a hash is
different from the “classic” (in terms of volume of ruby-talk devoted
to it, and how long we’ve been discussing it :slight_smile: array-to-hash
conversion, as per RCR #12 and its definition of “hashify” (a term I
proposed reluctantly, knowing people would hate it :slight_smile: but it seemed
the most accurate for what I was describing). Modularization is a
good idea, though, particularly for the various home-grown
[{to_(h}ash]ify) variants in circulation, though organizing that kind
of thing community-wide is something I’ve never figured out how to do.

To me, ‘hashify’ implies taking an assoc array and converting it to hash
form (or perhaps the perl-influenced [a, b, c, d] → {a=>b, c=>d}). I
still can’t think of a name for the useful case :slight_smile: make_hash, perhaps …

*(1…10).make_hash {|i| f(i)}

or maybe the complementary hash_to and hash_from, where the block is
respectively the value and the key for the corresponding array entry :slight_smile:

martin

In article 004f01c34ec6$99a00c80$0100000a@chrismo,

class Array
def to_s
res = ‘’
each do |element|
res << ', ’ if !res.empty?
res << element
end
res
end
end

It may be simpler to achieve this by saying:

class Array
def to_s
join(', ')
end
end

Mike

···

Chris Morris chrismo@clabs.org wrote:

mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA

To me, ‘hashify’ implies taking an assoc array and converting it to hash
form (or perhaps the perl-influenced [a, b, c, d] → {a=>b, c=>d}). I
still can’t think of a name for the useful case :slight_smile: make_hash, perhaps …

*(1…10).make_hash {|i| f(i)}

or maybe the complementary hash_to and hash_from, where the block is
respectively the value and the key for the corresponding array entry :slight_smile:

martin

I like “make_hash”. We would get more flexibility if “make_hash” insisted
on receiving two values for the block: one for key and one for value. In
one instance recently, I wanted to map the “filename” part of a data
object to the object itself. This, I think, is readable:

map = receipts.make_hash { |r| r.filename, r }

Whereas in my pet case of mapping filename to size, we have

map = filenames.make_hash { |fn| fn, File.stat(fn).size }

And your example comes out as

(1…10).make_hash { |i| i, f(i) }

(I don’t know why you put the asterix there; Ranges are Enumerable.)

And, of course, the obligitory reference implementation:

module Enumerable
def make_hash
result = {}
each do |elt|
key, value = yield(elt)
result[key] = value
end
result
end
end

I’ve raised an RCR for this (#148).

Cheers,
Gavin

To me, ‘hashify’ implies taking an assoc array and converting it to hash
form (or perhaps the perl-influenced [a, b, c, d] → {a=>b, c=>d}). I
still can’t think of a name for the useful case :slight_smile: make_hash, perhaps …

*(1…10).make_hash {|i| f(i)}

or maybe the complementary hash_to and hash_from, where the block is
respectively the value and the key for the corresponding array entry :slight_smile:

I’d like the code to be something like this:

module Enumerable
def to_h
h = Hash.new
if block_given?
self.each { |e| h[e] = yield(e) }
else
self.each { |key, value| h[key] = value }
end
return h
end
end

(1…5).to_h { |n| nn }
=> {5=>25, 1=>1, 2=>4, 3=>9, 4=>16}
[ [1,2], [3,4] ].to_h
=> {1=>2, 3=>4}
[ [1,2], [3,4] ].to_h.to_a
=> [[1, 2], [3, 4]]
[ [1,2], [3,4] ].to_h.to_a.to_h
=> {1=>2, 3=>4}
Dir["/bin/d
"].to_h { |f| File.size(f) }
=> {“/bin/dnsdomainname”=>9332, “/bin/date”=>25728, “/bin/dd”=>29492, “/bin/dmesg”=>3924, “/bin/df”=>27368, “/bin/domainname”=>9332}

I would almost prefer [1,2,3,4].to_h => {1=>2, 3=>4}, but Hash#to_h returns a
nested array, so that’s what this code does. Plus it’s easier to implement. :slight_smile:

Jason Creighton

···

On Mon, 21 Jul 2003 05:32:25 GMT Martin DeMello martindemello@yahoo.com wrote:

Hi –

I like “make_hash”. We would get more flexibility if “make_hash” insisted
on receiving two values for the block: one for key and one for value. In
one instance recently, I wanted to map the “filename” part of a data
object to the object itself. This, I think, is readable:
map = receipts.make_hash { |r| r.filename, r }

Whereas in my pet case of mapping filename to size, we have

map = filenames.make_hash { |fn| fn, File.stat(fn).size }

And your example comes out as

(1…10).make_hash { |i| i, f(i) }

(Wouldn’t you have to wrap your two return values in an array to get
the above to parse?)

I’ve raised an RCR for this (#148).

I’m not sure how this differs from (rejected) RCR#12 (except for
having to return a key as well as a value).

David

···

On Mon, 21 Jul 2003, Gavin Sinclair wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

I like “make_hash”. We would get more flexibility if “make_hash” insisted
on receiving two values for the block: one for key and one for value. In
one instance recently, I wanted to map the “filename” part of a data
object to the object itself. This, I think, is readable:

map = receipts.make_hash { |r| r.filename, r }

It’d be nice if => were merely an alias for , so that we could say

.make_hash {|r| r.filename => r}

with two arguments, there’s always the risk of confusing them.

Perhaps .make_hash {|r| {r.filename => r}}, where it updates the hash
with the anon hash. Inefficient, though, and it has the ugly {{ }}.

(I don’t know why you put the asterix there; Ranges are Enumerable.)

Yeah, I forget that from time to time.

martin

···

Gavin Sinclair gsinclair@soyabean.com.au wrote:

You have that already:

irb(main):001:0> Hash[1,2,3,4]
=> {1=>2, 3=>4}

Cheers,

Brian.

···

On Tue, Jul 22, 2003 at 01:02:53AM +0900, Jason Creighton wrote:

I would almost prefer [1,2,3,4].to_h => {1=>2, 3=>4}

Hi –

I like “make_hash”. We would get more flexibility if “make_hash” insisted
on receiving two values for the block: one for key and one for value. In
one instance recently, I wanted to map the “filename” part of a data
object to the object itself. This, I think, is readable:
map = receipts.make_hash { |r| r.filename, r }

Whereas in my pet case of mapping filename to size, we have

map = filenames.make_hash { |fn| fn, File.stat(fn).size }

And your example comes out as

(1…10).make_hash { |i| i, f(i) }

(Wouldn’t you have to wrap your two return values in an array to get
the above to parse?)

I was kinda hoping not, but so be it. The thin veneer of presenting
tested code vanishes before everyone’s eyes. I was surprised to
discover today that code (1) below works, but not code (2).

(1) def foo; 2,4; end
(2) def foo; return 2,4; end

I’ve raised an RCR for this (#148).

I’m not sure how this differs from (rejected) RCR#12 (except for
having to return a key as well as a value).

How did O.J. Simpson’s second trial differ from his first? :wink:

Anyway, I think returning a key as well as a value is a significant
difference:

  • much more flexible (I create all kinds of hashes all the time in
    my code, and could really use that flexibility)
  • less magical, more scrutible: having two values makes it clear what
    is going on, given that we’re dealing with a hash. With the
    single-value to_hash/hashify, I had to keep reminding myself what
    it meant; not so with the new “make_hash”.

Gavin

···

On Monday, July 21, 2003, 8:18:50 PM, dblack wrote:

On Mon, 21 Jul 2003, Gavin Sinclair wrote:

tested code vanishes before everyone's eyes. I was surprised to
discover today that code (1) below works, but not code (2).

  (1) def foo; 2,4; end
  (2) def foo; return 2,4; end

Well, you want to say say (2) work but not (1), no ?

Guy Decoux

Can I suggest another name - “collect_hash” - since that’s basically what it
is?

collect {… return [x,y] } =>> [[a,b], [c,d], …]
collect_hash {… return [x,y] } =>> {a=>b, c=>d, …}

In which case it clearly belongs in Enumerable - see Gavin’s implementation
in [RubyTalk:76446]

It’s still not an inverse operation to Hash#to_a, and I think there could
still be value in that. You could simulate it of course, using

myhash = myarray.collect_hash { |pair| pair }

If we didn’t have Hash#to_a then it could also be implemented as

myarray = myhash.collect { |pair| pair }

But we do, so we don’t bother.

Regards,

Brian.

···

On Mon, Jul 21, 2003 at 11:34:39PM +0900, Gavin Sinclair wrote:

I’m not sure how this differs from (rejected) RCR#12 (except for
having to return a key as well as a value).

How did O.J. Simpson’s second trial differ from his first? :wink:

Anyway, I think returning a key as well as a value is a significant
difference:

  • much more flexible (I create all kinds of hashes all the time in
    my code, and could really use that flexibility)
  • less magical, more scrutible: having two values makes it clear what
    is going on, given that we’re dealing with a hash. With the
    single-value to_hash/hashify, I had to keep reminding myself what
    it meant; not so with the new “make_hash”.

Thanks. I really do get things wrong far too much.

Gavin

···

On Tuesday, July 22, 2003, 12:43:19 AM, ts wrote:

tested code vanishes before everyone’s eyes. I was surprised to
discover today that code (1) below works, but not code (2).

(1) def foo; 2,4; end
(2) def foo; return 2,4; end

Well, you want to say say (2) work but not (1), no ?

‘collect_hash’ sounds nice

p [1, 2, 3].collect_hash { |i| i, “f(#{i})” }
#=> {1=>“f(1)”, 2=>“f(3)”, 3=>“f(3)”}

Other possible names (less great):

  • build_hash
  • make_hash
  • to_hash
  • map_hash
  • combine(_to_hash)

May I suggest a Enum#build routine, working like this pseudo code

[1, 2, 3].build(Hash) { |i| i, f(i) }
#=> {1=>“f(1)”, 2=>“f(3)”, 3=>“f(3)”}

[1, 2, 3].build(String) { |i| i.class }
#=> “FixnumFixnumFixnum”

[1, 2, 3].build(Array) { |i| Regex.new(i.to_s) }
#=> [/1/, /2/, /3/]

Is this madness, appending elements this way ?
Especialy will this work for hash ?

···

On Tue, 22 Jul 2003 00:59:50 +0900, Brian Candler wrote:

Can I suggest another name - “collect_hash” - since that’s basically what it
is?


Simon Strandgaard

Hmm, interesting idea: a way to delegate collection to different classes
without having to clutter up Enumerable with collect_hash, collect_string
etc.

If a class implements a standard “add element” method, perhaps <<, then how
about this:

class Array # repeat for module Enumerable as well
def collect(dest=Array,*args)
res = dest.new(*args)
each do |item|
res << yield(item)
end
res
end
end

class Hash
def <<(kvpair)
self[kvpair[0]] = kvpair[1]
end
end

a = [[“one”,“two”],[“three”,“four”],[“five”,“six”]]

p a.collect {|e| e[0]} #>> [“one”, “three”, “five”]
p a.collect(String,“”) {|e| e[1]} #>> “twofoursix”
p a.collect(Hash) {|e| e} #>> {“five”=>“six”, “three”=>“four”, “one”=>“two”}

The bit I don’t like is Hash#<< taking an array of two elements, rather than
having two arguments, but I guess that’s what you get from having a method
rather than a proc.

A minor quibble comes out of this: why doesn’t String.new with no arguments
just return an empty string? In 1.6.8 it gives an ArgumentError exception.

Your examples work, with minor modification:

def f(i) i*i; end
p [1, 2, 3].collect(Hash) { |i| [i, f(i)] }

{1=>1, 2=>4, 3=>9}

p [1, 2, 3].collect(String,“”) { |i| i.class.to_s }

“FixnumFixnumFixnum”

p [1, 2, 3].collect(Array) { |i| Regexp.new(i.to_s) }

[/1/, /2/, /3/]

Cheers,

Brian.

···

On Tue, Jul 22, 2003 at 01:23:07AM +0900, Simon Strandgaard wrote:

May I suggest a Enum#build routine, working like this pseudo code

[1, 2, 3].build(Hash) { |i| i, f(i) }
#=> {1=>“f(1)”, 2=>“f(3)”, 3=>“f(3)”}

[1, 2, 3].build(String) { |i| i.class }
#=> “FixnumFixnumFixnum”

[1, 2, 3].build(Array) { |i| Regex.new(i.to_s) }
#=> [/1/, /2/, /3/]

Is this madness, appending elements this way ?
Especialy will this work for hash ?

I forgot about using Method#arity. New version attached… Brian.

class Array # ditto for Enumerable
def collect(dest=Array,*args)
res = dest.new(*args)
if res.method(:<<).arity == 1
each do |item|
res << yield(item)
end
else
each do |item|
res.send(:<<, *yield(item))
end
end
res
end
end

class Hash
def <<(key,val)
self[key] = val
end
end

a = [[“one”,“two”],[“three”,“four”],[“five”,“six”]]

p a.collect {|e| e[0]}
p a.collect(String,“”) {|e| e[1]}
p a.collect(Hash) {|e| e}
p a.collect(Hash) {|k,v| [v,k]} # like Hash#invert

def f(i) i*i; end

p [1, 2, 3].collect(Hash) { |i| [i, f(i)] }
p [1, 2, 3].collect(String,“”) { |i| i.class.to_s }
p [1, 2, 3].collect(Array) { |i| Regexp.new(i.to_s) }

···

On Tue, Jul 22, 2003 at 01:43:46AM +0900, Brian Candler wrote:

The bit I don’t like is Hash#<< taking an array of two elements, rather than
having two arguments

A minor quibble comes out of this: why doesn’t String.new with no arguments
just return an empty string? In 1.6.8 it gives an ArgumentError exception.

~$ ruby -v
ruby 1.8.0 (2003-06-23) [i686-linux]
~$ irb

String.new
=> “”

Jason Creighton

···

On Tue, 22 Jul 2003 01:43:46 +0900 Brian Candler B.Candler@pobox.com wrote:

Thanks. Must get 1.8pre installed somewhere.

Another idea for code just posted: change two lines as follows,

def collect(res=Array,*args)
res = res.new(*args) if res.is_a? Class

Then collect could append to an existing object as well:

[1, 2, 3].collect(“fred”) { |x| x.to_s} #>> “fred123”

h = {“one”=>“two”}
[[“three”,“four”]].collect(h) {|x| x} #>> {“one”=>“two”,“three”=>“four”}

The dangerous thing is that you are modifying an existing object, whereas
collect normally generates a new instance of something.

But this pattern covers a whole lot of existing cases: e.g.

h1.update(h2)

is just a short form for

h2.collect(h1) {|x| x}

Perhaps “collect_into” would be a better name for this new method?

Cheers,

Brian.

···

On Tue, Jul 22, 2003 at 02:04:08AM +0900, Jason Creighton wrote:

A minor quibble comes out of this: why doesn’t String.new with no arguments
just return an empty string? In 1.6.8 it gives an ArgumentError exception.

~$ ruby -v
ruby 1.8.0 (2003-06-23) [i686-linux]
~$ irb

String.new
=> “”

Maybe its better to do the #new outside collect ?

Like this:
[1, 2, 3].collect(String.new("Test: ")) { |i| i.class }

“Test: FixnumFixnumFixnum”

res = Hash.new
def f(v) v*v; end
[1, 2].map(res) {|i| [i, f(i)] }
[3, 4].map(res) {|i| [i, f(i-2)] }
p res # print the accumulated result

{1=>1, 2=>4, 3=>1, 4=>4}

Accumulating is good :slight_smile:

···

On Tue, 22 Jul 2003 02:57:40 +0900, Brian Candler wrote:

On Tue, Jul 22, 2003 at 01:43:46AM +0900, Brian Candler wrote:

The bit I don’t like is Hash#<< taking an array of two elements, rather than
having two arguments

I forgot about using Method#arity. New version attached… Brian.

class Array # ditto for Enumerable
def collect(dest=Array,*args)
res = dest.new(*args)


Simon Strandgaard

Yes, I worked that out in parallel with you :slight_smile: Thinking while walking home,
the pattern

foo.collect_into(bar) {|v| v}

seems to be pretty useful, so I think that should be the default if no block
is provided. Then arr.to_h would be just:

arr.collect_into({})

which seems pretty intuitive to me. And h1.update(h2) would be equivalent to

h2.collect_into(h1)

which is also pretty clear.

Since not all objects necessarily have a ‘<<’ method then maybe it’s worth
having that as a parameter as well. The attached code lets you do that, or
guesses at a suitable method if one isn’t provided.

Cheers,

Brian.

collect_into.rb (2.04 KB)

···

On Tue, Jul 22, 2003 at 02:24:35AM +0900, Simon Strandgaard wrote:

On Tue, 22 Jul 2003 02:57:40 +0900, Brian Candler wrote:

On Tue, Jul 22, 2003 at 01:43:46AM +0900, Brian Candler wrote:

The bit I don’t like is Hash#<< taking an array of two elements, rather than
having two arguments

I forgot about using Method#arity. New version attached… Brian.

class Array # ditto for Enumerable
def collect(dest=Array,*args)
res = dest.new(*args)

Maybe its better to do the #new outside collect ?

The bit I don’t like is Hash#<< taking an array of two elements, rather than
having two arguments

I forgot about using Method#arity. New version attached… Brian.

class Array # ditto for Enumerable
def collect(dest=Array,*args)
res = dest.new(*args)

Maybe its better to do the #new outside collect ?

Yes, I worked that out in parallel with you :slight_smile: Thinking while walking home,
the pattern

:slight_smile:

Since not all objects necessarily have a ‘<<’ method then maybe it’s worth
having that as a parameter as well. The attached code lets you do that, or
guesses at a suitable method if one isn’t provided.

Good idea to test for different append methods

[snip code]

#collect_into / #map_into is nice.
How about abbreviating the name, so its just: #into

p [1, “2”, 3].into({}) { |i| [i.class, i] }

{“Fixnum”=>[1, 3], “String”=>[“2”]}

Important: Hash#append should not overwrite earlier entries.

···

On Tue, 22 Jul 2003 06:00:53 +0900, Brian Candler wrote:

On Tue, Jul 22, 2003 at 02:24:35AM +0900, Simon Strandgaard wrote:

On Tue, 22 Jul 2003 02:57:40 +0900, Brian Candler wrote:

On Tue, Jul 22, 2003 at 01:43:46AM +0900, Brian Candler wrote:


Simon Strandgaard

Hmm. What should it do then? Raise an exception? h1.update(h2) will
overwrite corresponding keys in h1, so there is a precedent for collect_into
working in that way.

I would rule out, say, automatically promoting a value into an array. It’s
too application-specific. If your application does in fact hold a hash of
arrays, then I think you’d code your append method accordingly; in other
cases, the structure may be different (a hash of hashes, say), in which case
the append method is different.

Cheers,

Brian.

···

On Tue, Jul 22, 2003 at 05:46:35AM +0900, Simon Strandgaard wrote:

Important: Hash#append should not overwrite earlier entries.