Proposal: Array#to_h, to simplify hash generation

Hi -talk,

Ruby has wonderful support for chewing and spitting arrays. For
instance, it’s easy to produce an array from any Enumerable using
#map. With hashes, however, it’s a bit more cumbersome.

For example, the following method is typical of my code:

# return { filename -> size }

def get_local_gz_files
files = {}
Dir["*.gz"].each do |filename|
files[filename] = File.stat(filename).size
end
files
end

The pattern is: create an empty hash, populate it, and return it. Now
Ruby is a wonderfully expressive and terse language. Accordingly, the
two lines devoted to initialising and returning the hash in the above
code seem wasted.

If Ruby had Array#to_h, then I could rewrite it as:

# return { filename -> size }

def get_local_gz_files
Dir["*.gz"].map { |filename|
[ filename, File.stat(filename).size ]
}.to_h
end

The proposed implementation of Array#to_h is per the following code:

class Array
def to_h
hash = {}
self.each do |elt|
raise TypeError unless elt.is_a? Array
key, value = elt[0…1]
hash[key] = value
end
hash
end
end

For the final justification, note that this is the logical reverse of
Hash#to_a:

h = {:x => 5, :y => 10, :z => -1 }
a = h.to_a # => [[:z, -1], [:x, 5], [:y, 10]]

And now, for my next trick…

a.to_h == h # => true (gosh, that actually worked)

Thoughts?

Gavin

  def get_local_gz_files
    files = {}
    Dir["*.gz"].each do |filename|
      files[filename] = File.stat(filename).size
    end
    files
  end

svg% cat b.rb
#!/usr/bin/ruby
def get_local_c_files
   Hash[*Dir["*.c"].map do |filename|
      [filename, File.stat(filename).size]
   end.flatten]
end
p get_local_c_files
svg%

svg% b.rb
{"st.c"=>10714, "range.c"=>10706, "enum.c"=>11250, "util.c"=>22676,
"sprintf.c"=>12332, "re.c"=>38877, "version.c"=>1094, "random.c"=>6485,
"object.c"=>34530, "class.c"=>17870, "main.c"=>988, "compar.c"=>2720,
"array.c"=>43170, "process.c"=>30792, "io.c"=>82748, "dln.c"=>39614,
"variable.c"=>35056, "time.c"=>32796, "string.c"=>69845, "regex.c"=>123352,
"numeric.c"=>36979, "inits.c"=>1765, "dmyext.c"=>20, "dir.c"=>21761,
"signal.c"=>13318, "pack.c"=>39965, "math.c"=>6199, "hash.c"=>39087,
"error.c"=>25114, "parse.c"=>348857, "ruby.c"=>22725, "marshal.c"=>27620,
"lex.c"=>4480, "bignum.c"=>34051, "struct.c"=>15141, "prec.c"=>1677,
"gc.c"=>34935, "file.c"=>58392, "eval.c"=>219839}
svg%

Guy Decoux

It does, almost:

irb(main):001:0> a = [“cat”,“one”,“dog”,“two”]
=> [“cat”, “one”, “dog”, “two”]
irb(main):002:0> Hash[*a]
=> {“cat”=>“one”, “dog”=>“two”}

I don’t remember seeing an exact inverse of Hash#to_a though, i.e. one which
converts [[a,b],[c,d]] to {a=>b, c=>d}

You can always ‘flatten’ your array, as long as the elements of the hash
you’re creating aren’t themselves arrays.

Regards,

Brian.

···

On Sat, Jul 19, 2003 at 11:22:20PM +0900, Gavin Sinclair wrote:

If Ruby had Array#to_h, then I could rewrite it as:

Hi,

···

In message “Proposal: Array#to_h, to simplify hash generation” on 03/07/19, Gavin Sinclair gsinclair@soyabean.com.au writes:

If Ruby had Array#to_h, then I could rewrite it as:

return { filename → size }

def get_local_gz_files
Dir[“*.gz”].map { |filename|
[ filename, File.stat(filename).size ]
}.to_h
end

It has been proposed several times. The issues are

  • whether the name “to_h” is a good name or not. somebody came up
    with the name “hashify”. I’m not excited by both names.

  • what if the original array is not an assoc array (array of arrays
    of two elements). raise error? ignore?

    					matz.
    

Hi -talk,

Ruby has wonderful support for chewing and spitting arrays. For
instance, it’s easy to produce an array from any Enumerable using
#map. With hashes, however, it’s a bit more cumbersome.

For example, the following method is typical of my code:

# return { filename -> size }

def get_local_gz_files
files = {}
Dir[“*.gz”].each do |filename|
files[filename] = File.stat(filename).size
end
files
end

One option, in this case, is to hijack the Hash#new block form:

files = Hash.new { |hash, key| hash[key] = File.stat(key).size }

files is now a “magic” hash that will stat any file that that’s used as a key.
If you’re not just doing random access, you could fill it like so:

Dir[“*.gz”].each { |f| files[f] }

The block will be called to return a value instead of nil when a key is missing:
We assign to the hash to save that value as well. You can do all sorts of
weird stuff using this feature:

e = Hash.new { |h, k| eval(k) }
=> {}
e[“Time.now”]
=> Tue Jul 22 13:32:56 MDT 2003

e[“Time.now”]
=> Tue Jul 22 13:33:03 MDT 2003
e = Hash.new { |h, k| h[k] = eval(k) }
=> {}
e[“Time.now”]
=> Tue Jul 22 13:35:09 MDT 2003

e[“Time.now”]
=> Tue Jul 22 13:35:09 MDT 2003

Jason Creighton

···

On Sat, 19 Jul 2003 23:22:20 +0900 Gavin Sinclair gsinclair@soyabean.com.au wrote:

Yes, it’s a bit ugly, though, IMO. There’s a need for a nicer way.

Gavin

···

On Sunday, July 20, 2003, 12:41:40 AM, ts wrote:

def get_local_gz_files
files = {}
Dir[“*.gz”].each do |filename|
files[filename] = File.stat(filename).size
end
files
end

svg% cat b.rb
#!/usr/bin/ruby
def get_local_c_files
Hash[Dir[".c"].map do |filename|
[filename, File.stat(filename).size]
end.flatten]
end
p get_local_c_files
svg%

Hi,

If Ruby had Array#to_h, then I could rewrite it as:

return { filename → size }

def get_local_gz_files
Dir[“*.gz”].map { |filename|
[ filename, File.stat(filename).size ]
}.to_h
end

It has been proposed several times.

I thought it sounded familiar, but didn’t see an RCR.

The issues are

  • whether the name “to_h” is a good name or not. somebody came up
    with the name “hashify”. I’m not excited by both names.

#to_h sounds good to me - we already have to_s, to_a, to_i, etc. It’s
just too sweet that Hash#to_a and Array#to_h should be the inverse of
each other.

What don’t you like about #to_h?

#to_hash is fine by me too, but I don’t really know the nuances of
to_s/to_str, to_a/to_ary, …

  • what if the original array is not an assoc array (array of arrays
    of two elements). raise error? ignore?

Raise error. #to_h is clearly a method to be used with care. People
are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
could be the equivalent to Hash[1,2,3,4]. But then there’s the corner
case: [ [1,2], “x”, [7,8], “g” ].to_h.

I think I would insist on the input being an assoc array.

Gavin

···

On Sunday, July 20, 2003, 1:31:42 AM, Yukihiro wrote:

In message “Proposal: Array#to_h, to simplify hash generation” > on 03/07/19, Gavin Sinclair gsinclair@soyabean.com.au writes:

Hi,

I thought it sounded familiar, but didn’t see an RCR.

I don’t remember the RCR number. Search for “hashify”.

What don’t you like about #to_h?

I just didn’t feel we had consensus. Besides, “to_h” you’ve proposed
work for arrays with specific structure (assoc like).

#to_hash is fine by me too, but I don’t really know the nuances of
to_s/to_str, to_a/to_ary, …

Longer versions are for implicit conversion. An object that has
“to_str” works like a string if it’s given as an argument.

Note we have “to_hash” already. But this would not be the reason for
“to_h”. We have “to_io” without the shorter version, for example.

  • what if the original array is not an assoc array (array of arrays
    of two elements). raise error? ignore?

Raise error. #to_h is clearly a method to be used with care. People
are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
could be the equivalent to Hash[1,2,3,4]. But then there’s the corner
case: [ [1,2], “x”, [7,8], “g” ].to_h.

I think I would insist on the input being an assoc array.

TypeError? or ArgumentError?

I just remembered that I thought Hash[ary] might be the better
solution. I’m not sure why I didn’t implement it. I have very loose
memory.

						matz.
···

In message “Re: Proposal: Array#to_h, to simplify hash generation” on 03/07/20, Gavin Sinclair gsinclair@soyabean.com.au writes:

And we already have Array methods that assume an associative array.

m.

···

Gavin Sinclair gsinclair@soyabean.com.au wrote:

Raise error. #to_h is clearly a method to be used with care. People
are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
could be the equivalent to Hash[1,2,3,4]. But then there’s the corner
case: [ [1,2], “x”, [7,8], “g” ].to_h.

I think I would insist on the input being an assoc array.

It’s #12. Interesting: I like the #hashify idea better than my proposal.

My original code could be written

# return { filename -> size }

def get_local_gz_files
Dir[“*.gz”].to_hash { |filename| File.stat(filename).size }
end

That does away with the intermediate assoc array, and is overall very
elegant. Best of all, it can be used with any Enumerable type, and it
doesn’t have any requirement on the structure of the receiver.

module Enumerable
def to_hash
result = {}
each do |elt|
result[elt] = yield(elt)
end
result
end
end

That is capturing the very idiom I have repeated so many times.

Alternatives to #to_hash are:
hashify (the original and the worst :slight_smile:
map_hash
hash_map (it is, after all, mapping a collection into a hash)

I think I like “map_hash” the best.

[“cat”, “dog”, “mouse”].map { |s| s.length }
# → [3, 3, 5]

[“cat”, “dog”, “mouse”].map_hash { |s| s.length }
# → {“cat”=>3, “mouse”=>5, “dog”=>3}

Gavin

···

On Sunday, July 20, 2003, 2:56:08 AM, Yukihiro wrote:

Hi,

In message “Re: Proposal: Array#to_h, to simplify hash generation” > on 03/07/20, Gavin Sinclair gsinclair@soyabean.com.au writes:

I thought it sounded familiar, but didn’t see an RCR.

I don’t remember the RCR number. Search for “hashify”.

While we’re on the subject of “to_s” and similar friends, I have a
question about Array#to_s and Hash#to_s.

I’ve never found them very useful the way they work by default:

irb(main):026:0> [1, 2, 3, 4].to_s
“1234”

irb(main):028:0> {1 =>2, 3 => 4}.to_s
“1234”

The main problem here is that Array#to_s calls join with the default
field separator, which for some reason is “”. To me, this isn’t
intuitive. Is there some historical reason why this behavior exists?
Even less intuitive to me is Hash#to_s, because the way the conversion
is done you lose any concept it was a hash.

Both of these default behaviors can be changed by setting $,

irb(main):033:0> $, = ", "
", "
irb(main):034:0> [1, 2, 3, 4].to_s
“1, 2, 3, 4”
irb(main):035:0> {1 =>2, 3 => 4}.to_s
“1, 2, 3, 4”

This is great for an array, but less great for a hash, you still lose
the key=>value association.

The main issue I have with these default to_s calls is that you seem to
lose a lot of information in the conversion. On the other hand, I
think the output from Array#inspect and Hash#inspect is great. It’s
easily readable and contains all the info I want.

I’m having trouble coming up with a good example of where having a
reasonable output from some_random_object.to_s should be useful, but
how about this:

def giveRating(obj)
puts “My rating for the movie is #{obj}”
end

irb(main):049:0> giveRating(“***”)
My rating for the movie is: ***
nil
irb(main):050:0> giveRating(“so-so”)
My rating for the movie is: so-so
nil
irb(main):051:0> giveRating(7)
My rating for the movie is: 7
nil
irb(main):052:0> giveRating([“good plot”, “bad writing”])
My rating for the movie is: good plotbad writing
nil
irb(main):053:0> giveRating({“acting” => 5, “music” => “awful”})
My rating for the movie is: musicawfulacting5
nil

It would be more intuitive to me if the complete default output were
something like:

irb(main):052:0> giveRating([“good plot”, “bad writing”])
My rating for the movie is: good plot, bad writing
nil
irb(main):053:0> giveRating({“acting” => 5, “music” => “awful”})
My rating for the movie is: music => awful, acting => 5
nil

Any thoughts, comments, harsh criticism?

Ben

···

On Saturday, July 19, 2003, at 12:56 PM, Yukihiro Matsumoto wrote:

#to_hash is fine by me too, but I don’t really know the nuances of
to_s/to_str, to_a/to_ary, …

Longer versions are for implicit conversion. An object that has
“to_str” works like a string if it’s given as an argument.

I just didn’t feel we had consensus. Besides, “to_h” you’ve proposed
work for arrays with specific structure (assoc like).

Far be it from me to say anything of much value, but I definitely think
that an instance function of Class Array should have a defined behavior
for all Arrays. Is there any argument to the contrary?

-Kurt

Hi,

···

In message “Re: Proposal: Array#to_h, to simplify hash generation” on 03/07/20, Martin DeMello martindemello@yahoo.com writes:

I think I would insist on the input being an assoc array.

And we already have Array methods that assume an associative array.

I think you mean assoc and rassoc. But they are look-up methods. No
harm would happen for non assoc input for them. I feel like Hash
creation is little bit different.

						matz.

The main problem here is that Array#to_s calls join with the default
field separator, which for some reason is “”. To me, this isn’t
intuitive. Is there some historical reason why this behavior exists?
Even less intuitive to me is Hash#to_s, because the way the conversion
is done you lose any concept it was a hash.

It’s intuitive because it’s the opposite of taking a string and putting
each character as an element of an array.

“foobar” → [‘f’,‘o’,‘o’,‘b’,‘a’,‘r’] → “foobar”

If you want a different .to_s you can just join with something else.
It’s pretty easy to just do foobararray.join(‘,’) if you want
“f,o,o,b,a,r”, and additionally it’s a little easier to read.

-Kurt

pack, assoc, and rassoc

···

On Sun, 20 Jul 2003 03:20:50 +0900, Kurt M. Dresner wrote:

I just didn’t feel we had consensus. Besides, “to_h” you’ve proposed
work for arrays with specific structure (assoc like).

Far be it from me to say anything of much value, but I definitely think
that an instance function of Class Array should have a defined behavior
for all Arrays. Is there any argument to the contrary?

-Kurt

Hi,

The main issue I have with these default to_s calls is that you seem to
lose a lot of information in the conversion. On the other hand, I
think the output from Array#inspect and Hash#inspect is great. It’s
easily readable and contains all the info I want.

Hmm, “to_s” means “stringify”, that is making a string out of an
object. It may or may not produce human readable output.

I’m having trouble coming up with a good example of where having a
reasonable output from some_random_object.to_s should be useful, but
how about this:

I don’t deny your desire. It is understandable. But:

  • I don’t think we will change “to_s” behavior for compatibility
    reason. we may add a new string converter method for this
    purpose.

  • you have to define your desire in detail; purpose, behavior, and
    corner cases, to show us it’s generic enough to add in the core.

Note you can add methods very easily even to the predefined classes in
Ruby.

						matz.
···

In message “Array and Hash to_s” on 03/07/20, Ben Giddings ben@thingmagic.com writes:

Actually, I’ve always felt those were out of place in Array too. And if
they were factored out into an AssocArray mixin, we could conveniently
put hashify there.

martin

···

Yukihiro Matsumoto matz@ruby-lang.org wrote:

In message “Re: Proposal: Array#to_h, to simplify hash generation” > on 03/07/20, Martin DeMello martindemello@yahoo.com writes:

And we already have Array methods that assume an associative array.

I think you mean assoc and rassoc. But they are look-up methods. No
harm would happen for non assoc input for them. I feel like Hash
creation is little bit different.

Aha! So that’s it. But that means it’s intuitive to people who think
of strings as character arrays. For people who have always though of
Strings as objects rather than arrays, it’s less intuitive. Also, for
the vast majority of arrays which don’t contain characters the behavior
isn’t expected.

What about the Hash#to_s method?

Ben

···

On Saturday, July 19, 2003, at 02:23 PM, Kurt M. Dresner wrote:

The main problem here is that Array#to_s calls join with the default
field separator, which for some reason is “”. To me, this isn’t
intuitive. Is there some historical reason why this behavior exists?
Even less intuitive to me is Hash#to_s, because the way the conversion
is done you lose any concept it was a hash.

It’s intuitive because it’s the opposite of taking a string and putting
each character as an element of an array.

“foobar” → [‘f’,‘o’,‘o’,‘b’,‘a’,‘r’] → “foobar”

Hi –

And we already have Array methods that assume an associative array.

I think you mean assoc and rassoc. But they are look-up methods. No
harm would happen for non assoc input for them. I feel like Hash
creation is little bit different.

Actually, I’ve always felt those were out of place in Array too. And if
they were factored out into an AssocArray mixin, we could conveniently
put hashify there.

But the special case of converting an associative array to a hash is
different from the “classic” (in terms of volume of ruby-talk devoted
to it, and how long we’ve been discussing it :slight_smile: array-to-hash
conversion, as per RCR #12 and its definition of “hashify” (a term I
proposed reluctantly, knowing people would hate it :slight_smile: but it seemed
the most accurate for what I was describing). Modularization is a
good idea, though, particularly for the various home-grown
[{to_(h}ash]ify) variants in circulation, though organizing that kind
of thing community-wide is something I’ve never figured out how to do.

David

···

On Mon, 21 Jul 2003, Martin DeMello wrote:

Yukihiro Matsumoto matz@ruby-lang.org wrote:

In message “Re: Proposal: Array#to_h, to simplify hash generation” > > on 03/07/20, Martin DeMello martindemello@yahoo.com writes:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Also, for
the vast majority of arrays which don’t contain characters the behavior
isn’t expected.

I think you may be making some strong assumptions about the vast majority of
arrays :slight_smile: I don’t really know, but my guess is by now, changing this
behavior could have a big impact an existing code, especially code that is
concerned with processing data per character.

And one of the great things about Ruby, is you can redefine this behavior,
put it in a personal lib and then always require that lib:

code not tested

class Array
def to_s
res = ‘’
each do |element|
res << ', ’ if !res.empty?
res << element
end
res
end
end

class Hash
def to_s
# …
end
end

···


Chris
http://clabs.org/blogki
-=-=-=-=-=-=-=-=-=-=-=-=-=-
Free solo piano album (mp3)
http://cministries.org/cstudios/blackandwhite.htm