Idiom for creating hash from two arrays

Hello all,

I'm currently using the following code for creating a hash from two arrays,
where one array contains keys and the other the corresponding values:

  keys = ['one','two','three']
  values = [1,2,3]

  h = Hash[*keys.zip(values).flatten]
  #=> {"three"=>3, "two"=>2, "one"=>1}

Can anyone suggest a better way of doing this? I'm slightly put off
by the many steps involved in my present solution.

Thanks.

"Jonathan Paisley" <jp-www@dcs.gla.ac.uk> schrieb im Newsbeitrag
news:pan.2004.12.07.13.40.03.395907@dcs.gla.ac.uk...

Hello all,

I'm currently using the following code for creating a hash from two

arrays,

where one array contains keys and the other the corresponding values:

  keys = ['one','two','three']
  values = [1,2,3]

  h = Hash[*keys.zip(values).flatten]
  #=> {"three"=>3, "two"=>2, "one"=>1}

Can anyone suggest a better way of doing this? I'm slightly put off
by the many steps involved in my present solution.

Here are some others - but IMHO your solution is much nicer:

keys = ['one','two','three']

=> ["one", "two", "three"]

values = [1,2,3]

=> [1, 2, 3]

h={}

=> {}

keys.each_with_index {|e,i| h[e]=values[i]}

=> ["one", "two", "three"]

h

=> {"three"=>3, "two"=>2, "one"=>1}

keys = ['one','two','three']

=> ["one", "two", "three"]

values = [1,2,3]

=> [1, 2, 3]

h={}

=> {}

keys.inject(values) do |v,k|

?> h[k]=v.shift

  v
end

=>

h

=> {"three"=>3, "two"=>2, "one"=>1}

Kind regards

    robert

Of these (including my original), I think the each_with_index solution
is the most transparent and readable, but I don't like having to
separately initialize h beforehand. I think I'll stick with what I've
got for now, or perhaps implement:

def Hash.from_pairs(keys,values) # or some better name
  h = {}
  keys.zip(values) do |k,v|
    h[k] = v
  end
  h
end

(using zip with a block has just occured to me)

I did some benchmark tests of all four implementations (my original,
your two, and the above). The last is about twice as fast as
each_with_index and inject, and at least an order of magnitude faster
than my original method involving flattening.

Many thanks for your input! Most appreciated.

···

On Tue, 07 Dec 2004 15:08:50 +0100, Robert Klemme wrote:

"Jonathan Paisley" <jp-www@dcs.gla.ac.uk>

  h = Hash[*keys.zip(values).flatten]
  #=> {"three"=>3, "two"=>2, "one"=>1}

Can anyone suggest a better way of doing this? I'm slightly put off by
the many steps involved in my present solution.

Here are some others - but IMHO your solution is much nicer:

h={}

=> {}

keys.each_with_index {|e,i| h[e]=values[i]}

=> ["one", "two", "three"]

h={}

=> {}

keys.inject(values) do |v,k|

?> h[k]=v.shift

  v
end

Jonathan Paisley wrote:

I did some benchmark tests of all four implementations (my original, your two, and the above). The last is about twice as fast as each_with_index and inject, and at least an order of magnitude faster than my original method involving flattening.

Try this implementation, too:

hash = {}
keys.size.times { |i| hash[ keys[i] ] = values[i] }

···

--
Florian Frank

zipnew ?

T.

···

On Tuesday 07 December 2004 12:42 pm, Jonathan Paisley wrote:

def Hash.from_pairs(keys,values) # or some better name
h = {}
keys.zip(values) do |k,v|
h[k] = v
end
h
end

Below are some benchmark results from four different implementations
(excluding the *zip.flatten approach, since it's way too slow). The
key and value arrays just contain integers.

The second set of results has GC disabled, and the GC time after
each run is included. It's interesting that the times are so different:
your size.times approach is under 3.5 seconds either way, but the
other approaches are very much slower when GC is enabled. I assume this
means that the other three are causing a lot of temporary objects to
be created.

** I suppose these objects could be the pair argument arrays for each of
the block invocations. Can anybody confirm this or suggest otherwise?

I realise now that my original tests (where I said that the
zip-with-block solution was twice as fast as each_with_index and
inject) were flawed since I wasn't using a large enough data set.

                          user system total real
each_with_index 28.090000 0.110000 28.200000 ( 28.794794)
inject 27.180000 0.040000 27.220000 ( 27.742072)
zip with block 27.610000 0.030000 27.640000 ( 28.192968)
size.times 3.270000 0.060000 3.330000 ( 3.381180)

                          user system total real
each_with_index 4.040000 0.260000 4.300000 ( 4.421269)
each_with_index(GC) 0.720000 0.000000 0.720000 ( 0.767091)
inject 4.470000 0.090000 4.560000 ( 4.760866)
inject(GC) 1.150000 0.010000 1.160000 ( 1.157983)
zip with block 3.500000 0.000000 3.500000 ( 3.630590)
zip with block(GC) 1.130000 0.000000 1.130000 ( 1.136200)
size.times 2.920000 0.010000 2.930000 ( 3.017372)
size.times(GC) 0.470000 0.000000 0.470000 ( 0.478058)

···

On Wed, 08 Dec 2004 03:15:10 +0900, Florian Frank wrote:

Jonathan Paisley wrote:

I did some benchmark tests of all four implementations (my original, your
two, and the above). The last is about twice as fast as each_with_index
and inject, and at least an order of magnitude faster than my original
method involving flattening.

Try this implementation, too:

hash = {}
keys.size.times { |i| hash[ keys[i] ] = values[i] }

=================================================================
require 'benchmark'

n = 1000000
keys = (0...n).to_a
values = keys.dup

def Hash.from_pairs_a(keys,values)
  h = {}
  keys.each_with_index {|e,i| h[e] = values[i]}
  h
end

def Hash.from_pairs_b(keys,values)
  Hash[*keys.zip(values).flatten]
end

def Hash.from_pairs_c(keys,values)
  h = {}
  keys.inject(values) do |v,k|
    h[k] = v.shift
    v
  end
  h
end

def Hash.from_pairs_d(keys,values)
  h = {}
  keys.zip(values) do |k,v|
    h[k] = v
  end
  h
end

def Hash.from_pairs_e(keys,values)
  hash = {}
  keys.size.times { |i| hash[ keys[i] ] = values[i] }
  hash
end

$no_gc = ARGV[0]

Benchmark.bm(20) do |x|
  def x.gcreport(label,&block)
    if $no_gc then GC.enable; GC.start; GC.disable; end
    report(label,&block)
    if $no_gc then
      report(label + "(GC)") { GC.enable; GC.start; GC.disable}
    end
  end
  x.gcreport("each_with_index") { Hash.from_pairs_a(keys,values) }
  #x.report("*zip.flatten") { Hash.from_pairs_b(keys,values) }
  x.gcreport("inject") { Hash.from_pairs_c(keys,values) }
  x.gcreport("zip with block") { Hash.from_pairs_d(keys,values) }
  x.gcreport("size.times") { Hash.from_pairs_e(keys,values) }
end

Hash.zipnew ... I quite like the similarity to Hash.new, and the inclusion
of the already familiar zip idea is also a good thing.

Thinking again about the name 'from_pairs' makes me think that it would
take a single array of pairs and make a hash from them (equivalent to
Hash[*pairs.flatten]).

So, I think I'll take your suggestion!

Thanks.

···

On Wed, 08 Dec 2004 05:43:42 +0900, trans. (T. Onoma) wrote:

On Tuesday 07 December 2004 12:42 pm, Jonathan Paisley wrote:
> def Hash.from_pairs(keys,values) # or some better name

zipnew ?

Jonathan Paisley wrote:

···

On Wed, 08 Dec 2004 05:43:42 +0900, trans. (T. Onoma) wrote:

On Tuesday 07 December 2004 12:42 pm, Jonathan Paisley wrote:

def Hash.from_pairs(keys,values) # or some better name

zipnew ?

Hash.zipnew ... I quite like the similarity to Hash.new, and the inclusion
of the already familiar zip idea is also a good thing.

Why does it need a new name? Instead, why not extend Hash.new to do this when it is passed two arrays? It seems pretty natural to me, and I don't see any conflict with existing usage.

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

Hi --

···

On Wed, 8 Dec 2004, Glenn Parker wrote:

Jonathan Paisley wrote:

On Wed, 08 Dec 2004 05:43:42 +0900, trans. (T. Onoma) wrote:

On Tuesday 07 December 2004 12:42 pm, Jonathan Paisley wrote:
> def Hash.from_pairs(keys,values) # or some better name

zipnew ?

Hash.zipnew ... I quite like the similarity to Hash.new, and the inclusion
of the already familiar zip idea is also a good thing.

Why does it need a new name? Instead, why not extend Hash.new to do this when it is passed two arrays? It seems pretty natural to me, and I don't see any conflict with existing usage.

What if you wanted to set a default value for the hash?

David

--
David A. Black
dblack@wobblini.net

I guess b/c Hash.new already takes a hash-default. Perhaps with keyword
arguments...

T.

···

On Wednesday 08 December 2004 08:30 am, Glenn Parker wrote:

Jonathan Paisley wrote:
> On Wed, 08 Dec 2004 05:43:42 +0900, trans. (T. Onoma) wrote:
>>On Tuesday 07 December 2004 12:42 pm, Jonathan Paisley wrote:
>>> def Hash.from_pairs(keys,values) # or some better name
>>
>>zipnew ?
>
> Hash.zipnew ... I quite like the similarity to Hash.new, and the
> inclusion of the already familiar zip idea is also a good thing.

Why does it need a new name? Instead, why not extend Hash.new to do
this when it is passed two arrays? It seems pretty natural to me, and I
don't see any conflict with existing usage.

That's certainly a possibility. I suppose the disadvantage of this is that
the semantics would then be quite different between one and two arguments:

1 argument : specify the default value
2 arguments : populate the hash with pairings of keys and values

I think it would be more natural to extend the existing Hash method, if
that didn't cause a conflict with the existing usage. The question, then,
is whether you'd ever want to construct a hash with just one entry, whose
key and value are arrays.

  h = Hash[keys,values]

I suppose if you did want the one-entry-hash, you could always do:

  h = Hash[key=>value]
or even just
  h = { key=>value }

···

On Wed, 08 Dec 2004 22:30:08 +0900, Glenn Parker wrote:

Jonathan Paisley wrote:

Hash.zipnew ... I quite like the similarity to Hash.new, and the inclusion
of the already familiar zip idea is also a good thing.

Why does it need a new name? Instead, why not extend Hash.new to do
this when it is passed two arrays? It seems pretty natural to me, and I
don't see any conflict with existing usage.

How do you set the default value after the fact?

  h = {}
  h.default=x #?

Personally I have never like how Hash.new works. I've always felt that #new
for core classes should be able to take there selves:

  h1 = { :a=>1, :b=>2 }
  h2 = Hash.new(h1)

Default makes more sense as an option:

  h2 = Hash.new(h1, :default=>0)

Although that take away one somewhat useful construction patterns, namely

  h2 = Hash.new(:a=>1, :b=>2, :default=>0) #would not work

man, how cool might it be if Ruby had a secondary argument divider to group
sets of parameters.

T.

···

On Wednesday 08 December 2004 09:02 am, Jonathan Paisley wrote:

On Wed, 08 Dec 2004 22:30:08 +0900, Glenn Parker wrote:
> Jonathan Paisley wrote:
>> Hash.zipnew ... I quite like the similarity to Hash.new, and the
>> inclusion of the already familiar zip idea is also a good thing.
>
> Why does it need a new name? Instead, why not extend Hash.new to do
> this when it is passed two arrays? It seems pretty natural to me, and I
> don't see any conflict with existing usage.

That's certainly a possibility. I suppose the disadvantage of this is that
the semantics would then be quite different between one and two arguments:

1 argument : specify the default value
2 arguments : populate the hash with pairings of keys and values

I think it would be more natural to extend the existing Hash method, if
that didn't cause a conflict with the existing usage. The question, then,
is whether you'd ever want to construct a hash with just one entry, whose
key and value are arrays.

  h = Hash[keys,values]

I suppose if you did want the one-entry-hash, you could always do:

  h = Hash[key=>value]
or even just
  h = { key=>value }

David A. Black wrote:

Why does it need a new name? Instead, why not extend Hash.new to do this when it is passed two arrays? It seems pretty natural to me, and I don't see any conflict with existing usage.

What if you wanted to set a default value for the hash?

Hash.new is already varadic:

      Hash.new => hash
      Hash.new(obj) => aHash
      Hash.new {|hash, key| block } => aHash

The second form specifies a default value. The third form specifies a block to generate missing values. I was suggesting that we add one more form:

      Hash.new(keyArray, valueArray) => aHash

This form is easily distinguishable from the original three, so you could still create a hash with a default value using the second or third form.

However, the more I think about it, the more I see this as a typical operation on existing hashes, not simply a constructor. So, I would rather see alternate forms for merge and update:

      Hash#merge(keyArray, valueArray)
      Hash#update(keyArray, valueArray)

Then, to the original problem would be expressed as:

      h = Hash.new.merge(keyArray, valueArray)

···

On Wed, 8 Dec 2004, Glenn Parker wrote:

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

How do you set the default value after the fact?

  h = {}
  h.default=x #?

Yes.

Default makes more sense as an option:

  h2 = Hash.new(h1, :default=>0)

I'm quite happy with being able to do h.default=. It's certainly clearer
than the single argument to Hash.new.

···

On Wed, 08 Dec 2004 23:13:12 +0900, trans. (T. Onoma) wrote:

Hi --

David A. Black wrote:

Why does it need a new name? Instead, why not extend Hash.new to do this when it is passed two arrays? It seems pretty natural to me, and I don't see any conflict with existing usage.

What if you wanted to set a default value for the hash?

Hash.new is already varadic:

    Hash.new => hash
    Hash.new(obj) => aHash
    Hash.new {|hash, key| block } => aHash

The second form specifies a default value. The third form specifies a block to generate missing values. I was suggesting that we add one more form:

    Hash.new(keyArray, valueArray) => aHash

This form is easily distinguishable from the original three, so you could still create a hash with a default value using the second or third form.

But what about a hash from two arrays with a default value?

However, the more I think about it, the more I see this as a typical operation on existing hashes, not simply a constructor. So, I would rather see alternate forms for merge and update:

    Hash#merge(keyArray, valueArray)
    Hash#update(keyArray, valueArray)

Then, to the original problem would be expressed as:

    h = Hash.new.merge(keyArray, valueArray)

I'm actually more drawn to some kind of #to_h-ish operation on arrays,
like:

   key_array.hash_with(value_array)

rather similar to Array#zip, but producing a hash. (I think I
submitted an RCR to this effect at some point... but I can't remember
its status.)

David

···

On Wed, 8 Dec 2004, Glenn Parker wrote:

On Wed, 8 Dec 2004, Glenn Parker wrote:

--
David A. Black
dblack@wobblini.net

That's a good point. Given that, I don't think it's really necessary to
make any alterations to the constructors.

I suppose in the example above using merge! or update would be appropriate,
since the original hash isn't needed.

  h = Hash.new.merge!(keys,values)

In addition, it would be nice if keys and values need not be arrays -
just Enumerables. Implementing this may be a bit tricky, however.

···

On Wed, 08 Dec 2004 23:25:10 +0900, Glenn Parker wrote:

However, the more I think about it, the more I see this as a typical
operation on existing hashes, not simply a constructor. So, I would
rather see alternate forms for merge and update:

      Hash#merge(keyArray, valueArray)
      Hash#update(keyArray, valueArray)

Then, to the original problem would be expressed as:

      h = Hash.new.merge(keyArray, valueArray)

Very :slight_smile: I wouldn't want to create a hash from one of these:

   class Forever
     include Enumerable
     def each
       i = 1
       loop { yield i += 1 }
     end
   end

Arrays are a useful, and much used, way to "normalize" certain kinds
of enumerable operations and input (like find, map, etc.). I think
any proto-hashing would be best done array-wise.

David

···

On Wed, 8 Dec 2004, Jonathan Paisley wrote:

On Wed, 08 Dec 2004 23:25:10 +0900, Glenn Parker wrote:

However, the more I think about it, the more I see this as a typical
operation on existing hashes, not simply a constructor. So, I would
rather see alternate forms for merge and update:

      Hash#merge(keyArray, valueArray)
      Hash#update(keyArray, valueArray)

Then, to the original problem would be expressed as:

      h = Hash.new.merge(keyArray, valueArray)

That's a good point. Given that, I don't think it's really necessary to
make any alterations to the constructors.

I suppose in the example above using merge! or update would be appropriate,
since the original hash isn't needed.

h = Hash.new.merge!(keys,values)

In addition, it would be nice if keys and values need not be arrays -
just Enumerables. Implementing this may be a bit tricky, however.

--
David A. Black
dblack@wobblini.net

David A. Black wrote:

But what about a hash from two arrays with a default value?

At some point, it becomes more legible to use separate operations:

h = Hash.new(keys, values)
h.default = "defaultValue"

Hash#default and Hash#default= are already defined.

Oddly, there is no Hash#default_proc= to complement Hash#default_proc.

I'm actually more drawn to some kind of #to_h-ish operation on arrays,
like:

  key_array.hash_with(value_array)

rather similar to Array#zip, but producing a hash.

This style of expression can only create a new hash instead of updating an existing hash. I think the "core" operation is updating/merging an existing hash with a set of matched keys and values.

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

David A. Black wrote:

···

On Wed, 8 Dec 2004, Jonathan Paisley wrote:

In addition, it would be nice if keys and values need not be arrays -
just Enumerables. Implementing this may be a bit tricky, however.

Very :slight_smile: I wouldn't want to create a hash from one of these:

  class Forever
    include Enumerable
    def each
      i = 1
      loop { yield i += 1 }
    end
  end

That's a fairly pathological case. Should we therefore deny all the other more useful applications of Enumerable in this context?

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

Hi --

David A. Black wrote:

But what about a hash from two arrays with a default value?

At some point, it becomes more legible to use separate operations:

h = Hash.new(keys, values)
h.default = "defaultValue"

Hash#default and Hash#default= are already defined.

I know. I'm still not in favor of changing the signature of Hash#new
to accomodate this particular case.

Oddly, there is no Hash#default_proc= to complement Hash#default_proc.

I'm actually more drawn to some kind of #to_h-ish operation on arrays,
like:

  key_array.hash_with(value_array)

rather similar to Array#zip, but producing a hash.

This style of expression can only create a new hash instead of updating an existing hash. I think the "core" operation is updating/merging an existing hash with a set of matched keys and values.

hash.update(key_array.hash_with(value_array)) :slight_smile:

David

···

On Thu, 9 Dec 2004, Glenn Parker wrote:

--
David A. Black
dblack@wobblini.net