Small array/hash question

Hi all,

I'm trying to loop across a dataset and create a hash where each value is an array so that later I can loop over the hash and for each key (it's important I store the key), I can loop over the array contained and spit out some results.

Without looking at the docs I wanted to do something like...

(in pseudo-code)
loop across data here
  work_types[nsc_id]=do |types|
       types << data[7]
  end
end loop
where work_types is the hash and types is the array I want to accumulate data in

This doesn't work, so I'm wondering what the ruby idiom for this kind of thing would be. Essentially for each piece of data I want to get the appropriate value from the Hash and append the value on to the end of the array associated with the key, or if it doesn't exist in the Hash, create a new entry with a new array populated with the value.

I'm sure there's a very simple way of doing this, but I can't see the method I want in the standard library docs - I thought it might be collect, but it doesn't look like it

Thanks
Kev

i think you want something like this:

   harp:~ > irb
   irb(main):001:0> work_types = Hash::new{|h,k| h[k] = }
   => {}
   irb(main):002:0> work_types[ 'foo' ] << 42
   => [42]
   irb(main):003:0> work_types[ 'foo' ] << 42
   => [42, 42]
   irb(main):004:0> work_types[ 'bar' ] << 'forty-two'
   => ["forty-two"]
   irb(main):005:0> work_types
   => {"foo"=>[42, 42], "bar"=>["forty-two"]}

if not you'll have to post more about your exact problem and some sample data.

hth.

-a

···

On Wed, 12 Oct 2005, Kev Jackson wrote:

Hi all,

I'm trying to loop across a dataset and create a hash where each value is an array so that later I can loop over the hash and for each key (it's important I store the key), I can loop over the array contained and spit out some results.

Without looking at the docs I wanted to do something like...

(in pseudo-code)
loop across data here
work_types[nsc_id]=do |types|
     types << data[7]
end
end loop
where work_types is the hash and types is the array I want to accumulate data in

This doesn't work, so I'm wondering what the ruby idiom for this kind of thing would be. Essentially for each piece of data I want to get the appropriate value from the Hash and append the value on to the end of the array associated with the key, or if it doesn't exist in the Hash, create a new entry with a new array populated with the value.

I'm sure there's a very simple way of doing this, but I can't see the method I want in the standard library docs - I thought it might be collect, but it doesn't look like it

Thanks
Kev

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

Ara.T.Howard wrote:

Hi all,

I'm trying to loop across a dataset and create a hash where each value is an array so that later I can loop over the hash and for each key (it's important I store the key), I can loop over the array contained and spit out some results.

Without looking at the docs I wanted to do something like...

(in pseudo-code)
loop across data here
work_types[nsc_id]=do |types|
     types << data[7]
end
end loop
where work_types is the hash and types is the array I want to accumulate data in

This doesn't work, so I'm wondering what the ruby idiom for this kind of thing would be. Essentially for each piece of data I want to get the appropriate value from the Hash and append the value on to the end of the array associated with the key, or if it doesn't exist in the Hash, create a new entry with a new array populated with the value.

I'm sure there's a very simple way of doing this, but I can't see the method I want in the standard library docs - I thought it might be collect, but it doesn't look like it

Thanks
Kev

i think you want something like this:

  harp:~ > irb
  irb(main):001:0> work_types = Hash::new{|h,k| h[k] = }
  => {}
  irb(main):002:0> work_types[ 'foo' ] << 42
  => [42]
  irb(main):003:0> work_types[ 'foo' ] << 42
  => [42, 42]
  irb(main):004:0> work_types[ 'bar' ] << 'forty-two'
  => ["forty-two"]
  irb(main):005:0> work_types
  => {"foo"=>[42, 42], "bar"=>["forty-two"]}

if not you'll have to post more about your exact problem and some sample data.

hth.

-a

I got the output I wanted with this

work_types = Hash.new
if work_types.has_key?(nsc_id) then
    work_types[nsc_id]= work_types[nsc_id].include?(work_type) ? work_types[nsc_id] : work_types[nsc_id] << work_type
else
    work_types[nsc_id]= [work_type]
end

So the problem is solved, but I wonder if there's a more elegant way of doing it (especially the check to see if the value is already in the array). My first assumption was that assignment to a Hash took a block (hence the pseudo code), I was actually a little suprised that it didn't :wink:

Kev

···

On Wed, 12 Oct 2005, Kev Jackson wrote:

Kev Jackson:

loop across data here
work_types[nsc_id]=do |types|
     types << data[7]
end
end loop
where work_types is the hash and types is the array I want to accumulate
data in

...
I got the output I wanted with this

work_types = Hash.new
if work_types.has_key?(nsc_id) then
   work_types[nsc_id]= work_types[nsc_id].include?(work_type) ?
work_types[nsc_id] : work_types[nsc_id] << work_type
else
   work_types[nsc_id]= [work_type]
end

So the problem is solved, but I wonder if there's a more elegant way of
doing it (especially the check to see if the value is already in the
array). My first assumption was that assignment to a Hash took a block
(hence the pseudo code), I was actually a little suprised that it didn't
:wink:

How about this?

work_types = Hash.new {|h, k| h[k] = }
work_types[nsc_id] << work_type unless
work_types[nsc_id].include?(work_type)

Cheers,
Dave

this is one easy way

   work_types = Hash::new{|h,k| h[k] = }

   work_types[ nsc_id ].push( work_type ).uniq!

but does a bit of extra work. another way would be to use set

   require 'set'

   work_types = Hash::new{|h,k| h[k] = Set::new}

   work_types[ nsc_id ] << work_type

but you must understand set and it's notion of equality. plus you lose data
order but, since you are ignoring dups, i guess this isn't important.

or perhaps you can model your data with a nested hash?

   work_types = Hash::new{|h,k| h[k] = {}}

   work_types[ nsc_id ][ work_type ] = true

and then use

   values = work_types[ nsc_id ].keys

or just make your own apprach more compact

   work_types = Hash::new

   work_types[ nsc_id ] = [ work_types[ nsc_id ], work_type ].compact.uniq

you have options - and there is always sqlite if you start to feel like you
are rolling query logic on top of this data structure :wink:

regards.

-a

···

On Wed, 12 Oct 2005, Kev Jackson wrote:

I got the output I wanted with this

work_types = Hash.new
if work_types.has_key?(nsc_id) then
  work_types[nsc_id]= work_types[nsc_id].include?(work_type) ? work_types[nsc_id] : work_types[nsc_id] << work_type
else
  work_types[nsc_id]= [work_type]
end

So the problem is solved, but I wonder if there's a more elegant way of
doing it (especially the check to see if the value is already in the array).
My first assumption was that assignment to a Hash took a block (hence the
pseudo code), I was actually a little suprised that it didn't :wink:

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

Ara.T.Howard wrote:

I got the output I wanted with this

work_types = Hash.new
if work_types.has_key?(nsc_id) then
  work_types[nsc_id]= work_types[nsc_id].include?(work_type) ? work_types[nsc_id] : work_types[nsc_id] << work_type
else
  work_types[nsc_id]= [work_type]
end

So the problem is solved, but I wonder if there's a more elegant way of
doing it (especially the check to see if the value is already in the array).
My first assumption was that assignment to a Hash took a block (hence the
pseudo code), I was actually a little suprised that it didn't :wink:

this is one easy way

  work_types = Hash::new{|h,k| h[k] = }

  work_types[ nsc_id ].push( work_type ).uniq!

That looks promising - I see it essentially relies on defining Hash and setting the initial values, so that I can avoid the "else work_types[nsc_id]= [work_type] end" part

uniq! certainly looks like it would shorten my code.

but does a bit of extra work. another way would be to use set

  require 'set'

  work_types = Hash::new{|h,k| h[k] = Set::new}

  work_types[ nsc_id ] << work_type

Yeah I was thinking of Set, but I want to require/include as little as possible to keep complexity down for other people to maintain.

but you must understand set and it's notion of equality. plus you lose data
order but, since you are ignoring dups, i guess this isn't important.

or perhaps you can model your data with a nested hash?

  work_types = Hash::new{|h,k| h[k] = {}}

  work_types[ nsc_id ][ work_type ] = true

and then use

  values = work_types[ nsc_id ].keys

or just make your own apprach more compact

  work_types = Hash::new

  work_types[ nsc_id ] = [ work_types[ nsc_id ], work_type ].compact.uniq

I'm not how this works in a manner that's similar to the (verbose but easy to understand) version I have already. In fact I can't understand half of what's going on here!

you have options - and there is always sqlite if you start to feel like you
are rolling query logic on top of this data structure :wink:

Dear god no! :). I'm only munging data dumps from Oracle, I'd hate to have to store them in a database just to transform them!

Kev

···

On Wed, 12 Oct 2005, Kev Jackson wrote: