Group by unique entries of a hash

Ne_Scripter · 29 September 2009 11:43

I have two data sets loaded into a hash to give the following output

"2efa4ba470", "00000005"
"2efa4ba470", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006"
"68c4bf10539", "00000003"
"c0784b5de101", "00000001"

My code to get this is as follows:

  source= "C:\\dummyFile.txt"
  hashMapping = Hash.new
  ocrIDMapping = Hash.new

  IO.foreach(source.to_s) do |data|
    fields = data.split(",")
    hash = fields[0]
    ocrID = fields[1]
    hashMapping[ocrID] = hash
  end

hashMapping.sort{|a,b| a[1]<=>b[1]}.each { |elem|

puts "#{elem[1]}, #{elem[0]}"}

I would like to alter my output to group my the first value to give an
output like this:

"2efa4ba470", "00000005", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006", "00000001"
"68c4bf10539", "00000003"

As you can see now only unique values are shown in the first field
however a list of the corresponding second field is formed, grouping the
results. Something like this I could do in SQL however I have never come
across it in Ruby so does anyone have any pointers?

Many thanks

···

--
Posted via http://www.ruby-forum.com/.

Paul_Smith1 · 29 September 2009 11:55

I have two data sets loaded into a hash to give the following output

"2efa4ba470", "00000005"
"2efa4ba470", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006"
"68c4bf10539", "00000003"
"c0784b5de101", "00000001"

My code to get this is as follows:

source= "C:\\dummyFile.txt"
hashMapping = Hash.new
ocrIDMapping = Hash.new

IO.foreach(source.to_s) do |data|
fields = data.split(",")
hash = fields[0]
ocrID = fields[1]
hashMapping[ocrID] = hash
end

hashMapping.sort{|a,b| a[1]<=>b[1]}.each { |elem|

puts "#{elem[1]}, #{elem[0]}"}

I would like to alter my output to group my the first value to give an
output like this:

"2efa4ba470", "00000005", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006", "00000001"
"68c4bf10539", "00000003"

As you can see now only unique values are shown in the first field
however a list of the corresponding second field is formed, grouping the
results. Something like this I could do in SQL however I have never come
across it in Ruby so does anyone have any pointers?

You want a hash where the key is the element you want to group on, and
the 'item' is an array of all items with the shared key. A bit like
(untested):

hashMapping = {}

IO.foreach(source.to_s) do |data|
   fields = data.split(",")
   hash = fields[0]
   ocrID = fields[1]

hashMapping[ocrID] ||= #If hashMapping has never seen this key
before, make an empty array

hashMapping[ocrID] << hash #Add the new element to the array for this key

end

···

On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter <stuart.clarke@northumbria.ac.uk> wrote:

Many thanks
--
Posted via http://www.ruby-forum.com/\.

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

Robert_K1 · 29 September 2009 13:38

It is slightly more efficient to do it in one step:

(hashMapping[ocrID] ||= ) << hash

Even nicer

hashMapping = Hash.new {|h,k| h[k] = }
...
hashMapping[ocrID] << hash

Kind regards

robert

···

2009/9/29 Paul Smith <paul@pollyandpaul.co.uk>:

On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter > <stuart.clarke@northumbria.ac.uk> wrote:

I have two data sets loaded into a hash to give the following output

"2efa4ba470", "00000005"
"2efa4ba470", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006"
"68c4bf10539", "00000003"
"c0784b5de101", "00000001"

My code to get this is as follows:

source= "C:\\dummyFile.txt"
hashMapping = Hash.new
ocrIDMapping = Hash.new

IO.foreach(source.to_s) do |data|
fields = data.split(",")
hash = fields[0]
ocrID = fields[1]
hashMapping[ocrID] = hash
end

hashMapping.sort{|a,b| a[1]<=>b[1]}.each { |elem|

puts "#{elem[1]}, #{elem[0]}"}

I would like to alter my output to group my the first value to give an
output like this:

"2efa4ba470", "00000005", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006", "00000001"
"68c4bf10539", "00000003"

As you can see now only unique values are shown in the first field
however a list of the corresponding second field is formed, grouping the
results. Something like this I could do in SQL however I have never come
across it in Ruby so does anyone have any pointers?

You want a hash where the key is the element you want to group on, and
the 'item' is an array of all items with the shared key. A bit like
(untested):

hashMapping = {}

IO.foreach(source.to_s) do |data|
fields = data.split(",")
hash = fields[0]
ocrID = fields[1]

hashMapping[ocrID] ||= #If hashMapping has never seen this key
before, make an empty array

hashMapping[ocrID] << hash #Add the new element to the array for this key

end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ne_Scripter · 29 September 2009 15:28

Ah yes, that makes perfect sense.

Thanks

Paul Smith wrote:

···

You want a hash where the key is the element you want to group on, and
the 'item' is an array of all items with the shared key. A bit like
(untested):

hashMapping = {}

IO.foreach(source.to_s) do |data|
   fields = data.split(",")
   hash = fields[0]
   ocrID = fields[1]

   hashMapping[ocrID] ||= #If hashMapping has never seen this key
before, make an empty array

   hashMapping[ocrID] << hash #Add the new element to the array for this
key

end

Many thanks
--
Posted via http://www.ruby-forum.com/\.

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

--
Posted via http://www.ruby-forum.com/\.

Simon_Krahnke · 30 September 2009 16:05

* Paul Smith <paul@pollyandpaul.co.uk> (2009-09-29) schrieb:

fields = data.split(",")
hash = fields[0]
ocrID = fields[1]

BTW, you could write this as:

hash, ocrID = *data.split(",")

or

hash, ocrID, *ignored = *data.split(",")

if you want to ignore everything behind a second comma, or

hash, ocrID = *data.split(",", 2)

if you want to include a second comma and everything behind it into the
ocrID string.

mfg, simon .... which color is the green bill? blue!

Paul_Smith1 · 29 September 2009 14:22

I have two data sets loaded into a hash to give the following output

"2efa4ba470", "00000005"
"2efa4ba470", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006"
"68c4bf10539", "00000003"
"c0784b5de101", "00000001"

My code to get this is as follows:

source= "C:\\dummyFile.txt"
hashMapping = Hash.new
ocrIDMapping = Hash.new

IO.foreach(source.to_s) do |data|
fields = data.split(",")
hash = fields[0]
ocrID = fields[1]
hashMapping[ocrID] = hash
end

hashMapping.sort{|a,b| a[1]<=>b[1]}.each { |elem|

puts "#{elem[1]}, #{elem[0]}"}

I would like to alter my output to group my the first value to give an
output like this:

"2efa4ba470", "00000005", "00000004"
"02adecfd5c", "00000002"
"c0784b5de101", "00000006", "00000001"
"68c4bf10539", "00000003"

As you can see now only unique values are shown in the first field
however a list of the corresponding second field is formed, grouping the
results. Something like this I could do in SQL however I have never come
across it in Ruby so does anyone have any pointers?

You want a hash where the key is the element you want to group on, and
the 'item' is an array of all items with the shared key. A bit like
(untested):

hashMapping = {}

IO.foreach(source.to_s) do |data|
fields = data.split(",")
hash = fields[0]
ocrID = fields[1]

hashMapping[ocrID] ||= #If hashMapping has never seen this key
before, make an empty array

hashMapping[ocrID] << hash #Add the new element to the array for this key

end

It is slightly more efficient to do it in one step:

(hashMapping[ocrID] ||= ) << hash

Even nicer

hashMapping = Hash.new {|h,k| h[k] = }

Is this defining a default element for the hash? I had a vague
recollection you could do this but completely forgot how.

I'd also rename the 'hash' variable to 'key' or something, I think
it's less confusing. Then Your hashMapping can either be given the
name 'hash', because that's what it is, or a name that's actually
useful for describing what the mystical contents of the hash are.

···

On Tue, Sep 29, 2009 at 2:38 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

2009/9/29 Paul Smith <paul@pollyandpaul.co.uk>:

On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter >> <stuart.clarke@northumbria.ac.uk> wrote:

...
hashMapping[ocrID] << hash

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

Ne_Scripter · 29 September 2009 17:40

So if we were to take this further.We now have out hashMapping list

"2efa4ba470", "00000005""00000004"
"c0784b5de101", "00000006""00000003"
"02adecfd5c", "00000002"
"c0784b5de101", "00000001"

Now we split the hash to give only the ID values (column 2) again doing
some like this:

hashMapping.each do |itemDetail|
newID = itemDetail[1].to_s.delete("\"").strip
end

I have another declared array in my code, with many more ID values like
those shown above

moreIDs = ["00000001", "00000003", "00000004", "00000005", 00000007",
"00000008"]

What I want to is search for all newID's that match moreIDs and output
the matching ID and the corresponding code shown in column one. So the
sample output would be like this:

"2efa4ba470", "00000005""00000004"
"c0784b5de101", "00000003"
"c0784b5de101", "00000001"

Show this shows the string code for an ID that is present in both of my
lists. I had thought something like moreID == newID but that creates a
lot off do loops. Also, thr though of array intersects crossed my mind,
but we have a hash and array hear so I was unsure of how to make this
work?

Any assistance in greatly appreciated.

Ne Scripter wrote:

···

Ah yes, that makes perfect sense.

Thanks

Paul Smith wrote:

You want a hash where the key is the element you want to group on, and
the 'item' is an array of all items with the shared key. A bit like
(untested):

hashMapping = {}

IO.foreach(source.to_s) do |data|
   fields = data.split(",")
   hash = fields[0]
   ocrID = fields[1]

   hashMapping[ocrID] ||= #If hashMapping has never seen this key
before, make an empty array

   hashMapping[ocrID] << hash #Add the new element to the array for this
key

end

Many thanks
--
Posted via http://www.ruby-forum.com/\.

--
Paul Smith
http://www.nomadicfun.co.uk

paul@pollyandpaul.co.uk

--
Posted via http://www.ruby-forum.com/\.

David_A_Black1 · 30 September 2009 17:14

I don't think you need the * for any of them. This:

hash, ocrID = data.split(',')

should be fine for getting the first two values.

David

···

On Thu, 1 Oct 2009, Simon Krahnke wrote:

* Paul Smith <paul@pollyandpaul.co.uk> (2009-09-29) schrieb:

fields = data.split(",")
hash = fields[0]
ocrID = fields[1]

BTW, you could write this as:

hash, ocrID = *data.split(",")

or

hash, ocrID, *ignored = *data.split(",")

if you want to ignore everything behind a second comma, or

hash, ocrID = *data.split(",", 2)

if you want to include a second comma and everything behind it into the
ocrID string.

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Jesus_Gabriel_y_Gala · 29 September 2009 14:47

Using that constructor you pass a block which will be executed every
time there's a missing key. The value of the block is used as a
default value, but as it is in this case, it can have the side effect
of modifying the hash. If you don't modify the hash inside the block,
it's not modified as you can see in the first example:

Two examples:

irb(main):001:0> h = Hash.new {|h,k| 0}
=> {}
irb(main):002:0> h[:a]
=> 0
irb(main):003:0> h[:a] += 1
=> 1
irb(main):004:0> h
=> {:a=>1}
irb(main):005:0> h2 = Hash.new {|h,k| h[k] = }
=> {}
irb(main):007:0> h2[:a]
=>
irb(main):008:0> h2
=> {:a=>}

Jesus.

···

On Tue, Sep 29, 2009 at 4:22 PM, Paul Smith <paul@pollyandpaul.co.uk> wrote:

On Tue, Sep 29, 2009 at 2:38 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

Even nicer

hashMapping = Hash.new {|h,k| h[k] = }

Is this defining a default element for the hash? I had a vague
recollection you could do this but completely forgot how.

Robert_K1 · 29 September 2009 14:59

hashMapping = {}

IO.foreach(source.to_s) do |data|
fields = data.split(",")
hash = fields[0]
ocrID = fields[1]

hashMapping[ocrID] ||= #If hashMapping has never seen this key
before, make an empty array

hashMapping[ocrID] << hash #Add the new element to the array for this key

end

It is slightly more efficient to do it in one step:

(hashMapping[ocrID] ||= ) << hash

Even nicer

hashMapping = Hash.new {|h,k| h[k] = }

Is this defining a default element for the hash? I had a vague
recollection you could do this but completely forgot how.

No, this is defining a hook which is executed each time a key is
requested which is not present. In this case the hook stores a new
Array in the Hash but you could do other things as well.

A default value is defined via Hash.new() which does not work in
this case for obvious reasons.

I'd also rename the 'hash' variable to 'key' or something, I think
it's less confusing. Then Your hashMapping can either be given the
name 'hash', because that's what it is, or a name that's actually
useful for describing what the mystical contents of the hash are.

Absolutely. I just did not want to cause extra confusion by starting
to rename everything.

Cheers

robert

···

2009/9/29 Paul Smith <paul@pollyandpaul.co.uk>:

On Tue, Sep 29, 2009 at 2:38 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

2009/9/29 Paul Smith <paul@pollyandpaul.co.uk>:

On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter >>> <stuart.clarke@northumbria.ac.uk> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ne_Scripter · 1 October 2009 13:15

I am having real problems with my array. As expected my array contents
is like so:

"00000004 00000005"
"00000003 00000006"
"00000001"
"00000002"

I now want 6 indivdual elements insteal of 4. I have tried to split the
double entries up with split, however they remain together. Is there
something fundamental I am missing? I want an array like so

array = ["00000004", "00000005", "00000003", "00000006", "00000001",
"00000002"]

Sorry if this is simple, I have spent several hours chopping and
changing.

Thanks

S

David A. Black wrote:

···

On Thu, 1 Oct 2009, Simon Krahnke wrote:

or

hash, ocrID, *ignored = *data.split(",")

if you want to ignore everything behind a second comma, or

hash, ocrID = *data.split(",", 2)

if you want to include a second comma and everything behind it into the
ocrID string.

I don't think you need the * for any of them. This:

hash, ocrID = data.split(',')

should be fine for getting the first two values.

David

--
Posted via http://www.ruby-forum.com/\.

Simon_Krahnke · 1 October 2009 19:35

* David A. Black <dblack@rubypal.com> (2009-09-30) schrieb:

* Paul Smith <paul@pollyandpaul.co.uk> (2009-09-29) schrieb:

fields = data.split(",")
hash = fields[0]
ocrID = fields[1]

BTW, you could write this as:

hash, ocrID = *data.split(",")

or

hash, ocrID, *ignored = *data.split(",")

if you want to ignore everything behind a second comma, or

hash, ocrID = *data.split(",", 2)

if you want to include a second comma and everything behind it into the
ocrID string.

I don't think you need the * for any of them.

I always include it anyway, to be explicit.

This:

hash, ocrID = data.split(',')

should be fine for getting the first two values.

Right, I just tried it. I thought you might be getting an array into
ocrID, but you don't.

mfg, simon .... l

···

On Thu, 1 Oct 2009, Simon Krahnke wrote:

Josh_Cheek · 1 October 2009 14:00

x = [
  "00000004 00000005" ,
  "00000003 00000006" ,
  "00000001" ,
  "00000002" ,
]

x.map!{|str| str.split(/\s+/) }.flatten!

x # => ["00000004", "00000005", "00000003", "00000006", "00000001",
"00000002"]

···

On Thu, Oct 1, 2009 at 8:15 AM, Ne Scripter <stuart.clarke@northumbria.ac.uk > wrote:

I am having real problems with my array. As expected my array contents
is like so:

"00000004 00000005"
"00000003 00000006"
"00000001"
"00000002"

I now want 6 indivdual elements insteal of 4. I have tried to split the
double entries up with split, however they remain together. Is there
something fundamental I am missing? I want an array like so

array = ["00000004", "00000005", "00000003", "00000006", "00000001",
"00000002"]

Sorry if this is simple, I have spent several hours chopping and
changing.

Thanks

S

David A. Black wrote:
> On Thu, 1 Oct 2009, Simon Krahnke wrote:
>
>> or
>>
>> hash, ocrID, *ignored = *data.split(",")
>>
>> if you want to ignore everything behind a second comma, or
>>
>> hash, ocrID = *data.split(",", 2)
>>
>> if you want to include a second comma and everything behind it into the
>> ocrID string.
>
> I don't think you need the * for any of them. This:
>
> hash, ocrID = data.split(',')
>
> should be fine for getting the first two values.
>
>
> David

--
Posted via http://www.ruby-forum.com/\.

David_A_Black1 · 1 October 2009 14:05

Hi --

···

On Thu, 1 Oct 2009, Josh Cheek wrote:

On Thu, Oct 1, 2009 at 8:15 AM, Ne Scripter <stuart.clarke@northumbria.ac.uk >> wrote:

I am having real problems with my array. As expected my array contents
is like so:

"00000004 00000005"
"00000003 00000006"
"00000001"
"00000002"

I now want 6 indivdual elements insteal of 4. I have tried to split the
double entries up with split, however they remain together. Is there
something fundamental I am missing? I want an array like so

array = ["00000004", "00000005", "00000003", "00000006", "00000001",
"00000002"]

Sorry if this is simple, I have spent several hours chopping and
changing.

x = [
"00000004 00000005" ,
"00000003 00000006" ,
"00000001" ,
"00000002" ,
]

x.map!{|str| str.split(/\s+/) }.flatten!

x # => ["00000004", "00000005", "00000003", "00000006", "00000001",
"00000002"]

You can even dispense with the argument to split if it's just
whitespace.

Another option, though perhaps a slightly memory-wasting one:

x.join(' ').split

David

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Ne_Scripter · 1 October 2009 14:47

Ah map. Perfect.

Josh Cheek wrote:

···

On Thu, Oct 1, 2009 at 8:15 AM, Ne Scripter > <stuart.clarke@northumbria.ac.uk >> wrote:

something fundamental I am missing? I want an array like so

>>
> David

--
Posted via http://www.ruby-forum.com/\.

x.map!{|str| str.split(/\s+/) }.flatten!

x # => ["00000004", "00000005", "00000003", "00000006", "00000001",
"00000002"]

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Create variables depending on counter ruby-talk	20	232	12 July 2013
Combining Array Elements ruby-talk	13	121	10 September 2007
Array slicing ruby-talk	11	96	3 May 2007
Array Practice ruby-talk	26	106	10 February 2008
[Q] removing array duplicates where a subset is unique ruby-talk	23	217	20 July 2009

Group by unique entries of a hash

Related topics