Array and hash iteration questions

I have a CSV file and I’m trying to do a few things with it. Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in conjunction
with the first one.

I’m iterating over the lines of the file, and splitting them into an array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.

In order to count the number of times something is seen, I took the approach:

cases = Hash.new(0)

cases[arr[324]] += 1

But now I want to save the number of cases where another value occurs with
the first one. (Essentially errors indexed by case)

The approach I have now is:

cases = Hash.new(0)
errors = Hash.new(0)

case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end

That works, but it seems to me that I really should be doing this with one
hash, not two. Any suggestions?

Next, I want to print out the values. It is easy to do this with
cases.each, but I’d like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).

Any ideas would be appreciated.

Ben

“Ben Giddings” bg-rubytalk@infofiend.com schrieb im Newsbeitrag
news:3F79D516.9050509@infofiend.com

I have a CSV file and I’m trying to do a few things with it.
Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in
conjunction
with the first one.

I’m iterating over the lines of the file, and splitting them into an
array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.

In order to count the number of times something is seen, I took the
approach:

cases = Hash.new(0)

cases[arr[324]] += 1

But now I want to save the number of cases where another value occurs
with
the first one. (Essentially errors indexed by case)

The approach I have now is:

cases = Hash.new(0)
errors = Hash.new(0)

case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end

That works, but it seems to me that I really should be doing this with
one
hash, not two. Any suggestions?

cases = Hash.new {|h,k| h[k] = [0, 0]}

ca = arr[324]
counter = cases[ca]
counter[0] += 1

counter[1] += 1 if /Error/ =~ arr[532]

Next, I want to print out the values. It is easy to do this with
cases.each, but I’d like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).

cases.sort.each do |ca, counter|
printf “%10s: %4d”, ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print “\n”
end

Regards

robert

Robert Klemme wrote:

cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn’t remember how to use the block form properly. I’m actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case][‘Number’] += 1
cases[case][‘Errors’] += 1 if arr[OFFSET] =~ /Error/

cases.sort.each do |ca, counter|
printf “%10s: %4d”, ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print “\n”
end

Aha, I just assumed hash didn’t have a sort method, because the concept of
a “sorted hash” seemed meaningless, but since it actually returns an array
containing [key, value] pairs, that’s perfect!

Thanks Robert

Ben

“Ben Giddings” bg-rubytalk@infofiend.com schrieb im Newsbeitrag
news:3F7B0BAC.7030305@infofiend.com

Robert Klemme wrote:

cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn’t remember how to use the block form properly. I’m
actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case][‘Number’] += 1
cases[case][‘Errors’] += 1 if arr[OFFSET] =~ /Error/

No need to use a Hash for this…

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.
It can affect performance if you have a large amount of cases… (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

counters = cases[case]
counters[Number] += 1
counters[Errors] += 1 if arr[OFFSET] =~ /Error/

You could as well do

cases[case].instance_eval do
self[Number] += 1
self[Errors] += 1 if arr[OFFSET] =~ /Error/
end

I’m getting carried away… :slight_smile:

cases.sort.each do |ca, counter|
printf “%10s: %4d”, ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print “\n”
end

Aha, I just assumed hash didn’t have a sort method, because the concept
of
a “sorted hash” seemed meaningless, but since it actually returns an
array
containing [key, value] pairs, that’s perfect!

It is! Thanks to Matz’s wisdom.

Thanks Robert

You’re welcome.

Kind regards

robert

10:17:02 [ruby]: ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
26.22 19.77 5.84 100001 0.06 0.06 Hash#
11.28 22.28 2.51 100001 0.03 0.03 Array#
0.07 22.30 0.01 1 15.00 15.00
Profiler__.start_profile
0.00 22.30 0.00 2 0.00 11140.50 Object#test
0.00 22.30 0.00 3 0.00 0.00 Module#method_added
0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
0.00 22.30 0.00 1 0.00 22281.00 #toplevel
0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
10:17:25 [ruby]: cat lookups.rb

def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end

testHash
testArray

10:18:15 [ruby]:

“Robert Klemme” bob.news@gmx.net schrieb im Newsbeitrag
news:blgp2a$bvnb8$1@ID-52924.news.uni-berlin.de

“Ben Giddings” bg-rubytalk@infofiend.com schrieb im Newsbeitrag
news:3F7B0BAC.7030305@infofiend.com

Robert Klemme wrote:

cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn’t remember how to use the block form properly. I’m
actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case][‘Number’] += 1
cases[case][‘Errors’] += 1 if arr[OFFSET] =~ /Error/

No need to use a Hash for this…

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash
lookup.

It can affect performance if you have a large amount of cases… (see
below; although the timing is dominated by the iteration here, you can
see
that the array is faster)

This sentence should really have appeared several lines above: it’s the
argument in favour of using arrays instead of hashes for the counters.

Regards

robert

“Robert Klemme” bob.news@gmx.net wrote in message news:blgp2a$bvnb8$1@ID-52924.news.uni-berlin.de

No need to use a Hash for this…

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.
It can affect performance if you have a large amount of cases… (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

I’m not sure if my testing method is quite consistent, but making a specific
record object looks like it could speed things up even more…

ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
73.74 13.08 13.08 3 4359.00 5911.67 Integer#upto
14.47 15.64 2.57 100001 0.03 0.03 Hash#
11.79 17.73 2.09 100001 0.02 0.02 Array#
0.08 17.75 0.01 1 15.00 15.00 Profiler__.start_profile
0.00 17.75 0.00 1 0.00 17735.00 #toplevel
0.00 17.75 0.00 1 0.00 0.00 Class#inherited
0.00 17.75 0.00 1 0.00 1329.00 Object#testObj
0.00 17.75 0.00 2 0.00 8203.00 Object#test
0.00 17.75 0.00 1 0.00 0.00 TestObj#initialize
0.00 17.75 0.00 1 0.00 8203.00 Object#testArray
0.00 17.75 0.00 9 0.00 0.00 Module#method_added
0.00 17.75 0.00 1 0.00 8203.00 Object#testHash
0.00 17.75 0.00 1 0.00 0.00 Module#attr_accessor
0.00 17.75 0.00 1 0.00 0.00 Class#new
type lookups.rb
def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end

a simple record class…

class TestObj
attr_accessor :num, :err
def initialize
@num = 0
@err = 0
end
end

def testObj
to = TestObj.new
0.upto( 100000 ) do
to.err
end
end

testHash
testArray
testObj

···

10:17:02 [ruby]: ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
26.22 19.77 5.84 100001 0.06 0.06 Hash#
11.28 22.28 2.51 100001 0.03 0.03 Array#
0.07 22.30 0.01 1 15.00 15.00
Profiler__.start_profile
0.00 22.30 0.00 2 0.00 11140.50 Object#test
0.00 22.30 0.00 3 0.00 0.00 Module#method_added
0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
0.00 22.30 0.00 1 0.00 22281.00 #toplevel
0.00 22.30 0.00 1 0.00 11110.00 Object#testHash