I have a CSV file and I’m trying to do a few things with it. Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in conjunction
with the first one.
I’m iterating over the lines of the file, and splitting them into an array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.
In order to count the number of times something is seen, I took the approach:
cases = Hash.new(0)
…
cases[arr[324]] += 1
…
But now I want to save the number of cases where another value occurs with
the first one. (Essentially errors indexed by case)
The approach I have now is:
cases = Hash.new(0)
errors = Hash.new(0)
…
case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end
…
That works, but it seems to me that I really should be doing this with one
hash, not two. Any suggestions?
Next, I want to print out the values. It is easy to do this with
cases.each, but I’d like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).
Any ideas would be appreciated.
Ben