Better way to accumulate totals

I am just learning RUBY and am very impressed, I one week I've built tree routines that will save my group 6 to 7 hours each week, and they run in 2 minutes. Pretty cool. I would appreciate a suggest on how to do some. I have a script that correctly parses a directory full of text file and extracts key data. The data lines look like this:

" category value"

I have 15 categories and the code below works. But is crude. I create 15 global arrays and match each line "category" to the text trying in the case statement. Would hash or an object work better. Just looking for pointer of which way to direct my research. Thanks!

Below is part of a big loop that goes through each found line for each document, and is in its own method.

case top_event[0].lstrip.rstrip
    when "Redo size:"
            
            $redo_cnt[0] += 1
            $redo_cnt[1] += top_event[1].to_f
            
    when "Logical reads:"
    
            $log_read_cnt[0] += 1
            $log_read_cnt[1] += top_event[1].to_f

···

------------------------------
Robert
rlkeller@yahoo.com

count = Hash.new{|h,k| h.update k => Hash.new{|h,k| h.update k => 0}}

case top_event[0].strip
   when /redo size:/i
     count[:redo][0] += 1
     count[:redo][1] += Float(top_event[1])

   when /logical reads:/i
     count[:read][0] += 1
     count[:read][1] += 1

...

require 'yaml'

y count

is one approach.

cheers.

a @ http://codeforpeople.com/

···

On Oct 30, 2007, at 2:41 PM, Robert Keller wrote:

Below is part of a big loop that goes through each found line for each document, and is in its own method.

case top_event[0].lstrip.rstrip
    when "Redo size:"

            $redo_cnt[0] += 1
            $redo_cnt[1] += top_event[1].to_f

    when "Logical reads:"

            $log_read_cnt[0] += 1
            $log_read_cnt[1] += top_event[1].to_f

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Robert Keller wrote:

The data lines look like this:

" category value"

case top_event[0].lstrip.rstrip
    when "Redo size:"

            $redo_cnt[0] += 1
            $redo_cnt[1] += top_event[1].to_f

    when "Logical reads:"

            $log_read_cnt[0] += 1
            $log_read_cnt[1] += top_event[1].to_f

This might be easier to understand:

#Create a hash, so that when you use a
#key that doesn't exist, it creates the key,
#assigns it the array [0,0], and returns the
#array:
categories = Hash.new() do |hash, key|
  hash[key] = [0,0]
end

#Loop over each line in a file:
File.foreach("data.txt") do |line|
  arr = line.split(":")
  cat = arr[0].strip
  val = Float(arr[1]) #causes an error if can't convert

  categories[cat][0] += 1
  categories[cat][1] += val
end

p categories
puts categories["Make pie"][0]
puts categories["Redo size"][1]

Using this data:

Redo size: 1.1
Logical reads: 2.1
Redo size: 1.1
Hello world: 3.1
Make pie: 4.1
Hello world: 3.1
Make pie: 4.1
Make pie: 4.1

this is the output:

{"Make pie"=>[3, 12.3], "Hello world"=>[2, 6.2], "Redo size"=>[2, 2.2],
"Logical reads"=>[1, 2.1]}
3
2.2

···

--
Posted via http://www.ruby-forum.com/\.

Here is a different approach. I start by creating a statistics object:

Statistics = Struct.new :count, :sum

stats = Hash.new {|h,k| h[k] = Statistics.new(0, 0)}

File.foreach "foo.dat" do |line|
  if /^\s+\b(.+?)\b\s+(\d+\.\d+)/ =~ line
    s = stats[$1]
    s.count += 1
    s.sum += Float($2)
  end
end

Note, you might have to tweak the regexp.

Kind regards

robert

···

2007/10/30, Robert Keller <rlkeller@yahoo.com>:

I am just learning RUBY and am very impressed, I one week I've built tree routines that will save my group 6 to 7 hours each week, and they run in 2 minutes. Pretty cool. I would appreciate a suggest on how to do some. I have a script that correctly parses a directory full of text file and extracts key data. The data lines look like this:

" category value"

I have 15 categories and the code below works. But is crude. I create 15 global arrays and match each line "category" to the text trying in the case statement. Would hash or an object work better. Just looking for pointer of which way to direct my research. Thanks!

--
use.inject do |as, often| as.you_can - without end

ara.t.howard wrote:

count = Hash.new{|h,k| h.update k => Hash.new{|h,k| h.update k => 0}}

This works...

count = Hash.new{|h,k| h[k] = Hash.new{|h1,k1| h1[k1] = 0}}
=> {}
count[:foo][1] += 1
=> 1
count[:foo][1] += 1
=> 2
count[:foo][1] += 1
=> 3

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

oh yeah, of course :wink: i'm in the habbit of using 'update' from 'inject' where is saves you having to return the hash itself... shorter is better always.

cheers.

a @ http://codeforpeople.com/

···

On Oct 30, 2007, at 3:40 PM, Joel VanderWerf wrote:

This works...

count = Hash.new{|h,k| h[k] = Hash.new{|h1,k1| h1[k1] = 0}}
=> {}
count[:foo][1] += 1
=> 1
count[:foo][1] += 1
=> 2
count[:foo][1] += 1
=> 3

--
it is not enough to be compassionate. you must act.
h.h. the 14th dalai lama

ara.t.howard wrote:

This works...

count = Hash.new{|h,k| h[k] = Hash.new{|h1,k1| h1[k1] = 0}}
=> {}
count[:foo][1] += 1
=> 1
count[:foo][1] += 1
=> 2
count[:foo][1] += 1
=> 3

oh yeah, of course :wink: i'm in the habbit of using 'update' from 'inject' where is saves you having to return the hash itself... shorter is better always.

...but only if returning the hash is the right thing to do. In this case, we want the "leaf" value, not the hash:

irb(main):034:0> count = Hash.new{|h,k| h.update k => Hash.new{|hh,kk| hh.update kk => 0}}
=> {}
irb(main):035:0> count[:foo][1] += 1
NoMethodError: undefined method `+' for {1=>{}, :foo=>{}}:Hash
         from (irb):35
irb(main):036:0> count
=> {1=>{}, :foo=>{}}

···

On Oct 30, 2007, at 3:40 PM, Joel VanderWerf wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407