Efficient Ruby equivalent to Excel's "SUMIF"?

7stud2 · 4 March 2013 14:21

I'm getting some relatively slow performance when doing calculations
against data, and I'm wondering whether there's a more efficient way to
do this.

I'm taking a multidimensional array and then totalling a "column" based
on multiple criteria. It looks like a much longer version of this:

data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

The way I'm doing this at present is to take the main key: "Part Number"
and unique it, then loop through all those values and select chunks of
the data which match. In order to accelerate the process I'm also
diminishing the source array each time so by the time I get to the end
the process is much faster.

Something like this:

···

_________________
#Make a 2D array of unique refs (+header) to store the output
data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

#Set up a 2D array to hold the output, with 1 of each Ref + a header
output = [ data.transpose[0].uniq ].transpose

output.map.with_index { |ar,idx|
  if idx == 0 #Headers
    [ ar[0], 'Total', 'Here', 'There' ]
  else
    #Get the current set to look at
    current_data = data.select { |a| a[0] =~ /#{ ar[0] }/i }
    #Remove it from the main pool
    data -= current_data
    #Populate the current row with totals
    b = [
      ar[0],
      nil,
      current_data.select { |a| a[2] == 'Here' }.inject(0) { |sum,a| sum
+ a[1].to_i },
      current_data.select { |a| a[2] == 'There' }.inject(0) { |sum,a|
sum + a[1].to_i }
    ]
    #Add a total
    b[1] = b[2] + b[3]
    #This one's done
    b
  end
}
_________________

I'm doing this with around 9000 rows of data using over 2000 unique
references. The way I'm doing it at the moment seems quite slow. Is
there a faster way to do this?

Thanks for reading this.

--
Posted via http://www.ruby-forum.com/.

Adam_Prescott · 4 March 2013 14:42

I think part of the problem is you're using arrays quite heavily. You can
do it in one pass if you pick a Hash that you later convert to a more
appropriate structure, or simply define your data as a class and
instantiate relevant objects, etc.

Anyway, here's an example:

data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

desired_output = [
  ["Ref", "Total", "Here", "There"],
  ["Part1", 5, 3, 2],
  ["Part2", 1, 1, 0]
]

# bump off the first element
headers = data.shift

references = {}

data.each do |reference, count, location|
  # if the references hash hasn't "seen" this reference yet,
  # create a hash to keep track of totals and locations.
  references[reference] ||= { :total => 0, :locations => {} }
  r = references[reference]

# contribute to the total for this reference.
r[:total] += count.to_i

  # if this location hasn't been seen for this reference yet,
  # start it off at zero. could also use Hash.new(0) in the
  # above definition of references[reference]
  r[:locations][location] ||= 0

# now contribute to the location's count
r[:locations][location] += count.to_i
end

# references now looks something like:

···

#
# {
# "Part1" => { :total => 5,
# :locations => { "Here" => 3, "There" => 2 } },
# "Part2" => { :total => 1,
# :locations => { "Here" => 1 } }
# }

# This corresponds to desired_output, just in a different way.
#
# You can convert this to the original desired array if you wish, by
# iterating over references, or you can pick a more appropriate Hash
# structure and just use that.

7stud2 · 4 March 2013 15:09

Nice thread it is.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 4 March 2013 15:09

Thanks Adam, that's a great way of looking at this!
I've been using arrays quite a lot in Ruby since they're so much easier
to use than in VBA. When your only tool is a hammer...
I knew the logical answer was some sort of incremental system, but I
hadn't been able to visualise a way to implement it. Hashes look like a
great solution.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 4 March 2013 15:12

i made you this:

selects = ["Here", "There"]

data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

data[1..-1].group_by(&:first).map {|k,d|
[k,d.inject(0){|s,v|s+v[1].to_i}] + selects.map {|s|
d.select{|v|v[2]==s}.inject(0){|s,v|s+v[1].to_i} }
}

#=> [["Part1", 5, 3, 2], ["Part2", 1, 1, 0]]

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 4 March 2013 15:24

Ooh, pretty. Nice one-liner Hans. I can just about follow it.
I hadn't seen "group_by" before. Looks like hashes are definitely the
right direction to be working towards for this.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 4 March 2013 16:01

I incorporated a mixture of these suggestions into my own data
construct, and it completes the task in a couple of seconds now rather
than in minutes. I'll have to see if there's anywhere else I could use
this kind of approach. Thanks for the help

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Premature Optimization ruby-talk	7	72	30 October 2006
Select "columns" from multidimensional array? ruby-talk	59	1098	30 November 2013
Better way to accumulate totals ruby-talk	6	111	31 October 2007
Better way to accumulate totals ruby-talk	0	104	31 October 2007
Most compact command for associate array 'totalling'? ruby-talk	9	122	9 August 2008

Efficient Ruby equivalent to Excel's "SUMIF"?

Related topics