Select "columns" from multidimensional array?

7stud2 · 31 January 2013 09:46

There's probably a simpler answer to this than the ways I've come up
with.
What's the best way to select columns from a two-dimensional array?

I build arrays to match excel-style formatting, like this but larger:

···

__________________

a = [
  [ 'A1', 'A2', 'A3' ],
  [ 'B1', 'B2', 'B3' ],
  [ 'C1', 'C2', 'C3' ]
]

def get_cols (multi_array, headers )

  indices = []
  headers.each { |val| indices << multi_array[0].index(val) }
  indices.compact!

  multi_array.map do |ar|
    indices.map { |idx| ar[idx] }
  end

end

get_cols a, %w(A1 A3)

=> [["A1", "A3"], ["B1", "B3"], ["C1", "C3"]]
__________________

I haven't been able to work out a way to do this without writing
long-winded code. Is there a simple solution?

Thanks.

--
Posted via http://www.ruby-forum.com/.

Jesus_Gabriel_y_Gala · 31 January 2013 10:25

Is this the desired output?

For getting the columns filtering the ones that have a header:

def get_cols(multi_array, headers)
multi_array.transpose.select {|(header,_)| headers.include? header}
end

1.9.2p290 :001 > a = [
1.9.2p290 :002 > [ 'A1', 'A2', 'A3' ],
1.9.2p290 :003 > [ 'B1', 'B2', 'B3' ],
1.9.2p290 :004 > [ 'C1', 'C2', 'C3' ]
1.9.2p290 :005?> ]
1.9.2p290 :008 > def get_cols(multi_array, headers)
1.9.2p290 :009?> multi_array.transpose.select {|(header,_)|
headers.include? header}
1.9.2p290 :010?> end
=> nil
1.9.2p290 :011 > get_cols a, %w(A1 A3)
=> [["A1", "B1", "C1"], ["A3", "B3", "C3"]]

Jesus.

···

On Thu, Jan 31, 2013 at 10:46 AM, Joel Pearson <lists@ruby-forum.com> wrote:

There's probably a simpler answer to this than the ways I've come up
with.
What's the best way to select columns from a two-dimensional array?

I build arrays to match excel-style formatting, like this but larger:
__________________

a = [
  [ 'A1', 'A2', 'A3' ],
  [ 'B1', 'B2', 'B3' ],
  [ 'C1', 'C2', 'C3' ]
]

def get_cols (multi_array, headers )

  indices =
  headers.each { |val| indices << multi_array[0].index(val) }
  indices.compact!

  multi_array.map do |ar|
    indices.map { |idx| ar[idx] }
  end

end

get_cols a, %w(A1 A3)

=> [["A1", "A3"], ["B1", "B3"], ["C1", "C3"]]

7stud2 · 31 January 2013 10:34

Praise Jesus!

My initial output was the desired one, but the "transpose" method is
what I was missing. Thanks

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 31 January 2013 12:09

Documenting the end result.
I added a sort to keep the input column order, and reordered the inputs
to make the array itself optional for ease-of-use within the parent
class.

def get_cols( headers, multi_array=nil)
multi_array = @data if multi_array.nil?
multi_array.transpose.select { |header,_| headers.include?(header)
}.sort_by { |header,_| headers.index(header) || headers.length
}.transpose
end

irb(main):017:0> get_cols %w(A3 A1), a
=> [["A3", "A1"], ["B3", "B1"], ["C3", "C1"]]

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 31 January 2013 14:14

Thanks robert

My current approach is
HTML Table -> Nokogiri Nodeset -> Multidimensional Array -> Excel / TSV

A Matrix looks like a useful way of grabbing the values I need when I
have to alter specifics in the data.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 31 January 2013 15:45

Looks like good advice, both. My concern with matrices is being able to
modify elements in the same way as I could in in a multidimensional
array, but I assume that's the reason for creating a child class.
There's a whole thread full of people waxing philosophical about the
subject!

I've never written a class based on someone else's before, sounds like
fun. I'll see what happens when I play with it a bit.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 3 February 2013 22:43

I decided to try and build on the Array class as I don't really
understand Matrices yet. I've added a few handy methods. The hidden Bang
stuff is justified, I think, as this class is intended to mimic Excel's
layout.

I'll add more useful bits as I come up with them, this is just an
experiment at the moment.

class Excel_Sheet<Array

  def initialize( val=[] )
    fail ArgumentError, 'Must be multidimensional array' unless
val[0].class == Array || val.empty?
    super( val )
  end

  def columns
    ensure_shape
    self[0].length
  end

  def rows
    self.length
  end

  def ensure_shape
    max_size = self.max_by(&:length).length
    self.map! { |ar| ar.length == max_size ? ar : ar + Array.new(
max_size - ar.length, nil) }
  end

  def get_cols( headers )
    ensure_shape
    self.transpose.select { |header,_| headers.include?(header)
}.sort_by { |header,_| headers.index(header) || headers.length
}.transpose
  end

  def get_cols!( headers )
    self.replace get_cols
  end

  def to_s
    self.map { |ar| ar.map { |el| "#{el}".strip.gsub( /\s/, ' ' ) }.join
"\t" }.join "\n"
  end

end

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 3 February 2013 23:10

And my first attempt at using a block with it:

def skip_headers
yield self[1..-1]
end

test.skip_headers do |row| end
p row
end

It works!

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 12 February 2013 12:19

I've decided to inherit from array after all, since all I want to do
with this is extend support for multidimensional arrays, but without
overwriting any of Array's methods.

Anyway, the obstacle I've hit is one I can avoid, but I was wondering
whether I'm doing something wrong, or whether there's a nice Rubyish way
around this. Here's a simplified version to demonstrate the issue:

···

___________________
class Excel_Sheet<Array

  def initialize( val=[] )
    val = %w(A1 B1 C1 A2 B2 C2 A3 B3 C3).each_slice(3).to_a if val ==
'test'
    super ( val )
  end

  def skip_headers
    block_given? ? ( [ self[0] ] + yield( self[1..-1] ) ) : (
self[1..-1] )
  end

  def filter( header, regex )
    idx = self[0].index header
    skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex } }
  end

end
___________________

When I do this sort of thing:
result = object.filter('Header', /value1|value2/)

I get the return as an Array, so I can't use my extra methods on it
anymore.

Here's my current workaround. It's the only way I could think of doing
this but it doesn't look right.
___________________
def filter( header, regex )
idx = self[0].index header
Excel_Sheet.new skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex
} }
end
___________________

So in short, my question is how can I return my class type after using
Array's methods on my child-class?

--
Posted via http://www.ruby-forum.com/.

7stud2 · 12 February 2013 20:26

Thanks for the advice and examples, I'll see whether I can understand
how the classes and methods work with each other there and set about
experimenting with them.

Once thing which put me off generating a custom class "from scratch" is
that Array appears to be equal to its content (I assume this is a
language shortcut), but it seems "custom" objects' values have to be
accessed via their accessors.
I was hoping for some more succinct syntax than this sort of thing:
puts [] #Array is so easy to create
puts CustomObject.new([]).value #This looks clunky next to that

I'd love to get accustomed proper OO thinking, but I'll inevitably make
all the rookie mistakes in the process. It's a lot to get used to all at
once given that I've been using Ruby for less than a year, and I have no
training other than helpful hints and googling. Thanks again for your
patience.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 13 February 2013 19:47

I haven't had a chance to look into your example yet; I've been reading
up on OOP.
I intend to take the ideas I've been coming up with for ease-of-use
within the Array class and use those, your Matrix example, and whatever
else occurs to me to form a new set of classes which can handle my data
and the operations I regularly need to perform. Then it's time to play
with scenarios and see what happens.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 14 February 2013 22:50

Interesting Matrix build. It's giving me a bit of a headache just trying
to figure out the links involved.

So MatrixPart defines the methods and the "parent" matrix (held as an
instance variable); and row and column both use these methods and both
access the variable which points to the matrix they're part of.

The rows and columns can be selected based on given headers, and each
will reference the other... and this is where my head explodes:

def index( row, col )
@row_headers.index( row ) * @col_headers.size + @col_headers.index(
col )
end

It takes a bit of getting used to, but thanks to Ruby's flexible array
class adding nil values automatically when you specify an index higher
than the upper boundary, that works.

I guess with a bit more poking and prodding I could figure out how to
append, insert, and delete rows and columns. After all, it's only a math
problem in the end. All the interconnected references (especially the
layered yields) still make my head spin though

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 15 February 2013 14:55

Hah, I wrote that head exploding comment first and then managed to work
out what it did afterwards. Still took a few minutes of smashing my head
into the desk to make room for the new thought though ;¬)

Using a Hash sounds like a good idea. I already tried rewriting the
selector into something a bit more excel-like (although I won't bore you
with all the little changes):

  def []( addr )
    col, row = addr.upcase.scan( /([A-Z]+)(\d+)/ ).flatten
    data[ index( row, col ) ]
  end

  def []=( addr, val )
    col, row = addr.upcase.scan( /([A-Z]+)(\d+)/ ).flatten
    data[ index( row, col ) ] = val
  end

m = Matrix.new(%w{A B C}, %w{1 2 3 4})
m["A1"] = 123
m["B4"] = 123

I haven't gotten around to changing all the "row, col" to "col, row"
references, so it looks a bit weird, but I'm just experimenting with
options at the moment. I'll have a go at Hashing it up as well.

Naturally I have many questions floating around in my head, but I'll try
to work them out through the scientific method of repeated failed
attempts

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 17 February 2013 20:14

I've attached my attempt at converting your code to suit mine (hope you
don't mind the plagarism )
I have a list of some of my plans to add functionality at the top, and
I've rewritten your test at the bottom to suit the new options.

I'd be interested to know whether there are any things I'm doing
drastically wrong... I think the rows? and columns? might be able to be
done more succinctly, for example.

Attachments:
http://www.ruby-forum.com/attachment/8146/XL.rb

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 17 February 2013 20:30

Oopsie, this:
data.keys.map { |k| k[/\d+/] }.max.to_i
should be this:
data.keys.map { |k| k[/\d+/].to_i }.max

Seeing mistakes already

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 February 2013 12:05

Nice tips! Thanks for the help again.

I had no idea how to use to_enum, I'll have to read up on that. I've
done all the Ruby courses I could find at Codecademy which filled in a
few gaps I had in my knowledge. I'm still reading the Book of Ruby as
well.

Hopefully this one is more stable:

I've decided to leave the "Matrix" class name alone in case I need to
use it within the same scope later. I've renamed this "RubyExcel" for
want of a better term.

I fixed all the things you mentioned (I think).

I've added the ability to upload a multidimensional array into the data.
It carries the option to overwrite or append as a switch.

I set the reference list of column references to a Constant.

I've removed "array" added "to_a" and "to_s"

I've added "find" to return a "cell address" when given a value

I still have a long list of things I want to add, and I'm sure I'll
think of more. I'm surprised I haven't found anything equivalent out
there, to be honest. Maybe all the real pros are using databases to
parse their output

Attachments:
http://www.ruby-forum.com/attachment/8149/RubyExcel.rb

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 February 2013 13:43

Nice link. I agree with the sentiment there, and I'll think more
carefully about using boolean switches in future.
I've split that method into "load" and "append", each passing arguments
to private "import_data".
I added the rescue when I realised the method was returning the number
of rows and I wanted it to return success or failure as a boolean, I
forgot it was catching my exceptions as well. Now it's true or
exception.

I do use switches occasionally, here's one example where I think it's
justified (from my older Excel_Sheet<Array class):

def filter( header, regex, switch=true )
  fail ArgumentError, "#{regex} is not valid Regexp" unless regex.class
== Regexp
  idx = self[0].index header
  fail ArgumentError, "#{header} is not a valid header" if idx.nil?
  operator = ( switch ? :=~ : :!~ )
  Excel_Sheet.new skip_headers { |xl| xl.select { |ar| ar[idx].send(
operator, regex ) } }
end

Mostly I just did that because I was learning how to use symbols, but it
makes the Regex more flexible with the minimum amount of repetition or
long-winded "if" statements.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 February 2013 15:50

I went with "filter" with an optional true/false regex switch because it
seemed like the simplest way to use it, and closest to my own experience
in using Excel's filters.
Passing the symbol feels less intuitive, and yielding to a block means
writing more code, particularly when I'm writing a quick method chain.
The notation I set up feels natural to me when chaining criteria. For
example I can just do this:
data.filter( 'Account', /^P/ ).filter( 'Type', /^Large/, false )

Regarding the usage of skip_headers
Say I have this data:

Type Flag Unique_ID
Type1 1 A001
Type2 0 A002
Type1 0 A003
Type3 1 A004
Type1 1 A005

If I only want to keep Parts of "Type1" and "Type3" then I could use
"select" and some Regex, but I might pick up the Header as well if I'm
not careful.
Using a method like "skip_headers" allows me to select or reject
elements of the data without losing the identifiers in the first row,
which I'm almost always going to need at the end when I output the data
into human-readable format.
I'm also dealing with entire rows rather than individual cells, and
since the source data can change its content and order, using the
headers to identify the data source for a given operation is essential.
Using skip_headers both allows me to preserve them while sorting through
data, and also puts them back on again for the next time I need to
reference them.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 February 2013 17:23

That Regexp to proc idea looks good. I could use proc form for a
positive match and a normal block for the negative. I'll see if I can
get something like this working when I write filter method for
RubyExcel.

Using the new class I can implement something like skip_headers by
passing a starting value to "rows" or "columns". This makes it more
flexible as well. I've rewritten those iterators using optional start
and end points:

def rows( start_row = 1, end_row = maxrow )
  fail TypeError, 'Data is empty' if maxrow == 0
  fail ArgumentError, 'The starting row must be less than the maximum
row' if maxrow < start_row
  return to_enum(:rows) unless block_given?
  ( start_row..end_row ).each do |idx|
    yield row( idx )
  end
  self
end

Now I can use rows(2) to skip the headers if necessary. It might be a
bit confusing when rows(1) actually returns from 1 to the end, but I've
already got row(1) for that purpose and it makes it shorter to iterate
through all of them. Plus it means I can do "rows.count", which is the
same as VBA syntax.

I vaguely understand the idea of passing something in to compare to a
header type. I'm not sure how I'd implement it though, since the only
headers I ever deal with are row 1, and they tend to look pretty similar
to the data itself.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 February 2013 23:47

Nice catch on the arguments, I completely missed that.

After more face-to-keyboard action, I came up with a working filter
system. it modifies self at the moment rather than returning a copy,
which is something I'll have to look intosince I'm not sure I want that
to be the default behaviour.

I've added a index option for row and column, and also added row and
column methods to String. Since those methods didn't exist before, and
you gave me the idea of modifying an existing class (Regexp), I thought
this would be quite a useful way to get the index values straight from
the hash keys.

In order to get the filter working properly I've created some compact
methods which will reconfigure the hash keys and values. That could
probably be refactored but it took me so long to get it working properly
I dare not touch it again yet!

I added empty? to the columns and rows as a helper for the compact
method.

I didn't like the inspect output so I tidied it up a bit as well, and
redefined "to_s" for each type.

I've added each_with_address as an option for the columns and rows since
they don't access the data hash directly. There might be a neater way to
implement this, but I couldn't figure it out.

I'm too tired for rational thought now so I'd better call it a day
before I find myself thinking that adding ASCII art comments in the
shape of ponies and rainbows would improve the code...

Attachments:
http://www.ruby-forum.com/attachment/8153/RubyExcel.rb

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Array columns ruby-talk	1	204	19 February 2016
Efficient Ruby equivalent to Excel's "SUMIF"? ruby-talk	6	150	4 March 2013
2 dimensional array ruby-talk	10	80	20 October 2007
Returning a column vector from a 2D array ruby-talk	3	126	2 February 2011
Multi-dimensional (like 2) arrays in Ruby ruby-talk	1	101	9 November 2002

Select "columns" from multidimensional array?

Related topics