Select "columns" from multidimensional array?

Using the filter method on my previous attempt would return a copy of
the object containing only the required data. I was hoping for something
similar with this so I can split into multiple sections using filter
logic but still keep the original if required, but I'm not sure how to
do this. This is the kind of thing I'd probably want to use:

m = #Main matrix
n = m.filter args #Creates Subset without altering "m"
m.filter! other_args #Alters its own data

A class method sounds like a good idea. I'll have to play with that and
see what I can come up with...

···

--
Posted via http://www.ruby-forum.com/.

Ok... new changes:

Used "Me" to reference the class so I don't have to keep typing the
classname when referencing class methods.

Removed String alterations in favour of the class methods "self.row" and
"self.column" which pass a string as an argument.

Combined various checks again multidimensional array input into a method
"multi_array?"

Added "insert_columns" and "insert_rows", which after long hours of
threatening my laptop with physical violence I've finally gotten into
some sort of working order.

Added "delete_column" and "delete_row"

Modified the "compact" methods to work with the above.

Modified "calc_columns" and "calc_rows" to accept a multidimensional
array as an optional argument, otherwise they reference "data".

Added "cells" so the API gives the option to list all populated cell
addresses.

Added strip! and upcase!

Modified the example case at the end to work with the major changes.

It's still far from complete, but it's taking form well enough that I
might be able to start replacing my old class with it in future
projects.
If I can get a clean and tidy version eventually I might make it into a
gem.

Sadly I did succumb to the urge to add ASCII art comments. I'll probably
remove it when I drift back closer to sanity.

Attachments:
http://www.ruby-forum.com/attachment/8154/RubyExcel.rb

···

--
Posted via http://www.ruby-forum.com/.

I've now had time to do a bit more testing and debugging, and I've
managed to come up with a (relatively) stable and usable framework. Even
more surprising, I now have a working version of pretty much everything
on my wish list. I suppose I can just call .dup or .clone for when I
want to take different sections of the data down different paths.

I've tried to refactor some of the more esoteric code, and improve the
"encapsulation" by hiding the underlying hash wherever I spotted it
poking out.

I've actually managed to use recursion (in the "unique" method) without
permanently looping! I'll just gloss over the amount of permanent loops
I went through to get that right...

I've had to make some hefty use of reverse_each to sort out deleting
entire sections and compacting the rest afterwards.

As usual, I've probably made many silly mistakes and done things the
hard way. Any advice would be appreciated (although not demanded :wink: ).

I think the next thing to do is put this class through some real-world
scenarios, so I'll be using it as a replacement for ye olde Excel_Sheet
class.

Attachments:
http://www.ruby-forum.com/attachment/8156/RubyExcel.rb

···

--
Posted via http://www.ruby-forum.com/.

Thanks Timo.

I have used the spreadsheet gem and parseexcel, but I switched to using
win32ole for more advanced options and for full compatibility with
office 2010 formats. I haven't been keeping up to date with their
progress since, it's always worth looking into them. I haven't seen
hyogen or statsample before, I'll have a look.

There are a few reasons I've set out to build my own class(es) for this:

1) To create an API which feels natural for me. My previous scripting
experience is mostly in VBA, so I'm accustomed to using Excel's API and
my own custom functions. I wanted to get as close to this feel as
possible in order to get comfortable learning Ruby. Of course, in the
process I'm picking up on the concepts of OOP, so my approach is
starting to change as well.

2) To work alongside my existing code. This is a side-project to my main
one, which is a sort of web-scraper specifically designed to work with
my company's online database. I have a few other classes which perform
tasks such as automated reporting with advanced criteria, filling in
forms quickly with minimal user input, and just general helper
applications to improve productivity. It seemed only natural to create a
class capable of handling all the data analysis tasks which I had
previously written using standard Array methods.

3) The actual interaction I have with Excel tends to be minimal. I have
a few data dump methods which will take an array or a hash of arrays,
and output a TSV or Excel file automatically. The focus of the gems you
mentioned seems to be more on read-write operations like this, and I was
after something which would let me analyse and restructure large amounts
of data into an ordered summary. The reason I tend to order my
operations as if it were manual Excel interactions is this is how
customers give me their current methodology, and my API makes it easier
for me to translate their approach into code.

4) To learn how to do it myself rather than relying on existing programs
:slight_smile:

I'd be happy to share my newfound knowledge with you, although I have to
give the credit mostly to Robert. I'd never have made it this far
without his guidance.

Incidentally I wrote a Matrix movie "Digital Rain" effect (which works
via win32ole) while I was trying to work out how to interact with Excel
efficiently. I've attached it in case you're interested.

Attachments:
http://www.ruby-forum.com/attachment/8160/matrix.rbw

···

--
Posted via http://www.ruby-forum.com/.

From what I've seen of statsample so far it looks more numerically
oriented, whereas I tend to be dealing with lots of strings, some
numbers, and a few dates.

Hyogen also looks more interested in the read-write than the analysis in
the interim. My input tends to come from Nokogiri and I can already
output to excel with some advanced formatting options using my own code.

I have attached one of the functions I wrote to output the data to Excel
(part of a larger class).
As you can see, it's pretty simple to use; and I always have the option
of just handing over the win3ole object if I need more advanced options
(which happens occasionally). It could use some refactoring, but it does
the job.

As I already have this kind of functionality available, it's the ability
to sort through a table of data efficiently in the memory which
interests me. I know a database would be ideal for that kind of
operation, but I have portability in mind as a given report program may
be on a variety of machines, and the only thing I know they have for
sure is Office 2007 or higher.

Attachments:
http://www.ruby-forum.com/attachment/8161/Demo.rb

···

--
Posted via http://www.ruby-forum.com/.

Sounds messy. Is the data on the page graphical rather than text based?
Why do you need Word?

···

--
Posted via http://www.ruby-forum.com/.

I thought I could avoid the issue of splitting the data into multiple
outputs by creating copies of the class, but apparently I was wrong:

irb(main):002:0> m = RubyExcel.new.load [['a','b']]
=> columns: 2, rows: 1, values: 2

irb(main):003:0> a = m.clone

irb(main):004:0> a['A1'] = nil

irb(main):005:0> puts a
        b

irb(main):006:0> puts m
        b

What am I doing wrong? Is @data inside m not being duplicated along with
the class instance?

···

--
Posted via http://www.ruby-forum.com/.

Thanks!
Before I start rewriting everything to try and use this, though...
Should I really be doing this sort of thing just to be able to
differentiate between bang and non-bang methods? Would my time be better
spent trying to find a way of returning a seperate instance of the class
without modifying the current one?

I did consider doing something like modifying a local variable, passing
it to the "load" method of a new instance, then returning that new
instance; but it seemed a rather long-winded method, and I think I'd
have to rewrite all the methods around it.

···

--
Posted via http://www.ruby-forum.com/.

Oh, that was simpler than I thought. I need to do some further testing
but this seems to work:

···

______________________________________________
def dup
  Me.new.load @data.clone
end

def load ( multi_array )
  if multi_array.is_a? Hash
    @data = multi_array
    calc_dimensions
    self
  else
    @data = {}
    import_data multi_array
  end
end
______________________________________________

irb(main):006:0> m = RubyExcel.new.load [['a','b']]
=> columns: 2, rows: 1, values: 2
a b

irb(main):007:0> a = m.dup
=> columns: 2, rows: 1, values: 2
a b

irb(main):010:0> a['A1'] = nil
columns: 2, rows: 1, values: 1
        b

irb(main):013:0> p m
columns: 2, rows: 1, values: 2
a b

--
Posted via http://www.ruby-forum.com/.

There are various ways to create a database from excel files, so that
should be fine. I can't really think of a more effective way to pull the
data from books, apart from trying out different OCR programs. The
necessity of scanning and manual checking pretty much renders moot any
attempt to accelerate that process.

Depending on the amount of data, you might actually get better
performance
out of a skilled typist doing transcription. It might be worth timing
the two against each other.

I assume there's no chance of getting the original data these books were
printed from? That would be the ideal solution.

···

--
Posted via http://www.ruby-forum.com/.

Maybe I should start a new thread for this...

Anyway, I've worked out some of the kinks (probably in an unecessarily
convoluted fashion) and added some documentation code after learning the
basics of "yard". I've managed to add plenty of little helpers for
modifying code, and I'm about to start real-life testing.

As always, any constructive criticism is welcome!

Attachments:
http://www.ruby-forum.com/attachment/8176/RubyExcel.rb

···

--
Posted via http://www.ruby-forum.com/.

Hah, I'm such an idiot. I've replaced the hideously inefficient
compact_rows & columns with these:

def compact_rows
  load to_a.reject { |ar| ar.all? { |el| el.nil? } }
end

def compact_columns
  load to_a.transpose.reject { |ar| ar.all? { |el| el.nil? } }
end

Much better :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

A related question... on values from nested arrays / json

https://www.ruby-forum.com/topic/4418971

Kindly advice

···

--
Posted via http://www.ruby-forum.com/.

I'd really start by creating a class for this - or use Matrix from the
standard library.

irb(main):008:0> m = Matrix[[1,2,3],[4,5,6]]
=> Matrix[[1, 2, 3], [4, 5, 6]]
irb(main):009:0> m.row 1
=> Vector[4, 5, 6]
irb(main):010:0> m.column 1
=> Vector[2, 5]

Kind regards

robert

···

On Thu, Jan 31, 2013 at 11:34 AM, Joel Pearson <lists@ruby-forum.com> wrote:

Praise Jesus!

My initial output was the desired one, but the "transpose" method is
what I was missing. Thanks :slight_smile:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks robert

You're welcome!

My current approach is
HTML Table -> Nokogiri Nodeset -> Multidimensional Array -> Excel / TSV

A Matrix looks like a useful way of grabbing the values I need when I
have to alter specifics in the data.

Whatever you do - reuse class Matrix, write your own - it is the most
reasonable thing to have a specific class for handling this instead of
writing functions which work with a nested Array structure. It will
make your life much easier because then you Matrix class can enforce
proper internal state which you cannot as easily when using a set of
functions to manipulate an Array structure.

Kind regards

robert

···

On Thu, Jan 31, 2013 at 3:14 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Hi Joel,

Worth mentioning that transpose requires each "row" to contain the same
number of "columns" or you will get an index error.

Here's a Gist on how to alter Array#transpose to allow a block for
populating missing elements.

Best

randym

···

On Thu, Jan 31, 2013 at 11:14 PM, Joel Pearson <lists@ruby-forum.com> wrote:

Thanks robert

My current approach is
HTML Table -> Nokogiri Nodeset -> Multidimensional Array -> Excel / TSV

A Matrix looks like a useful way of grabbing the values I need when I
have to alter specifics in the data.

--
Posted via http://www.ruby-forum.com/\.

I wouldn't do that. With the basic types it is usually much better to
use delegation (i.e. have a member of that type) than exposing the
full API via inheritance. The whole point of OO is to control
internal state which is usually quite difficult when exposing a
complete API of Array because anybody can insert and remove elements.

Btw. the "self." in your code are superfluous.

Kind regards

robert

···

On Sun, Feb 3, 2013 at 11:43 PM, Joel Pearson <lists@ruby-forum.com> wrote:

I decided to try and build on the Array class as I don't really
understand Matrices yet. I've added a few handy methods. The hidden Bang
stuff is justified, I think, as this class is intended to mimic Excel's
layout.

class Excel_Sheet<Array

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I've decided to inherit from array after all, since all I want to do
with this is extend support for multidimensional arrays, but without
overwriting any of Array's methods.

I usually do not engage in predictions since I don't have a crystal
ball but in this case I'll say: you won't get happy with that
approach. For example, anybody can override header values or complete
rows / columns violating your class's idea of internal state.

Anyway, the obstacle I've hit is one I can avoid, but I was wondering
whether I'm doing something wrong, or whether there's a nice Rubyish way
around this. Here's a simplified version to demonstrate the issue:

___________________
class Excel_Sheet<Array

  def initialize( val= )
    val = %w(A1 B1 C1 A2 B2 C2 A3 B3 C3).each_slice(3).to_a if val ==
'test'
    super ( val )
  end

  def skip_headers
    block_given? ? ( [ self[0] ] + yield( self[1..-1] ) ) : (
self[1..-1] )
  end

What is this supposed to do? Ah, I think I see. I'd probably name it
differently, i.e. each_data_cell or something.

  def filter( header, regex )
    idx = self[0].index header
    skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex } }
  end

end

That combines too much logic in one method IMHO. I'd rather select a
row based on header and then I would use #select on that.

So in short, my question is how can I return my class type after using
Array's methods on my child-class?

Do you mean as return value from #map and the like? Well, you can't
without overriding all methods with this approach, I'm afraid. That's
one of the reasons why this approach does not work well. :slight_smile:

Kind regards

robert

···

On Tue, Feb 12, 2013 at 1:19 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks for the advice and examples, I'll see whether I can understand
how the classes and methods work with each other there and set about
experimenting with them.

I didn't put commenting in the gist. If there's anything unclear feel
free to ask.

Once thing which put me off generating a custom class "from scratch" is
that Array appears to be equal to its content (I assume this is a
language shortcut), but it seems "custom" objects' values have to be
accessed via their accessors.
I was hoping for some more succinct syntax than this sort of thing:
puts #Array is so easy to create
puts CustomObject.new().value #This looks clunky next to that

You can get quite close, for example you can do

def M(*a)
  YourCustomMatrix.new(a)
end

# use
M(1,2,3,4)

or

M = Object.new
def M.(*a)
  YourCustomMatrix.new(a)
end

# use
M[1,2,3,4]

I'd love to get accustomed proper OO thinking, but I'll inevitably make
all the rookie mistakes in the process.

Yes, it will take time. Mistakes are what you will learn from. Given
that, I should probably shut up and let you make your personal
mistakes. :slight_smile:

It's a lot to get used to all at
once given that I've been using Ruby for less than a year, and I have no
training other than helpful hints and googling. Thanks again for your
patience.

You're welcome!

Kind regards

robert

···

On Tue, Feb 12, 2013 at 9:26 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Interesting Matrix build. It's giving me a bit of a headache just trying
to figure out the links involved.

So MatrixPart defines the methods and the "parent" matrix (held as an
instance variable); and row and column both use these methods and both
access the variable which points to the matrix they're part of.

Yes, Row and Column are a facade to the "real" data and provide a
different interface to it which presents a different abstraction:
while the Matrix has two dimensions a Row and a Column only have one.

The rows and columns can be selected based on given headers, and each
will reference the other... and this is where my head explodes:

def index( row, col )
  @row_headers.index( row ) * @col_headers.size + @col_headers.index(
col )
end

It takes a bit of getting used to, but thanks to Ruby's flexible array
class adding nil values automatically when you specify an index higher
than the upper boundary, that works.

Since you obviously understood the method now I am not sure why you
say your head explodes over this piece of code.

Btw, with a small change you can change storage of data from an Array
to a Hash making the Matrix class better suited for sparse matrices.
And here comes an important aspect of that implementation: only the
Matrix class had to change, there was absolutely no change necessary
for the other three classes! This shows how Matrix's API isolated
client code from inner workings of this class. This is what OO is
about.

I guess with a bit more poking and prodding I could figure out how to
append, insert, and delete rows and columns. After all, it's only a math
problem in the end. All the interconnected references (especially the
layered yields) still make my head spin though :slight_smile:

You'll get used to that - and with a bit of oil the squeaking goes
away as well. :wink:

Kind regards

robert

···

On Thu, Feb 14, 2013 at 11:50 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/