RubyExcel class. Useful?

Thanks for your input, Robert.

You're welcome!

I did wonder whether I should convert the underlying dataset into an
Array rather than using a Hash, since spreadsheets are "structured" and
I find the easiest way to manipulate the structure is with Array
methods.

It does not really matter what you do. You could even use a hybrid
approach where you start with an Array based storage and exchange it
with a Hash based storage once sparseness is too large (for your
particular measure of "too large"). If you follow the layered
approach (see at end) you could have two implementations of the plain
data store which have the exact same API but one uses a Hash
internally and the other on an Array...

Enumerable seems to be Array-based as well, and I'm still

No, not at all. Enumerable is just a module which relies solely on
the existence of a method #each. It's as simple as

irb(main):001:0> class X
irb(main):002:1> def each; yield 1; self end
irb(main):003:1> include Enumerable
irb(main):004:1> end
=> X
irb(main):005:0> x = X.new
=> #<X:0x802ec310>
irb(main):006:0> x.to_a
=> [1]
irb(main):007:0> x.select {|n| n.odd?}
=> [1]
irb(main):008:0> x.select {|n| n.even?}
=>
irb(main):009:0> x.find {|n| n > 0}
=> 1

Or a simplistic integer range:

irb(main):014:0> class IntRange
irb(main):015:1> include Enumerable
irb(main):016:1> def initialize(a, b)
irb(main):017:2> @low, @high = [Integer(a), Integer(b)].sort
irb(main):018:2> end
irb(main):019:1> def each
irb(main):020:2> n = @low
irb(main):021:2> while n < @high
irb(main):022:3> yield n
irb(main):023:3> n += 1
irb(main):024:3> end
irb(main):025:2> self
irb(main):026:2> end
irb(main):027:1> end
=> nil
irb(main):028:0> ir = IntRange.new 3, 8
=> #<IntRange:0x80280048 @low=3, @high=8>
irb(main):029:0> ir.to_a
=> [3, 4, 5, 6, 7]
irb(main):030:0> ir.each {|x| p x}
3
4
5
6
7
=> #<IntRange:0x80280048 @low=3, @high=8>

No Arrays around. :slight_smile:

rather hazy on when to override methods like "map", or when to rely on
the methods already available through "each".

You should normally not override Array methods. You generally
shouldn't inherit from Array either. Those core classes are best used
through delegation.

Still, I've learned a lot about Hashes while writing this code, so even
if I do abandon their use for the main data storage I'll still find good
use for them elsewhere.

That's good! I am glad you see it that way. Others might view these
exercises as useless detours - but they underestimate the learning
effects which come from that. You certainly learned a lot more than
by placing questions for every detailed question that occurred to you
as a few other members of the community seem to have chosen to do
recently.

My reasoning behind the prevalence of headers is simply that if you
wanted data without headers you'd just use arrays rather than this
class. One of the big things I find helpful with this is that code is
much more readable if I can reference a header rather than an index.

Then I would at least make the number of header rows and header
columns a property of the individual instance - and not as a constant
in the implementation. Still, I believe that with introduction of the
concept of "headers" in this class you may make things too complex too
fast.

I'm not sure what a test class or wrapper class is. I'll look them up.

A test class would be a class implementing unit tests. A wrapper
class simply wraps around your class RubyExcel in much the same way
that RubyExcel wraps a Hash. In other words: it presents a different
abstraction. It's a general approach in software engineering to
create several layers of abstractions which makes it easier to deal
with only few aspects on one layer. The ISO 7 layer model is a famous
representative of that approach.
http://www.technology-training.co.uk/understandingtheiso7layermodel_10.php

I see your point about to_s. I suppose I should differentiate between
using interpolation for multiple variables and to_s for single cases.

+1

The +, -, and << methods are recent additions; mostly because I only
just learned that you can define these. I'm sure there are multiple ways
to write these;

I wrote about numeric operators in Ruby a while back:
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html

my first attempt was very poor in performance... and my
thinking was to avoid re-inventing the wheel by using the Array methods
written by someone much smarter than me :slight_smile:

... which is perfectly understandable and OK. In this case the
conversion to an Array based structure might burn a lot of memory
though.

All in all, this still needs a lot of work to make it useful; but now
that I have a better feel of what the weaknesses and strengths are, I
hope to improve on this starting point and eventually build something
genuinely useful to others as well as myself.

That's a good approach.

I'd be particularly interested on the question of Array vs Hash for the
internals. Hash is great because of the simplicity of addresses and its
efficient way of coping with blank space, but Arrays can keep their
"form" much more effectively and already support things like sorting,
rows, and columns.

In the end it does not matter that much what you use internally for
representation. The important bit is to use the proper API to your
storage to allow for a consistent view of the model and easy of use.
For the moment I'd stick with Hash but it may make sense to use a
layered approach: split class RubyExcel into (at least) two where one
is only responsible for providing a consistent API to your data with
the minimal operations needed to make it work. Use that internally as
storage and put everything else like header handling, those
convenience methods mentioned or reading from file and writing to file
in the wrapper class. That way you get a clean separation of
concerns: you have a proper abstraction of the storage and you get a
second layer which adds all the whistles and bells you need to work
efficiently with it (like treading header rows and columns
differently).

Kind regards

robert

···

On Sat, Mar 9, 2013 at 4:31 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Wow! Lots of good advice, as usual :slight_smile: It'll take me a while to process
all this so I'll just get on with it.

I have attached the skeleton of a class for spreadsheet storage to
illustrate what I was saying earlier. Note for example that this
class does not have textual cell indexes. I thought it might be
easier to use int ranges when focusing on the basic manipulations
which will move cells. Depending on storage chosen that may be tricky
enough already.

A next step could be to create a subclass which allows for
CellReferences to register with the class and have them automatically
update on successful move operations. I do not know whether you need
that but when trying to recreate a spreadsheet application in Ruby
that might be an interesting feature to have.

As step towards the user your spreadsheet class could wrap the storage
class and translate indexes back and forth so the user of your class
would only see addresses like "A4" instead of [0, 3].

Incidentally, I hadn't seen this before:
Class: CSV::Table (Ruby 1.9.3)
It looks similar in many ways to what I'm trying to build. I'll have to
see what I can learn from it as well.

Yes, but keep in mind that CSV has a different mental model: in CSV a
row is typically a record, so all the data in a row belongs together.
In a spread sheet both dimensions are much more equal. With CSV you
won't do operations like inserting a rectangular area somewhere in
between while that could be quite common with spread sheets.

Kind regards

robert

spread-store.rb (1.19 KB)

···

On Sat, Mar 9, 2013 at 10:05 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Aha! I have learned how to use Github.

+1

My hideous-looking, still under-construction code is here:
GitHub - VirtuosoJoel/RubyExcel: Spreadsheet storage class in Ruby with Excel-style API

I've split it into something more resembling an Excel workbook, and
tried to reduce the redundancy of address-based code by adding a module
called Address.

I'll keep hammering bits onto it, cutting away the dross, and testing it
until it turns into a reasonably serviceable tool.

As always, any advice is welcome!

This doesn't look good:
def range=( first_cell, last_cell=nil, val )

Calls will look like this

sheet.range= 1, 2, 3

There is no visual separation between coordinates and the value like
there is with =.

I always find constructs like this suspicious:

def cell( row, col )
  Element.new( self, indices_to_address( row, col ) )
end

The purpose of a constructor is to construct something and return it
so the caller can do something with it. If you do not do that
everybody is left wondering what happens to the newly created
instance. If you need it only temporary then another method name
would be better. Or change it so that not Element stores itself
somewhere in "self" but rather "self" itself.

Kind regards

robert

···

On Wed, Mar 13, 2013 at 4:45 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

No, I was being stupid. Somehow I overlooked that the whole point of
the method was to return the Element instance. Just forget it. Sorry
for the noise.

Cheers

robert

···

On Wed, Mar 13, 2013 at 8:50 PM, Joel Pearson <lists@ruby-forum.com> wrote:

Maybe I'm just being stupid, but I'm having trouble understanding your
meaning. Do you mean that I shouldn't always use an "Element" class, and
just work directly with the data?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Took a quick look at the code; this bit looks wrong:

def delete(ref)
...
@sheets.delete_at( ref + 1 )

I think it should be (ref - 1). Which brings me to my main point, that
1-indexing is tricky and should be bolstered by a lot of tests in case
you inadvertently slip up. Also @sheets.delete_at changes the array
whereas @sheets.reject does not; it should be reject! for consistency.
(Again, tests would have caught this.)

martin

···

On Wed, Mar 13, 2013 at 8:45 AM, Joel Pearson <lists@ruby-forum.com> wrote:

Aha! I have learned how to use Github.

My hideous-looking, still under-construction code is here:
GitHub - VirtuosoJoel/RubyExcel: Spreadsheet storage class in Ruby with Excel-style API

I've split it into something more resembling an Excel workbook, and
tried to reduce the redundancy of address-based code by adding a module
called Address.

I'll keep hammering bits onto it, cutting away the dross, and testing it
until it turns into a reasonably serviceable tool.

As always, any advice is welcome!

--
Posted via http://www.ruby-forum.com/\.

Uh oh, I think I've backed myself into a corner...

:slight_smile:

I assume that I'll need some sort of "dup" method in order to easily
differentiate between modifying an object in place and returning a copy.
However, in my attempts to interlink all the elements, I think I've
created a situation where I'll end up in a loop.

RubyExcel holds "child" Sheets
Sheet links to "parent" RubyExcel
I know they aren't really parent and child, but I wasn't sure of the
best way to link them together.

In order to dup RubyExcel I need to dup each sheet.
In order to dup a Sheet I need to dup RubyExcel.

I've ended up with a circular reference and I'm not sure of the best way
to proceed from here.

It may be an indication that you got your design wrong. Why did you
think you need the circular references in the first place?

Also, is it sensible to have the namespaces stacked the way I've done
it? Like RubyExcel::Sheet::Element.

I'd probably only have one level of namespaces for this, e.g. module
RubyExcel and put all classes in there. You won't have too many
classes to distribute across a multitude of namespaces.

Kind regards

robert

···

On Thu, Mar 14, 2013 at 11:04 AM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote in post #1101459:

This doesn't look good:
def range=( first_cell, last_cell=nil, val )
There is no visual separation between coordinates and the value like
there is with =.

Thanks for spotting that, I wasn't sure how to use it. I've completely
removed that method; range now returns an object and range.value can get
and set its content.

I always find constructs like this suspicious:

def cell( row, col )
  Element.new( self, indices_to_address( row, col ) )
end

The purpose of a constructor is to construct something and return it
so the caller can do something with it. If you do not do that
everybody is left wondering what happens to the newly created
instance. If you need it only temporary then another method name
would be better. Or change it so that not Element stores itself
somewhere in "self" but rather "self" itself.

I'm not really sure what you mean by this. My intention is that cell and
cells are like alises for range, but you can select by 1-based indices
(again, copying excel). Range is also capable of referring to more than
1 cell, and can take cells as arguments. This allow for purely numerical
selection of any area, I've allowed columns to be selected numerically
as well.

irb(main):053:0> s.cell(1,1)
=> RubyExcel::Sheet::Element: A1
irb(main):054:0> s.cell(1,1).value
=> "A1"

···

--
Posted via http://www.ruby-forum.com/\.

Robert Klemme wrote in post #1101581:

I've ended up with a circular reference and I'm not sure of the best way
to proceed from here.

It may be an indication that you got your design wrong. Why did you
think you need the circular references in the first place?

I need a Sheet to belong to a workbook, but also be able to be copied or
moved from one workbook to another. I need to be able to find the sheet
from the workbook, and find the workbook from the sheet.
Originally I thought that would be some sort of inheritance chain, but I
can't use inheritance because Sheet is not a type of Workbook.

I'd probably only have one level of namespaces for this, e.g. module
RubyExcel and put all classes in there. You won't have too many
classes to distribute across a multitude of namespaces.

Perhaps a RubyExcel Module with the first class called Workbook would be
more suitable.

···

--
Posted via http://www.ruby-forum.com/\.

Martin DeMello wrote in post #1101514:

Took a quick look at the code; this bit looks wrong:

def delete(ref)
...
@sheets.delete_at( ref + 1 )

I think it should be (ref - 1). Which brings me to my main point, that
1-indexing is tricky and should be bolstered by a lot of tests in case
you inadvertently slip up. Also @sheets.delete_at changes the array
whereas @sheets.reject does not; it should be reject! for consistency.
(Again, tests would have caught this.)

martin

When I "delete" a sheet, do I just need to stop referring to it as an
object? As I understand it; once it's removed from "@sheets", it should
be garbage collected.

···

--
Posted via http://www.ruby-forum.com/\.

It's about the constructor (Element.new) - not addressing methods.
Bottom line: don't abuse constructors for doing algorithmic things.
Constructors are for constructing something so it is in a consistent
state and can be worked with. Placing the complete logic in a
constructor will prevent flexible use of classes.

Cheers

robert

···

On Wed, Mar 13, 2013 at 6:18 PM, Joel Pearson <lists@ruby-forum.com> wrote:

Robert Klemme wrote in post #1101459:

I always find constructs like this suspicious:

def cell( row, col )
  Element.new( self, indices_to_address( row, col ) )
end

The purpose of a constructor is to construct something and return it
so the caller can do something with it. If you do not do that
everybody is left wondering what happens to the newly created
instance. If you need it only temporary then another method name
would be better. Or change it so that not Element stores itself
somewhere in "self" but rather "self" itself.

I'm not really sure what you mean by this. My intention is that cell and
cells are like alises for range, but you can select by 1-based indices
(again, copying excel). Range is also capable of referring to more than
1 cell, and can take cells as arguments. This allow for purely numerical
selection of any area, I've allowed columns to be selected numerically
as well.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote in post #1101581:

I've ended up with a circular reference and I'm not sure of the best way
to proceed from here.

It may be an indication that you got your design wrong. Why did you
think you need the circular references in the first place?

I need a Sheet to belong to a workbook, but also be able to be copied or
moved from one workbook to another. I need to be able to find the sheet
from the workbook,

OK, that sounds reasonable.

and find the workbook from the sheet.

What functionality for?

Originally I thought that would be some sort of inheritance chain, but I
can't use inheritance because Sheet is not a type of Workbook.

This is certainly not a case for inheritance.

Cheers

robert

···

On Thu, Mar 14, 2013 at 1:37 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote in post #1101592:

and find the workbook from the sheet.

What functionality for?

It's quite useful in case I lose track of the object which is the
"parent" of the sheet. Consider this example of the current code:

#Shortcut to avoid having to go through the workbook every time
sheet = RubyExcel.new.load data
#But now I want the workbook:
myRubyExcel = sheet.ruby_excel

I wanted something akin to Excel VBA's approach
ActiveSheet.parent
=> workbook

···

--
Posted via http://www.ruby-forum.com/\.