RubyExcel class. Useful?

I've managed to create a (relatively) stable data-processing class which
behaves a bit like Excel, but with some of Ruby's awesome syntax styles.
I've used it on a few projects now so I'm happy with how it works,
although I'm always trying to improve it.

The 2 files are intended to be placed in the same directory. I might
break it down further in a later build.

This was written to be documented with "yard" if that makes the random
layout easier on the eyes.

The intention is to help simplify scripts which have to sift through
table data, by allowing the user to work with a variety of ways to
access
and loop though the data. The class is built around the assumption that
it is dealing with a 2D array of data with headers in the first row,
which it generally tries to avoid touching while looping through the
data. The
indexing is 1-based to mimic Excel's API.

There are multiple ways to reference the data. You can do this by Row,
Column, Cell indices, Ranges, and individual addresses using [].

I've found this a happy medium between the slowness of Excel's clunky
VBA, and the frustration of having to constantly move Ruby's Array
indices around to avoid affecting the headers.

If I can get rid of the bugs in this and add enough useful functionality
then I might publish it as a gem for anyone who regularly has to deal
with tables (HTML in my case) and wants a simple tool to do it with.

Any comments, advice, requests or constructive criticism are welcome!

Special thanks to Robert Klemme for providing the impetus to get this
started.

Attachments:
http://www.ruby-forum.com/attachment/8208/RubyExcel.rb
http://www.ruby-forum.com/attachment/8209/RubyExcel_Components.rb

···

--
Posted via http://www.ruby-forum.com/.

Oops, I had a broken line in there.

This:
ruby_XL.send( self.class.to_sym, self.index ,ruby_XL )

Doesn't cause an error anymore if changed to this:
ruby_XL.send( self.class.name[/row|column/i].downcase.to_sym, self.index
)

···

--
Posted via http://www.ruby-forum.com/.

Are there tests that demonstrate the use of this?

···

On Fri, Mar 8, 2013 at 10:49 AM, Joel Pearson <lists@ruby-forum.com> wrote:

I've managed to create a (relatively) stable data-processing class which
behaves a bit like Excel, but with some of Ruby's awesome syntax styles.
I've used it on a few projects now so I'm happy with how it works,
although I'm always trying to improve it.

The 2 files are intended to be placed in the same directory. I might
break it down further in a later build.

This was written to be documented with "yard" if that makes the random
layout easier on the eyes.

The intention is to help simplify scripts which have to sift through
table data, by allowing the user to work with a variety of ways to
access
and loop though the data. The class is built around the assumption that
it is dealing with a 2D array of data with headers in the first row,
which it generally tries to avoid touching while looping through the
data. The
indexing is 1-based to mimic Excel's API.

There are multiple ways to reference the data. You can do this by Row,
Column, Cell indices, Ranges, and individual addresses using .

I've found this a happy medium between the slowness of Excel's clunky
VBA, and the frustration of having to constantly move Ruby's Array
indices around to avoid affecting the headers.

If I can get rid of the bugs in this and add enough useful functionality
then I might publish it as a gem for anyone who regularly has to deal
with tables (HTML in my case) and wants a simple tool to do it with.

Any comments, advice, requests or constructive criticism are welcome!

Special thanks to Robert Klemme for providing the impetus to get this
started.

Attachments:
http://www.ruby-forum.com/attachment/8208/RubyExcel.rb
http://www.ruby-forum.com/attachment/8209/RubyExcel_Components.rb

--
Posted via http://www.ruby-forum.com/\.

Good point Andrew. Writing a thorough demo is a good way to
systematically test every method as well.
I just tried to demo some of the newer stuff I added and I've already
found a couple of bugs where I have class methods referenced rather than
the mix-in methods I changed to recently. I'll run a thorough series of
tests, build up some examples, and then try again.

This covers some of the basics anyway:

irb(main):002:0> re = RubyExcel.new.test
=> RubyExcel => columns: 4, rows: 6, values: 24
irb(main):003:0> puts re
a b c d
e f g h
i j k l
m n o p
q r s t
u v w x
=> nil
irb(main):004:0> re['B2'] = 'z'
=> "z"
irb(main):005:0> re.column_by_header('c').map! { |el| el.next }
=> Class: RubyExcel::Column, Index: C
irb(main):006:0> puts re
a b d d
e z h h
i j l l
m n p p
q r t t
u v x x
=> nil
irb(main):007:0> re.delete_column('D')
=> RubyExcel => columns: 3, rows: 6, values: 18
irb(main):008:0> re.cell(2,3).value = 'z'
=> "z"
irb(main):009:0> puts re
a b d
e z z
i j l
m n p
q r t
u v x
=> nil
irb(main):010:0> re += [ %w(1 2 3) ]
=> RubyExcel => columns: 3, rows: 7, values: 21
irb(main):011:0> puts re
a b d
e z z
i j l
m n p
q r t
u v x
1 2 3
=> nil
irb(main):012:0> re.filter!('b') { |el| el =~ /[a-r]/ }
=> RubyExcel => columns: 3, rows: 4, values: 12
irb(main):013:0> puts re
a b d
i j l
m n p
q r t
=> nil
irb(main):014:0> re.row(2)['A'] = 'q'
=> "q"
irb(main):015:0> re.summarise 'a'
=> {"q"=>2, "m"=>1}
irb(main):016:0> puts re
a b d
q j l
m n p
q r t
=> nil
irb(main):017:0> puts re.uniq! 'a'
a b d
q j l
m n p
=> nil

···

--
Posted via http://www.ruby-forum.com/.

Looks nice! Could you toss it up on github?

Also I'd recommend calling it something more generic like
RubySpreadsheet or Tabular, since it is neither interfacing with excel
nor reading excel files.

martin

···

On Fri, Mar 8, 2013 at 7:49 AM, Joel Pearson <lists@ruby-forum.com> wrote:

I've managed to create a (relatively) stable data-processing class which
behaves a bit like Excel, but with some of Ruby's awesome syntax styles.
I've used it on a few projects now so I'm happy with how it works,
although I'm always trying to improve it.

The 2 files are intended to be placed in the same directory. I might
break it down further in a later build.

This was written to be documented with "yard" if that makes the random
layout easier on the eyes.

The intention is to help simplify scripts which have to sift through
table data, by allowing the user to work with a variety of ways to
access
and loop though the data. The class is built around the assumption that
it is dealing with a 2D array of data with headers in the first row,
which it generally tries to avoid touching while looping through the
data. The
indexing is 1-based to mimic Excel's API.

There are multiple ways to reference the data. You can do this by Row,
Column, Cell indices, Ranges, and individual addresses using .

I've found this a happy medium between the slowness of Excel's clunky
VBA, and the frustration of having to constantly move Ruby's Array
indices around to avoid affecting the headers.

If I can get rid of the bugs in this and add enough useful functionality
then I might publish it as a gem for anyone who regularly has to deal
with tables (HTML in my case) and wants a simple tool to do it with.

Any comments, advice, requests or constructive criticism are welcome!

Special thanks to Robert Klemme for providing the impetus to get this
started.

Attachments:
http://www.ruby-forum.com/attachment/8208/RubyExcel.rb
http://www.ruby-forum.com/attachment/8209/RubyExcel_Components.rb

--
Posted via http://www.ruby-forum.com/\.

Thanks for the feedback. I'd certainly like to make the code concise and
efficient. I'm trying to refactor regularly but I also add new options
and variations whenever I think of them, so it's a bit of a struggle in
both directions.

I've tried to use the "Write Once / DRY" approach, which is why I've
started to put certain shared methods into the Helper module, but I'm
still a n00b at this so I'm sure I'm making parts needlessly
complicated.

Thanks for the name suggestions. I picked the name "RubyExcel" out of
the air so I'm open to changing it. The reason I went with that name was
because I'm trying to make something with the best of Excel's and Ruby's
APIs together.

I'll have to learn how to use github now :slight_smile: I'll weed out the bugs that
I've noticed and then figure out how to upload there.

I wrote this because I work with browser-based reports which appear in
HTML tables. I then have to extract the data into excel. This is a tool
to handle the data in the interim, and to make all of the changes to the
data in the memory, using an API which allows me to easily. read back
what it's doing.
I'll add more detail on usage cases later.

Thanks for the advice, guys. I'll act on this and come back with some
updates.

···

--
Posted via http://www.ruby-forum.com/.

Good, I see you're making progress!

One thing I noticed on quick glancing is that you are overusing string
interpolation.

self =~ "#{s}" -> self =~ s.to_s

if "#{ val }".empty? -> if val.to_s.empty?

In = you are evaluating the same condition twice.

I also share Andrew's feeling that you might get away with less code.
Somehow it feels like you put too much into the class. For example,
methods #strip! and #upcase! deal with individual cell manipulations.
They may make sense as convenience methods but would first try to
focus on the core functionality of the spreadsheet: inserting and
removing rows and columns. And method #test does not belong there this
should go into a test class.

Another thing that stroke me odd is that you assume there will always
be one header row. IMHO that is a too specific assumption. If any I
would only put this in a subclass or a wrapper class.

Finally I believe you are working too much with Arrays. For example
in method #+ instances of Row, Column and RubyExcel are converted to
Arrays before they are appended. Since these classes belong to this
library you know how their internals look and you should have a better
way to transfer their data over to this sheet. Same for #- and
probably others as well.

Kind regards

robert

···

On Fri, Mar 8, 2013 at 4:49 PM, Joel Pearson <lists@ruby-forum.com> wrote:

I've managed to create a (relatively) stable data-processing class which
behaves a bit like Excel, but with some of Ruby's awesome syntax styles.
I've used it on a few projects now so I'm happy with how it works,
although I'm always trying to improve it.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks for your input, Robert.

I did wonder whether I should convert the underlying dataset into an
Array rather than using a Hash, since spreadsheets are "structured" and
I find the easiest way to manipulate the structure is with Array
methods. Enumerable seems to be Array-based as well, and I'm still
rather hazy on when to override methods like "map", or when to rely on
the methods already available through "each".
Still, I've learned a lot about Hashes while writing this code, so even
if I do abandon their use for the main data storage I'll still find good
use for them elsewhere.

My reasoning behind the prevalence of headers is simply that if you
wanted data without headers you'd just use arrays rather than this
class. One of the big things I find helpful with this is that code is
much more readable if I can reference a header rather than an index.
I'm not sure what a test class or wrapper class is. I'll look them up.

I see your point about to_s. I suppose I should differentiate between
using interpolation for multiple variables and to_s for single cases.

The +, -, and << methods are recent additions; mostly because I only
just learned that you can define these. I'm sure there are multiple ways
to write these; my first attempt was very poor in performance... and my
thinking was to avoid re-inventing the wheel by using the Array methods
written by someone much smarter than me :slight_smile:

I hadn't realised that I was evaluating "empty?" twice in "[]=". That
was a performance-increasing experiment I'd tried to avoid recalculating
the dimensions after every write operation. It made quite a difference
at the time and I hadn't looked into simplifying it yet.

All in all, this still needs a lot of work to make it useful; but now
that I have a better feel of what the weaknesses and strengths are, I
hope to improve on this starting point and eventually build something
genuinely useful to others as well as myself.

I'd be particularly interested on the question of Array vs Hash for the
internals. Hash is great because of the simplicity of addresses and its
efficient way of coping with blank space, but Arrays can keep their
"form" much more effectively and already support things like sorting,
rows, and columns.

···

--
Posted via http://www.ruby-forum.com/.

Wow! Lots of good advice, as usual :slight_smile: It'll take me a while to process
all this so I'll just get on with it.

Incidentally, I hadn't seen this before:
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV/Table.html
It looks similar in many ways to what I'm trying to build. I'll have to
see what I can learn from it as well.

···

--
Posted via http://www.ruby-forum.com/.

Ok... having looked over what I want to accomplish with this I've
decided to go with the following:

Completely rewrite using Arrays as primary internal storage (there's
always the option to swap with or include Hash later). I rarely deal
with large gaps in my data so this seems the best approach for now.

Rebuild the internal classes for the different functions (data storage,
sections, element handling, accessing, etc.) within a class which
controls the API. I want to try and separate the different types of
logic and make this easier to work on.

Plan the layout before writing any code this time, and rewrite as
idiomatically as possible rather than copy-pasting the older code.

This is going to take some time, due to my other work. Still, once it's
complete it should improve the speed with which I can can write related
code.
Apologies to anyone who wanted to play around with this anytime soon,
but I'd rather do it right than do it quickly.

···

--
Posted via http://www.ruby-forum.com/.

Aha! I have learned how to use Github.

My hideous-looking, still under-construction code is here:

I've split it into something more resembling an Excel workbook, and
tried to reduce the redundancy of address-based code by adding a module
called Address.

I'll keep hammering bits onto it, cutting away the dross, and testing it
until it turns into a reasonably serviceable tool.

As always, any advice is welcome!

···

--
Posted via http://www.ruby-forum.com/.

Maybe I'm just being stupid, but I'm having trouble understanding your
meaning. Do you mean that I shouldn't always use an "Element" class, and
just work directly with the data?

So, for instance, "cell" and "[]" could reference the data directly,
whereas "range" could be an object to be used in different ways?

···

--
Posted via http://www.ruby-forum.com/.

Rare honesty, admirable :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

Thanks for that, I hadn't gotten around to doing much testing with
sheets yet.

The 1-based indexing can be tricky to keep working as a standard, but
one thing I thought this might help with is transitioning between VBA
and Ruby. If the addressing and objects look similar, you don't need to
change much of the individual project detail.
I have some VBA code which I think would be much simpler in Ruby, and if
I can set this up I might even be able to write something to translate
VBA into Ruby working with this class. That's more of a vague idea for a
future project though.

I do need to add more checks, particularly validating input. What I want
to do is to try and have most validation pass through the Address
module, so I can make any changes there rather than running around the
entire project looking for duplicate code.

This is still very much under construction, but it's good practice for
me to use Github; and it's amazingly useful to have advice from more
experienced programmers while I struggle to build a viable tool. Thanks
again for your time :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

Uh oh, I think I've backed myself into a corner...

I assume that I'll need some sort of "dup" method in order to easily
differentiate between modifying an object in place and returning a copy.
However, in my attempts to interlink all the elements, I think I've
created a situation where I'll end up in a loop.

RubyExcel holds "child" Sheets
Sheet links to "parent" RubyExcel
I know they aren't really parent and child, but I wasn't sure of the
best way to link them together.

In order to dup RubyExcel I need to dup each sheet.
In order to dup a Sheet I need to dup RubyExcel.

I've ended up with a circular reference and I'm not sure of the best way
to proceed from here.

Also, is it sensible to have the namespaces stacked the way I've done
it? Like RubyExcel::Sheet::Element.

···

--
Posted via http://www.ruby-forum.com/.

I think I've found a way around this.

I've made Sheet's "workbook" an accessor rather than read-only. When I
dup a workbook, I dup each sheet, and then overwrite the "workbook"
attribute with the new one.

When I "dup" a Sheet, it has the same parent as the original, so I could
overwrite the original if required (like +=).

···

--
Posted via http://www.ruby-forum.com/.

The RubyExcel Module is now bashed into something resembling operational
status. I still have plenty of ideas to implement with it, but it's
finally something I can use (and have used) to handle data and output it
into a neat excel format.

I think the next thing I need to do is try and add documentation to the
code and upload it (YARD-style, I think) to github as well. Poorly
documented APIs are the bane of the earnest programmer :slight_smile:

Hopefully I've learned something along the way and not made a complete
mess of it; but if any intrepid code explorers spot any glaring
mistakes, correction would be appreciated.

Thanks,

Joel

···

--
Posted via http://www.ruby-forum.com/.

I've now learned how to publish this as a gem. Still lacking proper
documentation at present, but it'll let me work on it and test it out
from multiple locations.
It's currently living on RubyGems here:
https://rubygems.org/gems/rubyexcel
Minutes after I uploaded it the downloads went to 4. Not sure if that's
genuine downloads or some automatic thing.
Anyway, feel free to have a play with it and see if it does anything
unusual or lacks any features which would be handy. I'll add any
requests onto my to-do list... or to my list of things to learn how to
do :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

Hi all, just an update for anyone who's interested.

I've now managed to get this gem working sufficiently well to use it in
a few of my projects.

I've incorporated some WIN32OLE tools for things that I do a lot, also
lending credence to the name RubyExcel.

I think I've weeded out most of the bugs, and I've written a more
comprehensive guide which is the ReadMe at Github.

I've tried to take on board all the suggestions so far, although I've
dropped the header columns idea as tricky to implement alongside header
rows; and currently of no use to me. Maybe I'll be able to add them
later.

I've now managed to implement all the basics to the point that I'm just
adding new bits as and when I need them. The core sections all seem to
be working as intended, although of course it's still possible to cause
problems by feeding it invalid input. I might add more extensive
validation if that becomes enough of an issue.

I've even managed to get a wobbly looking bit of recursive code to
import a nested Hash into a table array; which is something I'm having
to do at work with increasing regularity. I spent long hours into the
night trying to build it and I'm still not entirely sure how it works.
It was more of a half-understood trial and error thing.
Actually if anyone can come up with a better way to do this I'd be
fascinated, since I always have trouble visualising recursive methods.
I'll attach the relevant code to this post.

The gem still lives here: https://rubygems.org/gems/rubyexcel
The code still lives here: https://github.com/VirtuosoJoel/RubyExcel

I still need to work out what (if any) documentation-style comments to
add to the code itself. What do people use the most? The rubygems
documentation link says it's a "YARD documentation server", but the
local install only dropped some ".ri" files, and I've only just learned
how to use those.

Attachments:
http://www.ruby-forum.com/attachment/8318/convert_hash.rb

···

--
Posted via http://www.ruby-forum.com/.

Thanks. One thing that strikes me, looking through the code, is that Column
and Row are very similar. It wouldn't surprise me if you could pull a lot
more logic up into the base class. I also have the vague feeling that this
could generally be done with a lot less code. But talk is cheap, so take
that with a grain of salt. What are you using this for? It's certainly an
interesting exercise, if nothing else.

···

On Fri, Mar 8, 2013 at 8:33 PM, Joel Pearson <lists@ruby-forum.com> wrote:

Good point Andrew. Writing a thorough demo is a good way to
systematically test every method as well.
I just tried to demo some of the newer stuff I added and I've already
found a couple of bugs where I have class methods referenced rather than
the mix-in methods I changed to recently. I'll run a thorough series of
tests, build up some examples, and then try again.

This covers some of the basics anyway:

irb(main):002:0> re = RubyExcel.new.test
=> RubyExcel => columns: 4, rows: 6, values: 24
irb(main):003:0> puts re
a b c d
e f g h
i j k l
m n o p
q r s t
u v w x
=> nil
irb(main):004:0> re['B2'] = 'z'
=> "z"
irb(main):005:0> re.column_by_header('c').map! { |el| el.next }
=> Class: RubyExcel::Column, Index: C
irb(main):006:0> puts re
a b d d
e z h h
i j l l
m n p p
q r t t
u v x x
=> nil
irb(main):007:0> re.delete_column('D')
=> RubyExcel => columns: 3, rows: 6, values: 18
irb(main):008:0> re.cell(2,3).value = 'z'
=> "z"
irb(main):009:0> puts re
a b d
e z z
i j l
m n p
q r t
u v x
=> nil
irb(main):010:0> re += [ %w(1 2 3) ]
=> RubyExcel => columns: 3, rows: 7, values: 21
irb(main):011:0> puts re
a b d
e z z
i j l
m n p
q r t
u v x
1 2 3
=> nil
irb(main):012:0> re.filter!('b') { |el| el =~ /[a-r]/ }
=> RubyExcel => columns: 3, rows: 4, values: 12
irb(main):013:0> puts re
a b d
i j l
m n p
q r t
=> nil
irb(main):014:0> re.row(2)['A'] = 'q'
=> "q"
irb(main):015:0> re.summarise 'a'
=> {"q"=>2, "m"=>1}
irb(main):016:0> puts re
a b d
q j l
m n p
q r t
=> nil
irb(main):017:0> puts re.uniq! 'a'
a b d
q j l
m n p
=> nil

--
Posted via http://www.ruby-forum.com/\.