here's some things i think of:
- use class variables of some sort : never store in an object what should be
stored in a class
- write lazy attribute getters : if your parser loads 15,000 fields from a
header but normally only 4 or 5 are use in code write the parser to parse
the field on demand and cache the result
- don't write explicit IO : use mmap and let the kernel worry about it
- use processes not threads : ipc is so easy with ruby (using local drb
objects) that it's ridiculous not to use multiple processes so
multi-processor machines can actually take best advantage of your code.
sometimes threads are better for a problem - but rarely - especially when
you count debugging time
- don't check things : assume an arg is an int - the stack trace is plenty
good to tell you it wasn't and the check slows things down
- use rbtree : it's amazing
- use sqlite : it's amazing
- ruby inline : it's amazing
- use iterator methods for filesystem operations : prefer
find('.'){|f| ... }
over
Dir['*/**'].each{|f| ... }
the seconds kills you when a dir has 100,000 files in it.
- compose things out of array, hashes, and strings whenever possible. these
objects are pretty optimized in ruby and serialize great as yaml meaning
you never need to write hand parsers.
- eliminate variable creation where possible : prefer
n = nil
obj.each{|elem| n = elem ** 2 and p n}
over
obj.each{|elem| n = elem ** 2 and p n}
- anchor every single regular expression : performance degrades
exponentially otherwise
- compile regular expressions only once unless they require variable
substitution
- cache anything possible, for instance methods dynamically generated from
method_missing : permanmently add the method to the object instead of
generating it everytime
now, i'm not saying these are 'right' - just that they are what i think of
when trying to make ruby fast.
cheers.
-a
···
On Fri, 2 Sep 2005, Hugh Sasse wrote:
I've been doing some stuff with CSV recently, having data in one flat file
that I'm trying to normalize into tables. Thus, I'm trying to ignore things
I've seen before so I don't create 2 entries for them, and that sort of
thing.
Jon Bentley points out that "Data Structures Programs" -- i.e. how the data
is "shaped" determines how the program will be designed and perform. Often
the best solution is not immediately obvious, as he discusses. So, for
example, one can test for "seen before" by
array =
unless array.include? item array << item end
which is much slower for clumped data than
unless array.include? item array.unshift item end
or one could use a hash, or maybe a set.
So, my question is [a memeber of the set of "How long is a piece of string?"
type questions]:
Are there any heuristics for performance of *Ruby* data structures
available, which guide one to designing code to *often* have good
performance?
The answer will be very much "it depends, you neet to test it, it's a
function of data size, ...", but I suspect thee implementation, i.e. Ruby
puts useful bounds on the problem.
My searching has not turned up anything, though there are good performance
hints in the PickAxe. I'm wondering if the info exists or if it would be
worth trying to create it, or if it is just too broad a problem to be worth
it. However, since we Rubyists often speak of the ability to put code
together quickly, having hints around on how to do so and yield efficient
code seems of some worth.
--
email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna
===============================================================================