Ruby, Analysis, and Tons of RAM

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be
forking a number of processes, each allocated a bit of the address
space (blech)?

Thanks, oh Ruby masters.

ben@somethingmodern.com wrote:

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

No...but, I'm always willing to help out! Just send me one such
workstation and I'll send you the results post haste! :wink:

Regards,
Jordan

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be
forking a number of processes, each allocated a bit of the address
space (blech)?

Not having done this myself, you should take everything I say with a
grain of salt, but since ruby allocations (eventually) use malloc, how
much of this massive address space it gets and all that jazz strike be
as being something that is entirely up to the operating system.
(Excepting things in C extensions which may use mmap or whatever.)

···

On Sat, Sep 23, 2006 at 08:25:18PM +0900, ben@somethingmodern.com wrote:

Thanks, oh Ruby masters.

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

i've had issues using mmap with files larger than 32gb - i'm not sure if the
latest release has fixed this or not... in general you can run into issues
with extenstions since ruby fixnums keep a bit to mark them as objects...

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be forking a
number of processes, each allocated a bit of the address space (blech)?

assuming you have two or four cpus this might not be a bad idea - ipc is so
dang easy with ruby it's trivial to coordinate processes. i have a slave
class i've used for this before:

http://codeforpeople.com/lib/ruby/slave/
http://codeforpeople.com/lib/ruby/slave/slave-0.0.1/README

regards.

-a

···

On Sat, 23 Sep 2006 ben@somethingmodern.com wrote:
--
in order to be effective truth must penetrate like an arrow - and that is
likely to hurt. -- wei wu wei

ben@somethingmodern.com wrote:

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be
forking a number of processes, each allocated a bit of the address
space (blech)?

Thanks, oh Ruby masters.

1. Again, you can contact me off list for some ideas ... without knowing your goal, it's difficult for me to know what steps you should take to reach it.

2. Assume a properly working state-of-the-art 64-bit dual-core AMD or Intel hardware platform with 64 GB of RAM and an appropriate SAN for storage from a major vendor like IBM. That severely limits your OS choices; last time I looked you need to be running either RHEL or SUSE Enterprise Linux. I don't know about the other vendors, but IBM has a marvelous document on performance tuning humongous servers at

http://www.redbooks.ibm.com/redbooks/pdfs/sg245287.pdf

3. OK, now you've purchased a high-end server and a *supported* enterprise-grade Linux, and you want to do some serious number crunching on it, and you want to do it in Ruby, possibly augmented by libraries in C, Fortran or assembler for speed. You will need to recompile *everything* -- Ruby, the math libraries, and the compiler itself -- to use 64-bit addressing. There are some hacks and workarounds, but pretty much this is required. If you end up with an Intel server, you might want to have a look at the Intel compilers instead of GCC. Intel also has some highly-tuned math libraries, as does AMD.

My point here is that you are "exploring" realms in Ruby that are "usually" addressed using "more traditional" techniques, so you're going to need to do a fair amount of testing. That kind of server costs a lot of money, and for that kind of money, you'll get lots of support from the vendor, coupled with strong incentives to do your job in ways that are tested and proven to work and supported by said vendor.That may or may not include Ruby, and if it does include Ruby, it may or may not involve a small number of monolithic Ruby scripts directly addressing a large address space.

There is a lot of help available on the Internet from people like me who love challenges like this. :slight_smile:

Believe it or not, make or Rake or something like that is your friend.
(I tend to roll a few lines of Ruby to do the heart of it)

Break your computation into a pipeline of processes and store
intermediate results on disk.

Since Life's a Bitch, your program will have bugs / crash / wrong data /
....

So fix the appropriate input, the Makefile or whatever knows the
dependency net and recomputes only the steps needed. (You did say Big
didn't you? That in my experience means lots and lots of wall clock time
for each run. This way decreases your run,debug,fix cycle time hugely)

Since you have multiple processes, the problems you mention fade away.

If you have multiple CPU's or machines, distributing the load becomes
easy.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

"We have more to fear from
  The Bungling of the Incompetent
  Than from the Machinations of the Wicked." (source unknown)

···

On Sat, 23 Sep 2006, ben@somethingmodern.com wrote:

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

MonkeeSage wrote:

···

ben@somethingmodern.com wrote:

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

No...but, I'm always willing to help out! Just send me one such
workstation and I'll send you the results post haste! :wink:

I have lots of experience doing that sort of thing, but none of it is in Ruby. You can contact me off list for some ideas.

Ah, someone *has* done some of this! What compiler did you use to recompile Ruby for 64-bit addressing? Did it work out of the box?

What's the bottleneck in Ruby's built-in IPC? Network traffic to "localhost" and to the other hosts? System V IPC? Something else?

I haven't really looked at the whole "lots of coordinated tiny processes" thing in Ruby, since Erlang seems to have nailed that approach and made it the core of the Erlang way to do things. I'm not a big fan of re-inventing wheels; I'd much rather just get my numbers crunched.

···

ara.t.howard@noaa.gov wrote:

On Sat, 23 Sep 2006 ben@somethingmodern.com wrote:

Does anyone have experience with using Ruby for analysis (*lots* of
maths), on a machine with a ridiculous amount of RAM? For example, a
hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.

i've had issues using mmap with files larger than 32gb - i'm not sure if the
latest release has fixed this or not... in general you can run into issues
with extenstions since ruby fixnums keep a bit to mark them as objects...

Are there any "gotchas" I should be aware of? Would all the RAM be
addressable by a given Ruby process? Or would I still have to be forking a
number of processes, each allocated a bit of the address space (blech)?

assuming you have two or four cpus this might not be a bad idea - ipc is so
dang easy with ruby it's trivial to coordinate processes. i have a slave
class i've used for this before:

http://codeforpeople.com/lib/ruby/slave/
http://codeforpeople.com/lib/ruby/slave/slave-0.0.1/README

regards.

-a

Ara, why does Slave.new keep a copy of the object (the one you want to be served up) on both sides of the fork?

Wouldn't it make more sense for it to work like this (modifying the example in slave.rb):

    class Server
      def add_two n
        n + 2
      end
    end

    slave = Slave.new {Server.new} # <-- note addition of {...}
    server = slave.object

    p server.add_two(40) #=> 42

Slave.new would call the block _only_ in the child, and the parent would never have an instance of Server, only the drb handle to it.

This might matter if Server.new consumes resources, sets up a data structure in memory, opens files, etc.

···

ara.t.howard@noaa.gov wrote:

http://codeforpeople.com/lib/ruby/slave/
http://codeforpeople.com/lib/ruby/slave/slave-0.0.1/README

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Thank you all for the excellent suggestions and links.

More about the problem domain: Linguistic modeling with lots of
posterior (Bayesian) inference maths. The probability matrices involved
can easily grow into the multiple GB's range, and obviously I'm
completely hosed if I keep things disk-based. (I've tried. Even with
hip and sexy paging.) The process is only somewhat parallelizable, but
I've gotten a nasty hit in the past from the IPC's. (This IPC hit was
probably my fault, damn "3.14159".to_f.)

Obviously, I'm doing the real maths with C routines called from Ruby.

I'm quite happy to hack around and compile everything from scratch. On
the other hand, I'm not happy with expensive support agreements or
using "traditional techniques" (FORTRAN, shudder). So maybe it's
reasonable to consider me 1) short on cash, 2) short on processing
time, 3) long on Linux admin skillZ, 4) long-ish on coding time.

Thanks again, folks.

M. Edward (Ed) Borasky wrote:

···

ben@somethingmodern.com wrote:
> Does anyone have experience with using Ruby for analysis (*lots* of
> maths), on a machine with a ridiculous amount of RAM? For example, a
> hip 64-bit Linux kernel on a machine with 32 or 64 GB of physical RAM.
>
> Are there any "gotchas" I should be aware of? Would all the RAM be
> addressable by a given Ruby process? Or would I still have to be
> forking a number of processes, each allocated a bit of the address
> space (blech)?
>
> Thanks, oh Ruby masters.

1. Again, you can contact me off list for some ideas ... without knowing
your goal, it's difficult for me to know what steps you should take to
reach it.

2. Assume a properly working state-of-the-art 64-bit dual-core AMD or
Intel hardware platform with 64 GB of RAM and an appropriate SAN for
storage from a major vendor like IBM. That severely limits your OS
choices; last time I looked you need to be running either RHEL or SUSE
Enterprise Linux. I don't know about the other vendors, but IBM has a
marvelous document on performance tuning humongous servers at

http://www.redbooks.ibm.com/redbooks/pdfs/sg245287.pdf

3. OK, now you've purchased a high-end server and a *supported*
enterprise-grade Linux, and you want to do some serious number crunching
on it, and you want to do it in Ruby, possibly augmented by libraries in
C, Fortran or assembler for speed. You will need to recompile
*everything* -- Ruby, the math libraries, and the compiler itself -- to
use 64-bit addressing. There are some hacks and workarounds, but pretty
much this is required. If you end up with an Intel server, you might
want to have a look at the Intel compilers instead of GCC. Intel also
has some highly-tuned math libraries, as does AMD.

My point here is that you are "exploring" realms in Ruby that are
"usually" addressed using "more traditional" techniques, so you're going
to need to do a fair amount of testing. That kind of server costs a lot
of money, and for that kind of money, you'll get lots of support from
the vendor, coupled with strong incentives to do your job in ways that
are tested and proven to work and supported by said vendor.That may or
may not include Ruby, and if it does include Ruby, it may or may not
involve a small number of monolithic Ruby scripts directly addressing a
large address space.

There is a lot of help available on the Internet from people like me who
love challenges like this. :slight_smile:

that's a great point joel. i'll add that capability and release on monday or
so. right now, if a block is given it's called with the object - but i can
detect the case based on whether or not and obj is also passed.

thanks a bunch.

-a

···

On Sat, 23 Sep 2006, Joel VanderWerf wrote:

ara.t.howard@noaa.gov wrote:

http://codeforpeople.com/lib/ruby/slave/
http://codeforpeople.com/lib/ruby/slave/slave-0.0.1/README

Ara, why does Slave.new keep a copy of the object (the one you want to be served up) on both sides of the fork?

Wouldn't it make more sense for it to work like this (modifying the example in slave.rb):

  class Server
    def add_two n
      n + 2
    end
  end

  slave = Slave.new {Server.new} # <-- note addition of {...}
  server = slave.object

  p server.add_two(40) #=> 42

Slave.new would call the block _only_ in the child, and the parent would never have an instance of Server, only the drb handle to it.

This might matter if Server.new consumes resources, sets up a data structure in memory, opens files, etc.

--
in order to be effective truth must penetrate like an arrow - and that is
likely to hurt. -- wei wu wei

ben@somethingmodern.com wrote:

Thank you all for the excellent suggestions and links.

More about the problem domain: Linguistic modeling with lots of
posterior (Bayesian) inference maths. The probability matrices involved
can easily grow into the multiple GB's range, and obviously I'm
completely hosed if I keep things disk-based. (I've tried. Even with
hip and sexy paging.) The process is only somewhat parallelizable, but
I've gotten a nasty hit in the past from the IPC's. (This IPC hit was
probably my fault, damn "3.14159".to_f.)

Hmmm ... large matrices and "only somewhat parallelizable" ... that's counterintuitive to me. Dense or sparse?

Obviously, I'm doing the real maths with C routines called from Ruby.

Who does the memory management? Ruby? C? Linux?

I'm quite happy to hack around and compile everything from scratch. On
the other hand, I'm not happy with expensive support agreements or
using "traditional techniques" (FORTRAN, shudder). So maybe it's
reasonable to consider me 1) short on cash, 2) short on processing
time, 3) long on Linux admin skillZ, 4) long-ish on coding time.

This sounds to me more like a computational linear algebra problem than a Linux system administration problem -- at least, once you've got a 64-bit Linux distro and toolchain up and running. :slight_smile: Given that you've gone to C, I can't imagine there not being an efficient open-source C library that won't handle your problem at near-optimal speeds, at least on dense matrices.

Although -- in my application area, performance modelling, most of the well-known existing packages are academic licensed rather than true open source. You can get them free if you're an academic researcher, but if you want to use them commercially, you have to pay for them. Which is why I'm writing Rameau. But I don't have a large-memory SMP machine, and my matrices are either sparse, small and dense, or easily converted into, say, a Kronecker product of small dense matrices.

If nobody has invited you yet, check out

http://sciruby.codeforpeople.com/sr.cgi/FrontPage

M. Edward (Ed) Borasky wrote:

> Thank you all for the excellent suggestions and links.
>
> More about the problem domain: Linguistic modeling with lots of
> posterior (Bayesian) inference maths. The probability matrices involved
> can easily grow into the multiple GB's range, and obviously I'm
> completely hosed if I keep things disk-based. (I've tried. Even with
> hip and sexy paging.) The process is only somewhat parallelizable, but
> I've gotten a nasty hit in the past from the IPC's. (This IPC hit was
> probably my fault, damn "3.14159".to_f.)

Hmmm ... large matrices and "only somewhat parallelizable" ... that's
counterintuitive to me. Dense or sparse?

Then maybe my terminology is weak. So they matrices are very large: 3
dimensions, about 30000x1000x3 elements, each element a float but the
3rd dimension could be an int. Very dense, typically about 30% zeros.
And by only somewhat parallelizable, I just mean that the algorithm
that builds the matrix bounces around like crazy -- it does not work on
a particularly "local area" of the matrix. (Read 2123x501x1, mutate
1x991x3, read 29820x11x2, and so on.)

> Obviously, I'm doing the real maths with C routines called from Ruby.

Who does the memory management? Ruby? C? Linux?

A mix between Ruby -- previously allocated arrays -- and C malloc()'ing
temporarily array for scratch space.

> I'm quite happy to hack around and compile everything from scratch. On
> the other hand, I'm not happy with expensive support agreements or
> using "traditional techniques" (FORTRAN, shudder). So maybe it's
> reasonable to consider me 1) short on cash, 2) short on processing
> time, 3) long on Linux admin skillZ, 4) long-ish on coding time.

This sounds to me more like a computational linear algebra problem than
a Linux system administration problem -- at least, once you've got a
64-bit Linux distro and toolchain up and running. :slight_smile: Given that you've
gone to C, I can't imagine there not being an efficient open-source C
library that won't handle your problem at near-optimal speeds, at least
on dense matrices.

The comfortably-licensed libraries I might shoe-horn into working --
for linguistic inference, that is -- are either groddy
proofs-of-concept, or optimzed for much smaller dimensions. :frowning: I'm in
relatively new territory here.

Although -- in my application area, performance modelling, most of the
well-known existing packages are academic licensed rather than true open
source. You can get them free if you're an academic researcher, but if
you want to use them commercially, you have to pay for them. Which is
why I'm writing Rameau. But I don't have a large-memory SMP machine, and
my matrices are either sparse, small and dense, or easily converted
into, say, a Kronecker product of small dense matrices.

If nobody has invited you yet, check out

http://sciruby.codeforpeople.com/sr.cgi/FrontPage

Thanks for the link. Gotta love the Web -- one of my projects
("integral") is already indexed as an InterestingProject.

···

ben@somethingmodern.com wrote: