M. Edward (Ed) Borasky wrote:
> Thank you all for the excellent suggestions and links.
>
> More about the problem domain: Linguistic modeling with lots of
> posterior (Bayesian) inference maths. The probability matrices involved
> can easily grow into the multiple GB's range, and obviously I'm
> completely hosed if I keep things disk-based. (I've tried. Even with
> hip and sexy paging.) The process is only somewhat parallelizable, but
> I've gotten a nasty hit in the past from the IPC's. (This IPC hit was
> probably my fault, damn "3.14159".to_f.)
Hmmm ... large matrices and "only somewhat parallelizable" ... that's
counterintuitive to me. Dense or sparse?
Then maybe my terminology is weak. So they matrices are very large: 3
dimensions, about 30000x1000x3 elements, each element a float but the
3rd dimension could be an int. Very dense, typically about 30% zeros.
And by only somewhat parallelizable, I just mean that the algorithm
that builds the matrix bounces around like crazy -- it does not work on
a particularly "local area" of the matrix. (Read 2123x501x1, mutate
1x991x3, read 29820x11x2, and so on.)
> Obviously, I'm doing the real maths with C routines called from Ruby.
Who does the memory management? Ruby? C? Linux?
A mix between Ruby -- previously allocated arrays -- and C malloc()'ing
temporarily array for scratch space.
> I'm quite happy to hack around and compile everything from scratch. On
> the other hand, I'm not happy with expensive support agreements or
> using "traditional techniques" (FORTRAN, shudder). So maybe it's
> reasonable to consider me 1) short on cash, 2) short on processing
> time, 3) long on Linux admin skillZ, 4) long-ish on coding time.
This sounds to me more like a computational linear algebra problem than
a Linux system administration problem -- at least, once you've got a
64-bit Linux distro and toolchain up and running.
Given that you've
gone to C, I can't imagine there not being an efficient open-source C
library that won't handle your problem at near-optimal speeds, at least
on dense matrices.
The comfortably-licensed libraries I might shoe-horn into working --
for linguistic inference, that is -- are either groddy
proofs-of-concept, or optimzed for much smaller dimensions.
I'm in
relatively new territory here.
Although -- in my application area, performance modelling, most of the
well-known existing packages are academic licensed rather than true open
source. You can get them free if you're an academic researcher, but if
you want to use them commercially, you have to pay for them. Which is
why I'm writing Rameau. But I don't have a large-memory SMP machine, and
my matrices are either sparse, small and dense, or easily converted
into, say, a Kronecker product of small dense matrices.
If nobody has invited you yet, check out
http://sciruby.codeforpeople.com/sr.cgi/FrontPage
Thanks for the link. Gotta love the Web -- one of my projects
("integral") is already indexed as an InterestingProject.
···
ben@somethingmodern.com wrote: