Ruby IPC In an OpenMosix Cluster

I'm undertaking a project that will eventually become a processing
pipeline application of sorts. It will receive data in a common
format, transform it into one of _many_ other formats, compile and send
it off to various endpoints (sounds spammish, but it's all solicited,
really :).

My team and I have tentatively decided on openMosix to provide an
easily scalable cluster, and Ruby for the application itself. I'm very
new to Ruby, and fairly new to IPC concepts -- The Little Book of
Semaphores and the many threads I've read here have helped me out a
_lot_.

Our application will rely on 3 sets of consumer process pools; each
pool is spawned by a daemon responsible for each basic operation (think
MTA): receive, transform, compile/send. By using processes instead of
threads we allow openMosix to migrate each process and make use of the
entire cluster.

So we have the model down, but I need a bit of advise on how to most
efficiently get these processes talking. What is the best form of IPC
to use here? It seems there are tons of Ruby examples on concurrency
and communication between threads, but I can't seem to find anything
definitive on IPC (to more than one child at least). I tried the sysv
extension off RAA, but couldn't get it to compile -- though I didn't
try my best.

Things I'm considering:

- DRb
- UNIXSocket
- mkfifo
- SysV message queue (openMosix doesn't support shmem segments)
- popen (though I can't see how to do it without round robin
producing)

If anyone has any advice, please shove me in the right direction.

Thanks in advance!

Also, thanks to matz for the great language (code blocks are uber
bueno)!

Best,
Dan

i recently built a system __exactly__ like this for noaa. it's built upon
ruby queue (rq) for the clustering and dirwatch for the event driven
components. both are on rubyforge and/or raa. the system uses a uniq library
that allows classes to be parameterize, loaded, and run on input data in a few
short lines. here is one class representing a processing flow

   class Flo5 < NRT::OLSSubscription::Geotiffed
     mode "production"

     roi 47,16,39,27
     satellites %w( F15 F16 )
     extensions %w( OIS )

     solarelevations -180, -12, 10.0

     hold 0

     username "flo"
     password "xxx"

     orbital_start_direction "descending"
   end

this is one hundred percent of the coding needed to inject a new processing
flow into the system, have incoming files spawn jobs for it, distribute
processing to a cluster, and to package and deliver data.

i'd like to do a write up about it in the near future but i've just got alpha
and am still pretty busy. in any case there are at least 6 or 7 components
that can be used from this system in any other such system. ping me on or off
line and i can give you some more info... right now i'm late for a meeting...

   Linux Clustering with Ruby Queue: Small Is Beautiful | Linux Journal
   http://raa.ruby-lang.org/project/rq/
   http://raa.ruby-lang.org/project/dirwatch/

kind regards.

-a

···

On Thu, 23 Mar 2006 the.liberal.media@gmail.com wrote:

I'm undertaking a project that will eventually become a processing pipeline
application of sorts. It will receive data in a common format, transform it
into one of _many_ other formats, compile and send it off to various
endpoints (sounds spammish, but it's all solicited, really :).

My team and I have tentatively decided on openMosix to provide an easily
scalable cluster, and Ruby for the application itself. I'm very new to
Ruby, and fairly new to IPC concepts -- The Little Book of Semaphores and
the many threads I've read here have helped me out a _lot_.

Our application will rely on 3 sets of consumer process pools; each pool is
spawned by a daemon responsible for each basic operation (think MTA):
receive, transform, compile/send. By using processes instead of threads we
allow openMosix to migrate each process and make use of the entire cluster.

So we have the model down, but I need a bit of advise on how to most
efficiently get these processes talking. What is the best form of IPC to
use here? It seems there are tons of Ruby examples on concurrency and
communication between threads, but I can't seem to find anything definitive
on IPC (to more than one child at least). I tried the sysv extension off
RAA, but couldn't get it to compile -- though I didn't try my best.

Things I'm considering:

- DRb
- UNIXSocket
- mkfifo
- SysV message queue (openMosix doesn't support shmem segments)
- popen (though I can't see how to do it without round robin
producing)

If anyone has any advice, please shove me in the right direction.

Thanks in advance!

Also, thanks to matz for the great language (code blocks are uber
bueno)!

Best,
Dan

--
share your knowledge. it's a way to achieve immortality.
- h.h. the 14th dali lama

Right on, Ara. Thanks for your input!

I had looked at rq very briefly, but at first glance it seemed like it
might be a hassle to maintain as we add nodes (having to
update/kill/restart the application all nodes). What's been your
experience with regards to maintenance?

Thanks,
Dan

Ok, so I went back and actually read through the entire rq article this
time (and noticed who wrote it -- many props Ara :).

From what I understood, you're suggesting something like this:

1. Use dirwatch to wait for incoming data (files) on an NFS exported
dir
2. Inject jobs into rq for each incoming file
3. rq executes commands on each node that read in each file from the
NFS mount

How fast is a setup like this? I would think there would be a lot of
overhead in forking processes for each job, and even more in the
NFS/file IO. We're shooting for 100 jobs/second, starting with a
fairly small cluster and then scaling up. Each piece of data is 4-10k.

My thought was to spawn a pool of processes once, then start feeding
them the data via [unknown IPC]. Seems like that would be a faster
solution as long as openMosix is efficient in redirecting the IO across
nodes. Of course, this may be a development nightmare (learning
experience), since neither my team nor I have a lot of experience with
multiprocessing.

If rq would satisfy our speed requirements, then I would love to avoid
the extra development time. Perhaps we'll just have to build a basic
prototype and run some tests. :slight_smile:

Best,
Dan

hmmm. not quite clear on what you are asking - but we regularly add and
remove nodes. you don't need to stop all nodes to do this at all - to add a
node simply start a feeder on it, to remove a node simply stop that nodes
feeder.

is that what you are asking?

in summary we regularly use rq to put together 'ad-hoc' clusters of 3-10 nodes
and this generally takes less than 5 minutes or so.

regards.

-a

···

On Thu, 23 Mar 2006 the.liberal.media@gmail.com wrote:

Right on, Ara. Thanks for your input!

I had looked at rq very briefly, but at first glance it seemed like it might
be a hassle to maintain as we add nodes (having to update/kill/restart the
application all nodes). What's been your experience with regards to
maintenance?

--
share your knowledge. it's a way to achieve immortality.
- h.h. the 14th dali lama

Ok, so I went back and actually read through the entire rq article this time
(and noticed who wrote it -- many props Ara :).

From what I understood, you're suggesting something like this:

1. Use dirwatch to wait for incoming data (files) on an NFS exported
dir
2. Inject jobs into rq for each incoming file
3. rq executes commands on each node that read in each file from the
NFS mount

How fast is a setup like this? I would think there would be a lot of
overhead in forking processes for each job, and even more in the
NFS/file IO. We're shooting for 100 jobs/second, starting with a
fairly small cluster and then scaling up. Each piece of data is 4-10k.

yup this would defintely push the limits __unless__ you can batch them. i'm
actually working with a group now that will be injesting data at almost that
exact same rate. in there case it's sufficient to bundle jobs up - higher
latency but also hight throughput. so basically a dirwatch would watch an
incoming directory and, perhaps once per minute, scan the directory and submit
bunches of 500 files for processing. for something really simply you might
not even need dirwatch - just move files to another directory once they've
been submitted - eg. sweep the directory as you submit. rq now supports
providing stdin (and saving stdout and stderr) so submitting jobs that process
500 files is really easy. anyhow, nfs scales pretty dang well on gigE with
tuned tcp/ip stacks and fast disk, we certainly abuse ours with little
problems. however, we also use vsftpd to access data and this is very easy to
setup and extremely fast to use - about as fast as you can get. in any case
you'll have the same issue with any cluster: distributing jobs is easier than
distributing data. for instance, many of our inputs are 3gb-600gb - one has
to be careful with this sort of payload! :wink:

My thought was to spawn a pool of processes once, then start feeding them
the data via [unknown IPC]. Seems like that would be a faster solution as
long as openMosix is efficient in redirecting the IO across nodes. Of
course, this may be a development nightmare (learning experience), since
neither my team nor I have a lot of experience with multiprocessing.

i did look into this a bit - if i recall openmosix makes io transparent and
can migrate process memory from node to node... in our case this would be a
disaster : code must follow data and not the other way around. from what i
know at the moment i can't imagine that kernel level io network multiplexing
would be faster than pulling data across and sending it back in huge chunks...
the resident network expert here seems to think people need to move that way
to optimize newer networks with things like jumbo frames. but i can't say for
certain i'm just a hacker!

If rq would satisfy our speed requirements, then I would love to avoid the
extra development time. Perhaps we'll just have to build a basic prototype
and run some tests. :slight_smile:

yes. we ran a simulation here, just prducing false input and running a busy
loop for a few seconds to appriximate the system that was similar to yours and
it seemed fine. however they are not in production yet so i cannot say for
sure. the good part is that all the bits are free and it should only take a
few hours to mock something up - even on a stock linux machine with no root
privs...

let me know if you go this route as i have upgrades to both dirwatch and rq
that you'll surely want.

regards.

-a

···

On Thu, 23 Mar 2006 the.liberal.media@gmail.com wrote:
--
share your knowledge. it's a way to achieve immortality.
- h.h. the 14th dali lama

hmmm. not quite clear on what you are asking - but we regularly add and
remove nodes. you don't need to stop all nodes to do this at all - to add a
node simply start a feeder on it, to remove a node simply stop that nodes
feeder.

is that what you are asking?

No, actually I really just spat out the wrong question before I read
the entire article and understood the rq setup. :slight_smile:

Dan

yup this would defintely push the limits __unless__ you can batch them.

Not sure if this is an option yet.

let me know if you go this route as i have upgrades to both dirwatch and rq
that you'll surely want.

I will definitely let you know. I'm going to experiment with a more
generic drb setup first, and see how that goes. Do your updates
contain the stdin/stdout support, or is that already in the newest
release?

Thanks again for all your help.

Dan