···
On Sat, 21 May 2005, Luke Kanies wrote:
I'm in the process of writing a kind of distributed application, where one
or more central servers does some initial processing of a set of files, and
a bunch of clients then connect and get an appropriate subset of the
processed information. In addition, each of the clients needs to be
queryable, so I can always figure out their status and get metrics and such.
Obviously there are many ways to do this, but given the industry I'm
targeting with this and the applications with which I expect to need to
integrate, it seems like some kind of semi-standardized web service makes
the most sense.
So, using some examples online, I hacked up a quick webrick/soap4r server on
both my client and server, and I'm successfully passing information around.
Well, kind of. The problem is that webrick seems to require that my process
be entirely reactive -- both my client and server want to sit there waiting
for someone to connect, when obviously that won't work. I need to get
separate actions going on each process, but webrick seems to want to require
that all action is entirely reactive. So, I'm now in the situation where
the server works entirely reactively, and the client can contact it fine
before I start the client's webrick server, but after the server starts I
lose control of the process.
What I'm really looking for is something like Perl's POE: Something that
allows me to set up multiple sub-processes, none of which are blocking, and
all of which run based on callbacks. On the server side, I want to respond
to requests, and periodically reprocess files as necessary (as they change
or whatever). On the client side, I want to periodically connect to the
server and get new data, and the data I have all has a period on which it is
reassessed -- e.g., every hour verify X is still true. The client needs to
also respond to requests for metrics and such when they come in.
I've been considering setting up the server as a Rails server, although that
is certainly overkill at this point in the game and might be overkill in the
long term. I think that's too heavyweight for the client, though, and I'm
not sure I would get the features I want out of Rails anyway.
Can anyone recommend anything I can use to get this kind of behaviour? Are
threads the only answer? (Please say they aren't.)
---
jobs:
pending: 243
holding: 0
running: 36
finished: 501
dead: 0
total: 780
temporal:
pending:
earliest: { jid: 619, metric: submitted, time: 2005-05-12 11:31:42.919905 }
latest: { jid: 1275, metric: submitted, time: 2005-05-20 14:20:15.163355 }
shortest:
longest:
holding:
earliest:
latest:
shortest:
longest:
running:
earliest: { jid: 613, metric: started, time: 2005-05-19 19:46:09.532144 }
latest: { jid: 1197, metric: started, time: 2005-05-20 15:26:14.373168 }
shortest: { jid: 1197, duration: 00:01:1.258993 }
longest: { jid: 613, duration: 19:41:41.339677 }
finished:
earliest: { jid: 781, metric: finished, time: 2005-05-12 13:35:31.757662 }
latest: { jid: 723, metric: finished, time: 2005-05-20 15:26:13.962584 }
shortest: { jid: 546, duration: 00:11:11.688514 }
longest: { jid: 976, duration: 30:18:18.852480 }
dead:
earliest:
latest:
shortest:
longest:
performance:
avg_time_per_job: 13:02:2.998790
n_jobs_in_last_1_hrs: 3
n_jobs_in_last_2_hrs: 6
n_jobs_in_last_4_hrs: 10
n_jobs_in_last_8_hrs: 23
n_jobs_in_last_16_hrs: 44
n_jobs_in_last_32_hrs: 91
exit_status:
successes: 501
failures: 0
we've run about a half a million jobs through our system now with zero falures
or bugs. if you nfs server/clients are setup right you can install in about 5
minutes without root privledges.
basically the concept would be to have each client/server have a queue that it
was putlling jobs from where all queues were located on a central nfs location.
so every node can submit jobs to every other node and all nodes can run jobs.
this is a servant architechture.
so, for example, working on an nfs mount, on two nodes of mine - jib and carp -
we can setup a queue for each node:
jib:~/shared > rq `hostname`.q create
---
q: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q
db: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db
schema: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db.schema
lock: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/lock
carp:~/shared > rq `hostname`.q create
---
q: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q
db: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db
schema: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db.schema
lock: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/lock
so now each node has a queue located on a central nfs mount
carp submits a job to jib:
carp:~/shared > rq jib.ngdc.noaa.gov.q/ submit echo 42
---
-
jid: 1
priority: 0
state: pending
submitted: 2005-05-20 15:32:54.664324
started:
finished:
elapsed:
submitter: carp.ngdc.noaa.gov
runner:
pid:
exit_status:
tag:
restartable:
command: echo 42
jib submits a job to carp:
jib:~/shared > rq carp.ngdc.noaa.gov.q/ submit echo 42
---
-
jid: 1
priority: 0
state: pending
submitted: 2005-05-20 15:33:31.209160
started:
finished:
elapsed:
submitter: jib.ngdc.noaa.gov
runner:
pid:
exit_status:
tag:
restartable:
command: echo 42
'feeders' (a process that takes jobs from the queue, runs them, and returns
them to the queue) is started on each node. (normally these are daemons and
be cron'd to be made 'immortal' - the restart if they die)
carp:~/shared > rq carp.ngdc.noaa.gov.q/ feed --log=/dev/null
42
jib:~/shared > rq jib.ngdc.noaa.gov.q/ feed --log=/dev/null
42
so carp ran jib's job and jib ran carp's job. we can see this by:
carp:~/shared > rq jib.ngdc.noaa.gov.q/ query jid=1
---
-
jid: 1
priority: 0
state: finished
submitted: 2005-05-20 15:32:54.664324
started: 2005-05-20 15:39:33.309159
finished: 2005-05-20 15:39:33.438110
elapsed: 0.128951
submitter: carp.ngdc.noaa.gov
runner: jib.ngdc.noaa.gov
pid: 26632
exit_status: 0
tag:
restartable:
command: echo 42
jib:~/shared > rq carp.ngdc.noaa.gov.q/ query jid=1
---
-
jid: 1
priority: 0
state: finished
submitted: 2005-05-20 15:33:31.209160
started: 2005-05-20 15:38:43.503715
finished: 2005-05-20 15:38:43.779134
elapsed: 0.275419
submitter: jib.ngdc.noaa.gov
runner: carp.ngdc.noaa.gov
pid: 20500
exit_status: 0
tag:
restartable:
command: echo 42
all the output is available as yaml and much of it can be input to other
commands. in addition the queue is easily available directly via an api so
it's pretty easy to code descision making based on some other node's queue
contents.
i also have a peice of software called 'dirwatch' (on raa too) that makes it
trivial to setup 'watches' on directories to trigger actions when files are
created, modified, deleted, etc. it's under revision as we speak and is
undergoing major internal overhaul - but the basic funtionality an user
interface won't change much.
hth.
-a
--
email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso
===============================================================================