i have. 
···
On Wed, 22 Sep 2004, Kevin McConnell wrote:
---
pending : 5875
running : 36
finished : 1108
dead : 0
yacht:~/shared > rq queue list running | head -20
---
-
jid: 1324
priority: 0
state: running
submitted: 2004-09-20 09:16:39.449169
started: 2004-09-22 03:55:24.914682
finished:
elapsed:
submitter: jib.ngdc.noaa.gov
runner: redfish.ngdc.noaa.gov
pid: 11519
exit_status:
command: /dmsp/moby-1-1/cfadmin/shared/jobs/wavgjob /dmsp/moby-1-1/conf/avg_dn/filelists/F142000.included F142000.cloud2.light1.tile8 /dmsp/moby-1-1/conf/avg_dn/cloud2.light1.tile8.conf cfd2://cfd2-3/F142000/
-
jid: 1325
priority: 0
state: running
submitted: 2004-09-20 09:16:39.449169
started: 2004-09-22 04:12:32.758249
this stack of work will take about a week to complete using 18 nodes.
from the man page of the main commandline program 'rq':
NAME
rq v0.1.2
SYNOPSIS
rq [queue] mode [mode_args]* [options]*
DESCRIPTION
rq is an __experimental__ tool used to manage nfs mounted work
queues. multiple instances of rq on multiples hosts can work from
these queues to distribute processing load to 'n' nodes - bringing many dozens
of otherwise powerful cpus to their knees with a single blow. clearly this
software should be kept out of the hands of radicals, SETI enthusiasts, and
one mr. jeff safran.
rq operates in one of the modes create, submit, feed, list, delete,
query, or help. depending on the mode of operation and the options used the
meaning of mode_args may change, sometime wildly and unpredictably (i jest, of
course).
MODES
modes may be abbreviated to uniqueness, therefore the following shortcuts
apply :
c => create
s => submit
f => feed
l => list
d => delete
q => query
h => help
create, c :
creates a queue. the queue MUST be located on an nfs mounted file system
visible from all nodes intended to run jobs from it.
examples :
0) to create a queue
~ > rq q create
or simply
~ > rq q c
list, l :
show combinations of pending, running, dead, or finished jobs. for this
command mode_args must be one of pending, running, dead, finished, or all.
the default is all.
mode_args may be abbreviated to uniqueness, therefore the following
shortcuts apply :
p => pending
r => running
f => finished
d => dead
a => all
examples :
0) show everything in q
~ > rq q list all
or
~ > rq q l all
or
~ > export RQ_Q=q
~ > rq l
0) show q's pending jobs
~ > rq q list pending
1) show q's running jobs
~ > rq q list running
2) show q's finished jobs
~ > rq q list finshed
submit, s :
submit jobs to a queue to be proccesed by any feeding node. any mode_args
are taken as the command to run. note that mode_args are subject to shell
expansion - if you don't understand what this means do not use this feature.
when running in submit mode a file may by specified as a list of commands to
run using the '--infile, -i' option. this file is taken to be a newline
separated list of commands to submit, blank lines and comments (#) are
allowed. if submitting a large number of jobs the input file method is MUCH
more efficient. if no commands are specified on the command line rq
automaticallys reads them from STDIN. yaml formatted files are also allowed
as input (http://www.yaml.org/\) - note that output of nearly all rq
commands is valid yaml and may, therefore, be piped as input into the submit
command.
the '--priority, -p' option can be used here to determine the priority of
jobs. priorities may be any number (0, 10]; therefore 9 is the maximum
priority. submitting a high priority job will NOT supplant currently
running low priority jobs, but higher priority jobs will always migrate
above lower priority jobs in the queue in order that they be run sooner.
note that constant submission of high priority jobs may create a starvation
situation whereby low priority jobs are never allowed to run. avoiding this
situation is the responsibility of the user.
examples :
0) submit the job ls to run on some feeding host
~ > rq q s ls
1) submit the job ls to run on some feeding host, at priority 9
~ > rq -p9 q s ls
2) submit 42000 jobs (quietly) to run from a command file.
~ > wc -l cmdfile
42000
~ > rq q s -q < cmdfile
3) submit 42 jobs to run at priority 9 from a command file.
~ > wc -l cmdfile
42
~ > rq -p9 q s < cmdfile
4) re-submit all finished jobs
~ > rq q l f | rq q s
feed, f :
take jobs from the queue and run them on behalf of the submitter. jobs are
taken from the queue in an 'oldest highest priority' order.
feeders can be run from any number of nodes allowing you to harness the CPU
power of many nodes simoultaneously in order to more effectively clobber
your network.
the most useful method of feeding from a queue is to do so in daemon mode so
that if the process loses it's controling terminal and will not exit when
you exit your terminal session. use the '--daemon, -d' option to accomplish
this. by default only one feeding process per host per queue is allowed to
run at any given moment. because of this it is acceptable to start a feeder
at some regular interval from a cron entry since, if a feeder is alreay
running, the process will simply exit and otherwise a new feeder will be
started. in this way you may keep feeder processing running even acroess
machine reboots.
examples :
0) feed from a queue verbosely for debugging purposes, using a minimum and
maximum polling time of 2 and 4 respectively
~ > rq q feed -v4 -m2 -M4
1) feed from a queue in daemon mode logging into /home/ahoward/rq.log
~ > rq q feed -d -l/home/ahoward/rq.log
2) use something like this sample crontab entry to keep a feeder running
forever (it attempts to (re)start every fifteen minutes)
#
# your crontab file
#
*/15 * * * * /full/path/to/bin/rq /full/path/to/nfs/mounted/q f -d -l/home/user/rq.log
log rolling while running in daemon mode is automatic.
delete, d :
delete combinations of pending, running, finished, dead, or specific jobs.
the delete mode is capable of parsing the output of list mode, making it
possible to create filters to delete jobs meeting very specific conditions.
mode_args are the same as for 'list', including 'running'. note that it is
possible to 'delete' a running job, but there is no way to actually STOP it
mid execution since the node doing the deleteing has no way to communicate
this information to the (possibly) remote execution host. therefore you
should use the 'delete running' feature with care and only for housekeeping
purposes or to prevent future jobs from being scheduled.
examples :
0) delete all pending, running, and finished jobs from a queue
~ > rq q d all
1) delete all pending jobs from a queue
~ > rq q d p
2) delete all finished jobs from a queue
~ > rq q d f
3) delete jobs via hand crafted filter program
~ > rq q list | filter_prog | rq q d
query, q :
query exposes the database more directly the user, evaluating the where
clause specified on the command line (or from STDIN). this feature can be
used to make a fine grained slection of jobs for reporting or as input into
the delete command. you must have a basic understanding of SQL syntax to
use this feature, but it is fairly intuitive in this capacity.
examples:
0) show all jobs submitted within a specific 10 minute range
~ > rq q query "started >= '2004-06-29 22:51:00' and started < '2004-06-29 22:51:10'"
1) shell quoting can be tricky here so input on STDIN is also allowed
~ > cat contraints
started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'
~ > rq q query < contraints
or (same thing)
~ > cat contraints | rq q query
2) this query output may then be used to delete specific jobs
~ > cat contraints | rq q query | rq q d
3) show all jobs which are either finished or dead
~ > rq q q state=finished or state=dead
NOTES
- realize that your job is going to be running on a remote host and this has
implication. paths, for example, should be absolute, not relative.
specifically the submitted job must be visible from all hosts currently
feeding from a q.
- you need to consider __CAREFULLY__ what the ramifications of having multiple
instances of your program all running at the same time will be. it is
beyond the scope of rq to ensure multiple instances of a program
will not overwrite each others output files, for instance. coordination of
programs is left entirely to the user.
- the list of finished jobs will grow without bound unless you sometimes
delete some (all) of them. the reason for this is that rq cannot
know when the user has collected the exit_status, etc. from a job and so
keeps this information in the queue until instructed to delete it.
- if you are using the crontab feature to maintain an immortal feeder on a
host then that feeder will be running in the environment provided by cron.
this is NOT the same environment found in a login shell and you may be
suprised at the range of commands which do not function. if you want
submitted jobs to behave as closely as possibly to their behaviour when
typed interactively you'll need to wrap each job in a shell script that
looks like the following:
#/bin/bash --login
commmands_for_your_job
and submit that script
ENVIRONMENT
RQ_Q: full path to queue
the queue argument to all commands may be omitted if, and only if, the
environment variable 'RQ_Q' contains the full path to the q. eg.
~ > export RQ_Q=/full/path/to/my/q
this feature can save a considerable amount of typing for those weak of wrist
DIAGNOSTICS
success => $? == 0
failure => $? != 0
AUTHOR
ara.t.howard@noaa.gov
BUGS
1 < bugno && bugno <= 42
OPTIONS
-f, --feed=appetite
-p, --priority=priority
--name
-d, --daemon
-q, --quiet
-e, --select
-i, --infile=infile
-M, --max_sleep=seconds
-m, --min_sleep=seconds
-l, --log=path
-v=0-4|debug|info|warn|error|fatal
--verbosity
--log_age=log_age
--log_size=log_size
-c, --config=path
--template=template
-h, --help
so far it looks like the solution of my problem was to close the database after
forking (if it was open) but i'm still testing this approach.
kind regards.
-a
--
EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen
===============================================================================