[ANN] ruby queue : rq-0.1.2

(very sorry if this is posted multiple times - for some reason i have
not seen
my original post)

rubyists-

rq (ruby queue) is a project aimed at filling the void between
roll-your own
distributed processing using ssh/rsh and full blown clustering
software like
sun grid engine. it is a tool designed to throw a bunch of nodes at a
list of
tasks in hurry. it is highly fault tolerant due to it's decentralized
design
and simple to use, requiring only a few minutes to setup and a the use
of
three or four simple commands. at this point doccumentation is scant
and this
release carries an experimental status; however, our site has run
nearly a
million jobs through rq over that last few months with no problems and
i am
excited to gather opions about the intial design before starting in
ernest on
an alpha release. please feel free to contact me either on or offline
with
any questions or assistance getting setup as i am eagar to find some
willing
testers.

for now the project lives at

    http://raa.ruby-lang.org/project/rq/

though a rubyforge/gem dist will accompany the alpha release

cheers.

-a

from 'rq -help'

    NAME
      rq v0.1.2

    SYNOPSIS
      rq [queue] mode [mode_args]* [options]*

    DESCRIPTION
      rq is an __experimental__ tool used to manage nfs mounted work
      queues. multiple instances of rq on multiples hosts can work
from
      these queues to distribute processing load to 'n' nodes -
bringing many dozens
      of otherwise powerful cpus to their knees with a single blow.
clearly this
      software should be kept out of the hands of radicals, SETI
enthusiasts, and
      one mr. jeff safran.

      rq operates in one of the modes create, submit, feed, list,
delete,
      query, or help. depending on the mode of operation and the
options used the
      meaning of mode_args may change, sometime wildly and
unpredictably (i jest, of
      course).

    MODES

      modes may be abbreviated to uniqueness, therefore the following
shortcuts
      apply :

        c => create
        s => submit
        f => feed
        l => list
        d => delete
        q => query
        h => help

      create, c :

        creates a queue. the queue MUST be located on an nfs mounted
file system
        visible from all nodes intended to run jobs from it.

        examples :

          0) to create a queue
              ~ > rq q create
            or simply
              ~ > rq q c

      list, l :

        show combinations of pending, running, dead, or finished jobs.
for this
        command mode_args must be one of pending, running, dead,
finished, or all.
        the default is all.

        mode_args may be abbreviated to uniqueness, therefore the
following
        shortcuts apply :

          p => pending
          r => running
          f => finished
          d => dead
          a => all

        examples :

          0) show everything in q
              ~ > rq q list all
            or
              ~ > rq q l all
            or
              ~ > export RQ_Q=q
              ~ > rq l

          0) show q's pending jobs
              ~ > rq q list pending

          1) show q's running jobs
              ~ > rq q list running

          2) show q's finished jobs
              ~ > rq q list finshed

      submit, s :

        submit jobs to a queue to be proccesed by any feeding node.
any mode_args
        are taken as the command to run. note that mode_args are
subject to shell
        expansion - if you don't understand what this means do not use
this feature.

        when running in submit mode a file may by specified as a list
of commands to
        run using the '--infile, -i' option. this file is taken to be
a newline
        separated list of commands to submit, blank lines and comments
(#) are
        allowed. if submitting a large number of jobs the input file
method is MUCH
        more efficient. if no commands are specified on the command
line rq
        automaticallys reads them from STDIN. yaml formatted files
are also allowed
        as input (http://www.yaml.org/) - note that output of nearly
all rq
        commands is valid yaml and may, therefore, be piped as input
into the submit
        command.

        the '--priority, -p' option can be used here to determine the
priority of
        jobs. priorities may be any number (0, 10]; therefore 9 is
the maximum
        priority. submitting a high priority job will NOT supplant
currently
        running low priority jobs, but higher priority jobs will
always migrate
        above lower priority jobs in the queue in order that they be
run sooner.
        note that constant submission of high priority jobs may create
a starvation
        situation whereby low priority jobs are never allowed to run.
avoiding this
        situation is the responsibility of the user.

        examples :

          0) submit the job ls to run on some feeding host

            ~ > rq q s ls

          1) submit the job ls to run on some feeding host, at
priority 9

            ~ > rq -p9 q s ls

          2) submit 42000 jobs (quietly) to run from a command file.

            ~ > wc -l cmdfile
            42000
            ~ > rq q s -q < cmdfile

          3) submit 42 jobs to run at priority 9 from a command file.

            ~ > wc -l cmdfile
            42
            ~ > rq -p9 q s < cmdfile

          4) re-submit all finished jobs

            ~ > rq q l f | rq q s

      feed, f :

        take jobs from the queue and run them on behalf of the
submitter. jobs are
        taken from the queue in an 'oldest highest priority' order.

        feeders can be run from any number of nodes allowing you to
harness the CPU
        power of many nodes simoultaneously in order to more
effectively clobber
        your network.

        the most useful method of feeding from a queue is to do so in
daemon mode so
        that if the process loses it's controling terminal and will
not exit when
        you exit your terminal session. use the '--daemon, -d' option
to accomplish
        this. by default only one feeding process per host per queue
is allowed to
        run at any given moment. because of this it is acceptable to
start a feeder
        at some regular interval from a cron entry since, if a feeder
is alreay
        running, the process will simply exit and otherwise a new
feeder will be
        started. in this way you may keep feeder processing running
even acroess
        machine reboots.

        examples :

          0) feed from a queue verbosely for debugging purposes, using
a minimum and
             maximum polling time of 2 and 4 respectively

            ~ > rq q feed -v4 -m2 -M4

          1) feed from a queue in daemon mode logging into
/home/ahoward/rq.log

            ~ > rq q feed -d -l/home/ahoward/rq.log

          2) use something like this sample crontab entry to keep a
feeder running
             forever (it attempts to (re)start every fifteen minutes)

···

#
            # your crontab file
            #

            */15 * * * * /full/path/to/bin/rq
/full/path/to/nfs/mounted/q f -d -l/home/user/rq.log

            log rolling while running in daemon mode is automatic.

      delete, d :

        delete combinations of pending, running, finished, dead, or
specific jobs.
        the delete mode is capable of parsing the output of list mode,
making it
        possible to create filters to delete jobs meeting very
specific conditions.

        mode_args are the same as for 'list', including 'running'.
note that it is
        possible to 'delete' a running job, but there is no way to
actually STOP it
        mid execution since the node doing the deleteing has no way to
communicate
        this information to the (possibly) remote execution host.
therefore you
        should use the 'delete running' feature with care and only for
housekeeping
        purposes or to prevent future jobs from being scheduled.

        examples :

          0) delete all pending, running, and finished jobs from a
queue

            ~ > rq q d all

          1) delete all pending jobs from a queue

            ~ > rq q d p

          2) delete all finished jobs from a queue

            ~ > rq q d f

          3) delete jobs via hand crafted filter program

            ~ > rq q list | filter_prog | rq q d

      query, q :

        query exposes the database more directly the user, evaluating
the where
        clause specified on the command line (or from STDIN). this
feature can be
        used to make a fine grained slection of jobs for reporting or
as input into
        the delete command. you must have a basic understanding of
SQL syntax to
        use this feature, but it is fairly intuitive in this capacity.

        examples:

          0) show all jobs submitted within a specific 10 minute range

            ~ > rq q query "started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'"

          1) shell quoting can be tricky here so input on STDIN is
also allowed

            ~ > cat contraints
            started >= '2004-06-29 22:51:00' and
            started < '2004-06-29 22:51:10'

            ~ > rq q query < contraints
              or (same thing)

            ~ > cat contraints | rq q query

          2) this query output may then be used to delete specific
jobs

            ~ > cat contraints | rq q query | rq q d

          3) show all jobs which are either finished or dead

            ~ > rq q q state=finished or state=dead

    NOTES
      - realize that your job is going to be running on a remote host
and this has
        implication. paths, for example, should be absolute, not
relative.
        specifically the submitted job must be visible from all hosts
currently
        feeding from a q.

      - you need to consider __CAREFULLY__ what the ramifications of
having multiple
        instances of your program all running at the same time will
be. it is
        beyond the scope of rq to ensure multiple instances of a
program
        will not overwrite each others output files, for instance.
coordination of
        programs is left entirely to the user.

      - the list of finished jobs will grow without bound unless you
sometimes
        delete some (all) of them. the reason for this is that rq
cannot
        know when the user has collected the exit_status, etc. from a
job and so
        keeps this information in the queue until instructed to delete
it.

      - if you are using the crontab feature to maintain an immortal
feeder on a
        host then that feeder will be running in the environment
provided by cron.
        this is NOT the same environment found in a login shell and
you may be
        suprised at the range of commands which do not function. if
you want
        submitted jobs to behave as closely as possibly to their
behaviour when
        typed interactively you'll need to wrap each job in a shell
script that
        looks like the following:

          #/bin/bash --login
          commmands_for_your_job

        and submit that script

    ENVIRONMENT
      RQ_Q: full path to queue

        the queue argument to all commands may be omitted if, and only
if, the
        environment variable 'RQ_Q' contains the full path to the q.
eg.

          ~ > export RQ_Q=/full/path/to/my/q

        this feature can save a considerable amount of typing for
those weak of wrist

    DIAGNOSTICS
     success => $? == 0
     failure => $? != 0

    AUTHOR
     ara.t.howard@noaa.gov

    BUGS
     1 < bugno && bugno <= 42

    OPTIONS

      -f, --feed=appetite
      -p, --priority=priority
          --name
      -d, --daemon
      -q, --quiet
      -e, --select
      -i, --infile=infile
      -M, --max_sleep=seconds
      -m, --min_sleep=seconds
      -l, --log=path
      -v=0-4|debug|info|warn|error|fatal
          --verbosity
          --log_age=log_age
          --log_size=log_size
      -c, --config=path
          --template=template
      -h, --help

Congrats, Ara! You've been working on this one for a while now, I believe. I'm excited to play with it, though I regret that there's probably not much I could use it on at work... I'll definately add it to my toolbox, though.

Thanks!

- Jamis

ara howard wrote:

···

(very sorry if this is posted multiple times - for some reason i have
not seen
my original post)

rubyists-

rq (ruby queue) is a project aimed at filling the void between
roll-your own
distributed processing using ssh/rsh and full blown clustering
software like
sun grid engine. it is a tool designed to throw a bunch of nodes at a
list of
tasks in hurry. it is highly fault tolerant due to it's decentralized
design
and simple to use, requiring only a few minutes to setup and a the use
of
three or four simple commands. at this point doccumentation is scant
and this
release carries an experimental status; however, our site has run
nearly a
million jobs through rq over that last few months with no problems and
i am
excited to gather opions about the intial design before starting in
ernest on
an alpha release. please feel free to contact me either on or offline
with
any questions or assistance getting setup as i am eagar to find some
willing
testers.

for now the project lives at

    http://raa.ruby-lang.org/project/rq/

though a rubyforge/gem dist will accompany the alpha release

cheers.

-a

from 'rq -help'

    NAME
      rq v0.1.2

    SYNOPSIS
      rq [queue] mode [mode_args]* [options]*

    DESCRIPTION
      rq is an __experimental__ tool used to manage nfs mounted work
      queues. multiple instances of rq on multiples hosts can work
from
      these queues to distribute processing load to 'n' nodes -
bringing many dozens
      of otherwise powerful cpus to their knees with a single blow. clearly this
      software should be kept out of the hands of radicals, SETI
enthusiasts, and
      one mr. jeff safran.

      rq operates in one of the modes create, submit, feed, list,
delete,
      query, or help. depending on the mode of operation and the
options used the
      meaning of mode_args may change, sometime wildly and
unpredictably (i jest, of
      course).

    MODES

      modes may be abbreviated to uniqueness, therefore the following
shortcuts
      apply :

        c => create
        s => submit
        f => feed
        l => list
        d => delete
        q => query
        h => help

      create, c :

        creates a queue. the queue MUST be located on an nfs mounted
file system
        visible from all nodes intended to run jobs from it.

        examples :

          0) to create a queue
              ~ > rq q create
            or simply
              ~ > rq q c

      list, l :

        show combinations of pending, running, dead, or finished jobs.
for this
        command mode_args must be one of pending, running, dead,
finished, or all.
        the default is all.

        mode_args may be abbreviated to uniqueness, therefore the
following
        shortcuts apply :

          p => pending
          r => running
          f => finished
          d => dead
          a => all

        examples :

          0) show everything in q
              ~ > rq q list all
            or
              ~ > rq q l all
            or
              ~ > export RQ_Q=q
              ~ > rq l

          0) show q's pending jobs
              ~ > rq q list pending

          1) show q's running jobs
              ~ > rq q list running

          2) show q's finished jobs
              ~ > rq q list finshed

      submit, s :

        submit jobs to a queue to be proccesed by any feeding node. any mode_args
        are taken as the command to run. note that mode_args are
subject to shell
        expansion - if you don't understand what this means do not use
this feature.

        when running in submit mode a file may by specified as a list
of commands to
        run using the '--infile, -i' option. this file is taken to be
a newline
        separated list of commands to submit, blank lines and comments
(#) are
        allowed. if submitting a large number of jobs the input file
method is MUCH
        more efficient. if no commands are specified on the command
line rq
        automaticallys reads them from STDIN. yaml formatted files
are also allowed
        as input (http://www.yaml.org/\) - note that output of nearly
all rq
        commands is valid yaml and may, therefore, be piped as input
into the submit
        command.

        the '--priority, -p' option can be used here to determine the
priority of
        jobs. priorities may be any number (0, 10]; therefore 9 is
the maximum
        priority. submitting a high priority job will NOT supplant
currently
        running low priority jobs, but higher priority jobs will
always migrate
        above lower priority jobs in the queue in order that they be
run sooner.
        note that constant submission of high priority jobs may create
a starvation
        situation whereby low priority jobs are never allowed to run. avoiding this
        situation is the responsibility of the user.

        examples :

          0) submit the job ls to run on some feeding host

            ~ > rq q s ls

          1) submit the job ls to run on some feeding host, at
priority 9

            ~ > rq -p9 q s ls

          2) submit 42000 jobs (quietly) to run from a command file.

            ~ > wc -l cmdfile
            42000
            ~ > rq q s -q < cmdfile

          3) submit 42 jobs to run at priority 9 from a command file.

            ~ > wc -l cmdfile
            42
            ~ > rq -p9 q s < cmdfile

          4) re-submit all finished jobs

            ~ > rq q l f | rq q s

      feed, f :

        take jobs from the queue and run them on behalf of the
submitter. jobs are
        taken from the queue in an 'oldest highest priority' order.

        feeders can be run from any number of nodes allowing you to
harness the CPU
        power of many nodes simoultaneously in order to more
effectively clobber
        your network.

        the most useful method of feeding from a queue is to do so in
daemon mode so
        that if the process loses it's controling terminal and will
not exit when
        you exit your terminal session. use the '--daemon, -d' option
to accomplish
        this. by default only one feeding process per host per queue
is allowed to
        run at any given moment. because of this it is acceptable to
start a feeder
        at some regular interval from a cron entry since, if a feeder
is alreay
        running, the process will simply exit and otherwise a new
feeder will be
        started. in this way you may keep feeder processing running
even acroess
        machine reboots.

        examples :

          0) feed from a queue verbosely for debugging purposes, using
a minimum and
             maximum polling time of 2 and 4 respectively

            ~ > rq q feed -v4 -m2 -M4

          1) feed from a queue in daemon mode logging into
/home/ahoward/rq.log

            ~ > rq q feed -d -l/home/ahoward/rq.log

          2) use something like this sample crontab entry to keep a
feeder running
             forever (it attempts to (re)start every fifteen minutes)

            #
            # your crontab file
            #

            */15 * * * * /full/path/to/bin/rq
/full/path/to/nfs/mounted/q f -d -l/home/user/rq.log

            log rolling while running in daemon mode is automatic.

      delete, d :

        delete combinations of pending, running, finished, dead, or
specific jobs.
        the delete mode is capable of parsing the output of list mode,
making it
        possible to create filters to delete jobs meeting very
specific conditions.

        mode_args are the same as for 'list', including 'running'. note that it is
        possible to 'delete' a running job, but there is no way to
actually STOP it
        mid execution since the node doing the deleteing has no way to
communicate
        this information to the (possibly) remote execution host. therefore you
        should use the 'delete running' feature with care and only for
housekeeping
        purposes or to prevent future jobs from being scheduled.

        examples :

          0) delete all pending, running, and finished jobs from a
queue

            ~ > rq q d all

          1) delete all pending jobs from a queue

            ~ > rq q d p

          2) delete all finished jobs from a queue

            ~ > rq q d f

          3) delete jobs via hand crafted filter program

            ~ > rq q list | filter_prog | rq q d

      query, q :

        query exposes the database more directly the user, evaluating
the where
        clause specified on the command line (or from STDIN). this
feature can be
        used to make a fine grained slection of jobs for reporting or
as input into
        the delete command. you must have a basic understanding of
SQL syntax to
        use this feature, but it is fairly intuitive in this capacity.

        examples:

          0) show all jobs submitted within a specific 10 minute range

            ~ > rq q query "started >= '2004-06-29 22:51:00' and
started < '2004-06-29 22:51:10'"

          1) shell quoting can be tricky here so input on STDIN is
also allowed

            ~ > cat contraints
            started >= '2004-06-29 22:51:00' and
            started < '2004-06-29 22:51:10'

            ~ > rq q query < contraints
              or (same thing)

            ~ > cat contraints | rq q query

          2) this query output may then be used to delete specific
jobs

            ~ > cat contraints | rq q query | rq q d

          3) show all jobs which are either finished or dead

            ~ > rq q q state=finished or state=dead

    NOTES
      - realize that your job is going to be running on a remote host
and this has
        implication. paths, for example, should be absolute, not
relative.
        specifically the submitted job must be visible from all hosts
currently
        feeding from a q.

      - you need to consider __CAREFULLY__ what the ramifications of
having multiple
        instances of your program all running at the same time will
be. it is
        beyond the scope of rq to ensure multiple instances of a
program
        will not overwrite each others output files, for instance. coordination of
        programs is left entirely to the user.

      - the list of finished jobs will grow without bound unless you
sometimes
        delete some (all) of them. the reason for this is that rq
cannot
        know when the user has collected the exit_status, etc. from a
job and so
        keeps this information in the queue until instructed to delete
it.

      - if you are using the crontab feature to maintain an immortal
feeder on a
        host then that feeder will be running in the environment
provided by cron.
        this is NOT the same environment found in a login shell and
you may be
        suprised at the range of commands which do not function. if
you want
        submitted jobs to behave as closely as possibly to their
behaviour when
        typed interactively you'll need to wrap each job in a shell
script that
        looks like the following:

          #/bin/bash --login
          commmands_for_your_job

        and submit that script

    ENVIRONMENT
      RQ_Q: full path to queue

        the queue argument to all commands may be omitted if, and only
if, the
        environment variable 'RQ_Q' contains the full path to the q. eg.

          ~ > export RQ_Q=/full/path/to/my/q

        this feature can save a considerable amount of typing for
those weak of wrist

    DIAGNOSTICS
     success => $? == 0
     failure => $? != 0

    AUTHOR
     ara.t.howard@noaa.gov

    BUGS
     1 < bugno && bugno <= 42

    OPTIONS

      -f, --feed=appetite
      -p, --priority=priority
          --name
      -d, --daemon
      -q, --quiet
      -e, --select
      -i, --infile=infile
      -M, --max_sleep=seconds
      -m, --min_sleep=seconds
      -l, --log=path
      -v=0-4|debug|info|warn|error|fatal
          --verbosity
          --log_age=log_age
          --log_size=log_size
      -c, --config=path
          --template=template
      -h, --help

.

--
Jamis Buck
jgb3@email.byu.edu
http://www.jamisbuck.org/jamis

"I use octal until I get to 8, and then I switch to decimal."

Hi Ara,

<snip>

    DESCRIPTION
      rq is an __experimental__ tool used to manage
nfs mounted work queues.

  ^^^^^^^^^^^^^
Has this been tested/tried on a mix of *ix and Windows
platform, and if yes, which NFS (free or commercial)
package was used on the Windows side?

Thanks,
-- shanko

···

--- ara howard <ahoward@fsl.noaa.gov> wrote:

__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail

this has not been tested on any window machines.

our environment is 100% linux - latest enterprise kernel on all boxes,
including nfs servers. the code SHOULD work on any box for which nfs locking
works (both my code and sqlite depend on fcntl locks working). if you are
interested in trying it out please let me know and i'll help you get a test
system up.

cheers.

-a

···

On Tue, 17 Aug 2004, Shashank Date wrote:

Hi Ara,

--- ara howard <ahoward@fsl.noaa.gov> wrote:
<snip>

    DESCRIPTION
      rq is an __experimental__ tool used to manage
nfs mounted work queues.

^^^^^^^^^^^^^
Has this been tested/tried on a mix of *ix and Windows
platform, and if yes, which NFS (free or commercial)
package was used on the Windows side?

Thanks,
-- shanko

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Hi Ara,

"Ara.T.Howard" <ahoward@noaa.gov> wrote in message

this has not been tested on any window machines.

our environment is 100% linux - latest enterprise kernel on all boxes,
including nfs servers. the code SHOULD work on any box for which nfs

locking

works (both my code and sqlite depend on fcntl locks working). if you are
interested in trying it out please let me know and i'll help you get a

test

system up.

Definitely interested, but unfortunately won't be able to devote testing
time
until early November (if at all). I am looking at a project in the pipe
which
may have a mix of Linux and Windows for which this would be just perfect.

If and when we start using it, I will be able to contribute towards:

0. GUI Console,
1. dynamic load-balancer
2. rudimentary work-flow engine

in that order....if it does not have these by then, that is.

Wish you the very best.
-- shanko

"Shashank Date" <sdate@everestkc.net> wrote in message

If and when we start using it, I will be able to contribute towards:

0. GUI Console,
1. dynamic load-balancer
2. rudimentary work-flow engine

                          ^^^^^^^^^^^^^^
Forgot to add a link which lists some W/F engines out there:
http://jbpm.org/article.html#wftk

Hi Ara,

"Ara.T.Howard" <ahoward@noaa.gov> wrote in message

this has not been tested on any window machines.

our environment is 100% linux - latest enterprise kernel on all boxes,
including nfs servers. the code SHOULD work on any box for which nfs

locking

works (both my code and sqlite depend on fcntl locks working). if you are
interested in trying it out please let me know and i'll help you get a

test

system up.

Definitely interested, but unfortunately won't be able to devote testing
time until early November (if at all). I am looking at a project in the pipe
which may have a mix of Linux and Windows for which this would be just
perfect.

you'll probably want to test you nfs setup first - i can give a little script
that should determine if mixed windows/linux nfs access works with locking -
it's pure ruby so it should just run. do you have a few nodes you can test
on?

If and when we start using it, I will be able to contribute towards:

0. GUI Console,

that'd be great - it's on the TODO list...

1. dynamic load-balancer

it doesn't need one! :wink: all nodes access the queue taking jobs in a 'highest
priority oldest out first' fashion. in otherwords, all nodes bail water as
fast as possible - if the boat is sinking the only answers are:

   - write faster jobs
   - make network faster (vsftp is great!)
   - buy faster nodes
   - buy more nodes

this is one of the great things about the totally decentralized architechture
- there is no need for load balancing or scheduling!

2. rudimentary work-flow engine

you mean dependancies? yes that would be nice...

in that order....if it does not have these by then, that is.

Wish you the very best.
-- shanko

it all sounds great - i'll try to get some more docs out in the next few weeks
so you can read about it.

cheers.

-a

···

On Mon, 16 Aug 2004, Shashank Date wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Hello Ara,

"Ara.T.Howard" <ahoward@noaa.gov> wrote in message

you'll probably want to test you nfs setup first - i can give a little

script

that should determine if mixed windows/linux nfs access works with

locking -

it's pure ruby so it should just run. do you have a few nodes you can

test

on?

Not right away ... but I will try and get a 2-node configuration working
over
the (already crowded :wink: week-end. No promises though.

If I succeed, I will have Win XP (Home) on one node and SuSE 8.2 on the
other. But please go ahead and email the script which determines
if it works or not.

>
> If and when we start using it, I will be able to contribute towards:

> 0. GUI Console,

that'd be great - it's on the TODO list...

Good ... I will be using wxRuby if it comes to me.

> 1. dynamic load-balancer

it doesn't need one! :wink: all nodes access the queue taking jobs in a

'highest

priority oldest out first' fashion. in otherwords, all nodes bail water

as

fast as possible - if the boat is sinking the only answers are:

   - write faster jobs
   - make network faster (vsftp is great!)
   - buy faster nodes
   - buy more nodes

this is one of the great things about the totally decentralized

architechture

- there is no need for load balancing or scheduling!

Hmmm... then may be I have mis-understood the nature of your project.
I was thinking of using it for CPU intensive (eg. image analysis) jobs over
a
cluster of heterogenous (in terms of CPU power, O.S. and in terms
of functionality) nodes and wanted the ability to "farm" (push) work
requests to least busy CPUs (provided there is a way to determine that, of
course). Phil Tomson's TaskMaster comes to mind. See:
http://raa.ruby-lang.org/project/taskmaster/

[Phil: if you are reading this, will love some feedback from you:
specifically
are you planning to work on it in near future? Or is the project closed?]

From your describtion above it appears to me that work will be
"fetched" (pulled) by least busy CPUs. Am I correct? (We can take this
discussion off line if it starts becoming [OT]).

> 2. rudimentary work-flow engine

you mean dependancies? yes that would be nice...

Well, a little more than that. Right now, this is still "pie in the sky"
kind
of an idea... but I can see some of the patterns being implemented.

See: http://tmitwww.tm.tue.nl/research/patterns/patterns.htm
for some details.

This is very likely to be extremely specific to the problem we are trying
to solve and may also be proprietory. I hope your licensing terms
will permit me that.

it all sounds great - i'll try to get some more docs out in the next few

weeks

so you can read about it.

Fantastic ... I am all ears (ummm eyes ;-)!

cheers.

Thanks,
-- shanko

i did something like this once:

   http://raa.ruby-lang.org/project/flow/

thanks for the link.

the biggest issue in cluster flows, i think, is HOW to collect the exit status
w/o polling... this is especially true w/my design since there is no central
brain (controlling process). this is one of the disadvantages of a
decentralized system - but, i think, the advantages outweigh this and other
negatives.

cheers.

-a

···

On Mon, 16 Aug 2004, Shashank Date wrote:

"Shashank Date" <sdate@everestkc.net> wrote in message

If and when we start using it, I will be able to contribute towards:

0. GUI Console,
1. dynamic load-balancer
2. rudimentary work-flow engine

                         ^^^^^^^^^^^^^^
Forgot to add a link which lists some W/F engines out there:
http://jbpm.org/article.html#wftk

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Not right away ... but I will try and get a 2-node configuration working
over the (already crowded :wink: week-end. No promises though.

If I succeed, I will have Win XP (Home) on one node and SuSE 8.2 on the
other. But please go ahead and email the script which determines if it works
or not.

i'll tar it up and send it your way. any chance you could try compiliing this
on windows?

   http://raa.ruby-lang.org/project/posixlock/

i don't have access to a windows machine with a compiler toolchain and don't
even know if windows offers a posix fctnl - but i'm hoping it does. sqlite
compiles on windows so it must - but i may have to add some #ifdefs to that
code to get it working... i should also add that you can simply pack a struct
to get fcntl working (thanks matz for this pure ruby solution) so a c
extension is not strictly needed for access to fcntl locking. however, some
form of it is required so we should probably look into that pronto. i have an
RCR out there (i think) for posixlocking in ruby but havn't had time to pursue
it - the lack of it is a problem IMHO....

Hmmm... then may be I have mis-understood the nature of your project. I was
thinking of using it for CPU intensive (eg. image analysis) jobs over a
cluster of heterogenous (in terms of CPU power, O.S. and in terms of
functionality) nodes and wanted the ability to "farm" (push) work requests
to least busy CPUs (provided there is a way to determine that, of course).
Phil Tomson's TaskMaster comes to mind. See:
http://raa.ruby-lang.org/project/taskmaster/

[Phil: if you are reading this, will love some feedback from you:
specifically are you planning to work on it in near future? Or is the
project closed?]

From your describtion above it appears to me that work will be "fetched"
(pulled) by least busy CPUs. Am I correct? (We can take this discussion off
line if it starts becoming [OT]).

exactly. the flow is something like

   def feed
     daemon do
       loop do
         start_jobs until busy?
         reap_jobs
       end
     end
   end

but obviously a bit more compilcated. the 'busy?' method only returns true if
a predefined number of jobs are already running (we set it to two for dual CPU
nodes) but i've got plans for this method to hook into resource monitoring so
a node may become 'busy' if some critical resource is maxed and, of course, so
that resources may be requested.

this approach totally avoids needing a scheduler since, as you correct state,
jobs are 'fetched' from the queue and the strongest fastest node simply get
more work done. it is working __really__ good for us - we see our node
performance line up exactly as we would have predicted it as the number of
jobs grows.

i think this approach should work very well for the scenario you describe.
i'm assuming you've also checked out technologies like condor, sge, etc? all
the software i looked at was extremely bloated, full of complexity bugs, and
didn't support one of the main features i'll eventually need (full boolean
resource requests) which is what lead me down this path. if you get a chance
send a message offline (on online i suppose if it's relevant) detailing your
planned processing flow if you don't mind. it's handy to know how people
might use your software since, sometimes, choices can be arbitrary and this
knowledge might help me make better ones.

cheers.

This is very likely to be extremely specific to the problem we are trying to
solve and may also be proprietory. I hope your licensing terms will permit
me that.

ruby's license - so it should.

kind regards.

-a

···

On Mon, 16 Aug 2004, Shashank Date wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================