[ANN] Drake: Distributed Rake

= DRAKE -- Distributed Rake

A branch of Rake supporting parallel task execution.

== Synopsis

Run up to three tasks in parallel:

  % drake -j3

or equivalently,

  % drake --threads 3

== Installation

  % gem install drake

== Notes

=== Compatibility

Drake is 100% compatible with Rake. The code path for --threads=1 is
effectively identical to that of Rake's. Drake passes all of Rake's
unit tests, with any number of threads from 1 to 1000 (that's the most
I tested).

=== Dependencies

In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. Consider

   task :a => [:x, :y, :z]

With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
before _a_ is invoked. However with drake --threads=N (for N > 1),
one should not expect any particular order of execution. Since there
is no dependency specified between _x_,_y_,_z_ above, Drake is free to
run them in any order.

If you wish _x_,_y_,_z_ to be invoked sequentially, then write

   task :a => seq[:x, :y, :z]

This is shorthand for

   task :a => :z
   task :z => :y
   task :y => :x

Upon invoking _a_, the above rules say: "Can't do _a_ until _z_ is
complete; can't do _z_ until _y_ is complete; can't do _y_ until _x_
is complete; therefore do _x_." In this fashion the sequence
_x_,_y_,_z_ is enforced.

The problem of insufficient dependencies plagues Makefiles as well.
Package maintainers affectionately call it "not j-safe."

=== MultiTask

The use of +multitask+ is deprecated. Tasks which may properly be run
in parallel will be run in parallel; those which cannot, will not. It
is not the user's job to decide.

Drake's +multitask+ is an alias of +task+.

=== Task#invoke inside Task#invoke

Parallelizing code means surrendering control over the
micro-management of its execution. Manually invoking tasks inside
other tasks is rather contrary to this notion, throwing a monkey
wrench into the system. An exception will be raised when this is
attempted in non-single-threaded mode.

== Links

* Download: * http://rubyforge.org/frs/?group_id=6530
* Rubyforge home: http://rubyforge.org/projects/drake
* Repository: http://github.com/quix/rake

== Author

* James M. Lawrence <quixoticsycophant@gmail.com>

My first reaction was, "So hows is this different than Rake again? Rake has
multitask..."

···

On Tuesday 09 September 2008 01:58:40 quixoticsycophant@gmail.com wrote:

=== MultiTask

The use of +multitask+ is deprecated. Tasks which may properly be run
in parallel will be run in parallel; those which cannot, will not. It
is not the user's job to decide.

Drake's +multitask+ is an alias of +task+.

Aha.

So what do you do with things which aren't thread-save? Or "j-safe"?

And how is this different than running Rake with 'task' set to an alias
of 'multitask'?

I forgot to mention that there is a good reason for the gem-only
release. Despite outward appearances, Drake is internally the same as
Rake, down to using the same file names and top-level module named
'Rake'. This is to make a mainline merge easier, if Jim decides to do
so. (The fork stems from the latest Rake repository.)

Since Rubygems installs each gem in separate directory, it it safe to
have Rake and Drake installed at the same time. However if you bypass
gems by executing drake's install.rb, your rake will be the parallized
one.

I also forgot to thank Jim, who transitioned to github in order to
help me do this.

Thanks--
J

Does Drake properly clean up its children if it is aborted with SIGINT? ISTR
multitask in rake leaving orphans running.

···

--
Jos Backus
jos at catnook.com

Hmmm.... this is not backward compatible. Things could go very badly
if I tried -j3 on my "badly" written Rakefiles.

May I make a suggestion? Have

    task :a => [:x, :y, :z]

translate into the task :a => :z => :y => :x thing. And then

    task :a => [[:x, :y, :z]]

Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra "j-array".

And besides it sort of looks like parallel marks || x || :wink:

Other than this one thing, I say very nice work.

T.

···

On Sep 9, 2:58 am, quixoticsycoph...@gmail.com wrote:

In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. Consider

task :a => [:x, :y, :z]

With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
before _a_ is invoked. However with drake --threads=N (for N > 1),
one should not expect any particular order of execution. Since there
is no dependency specified between _x_,_y_,_z_ above, Drake is free to
run them in any order.

If you wish _x_,_y_,_z_ to be invoked sequentially, then write

task :a => seq[:x, :y, :z]

This is shorthand for

task :a => :z
task :z => :y
task :y => :x

Upon invoking _a_, the above rules say: "Can't do _a_ until _z_ is
complete; can't do _z_ until _y_ is complete; can't do _y_ until _x_
is complete; therefore do _x_." In this fashion the sequence
_x_,_y_,_z_ is enforced.

The problem of insufficient dependencies plagues Makefiles as well.
Package maintainers affectionately call it "not j-safe."

== Installation

  % gem install drake

% gem install drake
Successfully installed drake-0.8.1.11.0.1

% which drake
drake not found

···

--
Posted via http://www.ruby-forum.com/\.

James M. Lawrence wrote:

In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. Consider

   task :a => [:x, :y, :z]

With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
before _a_ is invoked.

Just to clarify: In standard rake x, y and z will be invoked by task a
in that order. However, that doesn't provide any guarantees that they
will be executed in that order.

For example, consider the following additional dependencies:

task :x => :z

Then the code for z will be executed before task x.

The moral of the story is that depending upon ordering of dependencies
to determine the ordering of execution is a bug in standard rake too.
(its just more likely that the drake will make this kind of bug
manifest).

BTW, good job James.

···

--
Posted via http://www.ruby-forum.com/\.

I have restored the original 'multitask' for single-threaded mode
only. Now Drake and Rake should have functionally identical codepaths
for single-threaded mode (default behavior); my previous assertion of
such which wasn't quite true.

I have also taken Thomas Sawyer's suggestion for a randomizing option
(credited in the ChangeLog).

New sections of the README:

=== Migrating to -j

First of all, do you want to bother with -j? If you are satisfied
with your build time, then there is really no reason to use it.

If on the other hand your build takes twenty minutes to complete, you
may be interested in investing some time getting the full dependency
tree correct in order to take advantage of multiple CPUs or cores.

Though there is no way for Drake to fathom what *you* mean by a
correct dependency, there is a tool available which helps you get
closer to saying what you mean.

  % drake --rand[=SEED]

This will randomize the order of sibling prerequisites for each task.
When given the optional SEED integer, it will call srand(SEED) to
produce the same permutation each time. The randomize option also
disables +multitask+.

Though this option may produce an error due to an unspecified
dependency, at least it will be an error which is exactly the same on
each run (with SEED). In addition, you'll have the major debugging
advantage of using a single thread.

This option will also work in multi-threaded mode. After all, once
-jN is running smoothly there is *still* no guarantee that you have it
right. However with each successful execution of drake -jN --rand,
the probability of correctness approaches 1 (though asymptotically
so).

(The only way to prove correctness is to test all such permutations,
which for any non-trivial project would be prohibitively large,
especially those which meaningfully benefit from -j.)

=== MultiTask

When more than one thread is given, +multitask+ behaves just like
+task+. Those tasks which may properly be run in parallel will be run
in parallel; those which cannot, will not. It is not the user's job
to decide. In other words, for -jN (N > 1), +multitask+ is an alias
of +task+.

For -j1 (default), +multitask+ behaves as the original.

···

--
Posted via http://www.ruby-forum.com/.

So what do you do with things which aren't thread-save? Or "j-safe"?

The same thing you do with a Makefile that isn't j-safe: (1) write the
dependencies correctly, which makes it j-safe, or (2) don't run it
with -j.

And how is this different than running Rake with 'task' set to an alias
of 'multitask'?

If 'task' became 'multitask', Rake would run all your tasks at once --
all at the same time. That's probably not what you want :slight_smile:

Thinking in terms of parallel execution has a little learning curve.
It's certainly a not a natural transition coming from single-threaded
thinking.

Incidentally there is a good litmus test for determining whether you
get the gist of parallelism: once it becomes obvious that 'multitask'
is a mistake, then you probably get it. The dependency graph tells us
what can be run in parallel and what can't. It's a math problem.
'multitask' stomps it all to pieces, having the power to declare 2 + 5
= 8 if it so chooses.

My advice for Rakefile writers is to incrementally move toward -j
correctness. Start with the bottom tasks first (those executed last)
and work your way up, testing each new task subtree.

Regards,
J

···

On Sep 9, 4:52 am, David Masover <ni...@slaphack.com> wrote:

Anton Ivanov wrote:

== Installation

  % gem install drake

% gem install drake
Successfully installed drake-0.8.1.11.0.1

% which drake
drake not found

% sudo chmod +x /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake
% /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake -j2
/var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake: invalid option -- j
% /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake --threads 2
/var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake: unrecognized option
`--threads'

···

--
Posted via http://www.ruby-forum.com/\.

There is a mathematical reality we cannot avoid, from which
special-case syntax and backwards-compatibility acrobatics cannot save
us. The problem is in our thinking. We didn't specify what depends
on what. We thought we did, but it turns out we were fooling
ourselves all along.

The dependency graph is at root a clear and simple concept. Defining
node relations is also clear and simple:

  task :a => :b

Child nodes are on the left, pointing to parent nodes on the right.
Task a depends on task b. Clear. Simple. Now, are we certain that
we want to create a superstructure around this concept, and if so,
what exactly will it be?

Trans suggested that this

    task :a => [:x, :y, :z]

should be translated into this

    task :a => :z
    task :z => :y
    task :y => :x

while this

    task :a => [[:x, :y, :z]]

is translated into this

    task :a => :x
    task :a => :y
    task :a => :z

OK, but there are a million ways in which a programmer can
insufficiently define dependencies. This will not come close to
saving us. Let's take just one scenario from this example.

Nodes x,y,z are added *in that order* to the parent a (excuse me for
speaking in graph terms). We can still be accidentally depending on
that. The problem in our thinking is not solved.

Do we want

    task :a => :x
    task :a => :y
    task :a => :z

to be equivalent to

    task :a => :z
    task :a => :x
    task :a => :y

Yes or no? If a programmer wants them to mean different things, how
shall we accommodate him?

Again I am taken back to the cold, hard, mathematical reality of the
graph. We must think carefully about whether it actually represents
the necessary relations. Nothing can do this for us, not tricks of
grammar or syntax, which may even obscure the relations.

There is already a historical precedent with Makefiles. A new syntax
could have been added to Makefiles, but none was. The Makefiles had
bugs, but instead of timidly skirting around the problems while
praising the gods of backwards compatibility, people faced them
head-on, solving them one at at time. It is my hope that rubyists
will do the same.

While I endeavor to keep an open mind, I am not entirely convinced
that everyone clearly sees what the problem is (I am speaking of the
various conversations I've had about parallelism, not here
specifically). The problem is not obvious at all. It requires some
reflection. I understand the value of backwards compatibility, which
is why drake==rake for -j1. In this circumstance, however, it appears
to me that the answer is forced upon us by the sheer mathematics of
the situation.

Can you construct a test case? The process table is clean after I hit
ctrl-c during drake -j100 on this

100.times { |i|
  name = i.to_s.to_sym
  task name do
    fork {
      loop { }
    }
  end
  task :default => name
}

···

On Sep 9, 11:25 am, Jos Backus <j...@catnook.com> wrote:

Does Drake properly clean up its children if it is aborted with SIGINT? ISTR
multitask in rake leaving orphans running.

I thought someone else might notice it and elaborate but there is
another potential benefit of this notation. Eg.

     task :a => [[:x, :y, :z], [:m, :n], :r]

Where :x, :y, :z can be run in parallel, as can :m and :n, but the
groups must run one before the other.

T.

···

On Sep 9, 1:54 pm, Trans <transf...@gmail.com> wrote:

On Sep 9, 2:58 am, quixoticsycoph...@gmail.com wrote:

> In a given Rakefile, it is possible (even likely) that the dependency
> tree has not been properly defined. Consider

> task :a => [:x, :y, :z]

> With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
> before _a_ is invoked. However with drake --threads=N (for N > 1),
> one should not expect any particular order of execution. Since there
> is no dependency specified between _x_,_y_,_z_ above, Drake is free to
> run them in any order.

> If you wish _x_,_y_,_z_ to be invoked sequentially, then write

> task :a => seq[:x, :y, :z]

> This is shorthand for

> task :a => :z
> task :z => :y
> task :y => :x

> Upon invoking _a_, the above rules say: "Can't do _a_ until _z_ is
> complete; can't do _z_ until _y_ is complete; can't do _y_ until _x_
> is complete; therefore do _x_." In this fashion the sequence
> _x_,_y_,_z_ is enforced.

> The problem of insufficient dependencies plagues Makefiles as well.
> Package maintainers affectionately call it "not j-safe."

Hmmm.... this is not backward compatible. Things could go very badly
if I tried -j3 on my "badly" written Rakefiles.

May I make a suggestion? Have

task :a =&gt; \[:x, :y, :z\]

translate into the task :a => :z => :y => :x thing. And then

task :a =&gt; \[\[:x, :y, :z\]\]

Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra "j-array".

Ah, so by design you consider it a bug. You could have fixed that from
day one by randomizing the order of the prerequisites. Now you have a
situation where many Rakefiles depend on that bug. So, why not turn
lemons into lemonade, and make this bug a feature?

T.

···

On Sep 10, 7:59 am, Jim Weirich <j...@weirichhouse.org> wrote:

James M. Lawrence wrote:
> In a given Rakefile, it is possible (even likely) that the dependency
> tree has not been properly defined. Consider

> task :a => [:x, :y, :z]

> With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
> before _a_ is invoked.

Just to clarify: In standard rake x, y and z will be invoked by task a
in that order. However, that doesn't provide any guarantees that they
will be executed in that order.

For example, consider the following additional dependencies:

task :x => :z

Then the code for z will be executed before task x.

The moral of the story is that depending upon ordering of dependencies
to determine the ordering of execution is a bug in standard rake too.
(its just more likely that the drake will make this kind of bug
manifest).

>
> So what do you do with things which aren't thread-save? Or "j-safe"?
>

The same thing you do with a Makefile that isn't j-safe: (1) write the
dependencies correctly, which makes it j-safe, or (2) don't run it
with -j.

Still going to be a fair number of cases of (3), I imagine: use locks to
synchronize non-thread-safe libraries, for which there's still a benefit to
running those tasks in parallel.

If 'task' became 'multitask', Rake would run all your tasks at once --
all at the same time. That's probably not what you want :slight_smile:

Actually, no, I assumed that 'multitask' only ran that specific task in
parallel.

Actually, I hadn't thought about it thoroughly enough to realize that this
wasn't what was happening:

The dependency graph tells us
what can be run in parallel and what can't.

I understand make -j, and I think I understand the difference with
multitask -- if I understand it:

multitask :foo ...
multitask :bar ...

In the above example, will everything really run concurrently? I'd assumed
that foo would run concurrently, and then bar would run concurrently.

In either case, I see what Drake is doing (real make -j behavior). Thanks for
explaining this -- it looks cool!

One more thing: I'm not sure what the best way to do this is, but I think it
would still be useful to have the task/multitask dichotomy, for legacy
programs. Multitasks would operate as properly parallized Drake tasks. Plain
old tasks would run in complete isolation, with the exception that if they
invoke a multitask, that multitask (and all its remaining dependencies) run
in j-parallized mode.

That would certainly break the purity of it, and it would be a bit more work,
but I think it could be made to work. The benefit is, you could translate an
existing project iteratively, without having to verify that the whole thing
is correct, first.

···

On Tuesday 09 September 2008 05:03:43 quixoticsycophant@gmail.com wrote:

On Sep 9, 4:52 am, David Masover <ni...@slaphack.com> wrote:

Yes! If you want it differently, you write in the ordering explicitly

task :a => :z
task :z => :y
task :y => :x

You have no reason to expect one operation to come before another if
there is not an explicit dependency chain between them

martin

···

On Tue, Sep 9, 2008 at 3:38 PM, . <quixoticsycophant@gmail.com> wrote:

Do we want

   task :a => :x
   task :a => :y
   task :a => :z

to be equivalent to

   task :a => :z
   task :a => :x
   task :a => :y

Yes or no? If a programmer wants them to mean different things, how
shall we accommodate him?

> Does Drake properly clean up its children if it is aborted with SIGINT? ISTR
> multitask in rake leaving orphans running.

I'm misremembering. SIGINT seems to work okay, it's SIGTERM that leaves
orphaned children (with ppid 1) around with rake, presumably because it
doesn't catch that signal. Same with drake (0.8.1.11.0.1)

Can you construct a test case? The process table is clean after I hit
ctrl-c during drake -j100 on this

100.times { |i|
  name = i.to_s.to_sym
  task name do
    fork {
      loop { }
    }
  end
  task :default => name
}

Thanks, I tried `drake &' folllowed by `kill $!' (using bash) and the forked
drake children revert from ppid $! to ppid 1. A `killall drake' is required to
clean up.

···

On Wed, Sep 10, 2008 at 08:38:35AM +0900, . wrote:

On Sep 9, 11:25 am, Jos Backus <j...@catnook.com> wrote:

--
Jos Backus
jos at catnook.com

It must be pulling the old rake libs. In all my tests the gem has won
the $LOAD_PATH order. It is easy to check,

  task :default do
    puts $LOAD_PATH
  end

% rake
/usr/local/stow/ruby-1.8.7-p72/lib/ruby/gems/1.8/gems/rake-0.8.1/bin
/usr/local/stow/ruby-1.8.7-p72/lib/ruby/gems/1.8/gems/rake-0.8.1/lib
[...]

% drake
/usr/local/stow/ruby-1.8.7-p72/lib/ruby/gems/1.8/gems/
drake-0.8.1.11.0.1/bin
/usr/local/stow/ruby-1.8.7-p72/lib/ruby/gems/1.8/gems/
drake-0.8.1.11.0.1/lib
[...]

What does yours say? Maybe your RUBYLIB environment var is pointing
at a lib directory containing the old rake.rb?

···

On Sep 9, 2:44 pm, Anton Ivanov <foral...@gmail.com> wrote:

Anton Ivanov wrote:

>> == Installation

>> % gem install drake

> % gem install drake
> Successfully installed drake-0.8.1.11.0.1

> % which drake
> drake not found

% sudo chmod +x /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake
% /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake -j2
/var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake: invalid option -- j
% /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake --threads 2
/var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake: unrecognized option
`--threads'
--
Posted viahttp://www.ruby-forum.com/.

unknown wrote:

If 'task' became 'multitask', Rake would run all your tasks at once --
all at the same time. That's probably not what you want :slight_smile:

[...] It's a math problem.
'multitask' stomps it all to pieces, having the power to declare 2 + 5
= 8 if it so chooses.

I'm not quite sure what you are saying here, but if you are trying to
imply that multitask does not honor dependencies in ordering, you are
incorrect. If there is dependency declared, then a task won't run until
all of its dependencies have finished.

That being said, there is a known bug in multitask where failures in
dependencies are not properly transmitted to all dependent tasks. But I
don't think you were refering to that.

-- Jim Weirich

···

--
Posted via http://www.ruby-forum.com/\.

Thomas Sawyer wrote:

···

On Sep 9, 1:54�pm, Trans <transf...@gmail.com> wrote:

> before _a_ is invoked. �However with drake --threads=N (for N > 1),
> � �task :a => :z

Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra "j-array".

I thought someone else might notice it and elaborate but there is
another potential benefit of this notation. Eg.

     task :a => [[:x, :y, :z], [:m, :n], :r]

Where :x, :y, :z can be run in parallel, as can :m and :n, but the
groups must run one before the other.

As stated before, assuming execution order amoung dependencies is a bug
even in standard rake. If groups of tasks need to be ordered in time,
declare a dependency. Anything else is just wrong.

-- Jim Weirich
--
Posted via http://www.ruby-forum.com/\.