Rake dependencies unknown prior to running tasks

Say I don't know what all the dependencies are until I've already begun executing tasks? To what extent can I add new tasks and dependencies on the fly? At first I thought that adding tasks during task execution didn't seem to be a safe thing to do. Then I made a few toy examples that seem to confirm this. So I started thinking that I need to have Rake call Rake, which seemed a bit clumsy. But then I read this discussion mentioned by Jim Weirich that seemed to imply that I ought to be able to make a single rake file do the job (http://markmail.org/message/jttfqf6wstvgahec#query:+page:1+mid:zlc5qjj5r6abfcse+state:results). But I couldn't find any details on how to accomplish this. Are there better ways of handing this problem that don't involve multiple rake files and rake calling rake?

Joe Wölfel wrote:

Say I don't know what all the dependencies are until I've already
begun executing tasks? To what extent can I add new tasks and
dependencies on the fly?

This is what 'import' is for,

  import 'moretasks.rb'

moretasks.rb is run after the Rakefile has been loaded but before any
tasks are invoked.

Actually I see no reason why it has to be a file. It looks like
'import' should take an optional block.

···

--
Posted via http://www.ruby-forum.com/\.

Mike Gold wrote:

Joe Wölfel wrote:

Say I don't know what all the dependencies are until I've already
begun executing tasks? To what extent can I add new tasks and
dependencies on the fly?

This is what 'import' is for

Sorry I just realized you meant that you actually create new tasks after
the task invocations have begun.

In this case, are you certain those things creating tasks should be
tasks? It seems like you should have normal ruby classes/methods which
determine which tasks to create, then create them. That is what I do.

I think this strategy covers all cases, even though you may need to
restructure your code. But in the end it's a cleaner approach, IMO.

···

--
Posted via http://www.ruby-forum.com/\.

Cleaner, maybe. But inefficient in my case. That would mean a lot of unnecessary rebuilding. Unfortunately, efficiency matters in this case. It can take days or weeks even with parallel builds. And it needs to be done often.

It seems like the wrong way to do it, but the only efficient solution I've come up with so far is to have Rake call itself with a different task. So basically I have dependency graph 1, which is known at the outset and dependency graph 2 which is only known after running tasks in dependency graph 1, and dependency graph 2 is itself dependent on dependancy graph 1.

It seems like a common problem. I've run into a number of build systems that needed to be restarted several times to get around similar issues. But if there's a better solution already out there I'd like to use it.

···

On 24 sept. 08, at 12:02, Mike Gold wrote:

Mike Gold wrote:

Joe Wölfel wrote:

Say I don't know what all the dependencies are until I've already
begun executing tasks? To what extent can I add new tasks and
dependencies on the fly?

This is what 'import' is for

Sorry I just realized you meant that you actually create new tasks after
the task invocations have begun.

In this case, are you certain those things creating tasks should be
tasks? It seems like you should have normal ruby classes/methods which
determine which tasks to create, then create them. That is what I do.

I think this strategy covers all cases, even though you may need to
restructure your code. But in the end it's a cleaner approach, IMO.
--
Posted via http://www.ruby-forum.com/\.

Joe Wölfel wrote:

Cleaner, maybe. But inefficient in my case. That would mean a lot
of unnecessary rebuilding. Unfortunately, efficiency matters in
this case. It can take days or weeks even with parallel builds. And
it needs to be done often.

It seems like the wrong way to do it, but the only efficient solution
I've come up with so far is to have Rake call itself with a different
task. So basically I have dependency graph 1, which is known at the
outset and dependency graph 2 which is only known after running tasks
in dependency graph 1, and dependency graph 2 is itself dependent on
dependancy graph 1.

It seems like a common problem. I've run into a number of build
systems that needed to be restarted several times to get around
similar issues. But if there's a better solution already out there
I'd like to use it.

I don't see why it would be inefficient or require unnecessary
rebuilding.

If you follow the strategy I mentioned, making your changes to the graph
before the first invoke, and avoiding tasks creating tasks (which is
forbidden anyway with the new parallel -j support in Drake), then you've
removed the dependency between graph 1 and graph 2 you describe.

By removing that dependency, it becomes *more* efficient because more
tasks can be parallelized, whereas before graph 1 and graph 2 had to be
executed sequentially (this may not be significant in your case, but is
very much so in other cases).

Any build system in which the only entry point is a task -- that is, you
must make a graph in order to make a graph -- would have to be run-run
to compensate its lack of dynamic support. Makefiles, for example.
That is why Rake is different -- you have the whole ruby language to
define your tasks, and then you say "go". This two-step approach is the
solution you seek.

···

--
Posted via http://www.ruby-forum.com/\.

I don't see why it would be inefficient or require unnecessary
rebuilding.

The reason is because I have to build things before I know (or can even determine programmatically) what other things need to be built.

Joe Wölfel wrote:

I don't see why it would be inefficient or require unnecessary
rebuilding.

The reason is because I have to build things before I know (or can
even determine programmatically) what other things need to be built.

If you can't determine programmaticaly what is built, then how does a
program build it?

Even C/C++ dependencies, where you have no clue what g++ -MM is going to
spit out, can be handled with 'import' and the makefile loader.

If you are executing some other program which generates stuff, perhaps
you can add a flag where the program outputs what it *would* generate.
Capture that and 'import' it.

And if you can't add that flag, or if you otherwise don't know what is
being generated, then your hands are tied anyway. You can't know what's
going to happen, so you can't do anything about it. The two graphs are
worlds apart, and never the twain shall meet. In this case I wonder
what solution you could have expected.

···

--
Posted via http://www.ruby-forum.com/\.

I didn't say what was being built couldn't be determined programmatically. I said it couldn't be determined until certain portions were already built. To build those things initial things I need a build tool, such as Rake. If the suggestion is that I shouldn't actually execute any Rake tasks until after I've determined all possible tasks then the catch 22 your talking about actually occurs. The only practical solution I've come up with so far is to have Rake build the initial targets and then call itself again to determine the rest of the dependency graph and build the remaining targets. If there were a way to augment the initial dependency graph dynamically then this wouldn't be necessary. I just don't happen to know of one.

···

On 24 sept. 08, at 14:37, Mike Gold wrote:

Joe Wölfel wrote:

I don't see why it would be inefficient or require unnecessary
rebuilding.

The reason is because I have to build things before I know (or can
even determine programmatically) what other things need to be built.

If you can't determine programmaticaly what is built, then how does a
program build it?

Even C/C++ dependencies, where you have no clue what g++ -MM is going to
spit out, can be handled with 'import' and the makefile loader.

If you are executing some other program which generates stuff, perhaps
you can add a flag where the program outputs what it *would* generate.
Capture that and 'import' it.

And if you can't add that flag, or if you otherwise don't know what is
being generated, then your hands are tied anyway. You can't know what's
going to happen, so you can't do anything about it. The two graphs are
worlds apart, and never the twain shall meet. In this case I wonder
what solution you could have expected.
--
Posted via http://www.ruby-forum.com/\.

Joe Wölfel wrote:

I didn't say what was being built couldn't be determined
programmatically. I said it couldn't be determined until certain
portions were already built. To build those things initial things I
need a build tool, such as Rake. If the suggestion is that I
shouldn't actually execute any Rake tasks until after I've determined
all possible tasks then the catch 22 your talking about actually
occurs. The only practical solution I've come up with so far is to
have Rake build the initial targets and then call itself again to
determine the rest of the dependency graph and build the remaining
targets. If there were a way to augment the initial dependency
graph dynamically then this wouldn't be necessary. I just don't
happen to know of one.

If you really cannot know what is going to be built, for example if a
program generates files whose names are taken from /dev/random and then
other tasks depend on those files, then you are in a pickle. Normally
this kind of thing is handled by 'import', but this assumes tasks can be
determined (for example examining the makedepend output).

What do you think of this:

  task :setup_a do
    puts "setup_a"
  end

  task :setup_b do
    puts "setup_b"
  end

  task :setup => [:setup_a, :setup_b] do
    puts "setup phase complete. defining new tasks..."

    task :main_a do
      puts "main_a"
    end

    task :main_b do
      puts "main_b"
    end

    puts "restarting..."
    throw :restart
  end

  task :main => [:main_a, :main_b] do
    puts "main phase complete."
  end
  task :main_a => :setup
  task :main_b => :setup

  task :default => :main do
    puts "all done."
  end

% rake -f test/Rakefile.restart-flag
(in /Users/jlawrence/work/rake)
setup_a
setup_b
setup phase complete. defining new tasks...
restarting...
main_a
main_b
main phase complete.
all done.

I may be inflicting hardship on myself since this would complicate drake
(http://drake.rubyforge.org), but anyway... This patch is for regular
rake; the git branch is the same thing.

  % git clone git://github.com/quix/rake.git
  % cd rake
  % git checkout -b restart-flag origin/restart-flag

diff --git a/lib/rake.rb b/lib/rake.rb
index 7c84f57..3010261 100755
--- a/lib/rake.rb
+++ b/lib/rake.rb
@@ -560,8 +560,15 @@ module Rake

     # Invoke the task if it is needed. Prerequites are invoked first.
     def invoke(*args)
- task_args = TaskArguments.new(arg_names, args)
- invoke_with_call_chain(task_args, InvocationChain::EMPTY)
+ catch(:done) {
+ loop {
+ catch(:restart) {
+ task_args = TaskArguments.new(arg_names, args)
+ invoke_with_call_chain(task_args, InvocationChain::EMPTY)
+ throw :done
+ }
+ }
+ }
     end

     # Same as invoke, but explicitly pass a call chain to detect
@@ -573,8 +580,8 @@ module Rake
           puts "** Invoke #{name} #{format_trace_flags}"
         end
         return if @already_invoked
- @already_invoked = true
         invoke_prerequisites(task_args, new_chain)
+ @already_invoked = true
         execute(task_args) if needed?
       end
     end

···

--
Posted via http://www.ruby-forum.com/\.

If only the Internet came with an Undo button...

Since in the previous example Rake complains unless main_a and main_b
are defined, it sort of defeats the whole purpose. This works:

  task :setup_a do
    puts "setup_a"
  end

  task :setup_b do
    puts "setup_b"
  end

  task :setup => [:setup_a, :setup_b] do
    puts "setup phase complete. defining new tasks..."

    task :main_a do
      puts "main_a"
    end

    task :main_b do
      puts "main_b"
    end

    puts "restarting..."
    throw :restart
  end

  task :main => [:main_a, :main_b] do
    puts "main phase complete."
  end

  task :default => [:setup, :main] do
    puts "all done."
  end

However this defeats Drake, which I suppose is another matter.

···

--
Posted via http://www.ruby-forum.com/.

Thanks for the patch. Here's a clunkier variation on your suggestion that seems to work with Drake. Stage 1 serializes an unpredictable set of tasks. Stage 2 creates instances of them and runs them if necessary. There might be a better way that involves making the dependency tree modifiable dynamically. I think allowing all possible dependency changes would get complicated. Maybe that would require reevaluating the entire tree constantly and there's no way to un-execute a task anyway. But most of the real world problems I can think of seem to involve adding tasks that wouldn't have been exercised yet anyway. Could this be solved with an improved dependency tree walking algorithm?

require 'rake/clean'

# Stage 1 puts a random set of numbers in a file
STAGE_ONE_RESULTS = "s1.txt"
file STAGE_ONE_RESULTS do
  open(STAGE_ONE_RESULTS, 'wb') do |file|
    (1..5).map{|i|rand 10}.uniq.each do |i|
      puts "stage1 creating dependency #{i}"
      file.puts i
    end
  end
end
task :stage1 => STAGE_ONE_RESULTS

# Stage 2 creates task based on those random numbers
task :stage2 => :stage1
if File.exists? STAGE_ONE_RESULTS
  IO.readlines(STAGE_ONE_RESULTS).each do |task_info|
    task task_info do
      puts "stage2 executing #{task_info}"
    end
    task :stage2 => task_info
  end
end

task :all => :stage1 do
  puts `drake -j4 stage2`
end

CLEAN.include STAGE_ONE_RESULTS
task :default => :all

···

On 24 sept. 08, at 20:29, James M. Lawrence wrote:

If only the Internet came with an Undo button...

Since in the previous example Rake complains unless main_a and main_b
are defined, it sort of defeats the whole purpose. This works:

  task :setup_a do
    puts "setup_a"
  end

  task :setup_b do
    puts "setup_b"
  end

  task :setup => [:setup_a, :setup_b] do
    puts "setup phase complete. defining new tasks..."

    task :main_a do
      puts "main_a"
    end

    task :main_b do
      puts "main_b"
    end

    puts "restarting..."
    throw :restart
  end

  task :main => [:main_a, :main_b] do
    puts "main phase complete."
  end

  task :default => [:setup, :main] do
    puts "all done."
  end

However this defeats Drake, which I suppose is another matter.
--
Posted via http://www.ruby-forum.com/\.

The following is a better implementation which could be made to work
with Drake. A patch for regular Rake follows.

  task :setup_a do
    puts "setup_a"
  end

  task :setup_b do
    puts "setup_b"
  end

  task :setup => [:setup_a, :setup_b] do
    puts "setup phase complete. defining new tasks..."

    task :main_a do
      puts "main_a"
    end

    task :main_b do
      puts "main_b"
    end

    task :main => [:main_a, :main_b] do
      puts "main phase complete."
    end

    puts "restarting..."
    throw :restart
  end

  task :main

  task :default => [:setup, :main] do
    puts "all done."
  end

diff --git a/lib/rake.rb b/lib/rake.rb
index 36c2734..1e360a6 100755
--- a/lib/rake.rb
+++ b/lib/rake.rb
@@ -573,8 +573,8 @@ module Rake
           puts "** Invoke #{name} #{format_trace_flags}"
         end
         return if @already_invoked
- @already_invoked = true
         invoke_prerequisites(task_args, new_chain)
+ @already_invoked = true
         execute(task_args) if needed?
       end
     end
@@ -1994,7 +1994,14 @@ module Rake
         elsif options.show_prereqs
           display_prerequisites
         else
- top_level_tasks.each { |task_name| invoke_task(task_name) }
+ catch(:done) {
+ loop {
+ catch(:restart) {
+ top_level_tasks.each { |task_name|
invoke_task(task_name) }
+ throw :done
+ }
+ }
+ }
         end
       end
     end

Joe Wölfel wrote:

Thanks for the patch. Here's a clunkier variation on your
suggestion that seems to work with Drake. Stage 1 serializes an
unpredictable set of tasks. Stage 2 creates instances of them and
runs them if necessary. There might be a better way that involves
making the dependency tree modifiable dynamically. I think allowing
all possible dependency changes would get complicated. Maybe that
would require reevaluating the entire tree constantly and there's no
way to un-execute a task anyway. But most of the real world problems
I can think of seem to involve adding tasks that wouldn't have been
exercised yet anyway. Could this be solved with an improved
dependency tree walking algorithm?

I think the best strategy is to cache the dynamic changes and update
only when the :restart flag is thrown. Fortunately, Drake is already
structured to work this way.

Drake does a dry run to collect all tasks to be executed, then passes
the dependency graph to my CompTree package which executes it in
parallel (CompTree is a kind of modest Erlang-in-Ruby).

It would be safe to add tasks during execution, as CompTree is running a
shallow copy of the dependency graph and will be unaware of any new
tasks or dependencies. I don't foresee any serious issues with simply
restarting the computation with a new shallow copy of the appended
graph.

Though Drake copies the dependency tree for unrelated reasons, it turns
out to be coincidentally useful here because it acts as a cache while
the user can append the original.

Though this is mostly brainstorming, I do see a need for a restart
feature, whether or not these ideas pan out. The :setup phase may
execute non-trivial tasks which will be obliviously re-executed by :main
in the separate process. We could assume the two stages comprise
disjoint sets, however it would be difficult to enforce. It's an
artificial restriction which will eventually fail.

In the example above, the :main gets executed in Rake and Drake for
entirely different reasons. In single-threaded Rake, the restart
happens before it even gets to :main, so :main did not get marked as
@already_invoked. In multi-threaded Drake, :main *does* get marked,
however after the restart its newly-created child nodes will still be
executed because CompTree will not even *consider* a node until all its
children have been computed.

One difference: in single-threaded Rake you must be careful to add tasks
"in the future", some place ahead in the sequential order of execution.
In my example :setup modifies :main, which is fine since the order given
is [:setup, :main]. In multi-threaded Drake you don't have to worry
about it, for reasons mentioned in the previous paragraph.

James M. Lawrence

···

--
Posted via http://www.ruby-forum.com/\.

I've had a chance to play with your solution a bit. It is a much better solution than my other earlier crude solution that relaunched Rake. With your solution it seems all you need to do is call restart at the end of a task that creates new tasks. It's very simple. Everything just seems to work and previously executed tasks aren't executed twice. In my solution they are executed twice, which is bad (or at least expensive). Also, in my solution I have to hard code Rake parameters and it gets especially hairy when I have more than one task that creates other tasks.

I noticed that your patch doesn't seem to work with multitask for some reason. I'm not sure why. Also, as simple as it is to throw restart I'm wondering if it's possible that this could be done automatically - maybe with a warning for users who do it inadvertently. If a task is defined at any point while a task is executing then restart could be thrown automatically when the executing task completes. Then Rake could automatically support dynamic task creation. Would that make sense?

···

On 25 sept. 08, at 19:23, James M. Lawrence wrote:

The following is a better implementation which could be made to work
with Drake. A patch for regular Rake follows.

  task :setup_a do
    puts "setup_a"
  end

  task :setup_b do
    puts "setup_b"
  end

  task :setup => [:setup_a, :setup_b] do
    puts "setup phase complete. defining new tasks..."

    task :main_a do
      puts "main_a"
    end

    task :main_b do
      puts "main_b"
    end

    task :main => [:main_a, :main_b] do
      puts "main phase complete."
    end

    puts "restarting..."
    throw :restart
  end

  task :main

  task :default => [:setup, :main] do
    puts "all done."
  end

diff --git a/lib/rake.rb b/lib/rake.rb
index 36c2734..1e360a6 100755
--- a/lib/rake.rb
+++ b/lib/rake.rb
@@ -573,8 +573,8 @@ module Rake
           puts "** Invoke #{name} #{format_trace_flags}"
         end
         return if @already_invoked
- @already_invoked = true
         invoke_prerequisites(task_args, new_chain)
+ @already_invoked = true
         execute(task_args) if needed?
       end
     end
@@ -1994,7 +1994,14 @@ module Rake
         elsif options.show_prereqs
           display_prerequisites
         else
- top_level_tasks.each { |task_name| invoke_task(task_name) }
+ catch(:done) {
+ loop {
+ catch(:restart) {
+ top_level_tasks.each { |task_name|
invoke_task(task_name) }
+ throw :done
+ }
+ }
+ }
         end
       end
     end

Joe Wölfel wrote:

Thanks for the patch. Here's a clunkier variation on your
suggestion that seems to work with Drake. Stage 1 serializes an
unpredictable set of tasks. Stage 2 creates instances of them and
runs them if necessary. There might be a better way that involves
making the dependency tree modifiable dynamically. I think allowing
all possible dependency changes would get complicated. Maybe that
would require reevaluating the entire tree constantly and there's no
way to un-execute a task anyway. But most of the real world problems
I can think of seem to involve adding tasks that wouldn't have been
exercised yet anyway. Could this be solved with an improved
dependency tree walking algorithm?

I think the best strategy is to cache the dynamic changes and update
only when the :restart flag is thrown. Fortunately, Drake is already
structured to work this way.

Drake does a dry run to collect all tasks to be executed, then passes
the dependency graph to my CompTree package which executes it in
parallel (CompTree is a kind of modest Erlang-in-Ruby).

It would be safe to add tasks during execution, as CompTree is running a
shallow copy of the dependency graph and will be unaware of any new
tasks or dependencies. I don't foresee any serious issues with simply
restarting the computation with a new shallow copy of the appended
graph.

Though Drake copies the dependency tree for unrelated reasons, it turns
out to be coincidentally useful here because it acts as a cache while
the user can append the original.

Though this is mostly brainstorming, I do see a need for a restart
feature, whether or not these ideas pan out. The :setup phase may
execute non-trivial tasks which will be obliviously re-executed by :main
in the separate process. We could assume the two stages comprise
disjoint sets, however it would be difficult to enforce. It's an
artificial restriction which will eventually fail.

In the example above, the :main gets executed in Rake and Drake for
entirely different reasons. In single-threaded Rake, the restart
happens before it even gets to :main, so :main did not get marked as
@already_invoked. In multi-threaded Drake, :main *does* get marked,
however after the restart its newly-created child nodes will still be
executed because CompTree will not even *consider* a node until all its
children have been computed.

One difference: in single-threaded Rake you must be careful to add tasks
"in the future", some place ahead in the sequential order of execution.
In my example :setup modifies :main, which is fine since the order given
is [:setup, :main]. In multi-threaded Drake you don't have to worry
about it, for reasons mentioned in the previous paragraph.

James M. Lawrence
--
Posted via http://www.ruby-forum.com/\.