Why not call Thread.join?

Take this code from the Ruby Cookbook:

module Enumerable
   def each_simultaneously
     threads = []
     each { |e| threads << Thread.new { yield e } }
     return threads
   end
end

It is used on an array so that you may do this:
[1,2,3].each_simultaneously do |i|
  sleep 5
  puts i
end

And it works!

But why don't I need to call threads.each {|t| t.join }?

And if I did, would it slow it down?

Thanks,
Ari
-------------------------------------------|
Nietzsche is my copilot

Take this code from the Ruby Cookbook:

module Enumerable
def each_simultaneously
threads =
each { |e| threads << Thread.new { yield e } }
return threads
end
end

It is used on an array so that you may do this:
[1,2,3].each_simultaneously do |i|
sleep 5
puts i
end

And it works!

What did you expect to happen?
The example you provided will do nothing but create threads and
exit.

But why don't I need to call threads.each {|t| t.join }?

Any running threads are killed when the program exits.

And if I did, would it slow it down?

Generally speaking, the only thing it would slow down (stop really) is
the execution path of the main thread.

Now if for some reason your main thread has to do other work, a join
would delay that, of course.

···

On Dec 30, 9:02 pm, thefed <fed...@gmail.com> wrote:

Take this code from the Ruby Cookbook:

module Enumerable
   def each_simultaneously
     threads =
     each { |e| threads << Thread.new { yield e } } return threads
   end
end

It is used on an array so that you may do this:
[1,2,3].each_simultaneously do |i|
  sleep 5
  puts i
end

When I ran this (not in IRB) it didn't work. The interpreter terminated
before any of the threads finished sleeping for 5 seconds. In any case,
you want to join each thread so that the next statement will only execute
after all of the threads have finished their work (otherwise your next
statement will see an undetermined intermediate view of the array).

OK, I understand it better. But why does each {|t| t.join} join them
all at the same time (ish), and not wait for the first one to finish
executing before joining the others?

It joins them one at a time in order. But while your main thread is
waiting for a specific thread to finish, any other thread is also allowed
to execute, and possibly terminate. If thread b terminates while thread a
is joined, then you call join on thread b, join will return immediately
since there's nothing to wait for. Hence, each{|t| t.join} finishes
practically immediately when the longest running thread finishes.

--Ken

···

On Mon, 31 Dec 2007 00:02:10 -0500, thefed wrote:

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Nevertheless it's good practice to join. If main has other work to do then you should join once that is done, i.e. at the end of the script. If those threads have terminated already you basically only have the overhead of the Threads Array iteration - but you get robustness in return, i.e. you ensure that all those Threads can terminate properly (assuming that they are written in a way to do that eventually).

Kind regards

  robert

···

On 31.12.2007 06:45, Skye Shaw!@#$ wrote:

On Dec 30, 9:02 pm, thefed <fed...@gmail.com> wrote:

Take this code from the Ruby Cookbook:

module Enumerable
   def each_simultaneously
     threads =
     each { |e| threads << Thread.new { yield e } }
     return threads
   end
end

It is used on an array so that you may do this:
[1,2,3].each_simultaneously do |i|
        sleep 5
        puts i
end

And it works!

What did you expect to happen?
The example you provided will do nothing but create threads and
exit.

But why don't I need to call threads.each {|t| t.join }?

Any running threads are killed when the program exits.

And if I did, would it slow it down?

Generally speaking, the only thing it would slow down (stop really) is
the execution path of the main thread.

Now if for some reason your main thread has to do other work, a join
would delay that, of course.

OK, I understand it better. But why does each {|t| t.join} join them all at the same time (ish), and not wait for the first one to finish executing before joining the others?

···

On Dec 31, 2007, at 12:49 AM, Skye Shaw!@#$ wrote:

Generally speaking, the only thing it would slow down (stop really) is
the execution path of the main thread.

Now if for some reason your main thread has to do other work, a join
would delay that, of course.

They are not joined at the same time but one after the other.

Cheers

  robert

···

On 31.12.2007 17:02, thefed wrote:

On Dec 31, 2007, at 12:49 AM, Skye Shaw!@#$ wrote:

Generally speaking, the only thing it would slow down (stop really) is
the execution path of the main thread.

Now if for some reason your main thread has to do other work, a join
would delay that, of course.

OK, I understand it better. But why does each {|t| t.join} join them all at the same time (ish), and not wait for the first one to finish executing before joining the others?

But then why doesn't this take 15 seconds? t.join is called in the main thread, so shouldn't the next Thread#join not get called until the first one finishes?

  module Enumerable
    def each_simultaneously
      threads =
      each { |e| threads >> Thread.new { yield e } }
      return threads
    end
  end

start_time = Time.now
[7,8,9].each_simultaneously do |e|
    sleep(5) # Simulate a long, high-latency operation
    print "Completed operation for #{e}!\n"
end
# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

···

On Dec 31, 2007, at 11:15 AM, Robert Klemme wrote:

On 31.12.2007 17:02, thefed wrote:

OK, I understand it better. But why does each {|t| t.join} join them all at the same time (ish), and not wait for the first one to finish executing before joining the others?

They are not joined at the same time but one after the other.

  module Enumerable
    def each_simultaneously
      threads =
      each { |e| threads << Thread.new { yield e } }
      return threads
    end
  end

Sorry all, THIS is the fixed up version of each_simultaneously. Turns out Ruby Cookbook has errors, too!

OK, I understand it better. But why does each {|t| t.join} join them all at the same time (ish), and not wait for the first one to finish executing before joining the others?

They are not joined at the same time but one after the other.

But then why doesn't this take 15 seconds? t.join is called in the main thread, so shouldn't the next Thread#join not get called until the first one finishes?

  module Enumerable
    def each_simultaneously
      threads =
      each { |e| threads >> Thread.new { yield e } }
      return threads
    end
  end

start_time = Time.now
[7,8,9].each_simultaneously do |e|
  sleep(5) # Simulate a long, high-latency operation
  print "Completed operation for #{e}!\n"
end
# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

try looking at the crude timeline below...

sec 0 1 2 3 4 5 6 7
  >---------|---------|---------|---------|---------|---------|---------|
main ====@=================================================
t[1] ===================================================
t[2] ===================================================
t[3] ===================================================

The @ on the main thread represents when the t.join gets called. It waits in this simple case for t[1] to finish it's work (sleeping for 5 seconds), then waits for t[2]. As t[2] has also been doing work all this time, it only blocks the main thread for another 0.1 sec before finishing. Same for t[3]. So this contrived example it takes 5 seconds + whatever overhead for starting threads.

You could throw more instrumentation in there if you wish and do things like adding additional calls to sleep to simulate extra thread overhead to make it more obvious.

Thank you SO MUCH! This really clears threading up for me. In retrospect it was less than obvious, but evident nonetheless. But this timeline really made the difference for me. Thank you!

- Ari

···

On Dec 31, 2007, at 3:46 PM, Craig Beck wrote:

try looking at the crude timeline below...

sec 0 1 2 3 4 5 6 7
  >---------|---------|---------|---------|---------|---------|---------|
main ====@=================================================
t[1] ===================================================
t[2] ===================================================
t[3] ===================================================

The @ on the main thread represents when the t.join gets called. It waits in this simple case for t[1] to finish it's work (sleeping for 5 seconds), then waits for t[2]. As t[2] has also been doing work all this time, it only blocks the main thread for another 0.1 sec before finishing. Same for t[3]. So this contrived example it takes 5 seconds + whatever overhead for starting threads.

You could throw more instrumentation in there if you wish and do things like adding additional calls to sleep to simulate extra thread overhead to make it more obvious.

Craig Beck wrote:

  module Enumerable
  print "Completed operation for #{e}!\n"
end
# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

try looking at the crude timeline below...

sec 0 1 2 3 4 5
6 7
  >---------|---------|---------|---------|---------|---------|---------|
main ====@=================================================
t[1] ===================================================
t[2] ===================================================
t[3] ===================================================

The @ on the main thread represents when the t.join gets called. It
waits in this simple case for t[1] to finish it's work (sleeping for 5
seconds), then waits for t[2]. As t[2] has also been doing work all
this time, it only blocks the main thread for another 0.1 sec before
finishing. Same for t[3]. So this contrived example it takes 5 seconds
+ whatever overhead for starting threads.

You could throw more instrumentation in there if you wish and do
things like adding additional calls to sleep to simulate extra thread
overhead to make it more obvious.

To me the important point in addition to the parallelism is that, when
run in batch mode, say with SciTE, main takes less than a second and
kills all the threads. Hence the messages are never seen. To see
the reports you have to do something like

start_time = Time.now
[7,8,9].each_simultaneously do |e|
    sleep(5) # Simulate a long, high-latency operation
    print "Completed operation for #{e}!\n"
end
sleep 5 #######main must take at least 5 seconds!!!!
# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

to guarantee that the threads have 5 seconds to finish
their operation. Or you can use

module Enumerable
   def each_simultaneously
     collect {|e| Thread.new {yield e}}.each {|t| t.join}
   end
end

which guarantees that the threads will finish before
control is returned to main.

In reality it is also important that threads spend a large
part of their operation just waiting when there is only one
CPU.

I think the problem arose because the example on page 760
of the Ruby Cookbook does not mention the necessity of the
main thread lasting long enough and does not show code to
make it happen.

I realize that much of this may have been obvious to some
who replied, but as a newby it wasn't to me until I read
the section and played with the code.

Ian

···

--
Posted via http://www.ruby-forum.com/\.

To me the important point in addition to the parallelism is that, when
run in batch mode, say with SciTE, main takes less than a second and
kills all the threads. Hence the messages are never seen. To see
the reports you have to do something like

start_time = Time.now
[7,8,9].each_simultaneously do |e|
    sleep(5) # Simulate a long, high-latency operation
    print "Completed operation for #{e}!\n"
end
sleep 5 #######main must take at least 5 seconds!!!!

Sorry to say that, but this is a bogus solution. Using sleep for this is not a good idea: if tasks take longer then you will loose output anyway or even risk that some tasks are not finished properly, if all tasks are finished much faster you'll waste time.

The thread killing is the exact reason why #each_simultaneously was built to return an Array of Thread objects. That way you can join all the threads.

# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

to guarantee that the threads have 5 seconds to finish
their operation. Or you can use

module Enumerable
   def each_simultaneously
     collect {|e| Thread.new {yield e}}.each {|t| t.join}
   end
end

which guarantees that the threads will finish before
control is returned to main.

I prefer the solution that does not join in the method but returns Threads. If you think about it, that version is significantly more flexible. You can join those threads immediately

an_enum.each_simultaneously {|e| ... }.each {|th| th.join}

but you can as well do some work in between

threads = an_enum.each_simultaneously {|e| ... }
do_some_work
...
threads.each {|th| th.join}

I realize that much of this may have been obvious to some
who replied, but as a newby it wasn't to me until I read
the section and played with the code.

When I was initially confronted with multithreading it also took me a while. For me at the time it was difficult to not confuse Thread objects with threads. This was in Java which decouples Thread object creation and thread execution, which probably makes it a bit easier to grasp the concepts.

It is important to keep this distinction in mind: a Thread object in a way is an object that is like any other object just with the added twist that it *may* be associated with an independent thread of execution (i.e. in Java it is not associated until the thread starts and after the thread terminates, in Ruby the association is there right from the start because threads are started immediately and lasts until the thread terminates).

Kind regards

  robert

···

On 01.01.2008 03:25, Ian Whitlock wrote:

Robert Klemme wrote:

I prefer the solution that does not join in the method but returns
Threads. If you think about it, that version is significantly more
flexible. You can join those threads immediately

an_enum.each_simultaneously {|e| ... }.each {|th| th.join}

but you can as well do some work in between

threads = an_enum.each_simultaneously {|e| ... }
do_some_work
...
threads.each {|th| th.join}

Thanks. That helps both with my understanding the significance
of collect and threads.

Ian

···

--
Posted via http://www.ruby-forum.com/\.