Understanding Threads

Matt_White · 9 November 2009 20:45

I am writing an app that retrieves multiple web pages in one method
call. Threading has improved performance drastically for me, but I
need some help understanding how exactly the call to join is going to
affect my program.

Here's some code:

def method(string)
  result = {}
  mutex = Mutex.new
  threads = []

  %w{methodname1 methodname2 methodname3 methodname4}.each do |method|
    threads << Thread.new(method) do |m|
      r = eval("#{m}(string)") # each method call makes an HTTP
request
      mutex.synchronize { result.merge!(r) }
    end
  end
  threads.each { |t| t.join }
  result
end

Seems like the call to join on each thread is necessary to keep the
script from getting ahead of itself, but if I exclude that line, it
doesn't seem to hurt my results and the program runs a lot faster.
Also, sometimes I get deadlocked somehow if I do use the call to join
and I'm not certain as to why. Can someone help shed some light on the
situation? Do I need to call join? Any idea why I'm deadlocking?
Thanks!

Jason_R · 9 November 2009 21:14

Thread#join simply says "Wait here until this thread has finished
executing".

So what you're doing is waiting for all threads to finish before execution
continues, aka blocking main thread execution. Without the #join, the values
in results will be nonderministic. Any perceived deadlocking is probably
whatever is in your eval call not timing out. You'll have to watch out
carefully for that.

Jason

···

On Mon, Nov 9, 2009 at 3:45 PM, Matt White <mattw922@gmail.com> wrote:

I am writing an app that retrieves multiple web pages in one method
call. Threading has improved performance drastically for me, but I
need some help understanding how exactly the call to join is going to
affect my program.

Here's some code:

def method(string)
result = {}
mutex = Mutex.new
threads =

%w{methodname1 methodname2 methodname3 methodname4}.each do |method|
   threads << Thread.new(method) do |m|
     r = eval("#{m}(string)") # each method call makes an HTTP
request
     mutex.synchronize { result.merge!(r) }
   end
end
threads.each { |t| t.join }
result
end

Seems like the call to join on each thread is necessary to keep the
script from getting ahead of itself, but if I exclude that line, it
doesn't seem to hurt my results and the program runs a lot faster.
Also, sometimes I get deadlocked somehow if I do use the call to join
and I'm not certain as to why. Can someone help shed some light on the
situation? Do I need to call join? Any idea why I'm deadlocking?
Thanks!

Judson_Lester1 · 9 November 2009 21:24

I completely agree with Jason's diagnosis. I'd like to make two
observations, though.

First, you can avoid the mutex entirely by using thread-local variables:

threads << Thread.new { Thread.current[:result] = method1(string) }

  results = threads.inject({}) do |results, thread|
    thread.join
    results.merge(thread[:result])
  end

Second, and (possibly) more controversially, just because you can eval
doesn't mean you should. To my eye, this looks nicer:

  threads = [
   Thread.new { Thread.current[:result] = method1(string) },
   Thread.new { Thread.current[:result] = method2(string) },
   Thread.new { Thread.current[:result] = method3(string) },
   Thread.new { Thread.current[:result] = method4(string) }
  ]

And exception handling, etc, will be ever so much clearer.

Judson

···

On Mon, Nov 9, 2009 at 1:14 PM, Jason Roelofs <jameskilton@gmail.com> wrote:

On Mon, Nov 9, 2009 at 3:45 PM, Matt White <mattw922@gmail.com> wrote:

> I am writing an app that retrieves multiple web pages in one method
> call. Threading has improved performance drastically for me, but I
> need some help understanding how exactly the call to join is going to
> affect my program.
>
> Here's some code:
>
> def method(string)
> result = {}
> mutex = Mutex.new
> threads =
>
> %w{methodname1 methodname2 methodname3 methodname4}.each do |method|
> threads << Thread.new(method) do |m|
> r = eval("#{m}(string)") # each method call makes an HTTP
> request
> mutex.synchronize { result.merge!(r) }
> end
> end
> threads.each { |t| t.join }
> result
> end
>
> Seems like the call to join on each thread is necessary to keep the
> script from getting ahead of itself, but if I exclude that line, it
> doesn't seem to hurt my results and the program runs a lot faster.
> Also, sometimes I get deadlocked somehow if I do use the call to join
> and I'm not certain as to why. Can someone help shed some light on the
> situation? Do I need to call join? Any idea why I'm deadlocking?
> Thanks!
>
>
Thread#join simply says "Wait here until this thread has finished
executing".

So what you're doing is waiting for all threads to finish before execution
continues, aka blocking main thread execution. Without the #join, the
values
in results will be nonderministic. Any perceived deadlocking is probably
whatever is in your eval call not timing out. You'll have to watch out
carefully for that.

Jason

Robert_K1 · 9 November 2009 22:01

I completely agree with Jason's diagnosis. I'd like to make two
observations, though.

First, you can avoid the mutex entirely by using thread-local variables:

  threads << Thread.new { Thread.current[:result] = method1(string) }

  results = threads.inject({}) do |results, thread|
    thread.join
    results.merge(thread[:result])
  end

Even better: we have Thread.value. If you join only from a single thread there is no additional synchronization needed:

irb(main):001:0> t = (1..5).map {|i| Thread.new(i) {|x| "value #{x}"} }
=> [#<Thread:0x9c4f618 dead>, #<Thread:0x9c4f58c dead>, #<Thread:0x9c4f4b0 run>, #<Thread:0x9c4f424 run>, #<Thread:0x9c4f398 run>]
irb(main):002:0> t.map {|th| th.value}
=> ["value 1", "value 2", "value 3", "value 4", "value 5"]

Second, and (possibly) more controversially, just because you can eval
doesn't mean you should. To my eye, this looks nicer:

  threads = [
   Thread.new { Thread.current[:result] = method1(string) },
   Thread.new { Thread.current[:result] = method2(string) },
   Thread.new { Thread.current[:result] = method3(string) },
   Thread.new { Thread.current[:result] = method4(string) }
  ]

And exception handling, etc, will be ever so much clearer.

You can as well do

def method(string)
   threads = %w{
     methodname1
     methodname2
     methodname3
     methodname4
     }.map do |method|
       Thread.new(method) do |m|
         send(m, string) # each method call makes an HTTP request
       end.map {|th| th.value}
end

and be done.

Kind regards

robert

···

On 11/09/2009 10:24 PM, Judson Lester wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
How do threads and join work? ruby-talk	19	131	27 April 2006
Why not call Thread.join? ruby-talk	12	129	1 January 2008
How to achieve parallelism, using threads? ruby-talk	5	123	26 May 2010
A basic question about Threads ruby-talk	3	151	18 November 2012
Ruby & Threads ruby-talk	3	89	14 July 2008

Understanding Threads

Related topics