Point me to help w/ multithreading in 1.9.2-p0

Hi Folks - A week or two ago, I pinged this list for recommendations on
a load testing gem. Unfortunately, didn't see much response from that
that pointed me in the right direction. So I've set about to write my
own, using threads, and can't find proper resources to help me
understand what's going on w/ said threads.

Here's the two key methods I'm using and figure are the source of my
troubles. First question is whether there's any glaring errors here
that I'm missing:

class ThreadedLoadTester

  def initialize(users, apis, session=nil)

    @threads = []
    @count = users
    @session = session
    1.upto @count.to_i do

      @threads << Thread.new do
        Thread.current[:hi5] = Hi5fbapi.new 'redacted params'
        Thread.current["api_calls"] = []

        apis.each do |api|
    #get_call returns a proc object w/ a snippet of test code
          Thread.current["api_calls"] << get_call(api)
        end

        Thread.stop
      end
    end
  end

  def run_threads()

    @threads.each do |thr|
      thr[:api_calls].each do |api|
        p api.call thr[:hi5], @session
      end
    end

    @threads.each {|t| t.wakeup.join}
    return nil
  end

Basically, I spin up a bunch of threads when I initialize the object,
populate an array in each thread w/ a series of proc objects, and then
put them to sleep. Then, when I run_threads(), the idea is to iterate
over each thread, iterate over the thread's array of procs, and call
each one. Oh, and this object is consumed by a sinatra-based web app.

This works sometimes. And sometimes it segfaults in a nasty way. I'm
having a hard time finding the technical details of how threading in
Ruby, particularly 1.9.2-p0, works. Any pointers?

Thanks,
Alex

Nevermind... figured it out.

Though I still wonder if there's any highly-detailed technical
documentation on ruby threading? (Or is the interpreter source
considered it?)

-Alex

···

On Mon, 2010-09-27 at 20:36 -0500, Alex Stahl wrote:

Hi Folks - A week or two ago, I pinged this list for recommendations on
a load testing gem. Unfortunately, didn't see much response from that
that pointed me in the right direction. So I've set about to write my
own, using threads, and can't find proper resources to help me
understand what's going on w/ said threads.

Here's the two key methods I'm using and figure are the source of my
troubles. First question is whether there's any glaring errors here
that I'm missing:

class ThreadedLoadTester

  def initialize(users, apis, session=nil)

    @threads =
    @count = users
    @session = session
    1.upto @count.to_i do

      @threads << Thread.new do
        Thread.current[:hi5] = Hi5fbapi.new 'redacted params'
        Thread.current["api_calls"] =

        apis.each do |api|
    #get_call returns a proc object w/ a snippet of test code
          Thread.current["api_calls"] << get_call(api)
        end

        Thread.stop
      end
    end
  end

  def run_threads()

    @threads.each do |thr|
      thr[:api_calls].each do |api|
        p api.call thr[:hi5], @session
      end
    end

    @threads.each {|t| t.wakeup.join}
    return nil
  end

Basically, I spin up a bunch of threads when I initialize the object,
populate an array in each thread w/ a series of proc objects, and then
put them to sleep. Then, when I run_threads(), the idea is to iterate
over each thread, iterate over the thread's array of procs, and call
each one. Oh, and this object is consumed by a sinatra-based web app.

This works sometimes. And sometimes it segfaults in a nasty way. I'm
having a hard time finding the technical details of how threading in
Ruby, particularly 1.9.2-p0, works. Any pointers?

Thanks,
Alex

Mind to share?

thanks,
- Markus

···

On 28.09.2010 04:35, Alex Stahl wrote:

Nevermind... figured it out.

No prob... but I'm not sure it's quite what you're looking for. I'm
refactoring code I can't get to work instead of finding the root cause
in the threading. (BTW - any thread insight would be appreciated based
on this write-up - cuz there's still problems! I'm starting to consider
that a Threaded Load Tester Gem might be handy...)

After staring at the screen for too long, I took a break to ponder
whether the organization of my threads was fundamentally flawed. It
occurred to me that one of the problems I was having - a deadlock -
could be avoided by instantiating the threads only when needed.

If you review my original code snippet, I spin up all the threads w/ a
proc object in initialize(), then .stop them, then perform via method
a .call on each element of the thread's api array, then a .wakeup
followed by the .join.

...
1.upto @count do
  @threads << Thread.new do
    Thread.current[:hi5] = Hi5fbapi.new 'redacted params'
    Thread.current["api_calls"] = []
    apis.each do |api|
      Thread.current["api_calls"] << get_call(api) #pushes a proc obj
    end
    Thread.stop
  end
end
...
@threads.each do |thr|
  thr[:api_calls].each do |api|
    p api.call thr[:hi5], @session
  end
end
@threads.each {|t| t.wakeup.join}
...

On 1.8.7 w/ a single core processor, this is a highly deterministic
sequence, and would not deadlock. Once deployed to a multi-core VM
running 1.9.2-p0 (selected specifically for concurrency), not so much.

There I encountered more deadlocks and also "NoMethodError"s from
Sinatra (undefined method `bytesize' for #<Thread:0xa36ae0c dead>).
This would occur during the .each where I would .join. So it's trying
to join a 'dead' thread. Except that if I add anything to prevent that,
such as

... unless t.status == 'dead'...

it would still deadlock or NoMethodError.

But further testing showed that adding any operation to the main thread,
prior to calling .join, would prevent the deadlock:

...
@threads.each do |thr|
  p thr.inspect
...

Anyway, I rewrote things where the threads are created in the method,
not initialize(), so '@api_calls' is already populated by procs, and it
works fine at small scale:

...
thr = []
1.upto @count do
  thr << Thread.new do
    @api_calls.each do |api|
      p api.call @hi5, @session
      Thread.pass
    end
  end
end
thr.each {|t| t.join}
...

This runs fine on 1.8.7/single and 1.9.2/multi at like 5-10 threads.
But when I ramp up to, say, 5,000 (it is a *load* test!), 1.8.7 is fine
but 1.9.2 segfaults.

Even 500 threads on 1.9.2 is segfaulting right now (but not 1.8.7). I
get the handy output:

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension
libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

I'll write it up tomorrow.

Some open questions:
1. When .join is called on a multi-core system, what qualifies as the
calling thread? The .main on the main processor, or the thread which
instantiates my ThreadedLoadTester object? (i.e. what if
ThreadedLoadTester is created from a sinatra thread which itself isn't
main?)

2. Sinatra regularly reports a 'NoMethodError' for 'bytesize' when the
last thread is dead but joined to the main thread. But only when the
main thread originates w/in sinatra, and not an inline call.

3. Is there a theoretical maximum to the number of concurrent threads
which can be created which all access a network interface? This is
admittedly a poor theory - what might really cause a segfault in 1.9.2
when 500 threads all try to access the network?

Thanks for asking :slight_smile:
-Alex

Are you opening multiple file descriptors per thread? If you exceed
1024 file descriptors per-process, then the select() interface will
overrun the buffers and segfault. You can fork() your process to get
around this (and get more CPU/memory concurrency), or do all your IO
over something like Rev[1] or EventMachine[2] which use epoll or kqueue.

Which OS is this on? There may be some lingering pthreads portability
issues for non-NPTL. Definitely talk to ruby-core about this.

[1] - http://rev.rubyforge.org/
[2] - http://rubyeventmachine.com/

···

Alex Stahl <astahl@hi5.com> wrote:

3. Is there a theoretical maximum to the number of concurrent threads
which can be created which all access a network interface? This is
admittedly a poor theory - what might really cause a segfault in 1.9.2
when 500 threads all try to access the network?

--
Eric Wong

Without going into too much detail I believe one flaw of your design here is that you are not using thread synchronization but instead try to explicitly start and stop threads and yield execution. It may be that this is causing your cores, but I really don't know.

What I would do:

1. Use a condition variable to let all threads start at the same time.

2. use Thread#value to collect results.

require 'thread'

lock = Mutex.new
cond = ConditionVariable.new
start = false

threads = (1..10).map do
   Thread.new do
     lock.synchronize do
       until start
         cond.wait(lock)
       end
     end

     # work
     # return results
     [rand(10), rand(100)]
   end
end

lock.synchronize do
   start = true
   cond.signal
end

threads.each do |th|
   p th.value
end

You can probably get away without the condition variable by just acquiring the lock (lock.synchronize) in the main thread before you create all threads and let all threads just synchronize with an empty block.

Kind regards

  robert

···

On 28.09.2010 11:16, Alex Stahl wrote:

No prob... but I'm not sure it's quite what you're looking for. I'm
refactoring code I can't get to work instead of finding the root cause
in the threading. (BTW - any thread insight would be appreciated based
on this write-up - cuz there's still problems! I'm starting to consider
that a Threaded Load Tester Gem might be handy...)

After staring at the screen for too long, I took a break to ponder
whether the organization of my threads was fundamentally flawed. It
occurred to me that one of the problems I was having - a deadlock -
could be avoided by instantiating the threads only when needed.

If you review my original code snippet, I spin up all the threads w/ a
proc object in initialize(), then .stop them, then perform via method
a .call on each element of the thread's api array, then a .wakeup
followed by the .join.

..
1.upto @count do
   @threads<< Thread.new do
     Thread.current[:hi5] = Hi5fbapi.new 'redacted params'
     Thread.current["api_calls"] =
     apis.each do |api|
       Thread.current["api_calls"]<< get_call(api) #pushes a proc obj
     end
     Thread.stop
   end
end
..
@threads.each do |thr|
   thr[:api_calls].each do |api|
     p api.call thr[:hi5], @session
   end
end
@threads.each {|t| t.wakeup.join}
..

On 1.8.7 w/ a single core processor, this is a highly deterministic
sequence, and would not deadlock. Once deployed to a multi-core VM
running 1.9.2-p0 (selected specifically for concurrency), not so much.

There I encountered more deadlocks and also "NoMethodError"s from
Sinatra (undefined method `bytesize' for #<Thread:0xa36ae0c dead>).
This would occur during the .each where I would .join. So it's trying
to join a 'dead' thread. Except that if I add anything to prevent that,
such as

.. unless t.status == 'dead'...

it would still deadlock or NoMethodError.

But further testing showed that adding any operation to the main thread,
prior to calling .join, would prevent the deadlock:

..
@threads.each do |thr|
   p thr.inspect
..

Anyway, I rewrote things where the threads are created in the method,
not initialize(), so '@api_calls' is already populated by procs, and it
works fine at small scale:

..
thr =
1.upto @count do
   thr<< Thread.new do
     @api_calls.each do |api|
       p api.call @hi5, @session
       Thread.pass
     end
   end
end
thr.each {|t| t.join}
..

This runs fine on 1.8.7/single and 1.9.2/multi at like 5-10 threads.
But when I ramp up to, say, 5,000 (it is a *load* test!), 1.8.7 is fine
but 1.9.2 segfaults.

Even 500 threads on 1.9.2 is segfaulting right now (but not 1.8.7). I
get the handy output:

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension
libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

I'll write it up tomorrow.

Some open questions:
1. When .join is called on a multi-core system, what qualifies as the
calling thread? The .main on the main processor, or the thread which
instantiates my ThreadedLoadTester object? (i.e. what if
ThreadedLoadTester is created from a sinatra thread which itself isn't
main?)

2. Sinatra regularly reports a 'NoMethodError' for 'bytesize' when the
last thread is dead but joined to the main thread. But only when the
main thread originates w/in sinatra, and not an inline call.

3. Is there a theoretical maximum to the number of concurrent threads
which can be created which all access a network interface? This is
admittedly a poor theory - what might really cause a segfault in 1.9.2
when 500 threads all try to access the network?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks for the note, Eric. Each thread is only opening one file
descriptor, so I haven't encountered a limit there. And a refactor
eliminated the segfaults.

Although, after further rewriting, I'm now bumping into a maximum of
~2,950 threads when running on 1.9.2-p0 (on Ubuntu 10.04). When I run
w/ >2,950, there's one of three errors I encounter:

-can't create Thread (11) (ThreadError) (from my class)
-out of memory error (from my class; sorry, missed a chance to copy)
-Cannot assign requested address - connect(2) (Errno::EADDRNOTAVAIL)
(from HTTP class)

Changed up the class to be a little more concise:

  def run_threads()
    thr =
    1.upto @count do
      thr << Thread.new do
        @tests.each do |test|
          p test.call @obj, @session
          Thread.pass
        end
      end
    end
    thr.each {|t| t.join}
  end

So what limits might I be bumping into now? The "can't create...' seems
to be the most common error - what could cause that? Is my best course
of action to d/l source, grep for that string, and analyze from there?
Or is this possibly a bug? Or am I beyond expected threading usage??

Thanks,
Alex

···

On Tue, 2010-09-28 at 14:37 -0500, Eric Wong wrote:

Alex Stahl <astahl@hi5.com> wrote:
> 3. Is there a theoretical maximum to the number of concurrent threads
> which can be created which all access a network interface? This is
> admittedly a poor theory - what might really cause a segfault in 1.9.2
> when 500 threads all try to access the network?

Are you opening multiple file descriptors per thread? If you exceed
1024 file descriptors per-process, then the select() interface will
overrun the buffers and segfault. You can fork() your process to get
around this (and get more CPU/memory concurrency), or do all your IO
over something like Rev[1] or EventMachine[2] which use epoll or kqueue.

Which OS is this on? There may be some lingering pthreads portability
issues for non-NPTL. Definitely talk to ruby-core about this.

[1] - http://rev.rubyforge.org/
[2] - http://rubyeventmachine.com/

Without going into too much detail I believe one flaw of your design
here is that you are not using thread synchronization but instead try to
explicitly start and stop threads and yield execution. It may be that
this is causing your cores, but I really don't know.

I agree that in general it's better to do this kind of thing using the
appropriate synchronization data structures... however, I should
expect that it is not possible to crash the ruby interpreter purely by
writing ruby code, regardless of the presence of bugs in it. I don't
know whether it is possible to actually achieve this level of
reliability when dealing with threading code that contains race
conditions, tho.

What I would do:

1. Use a condition variable to let all threads start at the same time.

[snip]

lock.synchronize do
   start = true
   cond.signal
end

Putting aside my prejudices against ConditionVariable, there is
another problem with this: ConditionVariable#signal awakens only one
thread waiting on the condvar. You'd want to use
ConditionVariable#broadcast instead. But even then, there is a race
condition; you're not guaranteed that all the threads have blocked
waiting on the condvar. Some may still be running the code before that
point, and they would end up never running to completion. A counting
semaphore could solve that, but ruby doesn't actually have one of
those (sigh).

Thread synchronization is a real PITA.

I see now that the first line of ConditionVariable#broadcast is this:
  # TODO: imcomplete
So, maybe there's some kind of problem with it?

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty block.

I can't see any holes in this scheme, so it's probably the best idea.

···

On 9/29/10, Robert Klemme <shortcutter@googlemail.com> wrote:

Without going into too much detail I believe one flaw of your design
here is that you are not using thread synchronization but instead try to
explicitly start and stop threads and yield execution. It may be that
this is causing your cores, but I really don't know.

I agree that in general it's better to do this kind of thing using the
appropriate synchronization data structures... however, I should
expect that it is not possible to crash the ruby interpreter purely by
writing ruby code, regardless of the presence of bugs in it. I don't
know whether it is possible to actually achieve this level of
reliability when dealing with threading code that contains race
conditions, tho.

Absolutely agree. But since 1.9.2 is pretty fresh I'd expect the more traditional code (proper thread sync) to be less likely to crash than obscure variants.

What I would do:

1. Use a condition variable to let all threads start at the same time.

[snip]

lock.synchronize do
    start = true
    cond.signal
end

Putting aside my prejudices against ConditionVariable, there is
another problem with this: ConditionVariable#signal awakens only one
thread waiting on the condvar. You'd want to use
ConditionVariable#broadcast instead.

Right, sorry for mixing this up.

But even then, there is a race
condition; you're not guaranteed that all the threads have blocked
waiting on the condvar. Some may still be running the code before that
point, and they would end up never running to completion.

This is not true. The only negative thing that would happen is that they would not start doing their work at the "same" time as the other threads. Other than that they would do their work as the other threads.

A counting
semaphore could solve that, but ruby doesn't actually have one of
those (sigh).

Thread synchronization is a real PITA.

Well, your answer kind of confirms this. :slight_smile:

I see now that the first line of ConditionVariable#broadcast is this:
   # TODO: imcomplete
So, maybe there's some kind of problem with it?

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty block.

I can't see any holes in this scheme, so it's probably the best idea.

Even then it could be that some threads execute code before the synchronize and thus not start concurrently with other threads. Frankly, for a test scenario I would not bother to try to let threads start really concurrently. Unless there is huge preparation overhead I would simply create those threads and let them do their work. There is no guarantee anyway that they can work in parallel because in a non realtime OS there are no guarantees as to when the scheduler decides to give CPU to threads.

Kind regards

  robert

···

On 03.10.2010 02:38, Caleb Clausen wrote:

On 9/29/10, Robert Klemme<shortcutter@googlemail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

But even then, there is a race
condition; you're not guaranteed that all the threads have blocked
waiting on the condvar. Some may still be running the code before that
point, and they would end up never running to completion.

This is not true. The only negative thing that would happen is that
they would not start doing their work at the "same" time as the other
threads. Other than that they would do their work as the other threads.

Oh, you are right. The boolean variable start prevents the case I was
worried about. This is why I find condvars confusing.

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty
block.

I can't see any holes in this scheme, so it's probably the best idea.

Even then it could be that some threads execute code before the
synchronize and thus not start concurrently with other threads.

But you made the synchronize statement at the very start of the thread
body, so this would seem to not be a concern in this case.

···

On 10/3/10, Robert Klemme <shortcutter@googlemail.com> wrote:

On 03.10.2010 02:38, Caleb Clausen wrote:

The access to the cond var in the synchronize block was also the first statement in the thread body:

   Thread.new do
     lock.synchronize do
       until start
         cond.wait(lock)
       end
     end

     # work
     # return results
     [rand(10), rand(100)]
   end

So there is really not that much difference. In practice this will usually not be a problem but from a more formal perspective it does not matter how many operations are performed before the synchronization - it still may be that a thread does not get CPU to get there. That's why I said that for a test scenario I would only bother to have all threads started concurrently if there was a lot of ramp up work to do.

Kind regards

  robert

···

On 03.10.2010 20:30, Caleb Clausen wrote:

On 10/3/10, Robert Klemme<shortcutter@googlemail.com> wrote:

On 03.10.2010 02:38, Caleb Clausen wrote:

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty
block.

I can't see any holes in this scheme, so it's probably the best idea.

Even then it could be that some threads execute code before the
synchronize and thus not start concurrently with other threads.

But you made the synchronize statement at the very start of the thread
body, so this would seem to not be a concern in this case.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/