Ruby 1.8.6, threadpooling and blocking sockets - advice/help

Hi,
I think I'm running up against ruby 1.8.6's not so
stellar threading system. Was hoping someone
could confirm or otherwise point out some flaws.

Note: I get reasonable performance when running on
ruby 1.9 it's just 1.8.6 that hangs like a
deadlock when I start using too many threads in
one of my test scripts. (My focus is actually
on 1.9 and jruby anyway).

Give you an idea:

I might get a pool of 10 acceptor threads to run
something like the following (each has their own
version of this code):

    client, client_sockaddr = @socket.accept
      # Threads block on #accept.
    data = client.recvfrom( 40 )[0].chomp
    @mutex.synchronize do
      puts "#{Thread.current} received #{data}... "
    end
    client.close

on @socket which was set up like this:

    @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    @socket.bind( @sockaddr )
    @socket.listen( 100 )

I wanted to create a barrage of requests so next I
create a pool of requester threads which each run
something like this:

  socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
  sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
  socket.connect( sockaddr )
  socket.puts "request #{i}"
  socket.close

All of this in one script. If I have so much as
2 requester threads in addition to the 10
acceptors waiting to receive their requests, 1.8.6
just seizes up before processing anything. If I
use 2 acceptors and 2 requesters, it works. If I
use 10 acceptors, 1 requester it works. When it
does work however, it doesn't appear to schedule
threads too well; it just seems to use one all the
time - although this seems to happen only when
using sockets as opposed to a more general job
queue.

I haven't submitted the full code because it uses
a threadpool library I'm still building/reviewing.

Regards,

Daniel Bush

···

--
Posted via http://www.ruby-forum.com/.

Hi,
I think I'm running up against ruby 1.8.6's not so
stellar threading system. Was hoping someone
could confirm or otherwise point out some flaws.

Note: I get reasonable performance when running on
ruby 1.9 it's just 1.8.6 that hangs like a
deadlock when I start using too many threads in
one of my test scripts. (My focus is actually
on 1.9 and jruby anyway).

Give you an idea:

I might get a pool of 10 acceptor threads to run
something like the following (each has their own
version of this code):

    client, client_sockaddr = @socket.accept
      # Threads block on #accept.
    data = client.recvfrom( 40 )[0].chomp
    @mutex.synchronize do
      puts "#{Thread.current} received #{data}... "
    end
    client.close

on @socket which was set up like this:

    @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    @socket.bind( @sockaddr )
    @socket.listen( 100 )

This won't work. You can have only 1 acceptor thread per server socket. Typically you dispatch processing *after* the accept to a thread (either newly created or taken from a pool).

I have no idea what the interpreter is going to do if you have multiple threads trying to accept from the same socket. In the best case #accept is synchronized and only one thread gets to enter it. In worse scenarios anything bad may happen.

I wanted to create a barrage of requests so next I
create a pool of requester threads which each run
something like this:

  socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
  sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
  socket.connect( sockaddr )
  socket.puts "request #{i}"
  socket.close

Btw, why don't you use TCPServer and TCPSocket?

All of this in one script. If I have so much as
2 requester threads in addition to the 10
acceptors waiting to receive their requests, 1.8.6
just seizes up before processing anything. If I
use 2 acceptors and 2 requesters, it works. If I
use 10 acceptors, 1 requester it works. When it
does work however, it doesn't appear to schedule
threads too well; it just seems to use one all the
time - although this seems to happen only when
using sockets as opposed to a more general job
queue.

See above.

I haven't submitted the full code because it uses
a threadpool library I'm still building/reviewing.

I would rather do something like this (sketeched):

require 'thread'
queue = Queue.new
workers = (1..10).map do
   Thread.new queue do |q|
     until (cl = q.deq).equal? q
       # process data from / for client cl
       begin
         data = cl.gets.chomp
         @mutex.synchronize do
           puts "#{Thread.current} received #{data}..."
         end
       ensure
         cl.close
       end
     end
   end
end

server = TCPServer.new ...

while client = server.accept
   queue.enq client
end

# elsewhere

TCPSocket.open do |sock|
    sock.puts "request"
end

Kind regards

  robert

···

On 10/19/2009 02:51 PM, Daniel Bush wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote:

      puts "#{Thread.current} received #{data}... "
    end
    client.close

on @socket which was set up like this:

    @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    @socket.bind( @sockaddr )
    @socket.listen( 100 )

This won't work. You can have only 1 acceptor thread per server socket.
  Typically you dispatch processing *after* the accept to a thread
(either newly created or taken from a pool).

I have no idea what the interpreter is going to do if you have multiple
threads trying to accept from the same socket. In the best case #accept
is synchronized and only one thread gets to enter it. In worse
scenarios anything bad may happen.

Ok, I wasn't sure if it was appropriate having >1 thread per socket
instance. It *appears* to work ok on ruby 1.9 up to about 100 socket
connections - not that that means anything when it comes to testing
stuff with threads. Maybe if I do 100,000+ I might elicit some type of
error.

I was intending to process the result of accept in another pool but I
was toying with the idea of having 2-3 threads waiting on #accept
assuming no synchronisation issues. I didn't know if it really mattered
or not. It might make a difference if you have a large number of
connections coming in depending on what the acceptor is doing in
addition; I wasn't sure.

I guess I'll have to scupper that idea or exhaustively test it to prove
it works and has benefit - both of which are questionable at this point.

I wanted to create a barrage of requests so next I
create a pool of requester threads which each run
something like this:

  socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
  sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
  socket.connect( sockaddr )
  socket.puts "request #{i}"
  socket.close

Btw, why don't you use TCPServer and TCPSocket?

yeah I was going to, I was just going off some examples in the
documentation and trying to cut my teeth on them and writing some tests.
But I was heading that way.

queue.

See above.

I haven't submitted the full code because it uses
a threadpool library I'm still building/reviewing.

I would rather do something like this (sketeched):

require 'thread'
queue = Queue.new
workers = (1..10).map do
   Thread.new queue do |q|
     until (cl = q.deq).equal? q
       # process data from / for client cl
       begin
         data = cl.gets.chomp
         @mutex.synchronize do
           puts "#{Thread.current} received #{data}..."
         end
       ensure
         cl.close
       end
     end
   end
end

server = TCPServer.new ...

while client = server.accept
   queue.enq client
end

# elsewhere

TCPSocket.open do |sock|
    sock.puts "request"
end

Thanks for the example.
I am scratching my head a little with this line:
  until (cl = q.deq).equal? q

I'm familiar with Queue and its behaviour.

Cheers,
Daniel Bush

···

On 10/19/2009 02:51 PM, Daniel Bush wrote:

--
Posted via http://www.ruby-forum.com/\.

Robert Klemme wrote:

      puts "#{Thread.current} received #{data}... "
    end
    client.close

on @socket which was set up like this:

    @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    @socket.bind( @sockaddr )
    @socket.listen( 100 )

This won't work. You can have only 1 acceptor thread per server socket.
  Typically you dispatch processing *after* the accept to a thread
(either newly created or taken from a pool).

I have no idea what the interpreter is going to do if you have multiple
threads trying to accept from the same socket. In the best case #accept
is synchronized and only one thread gets to enter it. In worse
scenarios anything bad may happen.

Ok, I wasn't sure if it was appropriate having >1 thread per socket instance. It *appears* to work ok on ruby 1.9 up to about 100 socket connections - not that that means anything when it comes to testing stuff with threads. Maybe if I do 100,000+ I might elicit some type of error.

I was intending to process the result of accept in another pool but I was toying with the idea of having 2-3 threads waiting on #accept assuming no synchronisation issues. I didn't know if it really mattered or not. It might make a difference if you have a large number of connections coming in depending on what the acceptor is doing in addition; I wasn't sure.

I guess I'll have to scupper that idea or exhaustively test it to prove it works and has benefit - both of which are questionable at this point.

Frankly, I wouldn't invest that effort: every example in all programming languages I have seen has just a single acceptor thread. Accepting socket connections is not an expensive operation so as long as you refrain from further processing a single thread is completely sufficient for handling accepts.

I wanted to create a barrage of requests so next I
create a pool of requester threads which each run
something like this:

  socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
  sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
  socket.connect( sockaddr )
  socket.puts "request #{i}"
  socket.close

Btw, why don't you use TCPServer and TCPSocket?

yeah I was going to, I was just going off some examples in the documentation and trying to cut my teeth on them and writing some tests. But I was heading that way.

queue.

See above.

I haven't submitted the full code because it uses
a threadpool library I'm still building/reviewing.

I would rather do something like this (sketeched):

require 'thread'
queue = Queue.new
workers = (1..10).map do
   Thread.new queue do |q|
     until (cl = q.deq).equal? q
       # process data from / for client cl
       begin
         data = cl.gets.chomp
         @mutex.synchronize do
           puts "#{Thread.current} received #{data}..."
         end
       ensure
         cl.close
       end
     end
   end
end

server = TCPServer.new ...

while client = server.accept
   queue.enq client
end

# elsewhere

TCPSocket.open do |sock|
    sock.puts "request"
end

Thanks for the example.
I am scratching my head a little with this line:
  until (cl = q.deq).equal? q

I'm familiar with Queue and its behaviour.

That's the worker thread termination code which basically works by checking whether the item fetched from the Queue is the Queue instance itself. Actually I omitted the other half of the code (the place which puts all those q instances in itself) because I didn't want to make the code more complex and also termination condition was unknown (may be a signal, a number of handled connections etc.).

If you want to make termination more readable you can also do something like this

QueueTermination = Object.new
...
until QueueTermination.equal?(cl = q.deq)
   ...
end

or

until QueueTermination == (cl = q.deq)
   ...
end

or

until QueueTermination === (cl = q.deq)
   ...
end

The basic idea is to stuff something in the queue which is unambiguously identifiable as non work content.

Kind regards

  robert

···

On 20.10.2009 02:31, Daniel Bush wrote:

On 10/19/2009 02:51 PM, Daniel Bush wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote:

···

On 20.10.2009 02:31, Daniel Bush wrote:

    @socket.bind( @sockaddr )

Ok, I wasn't sure if it was appropriate having >1 thread per socket
addition; I wasn't sure.

I guess I'll have to scupper that idea or exhaustively test it to prove
it works and has benefit - both of which are questionable at this point.

Frankly, I wouldn't invest that effort: every example in all programming
languages I have seen has just a single acceptor thread. Accepting
socket connections is not an expensive operation so as long as you
refrain from further processing a single thread is completely sufficient
for handling accepts.

         end
   queue.enq client

I am scratching my head a little with this line:
  until (cl = q.deq).equal? q

I'm familiar with Queue and its behaviour.

That's the worker thread termination code which basically works by
checking whether the item fetched from the Queue is the Queue instance
itself. Actually I omitted the other half of the code (the place which
puts all those q instances in itself) because I didn't want to make the
code more complex and also termination condition was unknown (may be a
signal, a number of handled connections etc.).

Ok, that's cool. I was pushing termination jobs on the thing I was
playing with although what you're doing there might be cleaner!

Thanks for the advice.
Cheers,

Daniel Bush
--
Posted via http://www.ruby-forum.com/\.

Robert Klemme wrote:

Frankly, I wouldn't invest that effort: every example in all programming
languages I have seen has just a single acceptor thread.

...or else serializes them so that only one thread accept()s at a time.
For a proper example look at Apache with preforked workers, and the
AcceptMutex directive.
http://httpd.apache.org/docs/2.0/mod/mpm_common.html

You could try the same approach, and use a ruby Mutex to protect your
socket#accept - but that could turn out to be more expensive than having
a single accept thread which dispatches to your worker pool, if you're
going to have a separate worker pool anyway.

···

--
Posted via http://www.ruby-forum.com/\.

Brian Candler wrote:

Robert Klemme wrote:

Frankly, I wouldn't invest that effort: every example in all programming
languages I have seen has just a single acceptor thread.

...or else serializes them so that only one thread accept()s at a time.
For a proper example look at Apache with preforked workers, and the
AcceptMutex directive.
mpm_common - Apache HTTP Server

Cool. Didn't even think to look at what the big boys do.
Thanks for the pointer.

You could try the same approach, and use a ruby Mutex to protect your
socket#accept - but that could turn out to be more expensive than having
a single accept thread which dispatches to your worker pool, if you're
going to have a separate worker pool anyway.

Yeah, I have a worker pool. I was sort of extrapolating from that and
having an acceptor pool based around the socket in addition to the
worker pool.

I don't have a lot of experience with heavy traffic; but the (naive)
motivation for this whole thing was to have one acceptor thread
receiving while the other was pushing on the queue and then swapping
over and over[1] -- at least to allow people to experiment with that
sort of thing if they wanted to. But synchronisation issues with the
extra thread might make things worse. I'm used to trying out duff ideas
so heck maybe I might take a look at it at some point - if only to get a
better feel for what's going on at that level.

Cheers,
Daniel Bush

[1] actually, I naively wanted all the threads to block on the socket
just like they would on a queue. oh well.

···

--
Posted via http://www.ruby-forum.com/\.

Brian Candler wrote:

You could try the same approach, and use a ruby Mutex to protect your socket#accept - but that could turn out to be more expensive than having a single accept thread which dispatches to your worker pool, if you're going to have a separate worker pool anyway.

Yeah, I have a worker pool. I was sort of extrapolating from that and having an acceptor pool based around the socket in addition to the worker pool.

I don't have a lot of experience with heavy traffic; but the (naive) motivation for this whole thing was to have one acceptor thread receiving while the other was pushing on the queue and then swapping over and over[1]

You need to synchronize anyway (at least on the queue) so adding another synchronization point (at accept) won't gain you much I guess. As Brian said, the effect can be the opposite - and nobody seems to do it anyway. As said, accepting connections is a pretty cheap operation.

[1] actually, I naively wanted all the threads to block on the socket just like they would on a queue. oh well.

You should also note that the network layer has its own queue at the socket (you can control its size as well). So even if a single thread would temporarily not be sufficient connection requests are not necessarily rejected. Basically you have

connect -> [network layer waiting queue] -> accept -> [ruby processing queue]

Kind regards

  robert

···

On 21.10.2009 13:49, Daniel Bush wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

You might look at an event framework like EventMachine or my own Rev (
http://rev.rubyforge.org/\) as a less error prone and high performance
alternative to threads.

The disadvantage of this approach is the need to invert control (event
frameworks are asynchronous), however it will resolve the synchronization
issues.

···

On Wed, Oct 21, 2009 at 5:49 AM, Daniel Bush <dlb.id.au@gmail.com> wrote:

I don't have a lot of experience with heavy traffic; but the (naive)
motivation for this whole thing was to have one acceptor thread
receiving while the other was pushing on the queue and then swapping
over and over[1] -- at least to allow people to experiment with that
sort of thing if they wanted to. But synchronisation issues with the
extra thread might make things worse. I'm used to trying out duff ideas
so heck maybe I might take a look at it at some point - if only to get a
better feel for what's going on at that level.

--
Tony Arcieri
Medioh/Nagravision