[ANN] Multiplexer - linear non-blocking I/O

Blocking I/O is really easy to use. But when you use it to write
servers, you run into problems: you can't run two blocking syscalls
simultaneously. So if you're writing a huge file to some guy, every
other client is stalled, and no one new can connect. Unacceptable, for
many types of servers. They need non-blocking I/O.

Non-blocking I/O is a lot more annoying to use. Instead of going

write "Hello. What's your name?"
name = read_line
write "How do you do, #{name}?"
state = read_line
write "It's nice to hear that you're #{state}, #{name}."

we have to make weird state machines.

The good news: we can use callcc to make the non-blocking nature
practically invisible. Multiplexer does that. Here's how you'd write a
hello server:

class Test < Multiplexer::Handler
  def handle
    write_line "Hello. What's your name?"
    name = read_line
    write_line "How do you do, #{name}?"
    state = read_line
    write_line "It's nice to hear that you're #{state}, #{name}."
    disconnect
  end
end

def test
  multiplexer = Multiplexer::Multiplexer.new 0.5
  multiplexer.listen 31337, Test
  multiplexer.run
end

test

It'll run in one thread. Multiplexer handles the select(3) calls.

Get it:
http://www.phubuh.org/Projects/nntpu/_darcs/current/multiplexer.rb

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87vfbszr5m.fsf@igloo.phubuh.org...

Blocking I/O is really easy to use. But when you use it to write
servers, you run into problems: you can't run two blocking syscalls
simultaneously. So if you're writing a huge file to some guy, every
other client is stalled, and no one new can connect. Unacceptable, for
many types of servers. They need non-blocking I/O.

Non-blocking I/O is a lot more annoying to use. Instead of going

<snip/>

It'll run in one thread. Multiplexer handles the select(3) calls.

What is the advantage over a solution with threads? IOW, why should I use
multiplexer over individual threads per connection?

Kind regards

    robert

"Robert Klemme" <bob.news@gmx.net> writes:

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87vfbszr5m.fsf@igloo.phubuh.org...

> Blocking I/O is really easy to use. But when you use it to write
> servers, you run into problems: you can't run two blocking syscalls
> simultaneously. So if you're writing a huge file to some guy, every
> other client is stalled, and no one new can connect. Unacceptable,
> for many types of servers. They need non-blocking I/O.
>
> Non-blocking I/O is a lot more annoying to use. Instead of going

<snip/>

> It'll run in one thread. Multiplexer handles the select(3) calls.

What is the advantage over a solution with threads? IOW, why should I
use multiplexer over individual threads per connection?

Since Ruby's threads aren't native, you can't do I/O from several at a
time. So one IO#read call blocking for a long time will block the other
threads, too. You could loop a read with a time-out, I guess. But with
a single thread running select, the whole process can stall completely
while waiting for I/O. And it's more elegant. :slight_smile:

"Robert Klemme" <bob.news@gmx.net> writes:

[ ... ]

What is the advantage over a solution with threads? IOW, why should I
use multiplexer over individual threads per connection?

The issues raised in subsequent posts illustrate the classic arguments
between one-thread-per-connection proponents and select-loop proponents
that I have heard since the mid 1990's.

Given good thread and select/non-blocking-io implementations, both
methods can solve the same problems and can work just fine.

As for me, I believe that it's good to have both methods to choose from.

The java folks have recently offered a 'select' methodology in addition
to their traditional thread-based approach. Perl has not-so-recently
added thread support on top of its traditional preference for
'select'.

And I'm glad that I also have both options in ruby.

···

--
Lloyd Zusman
ljz@asfast.com
God bless you.

This is not true. Ruby goes non-blocking I/O from threads, so in general you'll see overlapped execution.

Thread.new do
   loop do
     puts "You said #{gets}"
   end
end

10.times do
   sleep(1)
   puts "Say something"
end

Cheers

Dave

···

On Nov 26, 2004, at 8:19, Mikael Brockman wrote:

Since Ruby's threads aren't native, you can't do I/O from several at a
time. ]

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87r7mgzo8x.fsf@igloo.phubuh.org...

"Robert Klemme" <bob.news@gmx.net> writes:

> "Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
> news:87vfbszr5m.fsf@igloo.phubuh.org...
>
> > Blocking I/O is really easy to use. But when you use it to write
> > servers, you run into problems: you can't run two blocking syscalls
> > simultaneously. So if you're writing a huge file to some guy, every
> > other client is stalled, and no one new can connect. Unacceptable,
> > for many types of servers. They need non-blocking I/O.
> >
> > Non-blocking I/O is a lot more annoying to use. Instead of going
>
> <snip/>
>
> > It'll run in one thread. Multiplexer handles the select(3) calls.
>
> What is the advantage over a solution with threads? IOW, why should I
> use multiplexer over individual threads per connection?

Since Ruby's threads aren't native, you can't do I/O from several at a
time.

That's not true.

So one IO#read call blocking for a long time will block the other
threads, too.

Also wrong: execute this script

tickers = [$stdout, $stderr].map do |io|
  Thread.new do
    100.times do |i|
      io.puts "#{io.fileno}: #{Time.now}: Tick #{i}"
      sleep 1
    end
  end
end

puts "PROMPT"
# blocks in next line:
input = gets
puts "ENTERED #{input}"

tickers.each {|th| th.join }

16:39:44 [ruby]: ruby ticker.rb
1: Fri Nov 26 17:40:05 GMT+2:00 2004: Tick 0
2: Fri Nov 26 17:40:05 GMT+2:00 2004: Tick 0
PROMPT
2: Fri Nov 26 17:40:06 GMT+2:00 2004: Tick 1
1: Fri Nov 26 17:40:06 GMT+2:00 2004: Tick 1
2: Fri Nov 26 17:40:07 GMT+2:00 2004: Tick 2
1: Fri Nov 26 17:40:07 GMT+2:00 2004: Tick 2
2: Fri Nov 26 17:40:08 GMT+2:00 2004: Tick 3
1: Fri Nov 26 17:40:08 GMT+2:00 2004: Tick 3
foo
ENTERED foo
2: Fri Nov 26 17:40:09 GMT+2:00 2004: Tick 4
1: Fri Nov 26 17:40:09 GMT+2:00 2004: Tick 4
2: Fri Nov 26 17:40:10 GMT+2:00 2004: Tick 5
1: Fri Nov 26 17:40:10 GMT+2:00 2004: Tick 5
2: Fri Nov 26 17:40:11 GMT+2:00 2004: Tick 6
1: Fri Nov 26 17:40:11 GMT+2:00 2004: Tick 6
2: Fri Nov 26 17:40:12 GMT+2:00 2004: Tick 7
1: Fri Nov 26 17:40:12 GMT+2:00 2004: Tick 7

You could loop a read with a time-out, I guess. But with
a single thread running select, the whole process can stall completely
while waiting for I/O. And it's more elegant. :slight_smile:

IMHO threads are more elegant.

Regards

    robert

In article <87r7mgzo8x.fsf@igloo.phubuh.org>,

···

Mikael Brockman <mikael@phubuh.org> wrote:

"Robert Klemme" <bob.news@gmx.net> writes:

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87vfbszr5m.fsf@igloo.phubuh.org...

> Blocking I/O is really easy to use. But when you use it to write
> servers, you run into problems: you can't run two blocking syscalls
> simultaneously. So if you're writing a huge file to some guy, every
> other client is stalled, and no one new can connect. Unacceptable,
> for many types of servers. They need non-blocking I/O.
>
> Non-blocking I/O is a lot more annoying to use. Instead of going

<snip/>

> It'll run in one thread. Multiplexer handles the select(3) calls.

What is the advantage over a solution with threads? IOW, why should I
use multiplexer over individual threads per connection?

Since Ruby's threads aren't native, you can't do I/O from several at a
time. So one IO#read call blocking for a long time will block the other
threads, too. You could loop a read with a time-out, I guess. But with
a single thread running select, the whole process can stall completely
while waiting for I/O. And it's more elegant. :slight_smile:

But if I'm not mistaken, isn't callcc implemented with threads in Ruby?

Phil

"Robert Klemme" <bob.news@gmx.net> writes:

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87r7mgzo8x.fsf@igloo.phubuh.org...
> "Robert Klemme" <bob.news@gmx.net> writes:
>
> > "Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
> > news:87vfbszr5m.fsf@igloo.phubuh.org...
> >
> > > Blocking I/O is really easy to use. But when you use it to
> > > write servers, you run into problems: you can't run two blocking
> > > syscalls simultaneously. So if you're writing a huge file to
> > > some guy, every other client is stalled, and no one new can
> > > connect. Unacceptable, for many types of servers. They need
> > > non-blocking I/O.
> > >
> > > Non-blocking I/O is a lot more annoying to use. Instead of
> > > going
> >
> > <snip/>
> >
> > > It'll run in one thread. Multiplexer handles the select(3) calls.
> >
> > What is the advantage over a solution with threads? IOW, why
> > should I use multiplexer over individual threads per connection?
>
> Since Ruby's threads aren't native, you can't do I/O from several at
> a time.

That's not true.

> So one IO#read call blocking for a long time will block the other
> threads, too.

Also wrong: execute this script
[snip]

Sorry: faulty generalization. The problem is demonstrated in this
script:

require 'socket'

server = TCPServer.new 12345

t_a = Thread.start do
  a = server.accept
  data = "foo" * 10000000
  a << data
  a.close
end

b = server.accept
b << "b"
b.close

t_a.join

The second client isn't accepted until the huge batch of data is sent.
I guess you could solve the problem by splitting it into a bunch of
smaller batches.

Another appeal could be that by keeping it single-threaded, you have
fewer concurrency issues to worry about.

ptkwt@aracnet.com (Phil Tomson) writes:

In article <87r7mgzo8x.fsf@igloo.phubuh.org>,
>"Robert Klemme" <bob.news@gmx.net> writes:
>
>> "Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
>> news:87vfbszr5m.fsf@igloo.phubuh.org...
>>
>> > Blocking I/O is really easy to use. But when you use it to write
>> > servers, you run into problems: you can't run two blocking syscalls
>> > simultaneously. So if you're writing a huge file to some guy, every
>> > other client is stalled, and no one new can connect. Unacceptable,
>> > for many types of servers. They need non-blocking I/O.
>> >
>> > Non-blocking I/O is a lot more annoying to use. Instead of going
>>
>> <snip/>
>>
>> > It'll run in one thread. Multiplexer handles the select(3) calls.
>>
>> What is the advantage over a solution with threads? IOW, why should I
>> use multiplexer over individual threads per connection?
>
>Since Ruby's threads aren't native, you can't do I/O from several at a
>time. So one IO#read call blocking for a long time will block the other
>threads, too. You could loop a read with a time-out, I guess. But with
>a single thread running select, the whole process can stall completely
>while waiting for I/O. And it's more elegant. :slight_smile:
>

But if I'm not mistaken, isn't callcc implemented with threads in Ruby?

It is, but I don't think that several ``continuation threads'' run
concurrently. In any case, there is definitely only one thread doing
the I/O.

···

Mikael Brockman <mikael@phubuh.org> wrote:

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87brdkzjd8.fsf@igloo.phubuh.org...

"Robert Klemme" <bob.news@gmx.net> writes:

> "Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
> news:87r7mgzo8x.fsf@igloo.phubuh.org...
> > "Robert Klemme" <bob.news@gmx.net> writes:
> >
> > > "Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
> > > news:87vfbszr5m.fsf@igloo.phubuh.org...
> > >
> > > > Blocking I/O is really easy to use. But when you use it to
> > > > write servers, you run into problems: you can't run two blocking
> > > > syscalls simultaneously. So if you're writing a huge file to
> > > > some guy, every other client is stalled, and no one new can
> > > > connect. Unacceptable, for many types of servers. They need
> > > > non-blocking I/O.
> > > >
> > > > Non-blocking I/O is a lot more annoying to use. Instead of
> > > > going
> > >
> > > <snip/>
> > >
> > > > It'll run in one thread. Multiplexer handles the select(3)

calls.

> > >
> > > What is the advantage over a solution with threads? IOW, why
> > > should I use multiplexer over individual threads per connection?
> >
> > Since Ruby's threads aren't native, you can't do I/O from several at
> > a time.
>
> That's not true.
>
> > So one IO#read call blocking for a long time will block the other
> > threads, too.
>
> Also wrong: execute this script
> [snip]

Sorry: faulty generalization. The problem is demonstrated in this
script:

> require 'socket'
>
> server = TCPServer.new 12345
>
> t_a = Thread.start do
> a = server.accept
> data = "foo" * 10000000
> a << data
> a.close
> end
>
> b = server.accept
> b << "b"
> b.close
>
> t_a.join

The second client isn't accepted until the huge batch of data is sent.
I guess you could solve the problem by splitting it into a bunch of
smaller batches.

Another appeal could be that by keeping it single-threaded, you have
fewer concurrency issues to worry about.

The typical pattern for TCPserver looks different: You create a thread per
accepted connection.

require 'socket'

server = TCPServer.new 12345

loop do
  Thread.new(server.accept) do |a|
    begin
      data = "foo" * 10000000
      a << data
    ensure
      a.close
    end
  end
end

Regards

    robert

"Robert Klemme" <bob.news@gmx.net> writes:

"Mikael Brockman" <mikael@phubuh.org> schrieb im Newsbeitrag
news:87brdkzjd8.fsf@igloo.phubuh.org...

The typical pattern for TCPserver looks different: You create a thread per
accepted connection.

require 'socket'

server = TCPServer.new 12345

loop do
  Thread.new(server.accept) do |a|
    begin
      data = "foo" * 10000000
      a << data
    ensure
      a.close
    end
  end
end

You're right, but I get the same results. Only one client at a time is
accepted.

"Robert Klemme" <bob.news@gmx.net> writes:

> server = TCPServer.new 12345
>
> loop do
> Thread.new(server.accept) do |a|
> begin
> data = "foo" * 10000000
> a << data
> ensure
> a.close
> end
> end
> end

You're right, but I get the same results. Only one client at a time is
accepted.

Strange... I'm surprised there are too many differences between
your select multiplexer using continuations and a threads solution,
since: a) ruby uses select() to multiplex behind the scenes when
multiple threads are doing I/O; and b) ruby continuations are
implemented using threads.

That said I haven't studied your continuations solution so maybe
my surprise is misplaced...

But I wrote this simplistic threaded socket IO performance test
last month http://bwk.homeip.net/ftp/ruby/tcptest.rb and I've
watched it accept and handle 100+ clients without delay. Each
client is pushing bytes as fast as it can (at the specified
chunk size), and each server thread handling a client is in
return replying with bytes as fast as it can.

. . . Just chiming in in case this is somehow helpful... if not,
please disregard. =D

Regards,

Bill

···

From: "Mikael Brockman" <mikael@phubuh.org>

Until one thread terminates a system call like IO-related task, other
threads will be blocked. Kernel does not know ruby's (userland)
threads, so if your application needs concurrency in massive IO tasks,
maybe you should implement a kind of thread scheduler by yourself. OK,
Kernel#fork will be an another choice.

If 'thread' is only sufficient for all, why did Sun adopt NIO in Java2 1.4?

Run following server example, and execute 'telnet localhost 31337' on
two each shell, no client connection will be blocked.

<code>
require 'multiplexer'

class Test2 < Multiplexer::Handler
  def initialize
    super()
  end

  def handle
    begin
      write_line "ooo" * 10000000
    rescue EOFError
      puts "My client closed its socket."
    end
  end
end

def test
  multiplexer = Multiplexer::Multiplexer.new 0.5
  multiplexer.client_data = []
  multiplexer.listen 31337, Test2
  puts "Listening on port 31337."
  multiplexer.run
end

if $0 == __FILE__
  test
end
</code>

IMHO, multiplexinig is a good choice for concurrent programming in
ruby, insofar ruby VM does not support native threads.

Best regards,

···

--
http://nohmad.sub-port.net

In article <01ec01c4d3e3$68209820$6442a8c0@musicbox>,
  "Bill Kelly" <billk@cts.com> writes:

Strange... I'm surprised there are too many differences between
your select multiplexer using continuations and a threads solution,
since: a) ruby uses select() to multiplex behind the scenes when
multiple threads are doing I/O; and b) ruby continuations are
implemented using threads.

Because write(2) system call blocks until whole data is written unless
nonblocking flag is set, even after select notify the fd is writable.

The multiplexer works well because it sets nonblocking flag.
The threaded server also works well if it sets nonblocking flag.

Try:

require 'socket'
require 'fcntl'

server = TCPServer.new 12345

t_a = Thread.start do
  a = server.accept
  a.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
  data = "foo" * 10000000
  a << data
  a.close
end

b = server.accept
b.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
b << "b"
b.close

t_a.join

However nonblocking flag with stdio buffering may cause data lost...

···

--
Tanaka Akira

Hi,

Because write(2) system call blocks until whole data is written unless
nonblocking flag is set, even after select notify the fd is writable.

I've always used send/recv for my socket I/O. I *think* I don't
have a problem with blocking, even without NONBLOCK set. (Excepting
rare circumstances as are possible with accept(), etc.)

Does that sound right? Or should I be seeing blocking problems
even using send/recv ?

Thanks,

Regards,

Bill

···

From: "Tanaka Akira" <akr@m17n.org>

"Gyoung-Yoon Noh" <nohmad@gmail.com> schrieb im Newsbeitrag news:ff903fda04112610215fe2dc03@mail.gmail.com...

Until one thread terminates a system call like IO-related task, other
threads will be blocked. Kernel does not know ruby's (userland)
threads, so if your application needs concurrency in massive IO tasks,
maybe you should implement a kind of thread scheduler by yourself. OK,
Kernel#fork will be an another choice.

If 'thread' is only sufficient for all, why did Sun adopt NIO in Java2 1.4?

Because threads have a certain overhead and Sun wanted to provide a means to write high performance (i.e. highly optimized) servers. Apart from that I think they wanted to make the IO architecture more flexible and modular.

That said, a solution with threads and blocking IO is still more elegant IMHO. It's just that for some circumstances this is not the right solution.

Kind regards

    robert

In article <024601c4d40a$674fc8e0$6442a8c0@musicbox>,
  "Bill Kelly" <billk@cts.com> writes:

I've always used send/recv for my socket I/O. I *think* I don't
have a problem with blocking, even without NONBLOCK set. (Excepting
rare circumstances as are possible with accept(), etc.)

Does that sound right? Or should I be seeing blocking problems
even using send/recv ?

You'll see blocking problem with send when you send huge data.

Linux man page of send(2) says:

      When the message does not fit into the send buffer of the socket, send
      normally blocks, unless the socket has been placed in non-blocking I/O
      mode. In non-blocking mode it would return EAGAIN in this case. The
      select(2) call may be used to determine when it is possible to send
      more data.

Try:

require 'socket'
require 'fcntl'

server = TCPServer.new 12345

t_a = Thread.start do
  a = server.accept
  data = "foo" * 10000000
  a.send data, 0
  a.close
end

b = server.accept
b.send "b", 0
b.close

t_a.join

···

--
Tanaka Akira

"Robert Klemme" <bob.news@gmx.net> writes:

That said, a solution with threads and blocking IO is still more
elegant IMHO. It's just that for some circumstances this is not the
right solution.

I agree. What I find inelegant is threads with non-blocking I/O and
kludges like splitting the data into several system calls.

This problem is pretty easy to work around in threaded servers.

Realistically, it's usually one thread passing the data to send into a thread (or thread pool) that just does the writing, in most production servers I work with at least. The server I'm building right now uses thread safe queues to handle this communication. It's trivial to add a data length check, bust up the data, and queue multiple prints. Just remember to do that inside a synchronized block, to ensure that all messages are queued back-to-back.

James Edward Gray II

···

On Nov 26, 2004, at 7:56 PM, Tanaka Akira wrote:

You'll see blocking problem with send when you send huge data.

Hi,

You'll see blocking problem with send when you send huge data.

Linux man page of send(2) says:

> When the message does not fit into the send buffer of the socket, send
> normally blocks, unless the socket has been placed in non-blocking I/O
> mode. In non-blocking mode it would return EAGAIN in this case. The
> select(2) call may be used to determine when it is possible to send
> more data.

Thank you for your patience. I've been wrong about that
for years.

Now I'm suddenly extremely interested in the non-blocking
extension for Win32 ... :slight_smile:

Regards,

Bill

···

From: "Tanaka Akira" <akr@m17n.org>