[ANN] async-http hits 100,000 req/s

I've been working on a stack of gems (async, async-io and async-http)
as a proof of concept that Ruby can do fast async networking similar
to Node - and perhaps even better in some cases.

I have finally got async-http to the point where, as a
proof-of-concept, I think it's validated at least part of the above
statement.



On my desktop, I can get a nominal throughput of between 30,000 req/s
to 100,000 req/s with 4 cores/8 processes. The first number is for
discrete connections while the second is for keep-alive connections.
So, we can see the overhead for discrete connections is about 3x that
of connections which use keep-alive. I'd like to see higher
performance but I'm still trying to figure out how to benchmark it. I
broke RubyProf trying to do this.

Fortunately, keep-alive is well supported by nginx which is what you'd
definitely want to use as a front-end anyway.

As well as performance, each individual request is run within a fiber.
This has both benefits and negatives - the negatives are that you
don't want to do computationally intensive work while generating a
response as you'll increase the latency of all other requests being
handled by that server process, the positive being that
computationally intensive work can be easily farmed off to a thread
pool, another process, etc, and I/O blocking workloads (e.g. HTTP RPC)
will work transparently (provided you use async capable IO instances)
and won't block the server.

Anyway, I thought others might be interested in this. There is a long
way to go to a 1.0 release but I think this is useful.

I've been thinking about what would make this kind of model faster -
in terms of how Ruby 3.0 might be changed to support this design. Here
is a complete brain dump of the various things I've been thinking
about:

- IO objects are inherently pretty heavy, calling read_nonblock and
write_nonblock is a mess. Internally, different systems do different
things (e.g. NET::HTTP manually implements timeouts by calling
read_nonblock and then wait_readable/wait_writable). The entire IO
system of Ruby is geared towards threads, but threads perform very
very poorly in MRI. By far the best design is multi-process for a ton
of different reasons.
- IO objects expose a lot of behaviour which is irrelevant to most
use-cases (io/console, io/nonblock which doesn't seem to work at all).
This makes it hard to provide a clean high-level interface.
- All IO operations should be non-block with a super fast/simple API.
APIs which take complex lists of arguments, in the hot path, should be
avoided (exceptions: true for example). A separate function for
blocking and non-blocking IO is a huge cop-out.
- TCPServer/TCPSocket/UDPSocket/UNIXServer/UNIXSocket are all broken
by design. This also includes somewhat Addrinfo class. It's hard to
provide non-blocking behaviour because so many things will turn a
string hostname into an IP address, calling `getaddrinfo`.
- The `send` method for `IO` confusingly breaks `Object#send`. It
should just be `sendmsg` and `recvmsg` for datagrams and
`read`/`write` for streams. `UDPSocket#recv` and `Socket#recv` do
different things which is confusing.
- Fibers are fast, but I think they need to be *the* first class
concurrency construct in Ruby and made as fast as possible. I heard
that calling resume on a fiber does a syscall (if this is the case it
should be removed if possible).
- Threads as they are currently implemented should be removed from
Ruby 3.0 - they actually make for a very poor concurrency concept,
considering the GIL. They make all other operations more complex with
no real benefit given how they are currently implemented. Reasoning
about threads is bloody hard. It's even worse that the GIL hides a lot
of broken behaviour. What are threads useful for? IO concurrency? yes
but it's poor performing. Computational concurrency? not unless you
use JRuby, Rubinius, and even then, my experience with those platforms
has generally been sub-par.
- It's hard to reason about strings because the encoding may change at
any time. Currently if you have a string with binary encoding, and
append a UTF-8 string, it's encoding will change and this breaks all
future operations (if you are assuming it's still a binary string).
Leading to hacks like

I think that Ruby 3.0 should
- either remove the GIL, or remove Thread.
- simplify IO classes and allow permanent non-blocking mode (e.g.
io.nonblocking = true; io.read gives data or :wait_readable).
- ensure that Fiber is as fast as possible (creation, scheduling, etc).
- remove broken-by-design IO related classes (move to gem for
backwards compatibility?)
- read/write should be able to append to a byte string efficiently. A
byte buffer designed for fast append, index and fast slice! is a must
for most high-level protocols.
- perhaps support readv and writev under the hood - this would allow
you to write an array of buffers without needing to concatenate them
which saves on syscalls/memcpy.

Kind regards,
Samuel

I've been working on a stack of gems (async, async-io and async-http)
as a proof of concept that Ruby can do fast async networking similar
to Node - and perhaps even better in some cases.

I have finally got async-http to the point where, as a
proof-of-concept, I think it's validated at least part of the above
statement.

GitHub - socketry/async: An awesome asynchronous event-driven reactor for Ruby.
GitHub - socketry/async-io: Concurrent wrappers for native Ruby IO & Sockets.
GitHub - socketry/async-http

On my desktop, I can get a nominal throughput of between 30,000 req/s
to 100,000 req/s with 4 cores/8 processes. The first number is for
discrete connections while the second is for keep-alive connections.
So, we can see the overhead for discrete connections is about 3x that
of connections which use keep-alive. I'd like to see higher
performance but I'm still trying to figure out how to benchmark it. I
broke RubyProf trying to do this.

Cool.

Fortunately, keep-alive is well supported by nginx which is what you'd
definitely want to use as a front-end anyway.

cmogstored could be a hair faster than nginx, even for small
responses(*). It was mainly designed to avoid pathological
cases in nginx when serving static files off multiple
filesystems/devices (as opposed to all-out speed); but I've
found it a decent benchmarking tool, too:

  https://bogomips.org/cmogstored/README
  https://bogomips.org/cmogstored/INSTALL

If you use FreeBSD, it's also in the ports tree (but not yet in
any GNU/Linux distros). It understands HTTP/1.1 and you can get
started without installing it, just building it and doing:

  ./cmogstored --docroot=/path/to/static/

That listens on all addresses on port 7500 by default, so to get
"FOO" in /path/to/static, you hit: http://127.0.0.1:7500/FOO
You can add "-W $NPROC" to use multiple worker processes in case FD
allocation in the kernel becomes a problem. "-W" is undocumented
for MogileFS users, since I don't want to break compatibility for
people using the original Perl mogstored and I haven't found FD
allocation contention does not seem to be a problem in the
real world.

(*) I'm also completely slacking off by using Ragel to parse
    and snprintf(!) to generate response headers.

As well as performance, each individual request is run within a fiber.
This has both benefits and negatives - the negatives are that you
don't want to do computationally intensive work while generating a
response as you'll increase the latency of all other requests being
handled by that server process, the positive being that
computationally intensive work can be easily farmed off to a thread
pool, another process, etc, and I/O blocking workloads (e.g. HTTP RPC)
will work transparently (provided you use async capable IO instances)
and won't block the server.

Anyway, I thought others might be interested in this. There is a long
way to go to a 1.0 release but I think this is useful.

Cool. Thanks for sharing this; even if there's stuff below
I completely disagree with :slight_smile:

I've been thinking about what would make this kind of model faster -
in terms of how Ruby 3.0 might be changed to support this design. Here
is a complete brain dump of the various things I've been thinking
about:

- IO objects are inherently pretty heavy, calling read_nonblock and
write_nonblock is a mess. Internally, different systems do different
things (e.g. NET::HTTP manually implements timeouts by calling
read_nonblock and then wait_readable/wait_writable). The entire IO
system of Ruby is geared towards threads, but threads perform very
very poorly in MRI. By far the best design is multi-process for a ton
of different reasons.

Yes, IO objects are annoyingly big :<

Threads actually perform great for high throughput situations;
but yes, they're too big for dealing with network latency.

- IO objects expose a lot of behaviour which is irrelevant to most
use-cases (io/console, io/nonblock which doesn't seem to work at all).
This makes it hard to provide a clean high-level interface.

I'm not sure what you mean by "doesn't seem to work at all"

- All IO operations should be non-block with a super fast/simple API.
APIs which take complex lists of arguments, in the hot path, should be
avoided (exceptions: true for example). A separate function for
blocking and non-blocking IO is a huge cop-out.

NAK. I find value in using blocking accept/accept4 syscalls
(not emulating blocking with green threads/fibers + epoll/kqueue;
not even with EPOLLEXCLUSIVE)

TL; DR: I have studied the Linux kernel a bit and know
how to take advantage of it ---

This is because some blocking syscalls can take advantage of
"wake one" behavior in the Linux kernel to avoid thundering
herds. EPOLLEXCLUSIVE was added a few years ago to Linux to
appease some epoll users; but it's still worse for load
distribution at high accept rates. I'd rather embrace the fact
that epoll (and kqueue) themselves are (and must be) MT-friendly.

Similarly to accept, UNIXSocket#recv_io has the same behavior
with blocking recvmsg when the receiving socket is shared
between multiple processes.

cmogstored takes advantage of this "wake one" behavior by having
dedicated accept threads, so for users using the undocumented
"-W" option, it gives nearly perfect load balancing of
connections between workers.

Furthermore, non-blocking I/O on regular files and directories
does not exist in any portable or complete way on *nix
platforms. Threads (and processes)[2] are the only reasonable
ways to handle regular files and directories; even on NFS and other
network filesystems.

[2] inside Linux, they're both "tasks" with different levels of
    sharing; the clone(2) manpage might be helpful to understand
    this.

- TCPServer/TCPSocket/UDPSocket/UNIXServer/UNIXSocket are all broken
by design. This also includes somewhat Addrinfo class. It's hard to
provide non-blocking behaviour because so many things will turn a
string hostname into an IP address, calling `getaddrinfo`.

Perhaps resolv-replace.rb in the stdlib can make auto-Fiber
more useful (see below). And ruby-core could probably use some
help maintaining it, it's largely forgotten since the 1.8 days when
it was useful with green Threads. But yeah, getaddrinfo(3)
(along with all the other standardized name resolution APIs
before it) in the C standard library is a disaster for scalability.

- The `send` method for `IO` confusingly breaks `Object#send`. It
should just be `sendmsg` and `recvmsg` for datagrams and
`read`/`write` for streams. `UDPSocket#recv` and `Socket#recv` do
different things which is confusing.

`send` for streams is useful if you want to specify flags like
MSG_MORE and/or MSG_DONTWAIT. These flags are superior to
changing socket state via fcntl and setsockopt since they
require fewer syscalls and avoid races if multiple threads
operate on the same socket.

Instead, we should (and I think: have already been) downplaying
Object#send and encouraging Object#__send__ instead.

Also, I wish MSG_DONTWAIT were available for all files:

  https://cr.yp.to/unix/nonblock.html

- Fibers are fast, but I think they need to be *the* first class
concurrency construct in Ruby and made as fast as possible. I heard
that calling resume on a fiber does a syscall (if this is the case it
should be removed if possible).

We're working on auto-scheduling Fibers for 2.5:

  https://bugs.ruby-lang.org/issues/13618
  (but API design is hard and not my department)

As far as syscalls, it should be possible to recycle Fiber stacks
(like we do with Thread stacks) to avoid mmap/mprotect/munmap.
Maybe ko1 is working on that, too...

We shouldn't need to save+restore signal masks since the Ruby
doesn't change them at normal runtime; but I think that's being
done behind our backs by the *context library calls. We may
need to use setjmp/longjmp directly instead of
(make/get/swap)context.

- Threads as they are currently implemented should be removed from
Ruby 3.0 - they actually make for a very poor concurrency concept,
considering the GIL. They make all other operations more complex with
no real benefit given how they are currently implemented. Reasoning
about threads is bloody hard. It's even worse that the GIL hides a lot
of broken behaviour. What are threads useful for? IO concurrency? yes
but it's poor performing. Computational concurrency? not unless you
use JRuby, Rubinius, and even then, my experience with those platforms
has generally been sub-par.

Again, native threads are useful for filesystem I/O, despite the GVL.

I wish threads could be more useful by releasing GVL for readdir and
stat operations; but releasing+acquring GVL is expensive :<
Short term, I might complete my attempts to make GVL faster for 2.5.

- It's hard to reason about strings because the encoding may change at
any time. Currently if you have a string with binary encoding, and
append a UTF-8 string, it's encoding will change and this breaks all
future operations (if you are assuming it's still a binary string).
Leading to hacks like
async-io/lib/async/io/binary_string.rb at main · socketry/async-io · GitHub

*shrug* I make all my strings binary as soon as I get them.
But I'm just a simple *nix plumber; everything is a bunch of
bytes to me; even processes/threads/fibers.

I think that Ruby 3.0 should
- either remove the GIL, or remove Thread.

The former would be nice :slight_smile: As has been mentioned by others;
doing it without hurting single-thread performance is the hard
part.

I'm still hopeful we can take advantage of liburcu and steal
more ideas from the Linux kernel (unfortunately, Ruby did not go
to GPL-2+ back in the day), but liburcu is LGPL-2.1+ and we
already use libgmp optionally.

- simplify IO classes and allow permanent non-blocking mode (e.g.
io.nonblocking = true; io.read gives data or :wait_readable).

That's backwards-incompatible and I'd rather we keep using
*_nonblock. In Ruby, 2.5 *_nonblock will take advantage of
MSG_DONTWAIT and avoid unnecessary fcntl for sockets under
Linux: Feature #13362: [PATCH] socket: avoid fcntl for read/write_nonblock on Linux - Ruby master - Ruby Issue Tracking System

- ensure that Fiber is as fast as possible (creation, scheduling, etc).

AFAIK, ko1 is working on it for 2.5

- remove broken-by-design IO related classes (move to gem for
backwards compatibility?)

Make them skinnier, yes. I'm not sure how we can remove them
and dividing up core functionality makes it more difficult to
maintain.

- read/write should be able to append to a byte string efficiently. A
byte buffer designed for fast append, index and fast slice! is a must
for most high-level protocols.

Perhaps offsets to IO read/write operations can do this:

  Feature #11484: add output offset for readpartial/read_nonblock/etc - Ruby master - Ruby Issue Tracking System

For those familiar with Perl5, Perl has had sysread and syswrite
functions which are capable of taking offsets to avoid unnecessary
copying.

I'm not sure how the API would be done in Ruby, though;
and kwargs is deficient in our current C API:

  Feature #13434: better method definition in C API - Ruby master - Ruby Issue Tracking System

(I consider 13434 higher priority)

- perhaps support readv and writev under the hood - this would allow
you to write an array of buffers without needing to concatenate them
which saves on syscalls/memcpy.

Agreed for writev. No idea how a readv API would even work for Ruby...

Anyways, thank you for telling us your concerns. ruby-core will
try to do our best to improve Ruby without breaking existing code.

···

Samuel Williams <space.ship.traveller@gmail.com> wrote:

Eric, thanks so much for the detailed reply and understanding the
intent of my original message so well.

As you've been so generous to me with your reply, I'm going to try to
do the same for you.

Cool. Thanks for sharing this; even if there's stuff below
I completely disagree with :slight_smile:

It wouldn't be a good discussion if everyone agreed with each other :slight_smile:

Threads actually perform great for high throughput situations;
but yes, they're too big for dealing with network latency.

I hear what you are saying. From my point of view, the problem with
the GIL/Threads is that you essentially get all the problems of
Threads with non of the benefits. It's simply impossible for two pure
ruby functions to execute at the same time in MRI. The only point is
for IO multiplexing and it's really not a great solution, with large
numbers of inflight requests being the main concern.

- IO objects expose a lot of behaviour which is irrelevant to most
use-cases (io/console, io/nonblock which doesn't seem to work at all).
This makes it hard to provide a clean high-level interface.

I'm not sure what you mean by "doesn't seem to work at all"

[1] pry(main)> require 'io/nonblock'
[2] pry(main)> i, o = IO.pipe
=> [#<IO:fd 11>, #<IO:fd 12>]
[3] pry(main)> i.nonblock?
=> false
[4] pry(main)> i.nonblock = true
=> true
[5] pry(main)> i.nonblock?
=> true
[6] pry(main)> i.read
asdf
^CInterrupt:
from (pry):6:in `read'
[7] pry(main)> i.read(1024)
^CInterrupt:
from (pry):7:in `read'
[8] pry(main)> i.read_nonblock(1024)
IO::EAGAINWaitReadable: Resource temporarily unavailable - read would block
from <internal:prelude>:77:in `__read_nonblock'

I would have assumed line 6 should behave the same as line 8, but
perhaps I just don't understand how that API works. The documentation
is very sparse.

- All IO operations should be non-block with a super fast/simple API.
APIs which take complex lists of arguments, in the hot path, should be
avoided (exceptions: true for example). A separate function for
blocking and non-blocking IO is a huge cop-out.

NAK. I find value in using blocking accept/accept4 syscalls
(not emulating blocking with green threads/fibers + epoll/kqueue;
not even with EPOLLEXCLUSIVE)

TL; DR: I have studied the Linux kernel a bit and know
how to take advantage of it ---

This is because some blocking syscalls can take advantage of
"wake one" behavior in the Linux kernel to avoid thundering
herds. EPOLLEXCLUSIVE was added a few years ago to Linux to
appease some epoll users; but it's still worse for load
distribution at high accept rates. I'd rather embrace the fact
that epoll (and kqueue) themselves are (and must be) MT-friendly.

Similarly to accept, UNIXSocket#recv_io has the same behavior
with blocking recvmsg when the receiving socket is shared
between multiple processes.

Yes, I looked at this.

I'm not convinced it's the right way to write a high performance server.

Using SO_REUSEPORT, you can simply spin up as many processes as you
like, each listening on the same socket. The OS determines which
process the request goes to.

It's currently broken on macOS, but works beautifully and scales
magnificently on Linux. In theory it also works on BSD.

Furthermore, non-blocking I/O on regular files and directories
does not exist in any portable or complete way on *nix
platforms. Threads (and processes)[2] are the only reasonable
ways to handle regular files and directories; even on NFS and other
network filesystems.

[2] inside Linux, they're both "tasks" with different levels of
    sharing; the clone(2) manpage might be helpful to understand
    this.

Yes, it's an interesting conundrum - avoiding blocking may simply be
an impossible goal. Actually, with pre-emptive multi-tasking, that's
basically a given.

However, we can avoid it for most common operations, which is a good
start. In practice, thread pools (e.g. as used in libuv for blocking
operations like getaddrinfo) might solve the majority of problems.

If your app is going to be slow due to resolving addresses, reading
directories, and so on - it doesn't matter if the operation is
blocking or not - latency is going to be affected. It's just that it
also affects multiplexed operations.

- TCPServer/TCPSocket/UDPSocket/UNIXServer/UNIXSocket are all broken
by design. This also includes somewhat Addrinfo class. It's hard to
provide non-blocking behaviour because so many things will turn a
string hostname into an IP address, calling `getaddrinfo`.

Perhaps resolv-replace.rb in the stdlib can make auto-Fiber
more useful (see below). And ruby-core could probably use some
help maintaining it, it's largely forgotten since the 1.8 days when
it was useful with green Threads. But yeah, getaddrinfo(3)
(along with all the other standardized name resolution APIs
before it) in the C standard library is a disaster for scalability.

- The `send` method for `IO` confusingly breaks `Object#send`. It
should just be `sendmsg` and `recvmsg` for datagrams and
`read`/`write` for streams. `UDPSocket#recv` and `Socket#recv` do
different things which is confusing.

`send` for streams is useful if you want to specify flags like
MSG_MORE and/or MSG_DONTWAIT. These flags are superior to
changing socket state via fcntl and setsockopt since they
require fewer syscalls and avoid races if multiple threads
operate on the same socket.

Agreed.

Instead, we should (and I think: have already been) downplaying
Object#send and encouraging Object#__send__ instead.

It reminds be of Python :o

Also, I wish MSG_DONTWAIT were available for all files:

        https://cr.yp.to/unix/nonblock.html

- Fibers are fast, but I think they need to be *the* first class
concurrency construct in Ruby and made as fast as possible. I heard
that calling resume on a fiber does a syscall (if this is the case it
should be removed if possible).

We're working on auto-scheduling Fibers for 2.5:

        https://bugs.ruby-lang.org/issues/13618
        (but API design is hard and not my department)

As far as syscalls, it should be possible to recycle Fiber stacks
(like we do with Thread stacks) to avoid mmap/mprotect/munmap.
Maybe ko1 is working on that, too...

We shouldn't need to save+restore signal masks since the Ruby
doesn't change them at normal runtime; but I think that's being
done behind our backs by the *context library calls. We may
need to use setjmp/longjmp directly instead of
(make/get/swap)context.

All good ideas if they improve performance.

Auto-scheduling Fibers seems like an interesting idea. Making core
Ruby heavy seems like a mistake though.

Why not just a gem, and provide the necessary hooks? Async does
exactly what is proposed in this issue but with no modifications to
core Ruby, building on well-established C libraries where possible.

- Threads as they are currently implemented should be removed from
Ruby 3.0 - they actually make for a very poor concurrency concept,
considering the GIL. They make all other operations more complex with
no real benefit given how they are currently implemented. Reasoning
about threads is bloody hard. It's even worse that the GIL hides a lot
of broken behaviour. What are threads useful for? IO concurrency? yes
but it's poor performing. Computational concurrency? not unless you
use JRuby, Rubinius, and even then, my experience with those platforms
has generally been sub-par.

Again, native threads are useful for filesystem I/O, despite the GVL.

I wish threads could be more useful by releasing GVL for readdir and
stat operations; but releasing+acquring GVL is expensive :<
Short term, I might complete my attempts to make GVL faster for 2.5.

In my testing the GVL is a significantly source of latency and
convention in threaded servers.

I should make a comparison for you with real numbers. 8 threads vs 8 processes.

- It's hard to reason about strings because the encoding may change at
any time. Currently if you have a string with binary encoding, and
append a UTF-8 string, it's encoding will change and this breaks all
future operations (if you are assuming it's still a binary string).
Leading to hacks like
async-io/lib/async/io/binary_string.rb at main · socketry/async-io · GitHub

*shrug* I make all my strings binary as soon as I get them.
But I'm just a simple *nix plumber; everything is a bunch of
bytes to me; even processes/threads/fibers.

I agree.

Ruby String encoding sometimes causes odd and unexpected performance issues.

If the goal here is maximum throughput and utilisation of available
hardware, existing approaches simply aren't going to work.

I think that Ruby 3.0 should
- either remove the GIL, or remove Thread.

The former would be nice :slight_smile: As has been mentioned by others;
doing it without hurting single-thread performance is the hard
part.

How does it hurt single threaded performance?

People have already made implementations of Ruby without the GVL.

I'm still hopeful we can take advantage of liburcu and steal
more ideas from the Linux kernel (unfortunately, Ruby did not go
to GPL-2+ back in the day), but liburcu is LGPL-2.1+ and we
already use libgmp optionally.

- simplify IO classes and allow permanent non-blocking mode (e.g.
io.nonblocking = true; io.read gives data or :wait_readable).

That's backwards-incompatible and I'd rather we keep using
*_nonblock. In Ruby, 2.5 *_nonblock will take advantage of
MSG_DONTWAIT and avoid unnecessary fcntl for sockets under
Linux: Feature #13362: [PATCH] socket: avoid fcntl for read/write_nonblock on Linux - Ruby master - Ruby Issue Tracking System

That's a good idea.

The problem is, the behaviour of the underlying IO is leaking out
through the function name, which I find ugly. It means that every
function has two versions, and in addition to that, the terrible idea
to use exceptions, then compounded by the "fix" to use a keyword
argument on every function call, which only works on some versions of
Ruby, and you end up with this:

It would be better if Ruby just implemented the core read/write and
nonblocking semantics as one might expect, and then let library
authors take care of the rest. Instead, I feel like the current IO
situation in Ruby is over-engineered and facing an identity crisis.
Even for something as simple as read into a string buffer, has a huge
performance and cognitive overhead.

There is almost no case where one would want both blocking and
non-blocking semantics on the same socket.

- ensure that Fiber is as fast as possible (creation, scheduling, etc).

AFAIK, ko1 is working on it for 2.5

- remove broken-by-design IO related classes (move to gem for
backwards compatibility?)

Make them skinnier, yes. I'm not sure how we can remove them
and dividing up core functionality makes it more difficult to
maintain.

I'd argue that it makes it easier to maintain, since you can
independently break backwards compatibility of specific sub-systems,
and individual applications can depend on old versions, etc.

- read/write should be able to append to a byte string efficiently. A
byte buffer designed for fast append, index and fast slice! is a must
for most high-level protocols.

Perhaps offsets to IO read/write operations can do this:

  Feature #11484: add output offset for readpartial/read_nonblock/etc - Ruby master - Ruby Issue Tracking System

For those familiar with Perl5, Perl has had sysread and syswrite
functions which are capable of taking offsets to avoid unnecessary
copying.

I'm not sure how the API would be done in Ruby, though;
and kwargs is deficient in our current C API:

  Feature #13434: better method definition in C API - Ruby master - Ruby Issue Tracking System

(I consider 13434 higher priority)

Interesting, I didn't realise how messy the implementation of
`exceptions: true` and kwargs were in practice.

- perhaps support readv and writev under the hood - this would allow
you to write an array of buffers without needing to concatenate them
which saves on syscalls/memcpy.

Agreed for writev. No idea how a readv API would even work for Ruby...

You'd have to implement a custom buffering class which behaves like
String but maintains separate chunks of allocated memory. It's
actually a pretty good idea as it avoids generating a lot of garbage.

Thanks for your time.

Eric, thanks so much for the detailed reply and understanding the
intent of my original message so well.

No problem :slight_smile:

As you've been so generous to me with your reply, I'm going to try to
do the same for you.

> Cool. Thanks for sharing this; even if there's stuff below
> I completely disagree with :slight_smile:

It wouldn't be a good discussion if everyone agreed with each other :slight_smile:

> Threads actually perform great for high throughput situations;
> but yes, they're too big for dealing with network latency.

I hear what you are saying. From my point of view, the problem with
the GIL/Threads is that you essentially get all the problems of
Threads with non of the benefits. It's simply impossible for two pure
ruby functions to execute at the same time in MRI. The only point is
for IO multiplexing and it's really not a great solution, with large
numbers of inflight requests being the main concern.

Of course Ruby is not only for C100K clients/servers.
Yes, I find Threads currently have useful cases (see below)

>> - IO objects expose a lot of behaviour which is irrelevant to most
>> use-cases (io/console, io/nonblock which doesn't seem to work at all).
>> This makes it hard to provide a clean high-level interface.
>
> I'm not sure what you mean by "doesn't seem to work at all"

[1] pry(main)> require 'io/nonblock'
[2] pry(main)> i, o = IO.pipe
=> [#<IO:fd 11>, #<IO:fd 12>]
[3] pry(main)> i.nonblock?
=> false
[4] pry(main)> i.nonblock = true
=> true
[5] pry(main)> i.nonblock?
=> true
[6] pry(main)> i.read
asdf
^CInterrupt:
from (pry):6:in `read'
[7] pry(main)> i.read(1024)
^CInterrupt:
from (pry):7:in `read'
[8] pry(main)> i.read_nonblock(1024)
IO::EAGAINWaitReadable: Resource temporarily unavailable - read would block
from <internal:prelude>:77:in `__read_nonblock'

I would have assumed line 6 should behave the same as line 8, but
perhaps I just don't understand how that API works. The documentation
is very sparse.

I suppose we can improve documentation (can you provide a patch? :slight_smile:

I think this IO#read behavior was inherited from Ruby 1.8; where
all sockets/pipes were internally non-blocking for green
Threads. Anyways, I think exposing synchronous behavior by
default is easier for end users.

>> - All IO operations should be non-block with a super fast/simple API.
>> APIs which take complex lists of arguments, in the hot path, should be
>> avoided (exceptions: true for example). A separate function for
>> blocking and non-blocking IO is a huge cop-out.
>
> NAK. I find value in using blocking accept/accept4 syscalls
> (not emulating blocking with green threads/fibers + epoll/kqueue;
> not even with EPOLLEXCLUSIVE)
>
> TL; DR: I have studied the Linux kernel a bit and know
> how to take advantage of it ---
>
> This is because some blocking syscalls can take advantage of
> "wake one" behavior in the Linux kernel to avoid thundering
> herds. EPOLLEXCLUSIVE was added a few years ago to Linux to
> appease some epoll users; but it's still worse for load
> distribution at high accept rates. I'd rather embrace the fact
> that epoll (and kqueue) themselves are (and must be) MT-friendly.
>
> Similarly to accept, UNIXSocket#recv_io has the same behavior
> with blocking recvmsg when the receiving socket is shared
> between multiple processes.

Yes, I looked at this.

I'm not convinced it's the right way to write a high performance server.

Using SO_REUSEPORT, you can simply spin up as many processes as you
like, each listening on the same socket. The OS determines which
process the request goes to.

It's currently broken on macOS, but works beautifully and scales
magnificently on Linux. In theory it also works on BSD.

I still support Linux 2.6.18 and 2.6.32 in cmogstored,
and SO_REUSEPORT only exists in 3.9+

How does SO_REUSEPORT handle process shutdown these days?

I remember there were problems in earlier implementations losing
connections if a process closed/exited a listener which had a
socket queued up for it, but haven't followed up on that. I
think I saw a bit on haproxy being successful with it, though.

Implementation-wise, having dedicated acceptor thread simplifies
the main event loop for epoll_ctl use: EPOLL_CTL_ADD is only
called once per client in the dedicated accept thread, the main
worker threads will not have to check anything and can always
call EPOLL_CTL_MOD without caring about EPOLL_CTL_ADD.

One thread per listener is negligible overhead when I have
dozens/hundreds of disks and need >=1 threads per disk.

> Furthermore, non-blocking I/O on regular files and directories
> does not exist in any portable or complete way on *nix
> platforms. Threads (and processes)[2] are the only reasonable
> ways to handle regular files and directories; even on NFS and other
> network filesystems.
>
> [2] inside Linux, they're both "tasks" with different levels of
> sharing; the clone(2) manpage might be helpful to understand
> this.
>

Yes, it's an interesting conundrum - avoiding blocking may simply be
an impossible goal. Actually, with pre-emptive multi-tasking, that's
basically a given.

However, we can avoid it for most common operations, which is a good
start. In practice, thread pools (e.g. as used in libuv for blocking
operations like getaddrinfo) might solve the majority of problems.

getaddrinfo in a thread pool is wasteful, and thread pools can
easily suffer from head-of-line blocking.

Same applies to AIO, which uses thread pools, more the footnote:
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/81643

If your app is going to be slow due to resolving addresses, reading
directories, and so on - it doesn't matter if the operation is
blocking or not - latency is going to be affected. It's just that it
also affects multiplexed operations.

Right. The bigger problem is head-of-line blocking for unrelated
events/clients accessing different resources.

In a webserver, some clients are accessing contended resources,
they will encounter latency. However, that latency should not
affect other clients accessing fast resources at the same time.

Thats why I want Ruby to continue to have access to native
threads; it gives folks aware of these limitations the ability
to engineer solutions around it.

<snip>

>> - Fibers are fast, but I think they need to be *the* first class
>> concurrency construct in Ruby and made as fast as possible. I heard
>> that calling resume on a fiber does a syscall (if this is the case it
>> should be removed if possible).
>
> We're working on auto-scheduling Fibers for 2.5:
>
> https://bugs.ruby-lang.org/issues/13618
> (but API design is hard and not my department)

<snip>

Auto-scheduling Fibers seems like an interesting idea. Making core
Ruby heavy seems like a mistake though.

Why not just a gem, and provide the necessary hooks? Async does
exactly what is proposed in this issue but with no modifications to
core Ruby, building on well-established C libraries where possible.

See my response on that ticket
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/81643

>
>> - Threads as they are currently implemented should be removed from
>> Ruby 3.0 - they actually make for a very poor concurrency concept,
>> considering the GIL. They make all other operations more complex with
>> no real benefit given how they are currently implemented. Reasoning
>> about threads is bloody hard. It's even worse that the GIL hides a lot
>> of broken behaviour. What are threads useful for? IO concurrency? yes
>> but it's poor performing. Computational concurrency? not unless you
>> use JRuby, Rubinius, and even then, my experience with those platforms
>> has generally been sub-par.
>
> Again, native threads are useful for filesystem I/O, despite the GVL.
>
> I wish threads could be more useful by releasing GVL for readdir and
> stat operations; but releasing+acquring GVL is expensive :<
> Short term, I might complete my attempts to make GVL faster for 2.5.

In my testing the GVL is a significantly source of latency and
convention in threaded servers.

I should make a comparison for you with real numbers. 8 threads vs 8 processes.

Of course, GVL hurts performance, even in single-thread cases.

Maybe work-in-progress patch for GVL using futex will help Linux
users with contention:

  https://80x24.org/spew/20170509062022.4413-1-e@80x24.org/raw

But I'm not satisfied with the single-core regression and
will try to fix it as time allows.

<snip>

>> I think that Ruby 3.0 should
>> - either remove the GIL, or remove Thread.
>
> The former would be nice :slight_smile: As has been mentioned by others;
> doing it without hurting single-thread performance is the hard
> part.

How does it hurt single threaded performance?

AFAIK a test removal was to use fine-grained locks everywhere;
so that meant memory synchronization overhead; same problem
you'll see with releasing+reacquiring GVL for fast ops in 1
thread.

This is why readdir, stat, unlink wrappers in Ruby still hold
GVL: for the hot cache case. Sadly, that means the entire
Ruby process will stall when NFS goes out to lunch,
instead of just a single thread stalling.

People have already made implementations of Ruby without the GVL.

I think it's possible without regressions, just takes time:

> I'm still hopeful we can take advantage of liburcu and steal
> more ideas from the Linux kernel (unfortunately, Ruby did not go
> to GPL-2+ back in the day), but liburcu is LGPL-2.1+ and we
> already use libgmp optionally.

But, neither matz nor ko1 are big on promoting the existing
Thread API, but want to develop new/better actor APIs which
would be safer. *shrug*

>> - simplify IO classes and allow permanent non-blocking mode (e.g.
>> io.nonblocking = true; io.read gives data or :wait_readable).
>
> That's backwards-incompatible and I'd rather we keep using
> *_nonblock. In Ruby, 2.5 *_nonblock will take advantage of
> MSG_DONTWAIT and avoid unnecessary fcntl for sockets under
> Linux: Feature #13362: [PATCH] socket: avoid fcntl for read/write_nonblock on Linux - Ruby master - Ruby Issue Tracking System

That's a good idea.

The problem is, the behaviour of the underlying IO is leaking out
through the function name, which I find ugly. It means that every
function has two versions, and in addition to that, the terrible idea
to use exceptions, then compounded by the "fix" to use a keyword
argument on every function call, which only works on some versions of
Ruby, and you end up with this:

async-io/lib/async/io/generic.rb at 97d46edfbe849df608b79eefc81773548d24cb9d · socketry/async-io · GitHub

Sorry, I wasn't around when the exceptions were added in
1.8/1.9; and also for not getting "exception: false" added
sooner.

It would be better if Ruby just implemented the core read/write and
nonblocking semantics as one might expect, and then let library
authors take care of the rest. Instead, I feel like the current IO
situation in Ruby is over-engineered and facing an identity crisis.
Even for something as simple as read into a string buffer, has a huge
performance and cognitive overhead.

I try to stay away from API design; but I prefer
non-blocking/blocking semantics to be per-call rather than
stateful to the object.

It makes it easier to figure out what the caller expects when
reading someone elses at code.

At least for Linux + Ruby 2.5, we can avoid fcntl syscalls, too.
And maybe one day the proposed API in
<https://cr.yp.to/unix/nonblock.html&gt; can become available.

There is almost no case where one would want both blocking and
non-blocking semantics on the same socket.

I have :slight_smile: http://mid.gmane.org/20150513023712.GA4206@dcvr.yhbt.net

···

Samuel Williams <space.ship.traveller@gmail.com> wrote:

I was just reminded about one issue that came up recently due to
_nonblock variants.

SSLSocket doend't implement accept_nonblock. It actually calls the
underlying implementation's `accept` method.

So, the problem is, should SSLSocket implement _nonblock methods?

I think it's a leaky abstraction. SSLSocket doesn't care if the
underlying implementation blocks or doesn't. It's logic is not
affected.

The problem is, it duplicates all the functions even at a higher level
where the difference is no longer relevant.

I think that nonblocking should be a property of the socket, not the
function call.

I was just reminded about one issue that came up recently due to
_nonblock variants.

SSLSocket doend't implement accept_nonblock. It actually calls the
underlying implementation's `accept` method.

Huh? r23029

I guess you mean SSLServer doesn't implement accept_nonblock.
The SSLSocket#accept_nonblock API is a bit weird, but that's
because of OpenSSL(*)

Anyways, maybe SSLServer should have accept_nonblock...
(care to make the patch?)

(*) In OpenSSL, SSL_accept(3ssl) requires multiple calls with
    nonblocking sockets. This is because the TLS handshake
    requires extra roundtrips (using read/write in TCP) to
    negotiate. With TCP-only sockets, the kernel can finish
    negotiating the TCP handshake before userland even calls accept.

So, the problem is, should SSLSocket implement _nonblock methods?

Of course; and since it's been implemented for a few years,
already; it's there to stay.

I think it's a leaky abstraction. SSLSocket doesn't care if the
underlying implementation blocks or doesn't. It's logic is not
affected.

The problem is, it duplicates all the functions even at a higher level
where the difference is no longer relevant.

*shrug*

I think that nonblocking should be a property of the socket, not the
function call.

Think of it another way:

Would you want a epoll_wait or kevent timeout to be associated
with the epoll or kevent file description?

Or would you like your code to be able to control how long
each call to kevent/epoll_wait can sleep for with every call?
(the current situation with kevent and epoll_wait)

In an ideal world, maybe all I/O syscalls could have a
timeout arg; instead of socket-specific SO_RCVTIMEO/SO_SNDTIMEO.
Using a zero timeout would be today's MSG_DONTWAIT.

Realistically, zero timeout and infinite timeout are the
cheapest to implement; so that's what we have with current
sockets and MSG_DONTWAIT, at least.

···

Samuel Williams <space.ship.traveller@gmail.com> wrote:

The issue of implementing accept_nonblock for SSLSocket came up in a
PR, so I didn't look into exactly why it was needed, but I did find
another example here:

https://github.com/puma/puma/blob/master/lib/puma/accept_nonblock.rb

Perhaps it's working around buggy behaviour?

I'll need to try it out and investigate further. I just know that it
has come up. There is a new PR being worked on, once I understand a
bit more why it's an issue I'll report back.

Would you want a epoll_wait or kevent timeout to be associated with the epoll or kevent file description?

Nope, because clearly the timeout is an explicit part of the API -
wait for a given duration until some events happen. That's entirely
different from calling #read on a file descriptor. There is a clear
use case here - writing code but having it transparently multiplex
with other IO operations at the same time. `_nonblock` exposes the
implementation details and it's infecting all high level APIs in a
way that makes it very hard to reason about, and you get things like
Net::HTTP which explicitly call read_nonblock and handle timeouts by
calling wait_readable and wait_writable, while other APIs simply call
read/write.

My opinion of timeouts, is that they shouldn't be contracts for
individual operations - e.g. you have 10s to make this connection, and
10s to read this data. I think that's an inherently faulty way of
controlling non-determinism in networking. What's the correct timeout
for any given operation? 10s? 100s? 1000s? Only the user can tell you
that. Therefore, I feel that libraries that implement timeouts are
fundamentally designed wrong. async exposes a high level API which is
valid for any blocking operation and provides guarantees on behaviour
- simply wrap any blocking API in timeout(x) { ... io.read ... } and
if x seconds pass, io.read will fail with TimeoutError. Libraries only
need to handle exceptions correctly, and they will then handle
timeouts correctly, and the timeout behaviour is a policy imposed by
the user of the library, not the library itself.

The only case for timeouts within a library, that I found, was
unreliable datagrams, and sending requests to multiple servers. In
this case, temporal relationship of IO is a part of the protocol
though so it's a bit more specific and I feel it's okay.

My feeling is that the API should be super simple - eg for stream
sockets something like #bind, #accept, #read, #write, #shutdown,
#close. We can reason about those operations. If read is blocking or
not, is irrelevant except at the lowest level when implementing the
event reactor. So, it's unfortunate that those _nonblock APIs have
bubbled up the stack. It would have been much better if when `nonblock
= true` that `#read` would return :wait_readable IMHO.

The issue of implementing accept_nonblock for SSLSocket came up in a
PR, so I didn't look into exactly why it was needed, but I did find
another example here:

https://github.com/puma/puma/blob/master/lib/puma/accept_nonblock.rb

Perhaps it's working around buggy behaviour?

Nope, that's fine. Should support `exception: false` in Ruby 2.3+,
though.

I'll need to try it out and investigate further. I just know that it
has come up. There is a new PR being worked on, once I understand a
bit more why it's an issue I'll report back.

> Would you want a epoll_wait or kevent timeout to be associated with the epoll or kevent file description?

Nope, because clearly the timeout is an explicit part of the API -
wait for a given duration until some events happen. That's entirely
different from calling #read on a file descriptor. There is a clear
use case here - writing code but having it transparently multiplex
with other IO operations at the same time. `_nonblock` exposes the
implementation details and it's infecting all high level APIs in a
way that makes it very hard to reason about, and you get things like
Net::HTTP which explicitly call read_nonblock and handle timeouts by
calling wait_readable and wait_writable, while other APIs simply call
read/write.

But when dealing with sockets, SO_RCVTIMEO and SO_SNDTIMEO are
also parts of the socket API. They're just hidden into the
socket's internal state via {get,set}sockopt, identical to how
O_NONBLOCK is hidden in the file description's flags via fcntl.

IMHO, that makes things more confusing, and checking state
requires making an extra syscall.

My opinion of timeouts, is that they shouldn't be contracts for
individual operations - e.g. you have 10s to make this connection, and
10s to read this data. I think that's an inherently faulty way of
controlling non-determinism in networking. What's the correct timeout
for any given operation? 10s? 100s? 1000s? Only the user can tell you
that. Therefore, I feel that libraries that implement timeouts are
fundamentally designed wrong. async exposes a high level API which is
valid for any blocking operation and provides guarantees on behaviour
- simply wrap any blocking API in timeout(x) { ... io.read ... } and
if x seconds pass, io.read will fail with TimeoutError. Libraries only
need to handle exceptions correctly, and they will then handle
timeouts correctly, and the timeout behaviour is a policy imposed by
the user of the library, not the library itself.

*shrug* I've been back and forth there, myself; and on whether to
work on getting the stdlib Timeout implemented in a more
efficient way.

And also having Timeout be auto-Fiber aware, if auto-Fibers get
accepted into core.

The only case for timeouts within a library, that I found, was
unreliable datagrams, and sending requests to multiple servers. In
this case, temporal relationship of IO is a part of the protocol
though so it's a bit more specific and I feel it's okay.

My feeling is that the API should be super simple - eg for stream
sockets something like #bind, #accept, #read, #write, #shutdown,
#close. We can reason about those operations. If read is blocking or
not, is irrelevant except at the lowest level when implementing the
event reactor. So, it's unfortunate that those _nonblock APIs have
bubbled up the stack. It would have been much better if when `nonblock
= true` that `#read` would return :wait_readable IMHO.

*shrug* I guess we'll just have to agree to disagree; and it is
what it is right now.

I could never suggest breaking compatibility when going to Ruby
2 -> 3. We should learn from Python's mistake. Perl mostly
did the right thing by not touching Perl5; and I guess that's
also why nobody really cares for Perl6 :o

···

Samuel Williams <space.ship.traveller@gmail.com> wrote:

SSLSocket has #accept_nonblock:

https://github.com/socketry/socketry/blob/master/lib/socketry/ssl/socket.rb#L104

However, SSLServer does not. Implementing that is a little tricky. See here:

The problem is in the context of an SSLSocket, "accept" means "run the
server side of the SSL handshake", i.e. wait for ClientHello, send
ServerHello, various other messages until we hit Finished.

This is quite a bit different from the typical notion of "accept", because
we've already established a TCP session. This means the socket is in a sort
of "half open" state at this point, where the TCP portion is finished, but
the SSL portion is not.

So really we can't design a server API which exposes this behavior: it
requires multiplexing, and at that point you don't want "*_nonblock"
methods, you want something that abstracts over the reactor with
callbacks/promises (or in your case, fibers).

···

On Fri, Jun 9, 2017 at 9:33 PM, Samuel Williams < space.ship.traveller@gmail.com> wrote:

The issue of implementing accept_nonblock for SSLSocket came up in a
PR, so I didn't look into exactly why it was needed

--
Tony Arcieri

> The issue of implementing accept_nonblock for SSLSocket came up in a
> PR, so I didn't look into exactly why it was needed

SSLSocket has #accept_nonblock:

https://github.com/socketry/socketry/blob/master/lib/socketry/ssl/socket.rb#L104

However, SSLServer does not. Implementing that is a little tricky. See here:

socketry/lib/socketry/ssl/server.rb at master · socketry/socketry · GitHub

The problem is in the context of an SSLSocket, "accept" means "run the
server side of the SSL handshake", i.e. wait for ClientHello, send
ServerHello, various other messages until we hit Finished.

This is quite a bit different from the typical notion of "accept", because
we've already established a TCP session. This means the socket is in a sort
of "half open" state at this point, where the TCP portion is finished, but
the SSL portion is not.

Well said. I maybe it's not feasable to do SSLServer#accept_nonblock
in a portable way(*). I ended up doing something like this in
yahns:

  def initialize(sock, ssl_ctx)
    @need_accept = true
    @ssl = OpenSSL::SSL::SSLSocket.new(sock, ssl_ctx)
  end

  def tryread(len, buf)
    if @need_accept
      case rv = @ssl.accept_nonblock(exception: false)
      when :wait_readable, :wait_writable, nil
        return rv
      end
      @need_accept = false
    end
    @ssl.read_nonblock(len, buf, exception: false)
  end

  def trywrite(buf)
    # only for NNTPS and other protocols where server speaks first; not HTTPS
    if @need_accept
      case rv = @ssl.accept_nonblock(exception: false)
      when :wait_readable, :wait_writable, nil
        return rv
      end
      @need_accept = false
    end
    @ssl.write_nonblock(buf, exception: false)
  end

So really we can't design a server API which exposes this behavior: it
requires multiplexing, and at that point you don't want "*_nonblock"
methods, you want something that abstracts over the reactor with
callbacks/promises (or in your case, fibers).

(*) Non-portably: it could probably be done by rewriting
    SSLServer to use a dedicated epoll/kqueue FD. SSLServer#to_io
    would return the IO wrapper for that epoll/kqueue FD instead of
    the TCPServer socket. The TCPServer would be persistently
    watched by epoll/kqueue, along with any TCP sockets which
    are processing various steps of SSL_accept.

    Using a background thread + pipe + select might be
    a portable option, too, but probably too ugly...
    (libkqueue does something similar)

···

Tony Arcieri <bascule@gmail.com> wrote:

On Fri, Jun 9, 2017 at 9:33 PM, Samuel Williams < > space.ship.traveller@gmail.com> wrote:

(*) Non-portably: it could probably be done by rewriting
    SSLServer to use a dedicated epoll/kqueue FD. SSLServer#to_io
    would return the IO wrapper for that epoll/kqueue FD instead of
    the TCPServer socket. The TCPServer would be persistently
    watched by epoll/kqueue, along with any TCP sockets which
    are processing various steps of SSL_accept.

    Using a background thread + pipe + select might be
    a portable option, too, but probably too ugly...
    (libkqueue does something similar)

Yes, totally doable portably; but I think it's worthless bloat.

If anybody wants to submit this to https://bugs.ruby-lang.org/
go ahead. I won't.

-----8<-----

Yuck. I guess it works.

···

Subject: [PATCH] openssl: implement OpenSSL::SSL::SSLServer#accept_nonblock
---
ext/openssl/lib/openssl/ssl.rb | 139 ++++++++++++++++++++++++++++++++++++++++-
test/openssl/test_pair.rb | 33 ++++++++++
2 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/ext/openssl/lib/openssl/ssl.rb b/ext/openssl/lib/openssl/ssl.rb
index f40a451439..9f35daad30 100644
--- a/ext/openssl/lib/openssl/ssl.rb
+++ b/ext/openssl/lib/openssl/ssl.rb
@@ -333,6 +333,25 @@ def session_get_cb
       end
     end

+ # We need this so the result of SSLServer#to_io can handle
+ # local_address and other *Socket-only methods when it is a
+ # pipe, due to the ugly accept_nonblock implementation below
+ class SSLServerReadyPipe < IO # :nodoc:
+ def self.ssls_pipe(svr) # :nodoc:
+ ret = pipe
+ ret[0].instance_variable_set(:@svr, svr)
+ ret
+ end
+
+ def method_missing(m, *args) # :nodoc:
+ @svr.__send__(m, *args)
+ end
+
+ def respond_to?(*args) # :nodoc:
+ super || @svr.respond_to?(*args)
+ end
+ end
+
     ##
     # SSLServer represents a TCP/IP server socket with Secure Sockets Layer.
     class SSLServer
@@ -353,11 +372,125 @@ def initialize(svr, ctx)
           @ctx.session_id_context = session_id
         end
         @start_immediately = true
+
+ # all to support accept_nonblock
+ @pid = @th = nil
+ @ready_pipe =
+ @rset = {} # SSLSocket -> IO
+ @wset = {} # SSLSocket -> IO
       end

       # Returns the TCPServer passed to the SSLServer when initialized.
       def to_io
- @svr
+ @pid == $$ ? @ready_pipe[0] : @svr
+ end
+
+ # enqueue the ready socket (runs in background thread)
+ def accept_nb_ready(ssl)
+ @ready_list.enq(ssl)
+ @ready_pipe[1].write(-'s')
+ end
+
+ # can't raise in background thread...
+ def bg_err(e)
+ accept_nb_ready(e)
+ end
+
+ # begin the SSL_accept process (runs in background thread)
+ def accept_nb_start
+ sock, _ = @svr.accept_nonblock(exception: false)
+ return if sock == :wait_readable # spurious wakeup or lost race
+
+ ssl = OpenSSL::SSL::SSLSocket.new(sock, @ctx)
+ ssl.sync_close = true
+ if @start_immediately
+ case ssl.accept_nonblock(exception: false)
+ when :wait_readable
+ return @rset[ssl] = ssl.to_io
+ when :wait_writable
+ return @wset[ssl] = ssl.to_io
+ # else : fall through
+ end
+ end
+ accept_nb_ready(ssl)
+ rescue => e
+ sock&.close
+ bg_err(e)
+ end
+
+ def accept_nb_worker
+ Thread.new do
+ begin
+ r = @rset.keys
+ r << @svr
+ r = IO.select(r, @wset.keys) or next
+
+ r[1].each do |obj|
+ begin
+ case obj.accept_nonblock(exception: false)
+ when :wait_writable # noop, stay
+ when :wait_readable
+ @rset[obj] = @wset.delete(obj)
+ else
+ @wset.delete(obj)
+ accept_nb_ready(obj)
+ end
+ rescue => e
+ @wset.delete(obj).close
+ bg_err(e)
+ end
+ end
+
+ r[0].each do |obj|
+ if obj == @svr
+ accept_nb_start
+ else
+ begin
+ case obj.accept_nonblock(exception: false)
+ when :wait_readable # noop, stay
+ when :wait_writable
+ @wset[obj] = @rset.delete(obj)
+ else
+ @rset.delete(obj)
+ accept_nb_ready(obj)
+ end
+ rescue => e
+ @rset.delete(obj).close
+ bg_err(e)
+ end
+ end
+ end
+ rescue => e
+ bg_err(e)
+ end until @svr.closed?
+ end # Thread.new
+ end
+
+ def accept_nb_init
+ @pid = $$
+ @wset.each_value(&:close).clear
+ @rset.each_value(&:close).clear
+ @ready_pipe.each(&:close).replace(SSLServerReadyPipe.ssls_pipe(@svr))
+ @ready_list = Queue.new
+ @rbuf = ''.b
+ @th = accept_nb_worker
+ end
+
+ def accept_nonblock(exception: true)
+ raise IOError, -'closed' if @svr.closed?
+ accept_nb_init if @pid != $$
+
+ case @ready_pipe[0].read_nonblock(1, @rbuf, exception: false)
+ when :wait_readable
+ return :wait_readable unless exception
+ raise IO::EAGAINWaitReadable, -'Resource temporarily unavailable'
+ when nil
+ raise "BUG: unexpected EOF on ready_pipe"
+ else
+ ret = @ready_list.deq(true)
+ raise ret if Exception === ret
+ ret
+ end
       end

       # See TCPServer#listen for details.
@@ -393,6 +526,10 @@ def accept

       # See IO#close for details.
       def close
+ @th.exit if @th
+ @wset.each_value(&:close).clear
+ @rset.each_value(&:close).clear
+ @ready_pipe.each(&:close).clear
         @svr.close
       end
     end
diff --git a/test/openssl/test_pair.rb b/test/openssl/test_pair.rb
index 9a5205f81c..3ee4515489 100644
--- a/test/openssl/test_pair.rb
+++ b/test/openssl/test_pair.rb
@@ -454,6 +454,39 @@ def test_connect_accept_nonblock
     sock1.close if sock1 && !sock1.closed?
     sock2.close if sock2 && !sock2.closed?
   end
+
+ def test_server_accept_nonblock
+ ssls = server
+ port = ssls.to_io.local_address.ip_port
+ assert_equal :wait_readable, ssls.accept_nonblock(exception: false)
+ assert_raise(IO::WaitReadable) { ssls.accept_nonblock }
+ assert_raise(IO::WaitReadable) { ssls.accept_nonblock(exception: true) }
+ assert_nil IO.select([ssls], nil, nil, 0)
+ assert_equal port, ssls.to_io.local_address.ip_port,
+ 'socket methods still work after background thread started'
+ th = Thread.new { client(port) }
+ exp_rset = [[ssls], , ]
+ assert_equal exp_rset, IO.select([ssls], nil, nil, 10)
+ cl = th.value
+ assert_kind_of OpenSSL::SSL::SSLSocket, cl
+ accepted = ssls.accept_nonblock
+ assert_kind_of OpenSSL::SSL::SSLSocket, accepted
+ accepted.close
+
+ plain = create_tcp_client('127.0.0.1', port)
+ plain.write('writing plain text directly to socket will cause error')
+ assert_equal exp_rset, IO.select([ssls], nil, nil, 10)
+ assert_raise(OpenSSL::SSL::SSLError) { ssls.accept_nonblock }
+
+ assert_equal :wait_readable, ssls.accept_nonblock(exception: false),
+ 'back to normal after exception'
+ ensure
+ accepted&.close
+ th.join if th
+ ssls&.close
+ cl&.close
+ plain&.close
+ end
end

class OpenSSL::TestEOF1 < OpenSSL::TestCase
--
eeew!

Almost... This breaks blocking OpenSSL::SSL::SSLServer#accept
if they're both called by the same SSLServer object.

So I guess OpenSSL::SSL::SSLServer#accept needs to be modified
to do a blocking Queue#deq if it detects the background
thread running.

···

eeew! <e@80x24.org> wrote:

Subject: [PATCH] openssl: implement OpenSSL::SSL::SSLServer#accept_nonblock

Yuck. I guess it works.