Bug in socket code - high number of simul. conns causes segfault

Hi all,
after spending quite long time figuring out how to write program which
will serve about 500-1500 connections simultaneously I found few confusing
issues …
No matter how hard I try I’m still getting segfault/deadlock exception
with ~ 510 connections. I tried to use code from FAQ - threaded, nonthreaded,
I even wrote a ‘limited number of sockets per thread’ model … everything
just crashes down (no matter what kernel/ruby/glibc I’m using). :frowning:

Because I don’t want to waste bandwidth, I put all my results here:

http://www.fi.muni.cz/~xsafran3/ruby-is-buggy.tgz

nothr.tsvr.rb – no threads
mytsvr.rb – limited # of sockets per thread model
thr.tsvr.rb – one thread per socket
tcli.rb – test client

Is there anybody (more experienced) who solved this problem?

Please, speak up :slight_smile:

Thanks in advance,
Wejn

···


Wejn <lists+rubytalk(at)box.cz>
(svamberk.net’s Linux section, fi.muni.cz student, linuxfan)

    Bored?  Want hours of entertainment?         <<<
      Just set the initdefault to 6!             <<<

With the default ulimits, I get:

nothr.tsbr.rb:
The server keeps going, but the client quits with Erno::EBADF in
’open’. If I disable the sleeps, I see a hang at 480 or so
connections.

mytsvr.rb:
The server gets EBADF in ‘accept’, and the client gets EBADF in
’open’. If I disable the sleeps, I see no hangs.

thr.tsvr.rb:rb
The server gets EBADF in ‘accept’, and the client gets EBADF in
’open’. If I disable the sleeps, I see a hang every 80 connections
or so.

However, I can increase the number of fds available by running as root
and using ulimit -n 2000. In this case:

nothr.tsbr.rb:
The client segfaults after 522 connections.

mytsvr.rb:
The client segfaults after 558 connections.

thr.tsvr.rb:
The client gets Errno::ALREADY after 518 connections. The server
gets a fatal deadlock exception immediately following. If I disable
the sleeps, I see a hang at 240 and 480 connections.

One of my coworkers pointed out to me the other day the following
comment in /usr/include/linux/posix_types.h:

  • This allows for 1024 file descriptors: if NR_OPEN is ever grown
  • beyond that you’ll have to change this too. But 1024 fd’s seem to be
  • enough even for such “real” unices like OSF/1, so hopefully this is
  • one limit that doesn’t have to be changed [again].

Since Ruby uses user-level threads and calls select() for you with all
of the file descriptors that all of the threads are waiting on, there’s
not really any way you can get more than FD_SETSIZE file descriptors
with Ruby the way it is, no matter how many threads you use. It’s still
odd that the segfault occurs after 500 connections; it should happen
after 1000. It’s almost as if Ruby is using 2 file descriptors per
connection, though I’m not sure why it would do that.

One solution, if you really need this many file descriptors, is to
increase FD_SETSIZE. An alternative might be to modify the interpreter
to use poll() or /dev/poll instead of select(), on systems where they
are available. Both of these are capable of handling more than 1000
file descriptors.

Paul

Hi,

Since Ruby uses user-level threads and calls select() for you with all
of the file descriptors that all of the threads are waiting on, there’s
not really any way you can get more than FD_SETSIZE file descriptors
with Ruby the way it is, no matter how many threads you use. It’s still
odd that the segfault occurs after 500 connections; it should happen
after 1000. It’s almost as if Ruby is using 2 file descriptors per
connection, though I’m not sure why it would do that.

Socket comsumes two file descriptors per connection, to achieve
bi-directionl “stdio” connection.

One solution, if you really need this many file descriptors, is to
increase FD_SETSIZE. An alternative might be to modify the interpreter
to use poll() or /dev/poll instead of select(), on systems where they
are available. Both of these are capable of handling more than 1000
file descriptors.

Interesting idea.

						matz.
···

In message “Re: Bug in socket code - high number of simul. conns causes segfault” on 02/07/11, Paul Brannan pbrannan@atdesk.com writes:

Hi,

It’s still odd that the segfault occurs after 500 connections; it should
happen after 1000.

uhh … correct me if I’m wrong … but there should be NO segfault at all …

W.

···


Wejn <lists+rubytalk(at)box.cz>
(svamberk.net’s Linux section, fi.muni.cz student, linuxfan)

    Bored?  Want hours of entertainment?         <<<
      Just set the initdefault to 6!             <<<

Perhaps Ruby should check if max > FD_SETSIZE before calling select().
I’m not an expert in this area, so I’m not sure if this is the correct
fix().

Paul

···

On Thu, Jul 11, 2002 at 04:23:32PM +0900, Wejn wrote:

uhh … correct me if I’m wrong … but there should be NO segfault at all …