TCP Socket: Connection reset, but Select says it's valid

I'm running a server that sometimes encounters a
connection-reset-by-peer error when reading data from a client. I'm
calling IO.Select to verify that there are characters to be read, but
occasionally the "rescue" below is activated when I do my gets. I
presume this means the socket was closed after Select noticed there were
characters to be read but before I actually try to read them.

The client and server are in the same machine, so there's no cable to
pull out.

The messages of interest are sent every 30 seconds, and there might be 1
or 2 failures in a day. Running on Mac OS X 10.5.6.

The problem seems to occur when the client (below) connects to send a
message and another client connects before the first client's
transaction is complete. This transaction takes about 1 second. The
specified timeout is 120 seconds.

Is there anything obvious wrong with my code? I omitted quite a few
lines, I hope not too much.

Thanks-
Scott

######################### Server (client follows)

···

#########################

require 'gserver'

class ChatServer < GServer # Server class derived from GServer super
class
  def initialize(*args)
    super(*args)
    # Keep a record of the client IDs allocated
    # and the lines of chat
    @@client_id = 0
    @@chat = []
  end

  def serve(io) # Serve method handles connections
    # Increment the client ID so each client gets a unique ID
    @@client_id += 1
    my_client_id = @@client_id
    my_position = @@chat.size

    loop do
      #Every n seconds check for data
      n = 0.5

      selection = IO.select([io], nil, [io], n)
      # If some event occurred, retrieve the data and process it...
      if selection && selection[0] then
        # There was a read event
        begin
          line = io.gets
        rescue Exception => e
          stat = ''
          puts "\nerror reading 'line' from #{io}"
          puts "#{ e } (#{ e.class })!"
          # Close socket
          io.close
          print("\nSelection is #{selection[0]}\n")
        end
        if selection[2].size > 0 then
          puts "%%%%%%%% Select error array is #{selection[2]}" +
Time.now.to_s
        end
.
.
.
.
  end

# Use port 50000 if none supplied
portnum = ARGV[0] || 50000
max_connections = 100000
server = ChatServer.new(portnum, $my_ip, max_connections, $stdout, true)

server.start # Start the server

server.join

######################### client #########################

#!/usr/bin/env ruby

require "socket"
require 'timeout'

# TCP client. This script sends a message to the server and optionally
# gets a message back.
# If the message contains "check", server relays msg to other client,
# which then
# will send back "ok" if it is able to.
#
# Added 'quit' command when finished, so that the server disconnects
# and kills the thread we were on. This avoids max connections problem.

if ARGV.size < 3 then
  wait_time = 10
else
  wait_time = ARGV[2].to_i
end

DEFAULT_PORT = 50000

my_ip =
Socket::getaddrinfo(Socket.gethostname,"echo",Socket::AF_INET)[0][3]

if ARGV.size > 2
  PORT = ARGV[0]
  message = ARGV[1]
else
  PORT = DEFAULT_PORT
  message = "Empty message"
  puts 'usage: ruby IX_chat_client.rb PORTNUM "message" timeout_sec'
  exit
end

chat_client = TCPSocket.new(my_ip, PORT)

sleep 1

# Send message to server
chat_client.puts message

  # Don't take longer than wait_time seconds to get a response
begin
  Timeout::timeout(wait_time) do
          reply_line = chat_client.gets
          puts reply_line
  end
rescue Timeout::Error
  puts "Timed out."
  chat_client.puts "quit"
  exit
else
  chat_client.puts "quit"
  exit
ensure
chat_client.close
end
--
Posted via http://www.ruby-forum.com/.

Scott Cole wrote:

I'm running a server that sometimes encounters a
connection-reset-by-peer error when reading data from a client. I'm
calling IO.Select to verify that there are characters to be read, but
occasionally the "rescue" below is activated when I do my gets.

Your code prints the exception class and message, what do you see?
I

presume this means the socket was closed after Select noticed there were
characters to be read but before I actually try to read them.

select returns true for an EOF-condition too. From `man 2 select`:

       Three independent sets of file descriptors are watched. Those
listed
       in readfds will be watched to see if characters become
available for
       reading (more precisely, to see if a read will not block; in
particu‐
       lar, a file descriptor is also ready on end-of-file)

···

--
Posted via http://www.ruby-forum.com/\.

Thanks for responding. I would expect to get some kind of EOF error on
an EOF condition. Instead I see something like

error reading 'line' from #<TCPSocket:0x40e3f4>
Connection reset by peer (Errno::ECONNRESET)!

Selection is #<TCPSocket:0x40e3f4>

Brian Candler wrote:

···

Scott Cole wrote:

I'm running a server that sometimes encounters a
connection-reset-by-peer error when reading data from a client. I'm
calling IO.Select to verify that there are characters to be read, but
occasionally the "rescue" below is activated when I do my gets.

Your code prints the exception class and message, what do you see?
I

presume this means the socket was closed after Select noticed there were
characters to be read but before I actually try to read them.

select returns true for an EOF-condition too. From `man 2 select`:

       Three independent sets of file descriptors are watched. Those
listed
       in readfds will be watched to see if characters become
available for
       reading (more precisely, to see if a read will not block; in
particu‐
       lar, a file descriptor is also ready on end-of-file)

--
Posted via http://www.ruby-forum.com/\.

Scott Cole wrote:

Thanks for responding. I would expect to get some kind of EOF error on
an EOF condition. Instead I see something like

error reading 'line' from #<TCPSocket:0x40e3f4>
Connection reset by peer (Errno::ECONNRESET)!

Yep, it's not a simple EOF - the far end has sent a TCP RST instead of a
FIN - but AFAIK the socket is marked 'readable' for select.

More info at:
http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2007-08/msg00176.html

Note also that this is does not mean that the socket is marked
selectable for 'error' - selection[2]. In fact, you're almost certainly
never going to find the socket marked that way. See
http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2007-11/msg00187.html

···

--
Posted via http://www.ruby-forum.com/\.