TCP Sockets

Hi there,

another probably short question:

How can I tell how many bytes can be read from an IO object without
blocking?

Or how can I avoid blocking otherwise?

thanks!
Dominik

Hi,

···

At Fri, 9 May 2003 19:40:13 +0900, Dominik Werder wrote:

How can I tell how many bytes can be read from an IO object without
blocking?

IO#ready? of io/wait library in rough returns it.

cvs -d:pserver:anonymous@cvs.ruby-lang.org:/src co rough/ext/io
or
http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/rough/ext/io/io.zip?tarball=1


Nobu Nakada

It’s not possible to do exactly that, but you can set the socket to
non-blocking IO. Then, when you call socket.read and socket.write, they
won’t block, and will return how much they actually read and wrote.
You’ll need to use select so you only write when there’s space,
otherwise you’d have a loop.

You probably want to use threads instead. Internally, all the IO seems
to be asynchronous, but using non-blocking IO may not work well, and I
haven’t found out how to do it on Windows. What are you trying to do?

···

On Fri, 2003-05-09 at 05:40, Dominik Werder wrote:

Hi there,

another probably short question:

How can I tell how many bytes can be read from an IO object without
blocking?

Or how can I avoid blocking otherwise?

thanks!
Dominik


Tom Felker

The question of whether computers can think is just like the question
of whether submarines can swim.
– Edsger W. Dijkstra

How can I tell how many bytes can be read from an IO object without
blocking?
It’s not possible to do exactly that, but you can set the socket to
non-blocking IO. Then, when you call socket.read and socket.write, they
won’t block, and will return how much they actually read and wrote.
You’ll need to use select so you only write when there’s space,
otherwise you’d have a loop.

You probably want to use threads instead. Internally, all the IO seems
to be asynchronous, but using non-blocking IO may not work well, and I
haven’t found out how to do it on Windows. What are you trying to do?

Thanks! Perhaps it would be cool for my better understanding to know why
actually the IO operations are blocking??

I tried to make a simple debugging tool to simulate a simple HTTP request
and dump all to my screen. I thought it would be a twoliner but then the
blocking issues appeared… :((

thanks!
Dominik

Dominik Werder wrote:

How can I tell how many bytes can be read from an IO object without
blocking?

IO#ready? of io/wait library in rough returns it.

cvs -d:pserver:anonymous@cvs.ruby-lang.org:/src co rough/ext/io
or
http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/rough/ext/io/io.zip?tarball=1

thank you for the hint!

Unfortunately this seems to be very buggy. Sometimes it works, but most of
the times the function returns nil although there is data available
(readable without blocking).

Question to all:
Why is something like this not standard and thouroughly tested? Are threads
the only practical solution of this problem?

thanks!
Dominik

Sending a request is easy enough. Dealing with the response, you can:

  • read the headers one line at a time (e.g. using gets)
  • end of headers is marked by a blank line
  • you then have a Content-Length: header which tells you how many
    bytes to read; or if chunking has been requested, each chunk is preceded
    by a length value. Or for ancient HTTP/0.9 I think you just read until
    the TCP connection closes.

This means you need to understand some of the internal operation of HTTP to
work out how big each read() needs to be. Without using these explicit
mechanisms you can’t know exactly when the end of the transaction is:
remember the server response could be split over many TCP segments, and the
segment boundaries are not visible to the application layer anyway. You
could use timeouts (e.g. no data received for 1 second = end of transaction)
but that won’t always work because there is no requirement for the HTTP
server to send all or part of its response within a particular time -
consider a slow CGI script which takes 30 seconds to complete, for example.

Regards,

Brian.

···

On Fri, May 16, 2003 at 05:14:17PM +0900, Dominik Werder wrote:

I tried to make a simple debugging tool to simulate a simple HTTP request
and dump all to my screen. I thought it would be a twoliner but then the
blocking issues appeared… :((

Hi,

Unfortunately this seems to be very buggy. Sometimes it works, but
most of the times the function returns nil although there is data
available (readable without blocking).

I admit it’s not been tested enough, do you have test script or
something?

Question to all:
Why is something like this not standard and thouroughly tested? Are
threads the only practical solution of this problem?

One reason why it is not standard was Ruby had threads as you
mentioned, another was it’s not available everywhere. And it
is in ROUGH in order to be tested before included in standard.

···

At Fri, 16 May 2003 18:51:48 +0900, Dominik Werder wrote:


Nobu Nakada

my problem is not the http protocol itself (not at this time :slight_smile: but the IO-
blocking.
If I try to read all the lines the server must have send to me, but the
data is not yet available to the socket, the process blocks.
I wondered if it’s possible to solve this without threads, for example with
the knowledge about the amount of bytes that can be read without blocking…

thanks!
Dominik

Unfortunately this seems to be very buggy. Sometimes it works, but
most of the times the function returns nil although there is data
available (readable without blocking).
I admit it’s not been tested enough, do you have test script or
something?
I have to appologize for my hastily judgment: It was not ruby but the
loaded server what caused the problem :frowning:
Sorry.
It seems to be working well, and I like it :slight_smile:

One reason why it is not standard was Ruby had threads as you
mentioned, another was it’s not available everywhere. And it
is in ROUGH in order to be tested before included in standard.
Yes, this is a good reason.

bye!
Dominik

Maybe, but threads are really the “ruby way” to solve this problem.

In other words: if you care that the process blocks with a half-received
HTTP request, because you want to be doing something else at the same time,
then the ‘something else’ can be done in another thread.

Threads in Ruby are really just a convenient abstraction over select(). If a
thread is about to do an operation which would block, it adds its fd to
those waiting for data, and hands over control to another thread. When its
fd becomes ready again, at a convenient time it will get control handed back
to it.

Regards,

Brian.

···

On Fri, May 16, 2003 at 07:20:30PM +0900, Dominik Werder wrote:

my problem is not the http protocol itself (not at this time :slight_smile: but the IO-
blocking.
If I try to read all the lines the server must have send to me, but the
data is not yet available to the socket, the process blocks.
I wondered if it’s possible to solve this without threads, for example with
the knowledge about the amount of bytes that can be read without blocking…

Hi,

my problem is not the http protocol itself (not at this time :slight_smile: but the IO-
blocking.
If I try to read all the lines the server must have send to me, but the
data is not yet available to the socket, the process blocks.
I wondered if it’s possible to solve this without threads, for example with
the knowledge about the amount of bytes that can be read without blocking…

Sorry if this has already been covered, but, I’ve used select()
successfully to test whether data is available at a socket.
(I wrote a non-blocking multi-client server without threads this
way before learning how well and easily Ruby threads handle
socket I/O.)

Was select() not appropriate to your situation?

For what it’s worth, here’s a code example that’s working for me

read any available chars and append them to our local buffer

def terminal_buffered_read
begin
if Kernel.select([self], nil, nil, 0)
dat = self.recv(65536)
if !dat || dat.empty?
@term_eof = true
else
@term_read_buf << dat
end
end
rescue IOError, SystemCallError
@term_eof = true
end
end

(The above was part of a module whose methods were added to a
socket IO object via Object#extend.)

HTH,

Bill

···

From: “Dominik Werder” dwerder@gmx.net
from my non-threaded app:

Hi,

···

At Fri, 16 May 2003 19:43:07 +0900, Dominik Werder wrote:

Unfortunately this seems to be very buggy. Sometimes it works, but
most of the times the function returns nil although there is data
available (readable without blocking).
I admit it’s not been tested enough, do you have test script or
something?
I have to appologize for my hastily judgment: It was not ruby but the
loaded server what caused the problem :frowning:

There it is to be tested, so you don’t have to appologize about
it at all.

Rather, I did like to see the test case, even it succeeds or
failes.


Nobu Nakada

Maybe, but threads are really the “ruby way” to solve this problem.

In other words: if you care that the process blocks with a half-received
HTTP request, because you want to be doing something else at the same
time,
then the ‘something else’ can be done in another thread.

Threads in Ruby are really just a convenient abstraction over select().
If a
thread is about to do an operation which would block, it adds its fd to
those waiting for data, and hands over control to another thread. When
its
fd becomes ready again, at a convenient time it will get control handed
back
to it.

But what should I gonna do if I have to read binary data from one or more
connections and want to line them up in packets to send them over a single
connection to another host. In this case I can’t wait for a newline or EOF
cause the binary data may not contain CR/LF. Or if the sending server hangs
the connection is established but theres no data flowing… and so my
connection hangs too (forever?)

I agree that threads are a clean solution in networking. But it seems to me
that I have to able in some cases to know how much data I can read. Maybe
I’m mistaken.

How do for example P2P clients do that? One thread per connection?

thanks!
Dominik

Thanks for posting,

I guess this should be working for me.
I’m only very confused why something like this is so hard to do.
But it’s probably because everybody uses threads…

bye!
Dominik

···

Sorry if this has already been covered, but, I’ve used select()
successfully to test whether data is available at a socket.
(I wrote a non-blocking multi-client server without threads this
way before learning how well and easily Ruby threads handle
socket I/O.)

Was select() not appropriate to your situation?

For what it’s worth, here’s a code example that’s working for me
from my non-threaded app:

read any available chars and append them to our local buffer

def terminal_buffered_read
begin if Kernel.select([self], nil, nil, 0)
dat = self.recv(65536)
if !dat || dat.empty?
@term_eof = true
else
@term_read_buf << dat
end
end
rescue IOError, SystemCallError
@term_eof = true
end
end

(The above was part of a module whose methods were added to a
socket IO object via Object#extend.)

I did the following and it works great:

— SNIP —

require 'socket’
require ‘io/wait’

f=File.new ‘< — enter your filename here — >’, ‘w’

req="GET “<<ARGV[0]<<” HTTP/1.1\r\n"
t=<<EOF
Host: www.myhost.com # enter your hostname here
Accept-Encoding: gzip

EOF

req<<t

s=TCPSocket.new(‘localhost’, 80)
s.send req, 0
run=true
while run
sleep 0.5
rec=s.ready?
if rec.kind_of? Numeric
t = s.read rec
f.print t
else
run=false
end
end
f.close
print “Exit\n\n\n”

But what should I gonna do if I have to read binary data from one or more
connections and want to line them up in packets to send them over a single
connection to another host. In this case I can’t wait for a newline or EOF
cause the binary data may not contain CR/LF.

That would mean mixing the binary streams in a non-deterministic way, which
doesn’t sound very useful to me. Half a message from stream A followed by a
quarter message from stream B, followed by…

I think you need to understand the message boundaries within a stream for
this sort of multiplexing to be useful.

Or if the sending server hangs
the connection is established but theres no data flowing… and so my
connection hangs too (forever?)

That’s fine, your thread can hang as long as you like, and you can protect
it with a timeout: try

 require 'timeout'
 timeout(5) do
   sleep 10
 end

How do for example P2P clients do that? One thread per connection?

I don’t know what you mean by ‘P2P’, but a server talking to multiple
clients would indeed normally have a separate thread per client. There’s an
example of a fake pop3 server at
http://www.rubygarden.org/ruby?SingletonTutorial
(at the end of section 2)

Regards,

Brian.

···

On Fri, May 16, 2003 at 07:53:39PM +0900, Dominik Werder wrote:

That would mean mixing the binary streams in a non-deterministic way,
which
I think you need to understand the message boundaries within a stream for
this sort of multiplexing to be useful.
Of course I’ll take some data from Stream “1” and create a packet like
this: “Here we go with xx bytes from stream 1” and send this through my
"tunnel"-connection.
The problem: My program does not know about the underlaying protocol of the
stream. So if I receive 99 bytes from that stream, and these 99 bytes are a
request of any protocol and my task is to only forward this packet through
the tunnel, then I can hardly wait for, say, 100 bytes are available
because I did a read(100) and my process blocks until 100 bytes are
available :slight_smile:
Sorry for my english, was not the best at school :slight_smile:

That’s fine, your thread can hang as long as you like, and you can
protect
it with a timeout: try

require 'timeout’
timeout(5) do
sleep 10
end
Thanks, I’ll dig into this :slight_smile:

How do for example P2P clients do that? One thread per connection?
I don’t know what you mean by ‘P2P’, but a server talking to multiple
clients would indeed normally have a separate thread per client. There’s
an
example of a fake pop3 server at
http://www.rubygarden.org/ruby?SingletonTutorial
(at the end of section 2)
I mean peer-to-peer, like filesharing clients. They got many concurrently
connections…

thanks!
Dominik

It is my understanding that 1 thread/connection is considered to be less
scalable than a M:N model, with M threads taking caring of N sockets
each. That applies when using native threads, so I don’t know to which
extent it can be mapped to Ruby’s green ones.

···

On Fri, May 16, 2003 at 08:45:17PM +0900, Brian Candler wrote:

How do for example P2P clients do that? One thread per connection?

I don’t know what you mean by ‘P2P’, but a server talking to multiple
clients would indeed normally have a separate thread per client. There’s an
example of a fake pop3 server at
http://www.rubygarden.org/ruby?SingletonTutorial
(at the end of section 2)


_ _

__ __ | | ___ _ __ ___ __ _ _ __
’_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

MS-DOS, you can’t live with it, you can live without it.
– from Lars Wirzenius’ .sig

That would mean mixing the binary streams in a non-deterministic way,
which
I think you need to understand the message boundaries within a stream for
this sort of multiplexing to be useful.
Of course I’ll take some data from Stream “1” and create a packet like
this: “Here we go with xx bytes from stream 1” and send this through my
"tunnel"-connection.
The problem: My program does not know about the underlaying protocol of the
stream. So if I receive 99 bytes from that stream, and these 99 bytes are a
request of any protocol and my task is to only forward this packet through
the tunnel, then I can hardly wait for, say, 100 bytes are available
because I did a read(100) and my process blocks until 100 bytes are
available :slight_smile:

Sure, using the method that Nobu proposes you might be able to tell that
there are exactly 99 bytes sitting in the socket buffer right now.

But how do you know that 99 bytes is a whole request? Maybe the whole
request is 138 bytes, but because the message is sent in TCP chunks, the
first read() returned 99 bytes, and the next read later returns 39 bytes. Or
maybe it’s two requests of 69 bytes each; the first read() returns the whole
first request plus 30 bytes of the second request, and the the second read()
returns the remaining 39 bytes.

How do for example P2P clients do that? One thread per connection?
I don’t know what you mean by ‘P2P’, but a server talking to multiple
clients would indeed normally have a separate thread per client. There’s
an
example of a fake pop3 server at
http://www.rubygarden.org/ruby?SingletonTutorial
(at the end of section 2)
I mean peer-to-peer, like filesharing clients. They got many concurrently
connections…

They’ll have a separate TCP connection for each peer, and process messages
from each stream independently. Since you are parsing all the incoming
messages, you know where the message boundaries are.

Ruby threads do this easily; or you can simulate it using select(); or you
can fork off a separate process for each peer.

Regards,

Brian.

···

On Fri, May 16, 2003 at 10:26:59PM +0900, Dominik Werder wrote:

Sure, using the method that Nobu proposes you might be able to tell that
there are exactly 99 bytes sitting in the socket buffer right now.

But how do you know that 99 bytes is a whole request? Maybe the whole
request is 138 bytes, but because the message is sent in TCP chunks, the
first read() returned 99 bytes, and the next read later returns 39 bytes.
Or
maybe it’s two requests of 69 bytes each; the first read() returns the
whole
first request plus 30 bytes of the second request, and the the second
read()
returns the remaining 39 bytes.
This is not the problem: If the request is not complete in one packet, the
rest comes with the following packet.
But if the request does not come at all cause 99 bytes are not enough, then
I got a problem.
Think about my software as an TCP/IP router. The TCP/IP router doesn’t need
to have knowledge of the protocol either.

They’ll have a separate TCP connection for each peer, and process
messages
from each stream independently. Since you are parsing all the incoming
messages, you know where the message boundaries are.
Does they also have one thread per TCP connection?

thanks!
Dominik