Read/write slow, and TCPSocket and sys{read,write}

OK, so I’m throwing things back and forth over the network. Marshalled
objects, to be exact. The problem is, this is slow. And the problem
is read/write, as far as I can tell.

For instance, I can marshal and send 1000 objects, and receive 1000
replies, in a fraction of a second. However, if I send them one-by-one,
it takes minutes! Same code on both ends for the test. What gets sent
is a 4-byte network-order length, followed by the data.

So, at the suggestion from #ruby-lang, I tried sysread/syswrite. Except
TCPSocket doesn’t seem to want to let me use those… I get an IOError
about sysread on a buffered stream. (And no, I’m not using any buffered
functions myself, like #read, #write, #print, etc… only 4 calls, 2
sysreads, 2 syswrites.)

My main problem is the slowness. The sysread/syswrite bit was just a
hopeful solution. Anyone have any ideas?

Thanks.

···


Ryan Pavlik rpav@users.sf.net

“I’ve got a crude stabbing implement right here with
your name all over it.” - 8BT

> For instance, I can marshal and send 1000 objects, and receive 1000 > replies, in a fraction of a second. However, if I send them one-by-one, > it takes minutes! Same code on both ends for the test. What gets sent > is a 4-byte network-order length, followed by the data.

I hate replying to myself, but I should clarify. They are always sent
and handled one-by-one. Send an object, the server handles and object,
and sends the reply.

The slow part is when I send the object, get the reply, send the object,
get the reply, etc. If I send 1000 objects in a row and then read the
1000 replies, it’s fast. But they’re always sent and handled
individually. A little pseudocode to help:

Fast:

(1…1000).each {
connection.send(Ping.new)
}

(1…1000).each {
pong = connection.get
}

Slow:

(1…1000).each {
connection.send(Ping.new)
pong = connection.get
}

I hope that’s clearer. :slight_smile:

···

On Fri, 4 Apr 2003 05:48:53 +0900 Ryan Pavlik rpav@nwlink.com wrote:


Ryan Pavlik rpav@users.sf.net

“I’ve got a crude stabbing implement right here with
your name all over it.” - 8BT

How slow are you getting?

Using DRb (druby) you can typically get around 50 round-trips per second. I
guess you’re getting rather less than that.

If you don’t have druby installed, you might want to get it and have a look
at drb/drb.rb which does exactly what you’re doing - i.e. sends a bunch of
values which are 4-byte length followed by a marshalled object.

Also, I suggest you try setting
.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
at both ends to turn off the Nagle algorithm. Nagling makes the TCP stack
wait for a short period (typically 0.1s) before sending, in case any other
data comes along which can be accumulated in the same packet. If you don’t
have that set, then that will limit you to around 5 round-trips per second.

Regards,

Brian.

···

On Fri, Apr 04, 2003 at 06:06:15AM +0900, Ryan Pavlik wrote:

The slow part is when I send the object, get the reply, send the object,
get the reply, etc. If I send 1000 objects in a row and then read the
1000 replies, it’s fast. But they’re always sent and handled
individually. A little pseudocode to help:

Fast:

(1…1000).each {
connection.send(Ping.new)
}

(1…1000).each {
pong = connection.get
}

Slow:

(1…1000).each {
connection.send(Ping.new)
pong = connection.get
}

I hope that’s clearer. :slight_smile:

[…]

The slow part is when I send the object, get the reply, send the object,
get the reply, etc.

Perhaps it is not the read/write speed that is killing you, but the
latency in the network.

For example, let’s suppose that sending an object can be done really
fast (for example, 0.01 seconds), but getting a response takes a long
time (lets say 1 second to really exagerate the difference).

If I send a messages and wait for a response, it will take 0.01 + 1 +
0.01 seconds, or 1.02 seconds for each message. Sending 100 messages in
this manner will take 102 seconds.

If I send 100 messeges (2 seconds), wait (1 second), and then receive
100 messages (another 2 seconds), the whole thing will take 5 seconds
total.

If latency is really your problem, you might want to consider some kind
of buffering scheme where X number of messages can be sent before a
response is expected. Any book on communication protocols should cover
the basics.

···

On Thu, 2003-04-03 at 16:06, Ryan Pavlik wrote:


– Jim Weirich jweirich@one.net http://w3.one.net/~jweirich

“Beware of bugs in the above code; I have only proved it correct,
not tried it.” – Donald Knuth (in a memo to Peter van Emde Boas)

I suspect you’re being killed by the latency in the TCP
protocol layer. If you write to a socket, most implementations
won’t send your packet immediately, because they guess you
will write more instead of reading next. After say 10ms, they
give up waiting for more and send what you wrote. This is
known as the Nagle algorithm, you should be disable it using
setsockopt TCP_NODELAY (I haven’t done this in Ruby).

Clifford.

I should also probably have said that this was over loopback to localhost,
so that shouldn’t really be a problem, or at least not to this degree. :wink:

Thanks,

···

On Fri, 4 Apr 2003 09:29:57 +0900 Jim Weirich jweirich@one.net wrote:

On Thu, 2003-04-03 at 16:06, Ryan Pavlik wrote:
[…]

The slow part is when I send the object, get the reply, send the object,
get the reply, etc.

Perhaps it is not the read/write speed that is killing you, but the
latency in the network.

Ryan Pavlik rpav@users.sf.net

“I’ve got a crude stabbing implement right here with
your name all over it.” - 8BT

> How slow are you getting?

I haven’t benchmarked it yet, but using the setsockopt you mention
below speeds things up, probably by about 2x. It’s still way slower
than it could/should be.

Using DRb (druby) you can typically get around 50 round-trips per second. I
guess you’re getting rather less than that.

I’ll run a test to see, now, but still, 50/s over localhost is really
unacceptably slow… it should easily get a thousand or two a second,
and this is with plain ruby code. Perhaps I’ll try rewriting the IO
as a C extension, but this seems a bit extreme given the
circumstances.

If you don’t have druby installed, you might want to get it and have a look
at drb/drb.rb which does exactly what you’re doing - i.e. sends a bunch of
values which are 4-byte length followed by a marshalled object.

Also, I suggest you try setting
.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
at both ends to turn off the Nagle algorithm. Nagling makes the TCP stack
wait for a short period (typically 0.1s) before sending, in case any other
data comes along which can be accumulated in the same packet. If you don’t
have that set, then that will limit you to around 5 round-trips per second.

Interesting, and thanks for the tip, it did noticeably improve things.
I won’t be happy til I’m getting an order of magnitude or two more
though. :slight_smile:

Maybe an OS thread that sits and adds to a buffer, or something.
Actually at this time, I’ve optimized the frontend so that it only
does 1 or 2 messages, which is decent, but eventually this will be
used more interactively with more on the front end, so I’d like to see
the bottleneck removed.

···

On Fri, 4 Apr 2003 06:52:58 +0900 Brian Candler B.Candler@pobox.com wrote:

Regards,

Brian.


Ryan Pavlik rpav@users.sf.net

“I’ve got a crude stabbing implement right here with
your name all over it.” - 8BT

That’s 50 complete RPC exchanges doing some non-trivial work.

If you are just timing bytes over a socket, then how about the attached pair
of programs: the client sends a 4 byte string, and the server simply
uppercases it and sends it back. On my P266MMX laptop, this runs at almost
exactly 1000 operations per second.

I think you should break down what you are doing in this way until you find
the bottleneck. One possible problem is if you are doing a lot of

s.write(x)
s.write(y)
s.write(z)

then you could try replacing it with

s.write(x+y+z)

since that will generate one TCP segment instead of three.

Regards,

Brian.

cli.rb (161 Bytes)

serv.rb (176 Bytes)

···

On Fri, Apr 04, 2003 at 11:00:01AM +0900, Ryan Pavlik wrote:

I haven’t benchmarked it yet, but using the setsockopt you mention
below speeds things up, probably by about 2x. It’s still way slower
than it could/should be.

Using DRb (druby) you can typically get around 50 round-trips per second. I
guess you’re getting rather less than that.

I’ll run a test to see, now, but still, 50/s over localhost is really
unacceptably slow… it should easily get a thousand or two a second,