Net/http performance

(notes: I posted this to comp.lang.ruby, figuring it would filter
through to appropriate mailing lists, but it did not).

It seems that net/http's implementation is extremely inefficient when
it comes to dealing with large files.

I think this is something worth fixing in subsequent versions. It
shouldn't be as bad as it is. I would also appreciate any hints or
advice on working around the problem.

Specifically, I am interested in HTTP GETs (from net/http) and HTTP PUTs
(both on the net/http side and WEBrick receiving side) that have
adequate streaming performance. I would like to GET and PUT fairly large
files, and don't want to pay such a large network and CPU performance
overhead.

Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.

"Host: localhost, port: 12000, request_uri: /ten-meg.bin"
                user system total real
TCPSocket 0.030000 0.150000 0.180000 ( 0.468867)
net/http 10.620000 8.630000 19.250000 ( 21.787785)
LB net/http 10.870000 8.900000 19.770000 ( 22.259448)
open-uri 16.400000 11.900000 28.300000 ( 39.834555)

As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I'm using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We're talking 20MB/s for TCPSocket versus 400KB/s for net/http.

What's happening here? What can I do to fix it?

Any help appreciated.

Regards,

Luke.

#!/usr/bin/ruby

require 'net/http'
require 'open-uri'
require 'benchmark'
require 'WEBrick'
include WEBrick

uri = URI.parse("http://localhost:12000/ten-meg.bin")
sourceFolder = "/tmp/"

Kernel.system("dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240")

port = 12000
server = HTTPServer.new(:Port => port, :DocumentRoot => sourceFolder)
# trap the signal for shutdown
trap("INT"){ server.shutdown }
pid = Kernel.fork {
  $stdout.reopen('/tmp/WEBrick.stdout')
  $stderr.reopen('/tmp/WEBrick.stderr')
  server.start

}

at_exit { Process.kill("INT", pid) }

Kernel.sleep 1

p "Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}"

Benchmark.bm(10) do |time|
  out = File.new("/tmp/tcp.tar.bz2", "w")
  time.report("TCPSocket") do
    s = TCPSocket.open uri.host, uri.port
    s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n"
    temp = s.read.split("\r\n\r\n", 2).last
    s.close
    out.write(temp)
  end
  out.close

  out = File.new("/tmp/net.tar.bz2", "w")
  time.report("net/http") do
    Net::HTTP.start uri.host, uri.port do |http|
      http.request_get(uri.request_uri) do |response|
        response.read_body do |segment|
          out.write(segment)
        end
      end
    end
  end
  out.close

  out = File.new("/tmp/luke.out", "w")
  time.report("LB net/http") do
    http = Net::HTTP.new(uri.host, uri.port)
    http.request_get(uri.path) { |response|
      response.read_body { |segment|
        out.write(segment)
      }
    }
  end
  out.close

  out = File.new("/tmp/uri.tar.bz2", "w")
  time.report("open-uri") do
    uri.open do |x|
      out.write(x.read)
    end
  end
  out.close
end

···

--
Posted via http://www.ruby-forum.com/.

Why don't you swap out WEBrick for Mongrel and run the same tests?

I suspect that the server is the bottleneck, not the client.

Gary Wright

···

On Jul 15, 2006, at 3:44 AM, Luke Burton wrote:

Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.

Better yet, don't use a Ruby web server at all, and use another tool you
trust (httperf or ab and curl) to determine a good baseline performance.
Once you've got what *could* be done with net/http then you can run
net/http and compare.

Also, I'm working on a faster alternative to net/http in the RFuzz http
client. Stay tuned for that, but you can play with it right now:

  http://www.zedshaw.com/projects/rfuzz/

···

On Sat, 2006-07-15 at 23:37 +0900, gwtmp01@mac.com wrote:

On Jul 15, 2006, at 3:44 AM, Luke Burton wrote:
> Below I have attached a test suite that illustrates the problem. I
> used
> WEBrick as the server.

Why don't you swap out WEBrick for Mongrel and run the same tests?

I suspect that the server is the bottleneck, not the client.

--
Zed A. Shaw

http://mongrel.rubyforge.org/
http://www.railsmachine.com/ -- Need Mongrel support?

Gary Wright wrote:

I suspect that the server is the bottleneck, not the client.

Zed Shaw wrote:

Better yet, don't use a Ruby web server at all, and use another tool you
trust (httperf or ab and curl) to determine a good baseline performance.
Once you've got what *could* be done with net/http then you can run
net/http and compare.

Hi Zed & Garry,

I neglected to mention in my post that I have already double checked
that WEBrick is not the culprit. Fetching from WEBrick using curl is as
fast as using TCPSocket:

$ time curl -O http://localhost:12000/ten-meg.bin
  % Total % Received % Xferd Average Speed Time Time Time
Current
                                 Dload Upload Total Spent Left
Speed
100 10.0M 100 10.0M 0 0 21.9M 0 --:--:-- --:--:-- --:--:--
29.4M

real 0m0.466s
user 0m0.013s
sys 0m0.093s

So I *know* Ruby can shift the bits around fast enough, it's just that
net/http isn't playing the game.

Also, I'm working on a faster alternative to net/http in the RFuzz http
client. Stay tuned for that, but you can play with it right now:

I'd definitely like to check that out!

I just think that net/http's abysmal speed is somewhat anomalous. I am
confident there is a simple explanation - maybe some tight loop in there
doing something not particularly clever - but I haven't had the time as
yet to dive right into net/http and find the reason. And I haven't had
much success with Ruby profilers either. More suggestions welcome here!

I have gone ahead and changed the critical section of my code to use
TCPSocket instead. That solved the HTTP GET problem, but I still
struggle with PUTs:

file = File.open(resultFile, "r")
http = Net::HTTP.new(@uri.host, @uri.port)
http.put("/put/" + URI.escape(File.basename(resultFile)), file.read)

Now that's not real pleasant because it relies on snarfing the whole
file into memory first. I would have liked to do something like:

http.put("/put/" + URI.escape(File.basename(resultFile))) do

datasocket>

    while file.eof? == false
        datasocket.write(file.read(4096))
    end
end

This is similar to what net/http offers in the case of HTTP GET, but of
course it's broken because of the aforementioned speed concerns:

http.request_get("/#{file}") { |response|
    response.read_body { |segment|
        # in here, 400 KB/s max and > 70% CPU utilisation ...
        outputFile.write(segment)
   }
}

I still call myself a Ruby newbie, so one of my concerns is that perhaps
I'm Just Not Getting It and that if I followed the Ruby Way my troubles
would vanish :slight_smile:

···

--
Posted via http://www.ruby-forum.com/\.

Luke Burton wrote:

I just think that net/http's abysmal speed is somewhat anomalous. I am confident there is a simple explanation - maybe some tight loop in there doing something not particularly clever - but I haven't had the time as yet to dive right into net/http and find the reason. And I haven't had much success with Ruby profilers either. More suggestions welcome here!

I think you're right. In the GET case (for both open-uri and net/http) I have found a possibility to speed up the download. The method BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of 1024. I changed this to 8192:

    def rbuf_fill
      timeout(@read_timeout) {
        @rbuf << @io.sysread(8192)
      }
    end

This lead to almost the same speed I could achieve with the use of direct sockets.

I have gone ahead and changed the critical section of my code to use TCPSocket instead. That solved the HTTP GET problem, but I still struggle with PUTs:

file = File.open(resultFile, "r")
http = Net::HTTP.new(@uri.host, @uri.port)
http.put("/put/" + URI.escape(File.basename(resultFile)), file.read)

Now that's not real pleasant because it relies on snarfing the whole file into memory first. I would have liked to do something like:

http.put("/put/" + URI.escape(File.basename(resultFile))) do

datasocket>

   while file.eof? == false
       datasocket.write(file.read(4096))
   end
end

You can do this by setting Put#body_stream= to your IO object. The only problem is, that the related method also sets a very small buffer size:

    def send_request_with_body_stream(sock, ver, path, f)
      raise ArgumentError, "Content-Length not given and Transfer-Encoding is not `chunked'" unless content_length() or chunked?
      unless content_type()
        warn 'net/http: warning: Content-Type did not set; using application/x-www-form-urlencoded' if $VERBOSE
        set_content_type 'application/x-www-form-urlencoded'
      end
      write_header sock, ver, path
      if chunked?
        while s = f.read(1024)
          sock.write(sprintf("%x\r\n", s.length) << s << "\r\n")
        end
        sock.write "0\r\n\r\n"
      else
        while s = f.read(1024)
          sock.write s
        end
      end
    end

I think this is rather unfortunate. It would be better, if those methods would use higher buffer values and/or make them tweakable if necessary.

In article <44B9A127.3020801@nixe.ping.de>,
  "Florian Frank" <flori@nixe.ping.de> writes:

I think you're right. In the GET case (for both open-uri and net/http) I
have found a possibility to speed up the download. The method
BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of
1024. I changed this to 8192:

    def rbuf_fill
      timeout(@read_timeout) {
        @rbuf << @io.sysread(8192)
      }
    end

I guess the timeout() is slow.

Try:

    def rbuf_fill
      @rbuf << @io.sysread(1024)
    end

However, the above is not acceptable in general since
timeout is a feature.

It is possible to implement timeout without timeout() as:

    def rbuf_fill
      begin
        @rbuf << @io.read_nonblock(4096)
      rescue Errno::EWOULDBLOCK
        if IO.select([@io], nil, nil, @read_timeout)
          @rbuf << @io.read_nonblock(4096)
        else
          raise Timeout::TimeoutError
        end
      end
    end

···

--
Tanaka Akira

Tanaka Akira wrote:

I guess the timeout() is slow.

Try:

    def rbuf_fill
      @rbuf << @io.sysread(1024)
    end
  

The main problem seems to be, that the read* methods of BufferedIO all suffer from the small value in rbuf_fill. If I download a big file it's very likely, that TCP packets are bigger than 1024 bytes (depending on my network infrastructure). For every received packet I have to call lots of Ruby methods to handle it. At the same time this renders operation system buffers, that are usually higher than 1024 bytes useless. This is a big overhead per packet, which reduces the maximum bandwidth, that can be achieved.

However, the above is not acceptable in general since
timeout is a feature.

It is possible to implement timeout without timeout() as:

    def rbuf_fill
      begin
        @rbuf << @io.read_nonblock(4096)
      rescue Errno::EWOULDBLOCK
        if IO.select([@io], nil, nil, @read_timeout)
          @rbuf << @io.read_nonblock(4096)
        else
          raise Timeout::TimeoutError
        end
      end
    end
  

This would be even faster I think, because timeout also represents a lot of overhead. For 1.8. those methods would have to be backported then. Hint, hint... :wink:

···

--
Florian Frank

RUBY_VERSION # => "1.8.5"
RUBY_RELEASE_DATE # => "2006-06-24"
IO.instance_methods.grep(/nonblock/) # => ["read_nonblock", "write_nonblock"]

(Yes, I must remove them from my 1.8 vs. 1.9 changelog summary)

···

On Sun, Jul 16, 2006 at 09:17:35PM +0900, Florian Frank wrote:

Tanaka Akira wrote:
>It is possible to implement timeout without timeout() as:
>
> def rbuf_fill
> begin
> @rbuf << @io.read_nonblock(4096)
> rescue Errno::EWOULDBLOCK
> if IO.select([@io], nil, nil, @read_timeout)
> @rbuf << @io.read_nonblock(4096)
> else
> raise Timeout::TimeoutError
> end
> end
> end
>
This would be even faster I think, because timeout also represents a lot
of overhead. For 1.8. those methods would have to be backported then.
Hint, hint... :wink:

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby

Hi all,

Thanks to the many thoughtful suggestions here, I have implemented an
easy workaround to this problem that doesn't involve giving up the
net/http library completely.

If you go back to my original benchmark testing code in the original
post, I have made the following changes. Basically I override the
necessary methods to tweak the buffer size:

class OverrideInternetMessageIO < Net::InternetMessageIO
  def rbuf_fill
    timeout(@read_timeout) {
      @rbuf << @socket.sysread(65536)
    }
  end
end

class NewHTTP < Net::HTTP
  def NewHTTP.socket_type
     OverrideInternetMessageIO
  end
end

Benchmark.bm(10) do |time|
  out = File.new("/tmp/net.tar.bz2", "w")
  time.report("net/http - bigbuffer") do
    NewHTTP.start uri.host, uri.port do |http|
      http.request_get(uri.request_uri) do |response|
        response.read_body do |segment|
          out.write(segment)
        end
      end
    end
  end
  out.close
end

After making all those changes, we see the following new results for the
10 MB file transfer from WEBrick:

                           user system total real
net/http - big buffer 0.360000 0.390000 0.750000 ( 0.991848)

That's still twice as slow as a raw TCPSocket, but it's now definitely
in the realm of "usable for large file transfers".

I couldn't make any real recommendations on what the buffers size should
be. I imagine it's a trade off between the OS kernel's buffer size, TCP
packet size, and memory footprint of your application. Do HTTP clients
normally automatically negotiate a buffer? Do they pick one based on
content type? What are the common optimisations, and should net/http
follow them?

I tested a couple of values and found anything past 65536 bytes started
giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 -
i.e. the default OS X install).

Thanks again for all the pointers and commentary, and I hope to see a
more robust solution in a future version :slight_smile:

Regards,

Luke.

···

--
Posted via http://www.ruby-forum.com/.

An addendum: testing with ruby 1.8.4 from a Locomotive bundle I had
handy, reveals my solution does not work on my recent versions of Ruby.

I suspect this might be due to more strictness on doing shonky things,
such as overriding private methods, which is the basis of my hack.

For now I am satisfied (since I am targeting OS X with Ruby 1.8.2), but
if any Ruby gurus can suggest a 1.8.4 compatible hack, I'm all ears.

Regards,

Luke.

···

--
Posted via http://www.ruby-forum.com/.

Luke Burton wrote:

necessary methods to tweak the buffer size:

class OverrideInternetMessageIO < Net::InternetMessageIO
  def rbuf_fill
    timeout(@read_timeout) {
      @rbuf << @socket.sysread(65536)
    }
  end
end

>

I couldn't make any real recommendations on what the buffers size should
be. I imagine it's a trade off between the OS kernel's buffer size, TCP
packet size, and memory footprint of your application. Do HTTP clients
normally automatically negotiate a buffer? Do they pick one based on
content type? What are the common optimisations, and should net/http
follow them?

I tested a couple of values and found anything past 65536 bytes started
giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 -
i.e. the default OS X install).

It would be really interesting to get a record of the actual data sizes
read out from each of the calls to sysread(65536) in your modified code.
If stdout is free, maybe you could do something like:
  timeout(@read_timeout) {
    a << @socket.sysread(65536)
    puts "sysread #{a.length} bytes"
    @rbuf << a
  }
just for a few trials, especially to compare the values when your large
file in coming in from a LAN, a WAN, and the Internet. The values you
see should give you a clue as to what the sysread buffer size ought to
be. TCP of course has the "sliding congestion window" mechanism in which
it adaptively increases the number of bytes that a peer may send to
another peer before it must wait for an acknowledgement. Most of the
time, this number may not be larger than 64K (because it's carried in a
16-bit field in the TCP packet header), which explains your observation
that 64K is a practical limit. With a large file transfer on a fast and
otherwise unloaded network, you should see this value quickly reach and
remain at 64K. (This is not something that HTTP clients have to do
themselves, to your other question- it's built into TCP.) If your
application runs on a LAN, I would expect a lot of benefit from 64K
sysreads. Across the Internet, I'd be pretty surprised if you get much
improvement from sysreads above 16K (which is also the typical
network-driver buffer size for Berkeley-derived kernels like OSX, unless
you've tweaked yours).

It's rather a surprise that the Ruby code which handles these raw reads
is so inefficient that cutting down the number of passes through it
makes such a difference. I'm usually pretty surprised when I/O
processing in an application is more than a negligible fraction of the
network transit time.

···

--
Posted via http://www.ruby-forum.com/\.

Luke Burton <luke@burton.echidna.id.au> writes:

After making all those changes, we see the following new results for the
10 MB file transfer from WEBrick:

                           user system total real
net/http - big buffer 0.360000 0.390000 0.750000 ( 0.991848)

That's still twice as slow as a raw TCPSocket, but it's now definitely
in the realm of "usable for large file transfers".

Does this improve noticeably if you use the read_nonblock suggestion
given earlier in the thread? Say:

class OverrideInternetMessageIO < Net::InternetMessageIO
  def rbuf_fill
    begin
      @rbuf << @io.read_nonblock(65536)
    rescue Errno::EWOULDBLOCK
      if IO.select([@io], nil, nil, @read_timeout)
        @rbuf << @io.read_nonblock(65536)
      else
        raise Timeout::TimeoutError
      end
    end
  end
end

As for the appropriate buffer size, for what it's worth apache uses
this structure to read into when it's acting as a proxy server and
reading someone else's output:

    char buffer[HUGE_STRING_LEN];

Where HUGE_STRING_LEN is defined in various apr (Apache Portable
Runtime) headers as 8192. (In Apcahe 1.3 it was in 'httpd.h')

I don't have time to track through the mozilla source to find out what
buffer size they use.

It would be really interesting to get a record of the actual data sizes
read out from each of the calls to sysread(65536) in your modified code.
If stdout is free, maybe you could do something like:
  timeout(@read_timeout) {
    a << @socket.sysread(65536)
    puts "sysread #{a.length} bytes"
    @rbuf << a
  }

Yecch, sorry, obviously the second line in my code snippet should be:
    a = @socket.sysread(65536)

···

--
Posted via http://www.ruby-forum.com/\.

Daniel Martin wrote:

Does this improve noticeably if you use the read_nonblock suggestion
given earlier in the thread? Say:

Hi Daniel,

I tested this earlier ... but my version of Ruby doesn't have
read_nonblock, unfortunately. I haven't had a chance to pull down 1.8.5
and re-test.

Additionally, if I were to do so, I'd need to find a new workaround
method. As mentioned above, my OverrideInternetMessageIO class does not
seem to function in > ruby-1.8.2. So that makes a nice catch-22.

Of couse I could hand edit the net/http classes, which would suffice for
an academic test. In the real world I can't run around patching people's
net/http for them :frowning:

Taking a step back for a moment - is there a Ruby bugzilla system or
equivalent, where such problems can be logged and prioritised? I would
be happy to do the grunt work of submitting the patch, now that the
solution is pretty clear.

Regards,

Luke.

···

--
Posted via http://www.ruby-forum.com/\.

In article <17b6c68ca7a0c35209ac51f136506734@ruby-forum.com>,
  Luke Burton <luke@burton.echidna.id.au> writes:

I tested this earlier ... but my version of Ruby doesn't have
read_nonblock, unfortunately. I haven't had a chance to pull down 1.8.5
and re-test.

This should work with 1.8.2.

    def rbuf_fill
      if IO.select([@io], nil, nil, @read_timeout)
        @rbuf << @io.sysread(16384)
      else
        raise Timeout::TimeoutError
      end
    end

The enlarging buffer size should work well until in-kernel
TCP buffer is large enough to store data receiving between
successive rbuf_fill.

If the in-kernel buffer is not large enough, the overhead
should be reduced. I think timeout() is the first candidate
to remove.

···

--
Tanaka Akira