Net/http performance question

I have the following code:

def fetch_into (uri, name)
  http = Net::HTTP.new(uri.host, uri.port)
  req = Net::HTTP::Get.new(uri.path)
  req.basic_auth(USERNAME, PASSWORD)
  start_time = Time.now.to_f
  File.open(name, "w") do |f|
    print " - fetching #{name}"
    http.request(req) do |result|
      f.write(result.body)
      f.close()
      elapsed = Time.new.to_f - start_time
      bps = (result.body.length / elapsed) / 1024
      printf ", at %7.2f kbps\n", bps
    end
  end
end

this is run in a very simple loop that doesn't do anything that
requires much CPU. the files downloaded are about 10Mb and since the
connection is not that fast (about 15Mbit/sec) I would expect this to
consume little CPU, but in fact it *gobbles* up CPU. on a 2Ghz AMD it
eats 65% CPU on average (the job runs for hours on end).

where are the cycles going? I assumed it would be a somewhat
suboptimal way of doing it since there might be some buffer resizing
in there, but not *that* badly.

anyone care to shed some light on this?

(I would assume that there is a way of performing an http request in a
way where you can read chunks of the response body at a time?)

-Bjørn

Hi,
there seems to be HTTPResponse#read_body, that can provide the chunks
as they come (not tested, copy&paste from docs:

# using iterator
  http.request_get('/index.html') {|res|
    res.read_body do |segment|
      print segment
    end
  }

BTW, you could move the File.open later, saving f.close() call
try fiddling with GC - GC.disable when receiving might help or not.
don't forget to enable it between requests.

so

def fetch_into (uri, name)
  http = Net::HTTP.new(uri.host, uri.port)
  req = Net::HTTP::Get.new(uri.path)
  req.basic_auth(USERNAME, PASSWORD)
  start_time = Time.now.to_f
  print " - fetching #{name}"
  # GC.disable # optional
  http.request(req) do |result|
    File.open(name, "w") do |f|
      result.read_body do |segment|
        f.write(segment)
      end
    end
    elapsed = Time.new.to_f - start_time
    bps = (result.body.length / elapsed) / 1024
    printf ", at %7.2f kbps\n", bps
  end
  # GC.enable
end

···

On 10/31/06, Bjorn Borud <borud-news@borud.no> wrote:

I have the following code:

def fetch_into (uri, name)
  http = Net::HTTP.new(uri.host, uri.port)
  req = Net::HTTP::Get.new(uri.path)
  req.basic_auth(USERNAME, PASSWORD)
  start_time = Time.now.to_f
  File.open(name, "w") do |f|
    print " - fetching #{name}"
    http.request(req) do |result|
      f.write(result.body)
      f.close()
      elapsed = Time.new.to_f - start_time
      bps = (result.body.length / elapsed) / 1024
      printf ", at %7.2f kbps\n", bps
    end
  end
end

this is run in a very simple loop that doesn't do anything that
requires much CPU. the files downloaded are about 10Mb and since the
connection is not that fast (about 15Mbit/sec) I would expect this to
consume little CPU, but in fact it *gobbles* up CPU. on a 2Ghz AMD it
eats 65% CPU on average (the job runs for hours on end).

where are the cycles going? I assumed it would be a somewhat
suboptimal way of doing it since there might be some buffer resizing
in there, but not *that* badly.

anyone care to shed some light on this?

(I would assume that there is a way of performing an http request in a
way where you can read chunks of the response body at a time?)

["Jan Svitok" <jan.svitok@gmail.com>]

Hi,
there seems to be HTTPResponse#read_body, that can provide the chunks
as they come (not tested, copy&paste from docs:

# using iterator
  http.request_get('/index.html') {|res|
    res.read_body do |segment|
      print segment
    end
  }

thanks!

indeed, this helped a bit, but not too much. from the looks of it the
standard libraries seem to hard-code the read buffer size to 1024
(Ruby 1.8, net/protocol.rb) which results in at least twice the number
of system calls to read(2) for the same amount of data. I
experimentally upped the read buffer to 10k, and now it seems I get
buffer-fulls equivalent to the MTU over the interface the data is
readfrom.

when it is at 1024 bytes I consistently get one buffer of 1024 and the
next buffer is approximately MTU - 1024. the next is 1024 bytes again
etc.

even after modifying the hard-coded buffer size to 10k it still eats
obscene amounts of CPU for what it is doing. I would have expected
any reasonable implementation to eat at most 1% CPU (probably less)
for what is almost pure IO. (it now consumes about 35% CPU on a 2Ghz
AMD).

anyway, note to implementors: it might be an idea to pick a buffer
size larger than 1024 bytes if you are going to hard code it. at the
very least 4k or 8k would be more sensible. preferably it should be
configurable (but with a sensible default value) so the user can make
an informed decision to increase or decrease the size as needed.

-Bjørn