Open-uri speed

I just discovered this. Lots of people already know, I'm sure, but maybe some don't.

# net/http ran in: 128.734298 seconds.
# open-uri ran in: 268.869359 seconds.

require "net/http"
require "open-uri"

def timing(name)
   start_time = Time.now
   yield
   end_time = Time.now
   puts "#{name} ran in: #{end_time - start_time} seconds."
end

n = 1_000

timing("net/http") {Net::HTTP.start("www.pythonchallenge.com") do |http|
   n.times do |i|
     r = http.get("/pc/def/linkedlist.php")
   end
end}

timing("open-uri") {n.times do |i|
   open("http://www.pythonchallenge.com/pc/def/linkedlist.php") do |x|
     x.read
   end
end}

-- Elliot Temple

Interesting. Are you sure it's not caused by server side effects? I
mean, the URL you are retrieving seems to come from a PHP script - and
that can do anything it wants to waste time, including parsing client
identifier etc. I would have preferred to test with a static HTML
page.

Btw, something you might not know: there is module Benchmark that can
be easily used for these kinds of things. It also does nice printing
and differentiation between kernel and user times, automatic ramp up
if needed etc. :slight_smile:

Kind regards

robert

···

2006/7/4, Elliot Temple <curi@curi.us>:

I just discovered this. Lots of people already know, I'm sure, but
maybe some don't.

# net/http ran in: 128.734298 seconds.
# open-uri ran in: 268.869359 seconds.

--
Have a look: Robert K. | Flickr

Well of course the second one takes longer. It parses the URI and then does the same thing as the first. There's at least one more method call involved with the second every time through the the loop.

···

On Jul 4, 2006, at 12:51 AM, Elliot Temple wrote:

I just discovered this. Lots of people already know, I'm sure, but maybe some don't.

# net/http ran in: 128.734298 seconds.
# open-uri ran in: 268.869359 seconds.

require "net/http"
require "open-uri"

def timing(name)
  start_time = Time.now
  yield
  end_time = Time.now
  puts "#{name} ran in: #{end_time - start_time} seconds."
end

n = 1_000

timing("net/http") {Net::HTTP.start("www.pythonchallenge.com") do |http>
  n.times do |i|
    r = http.get("/pc/def/linkedlist.php")
  end
end}

timing("open-uri") {n.times do |i|
  open("http://www.pythonchallenge.com/pc/def/linkedlist.php&quot;\) do |x|
    x.read
  end
end}

-- Elliot Temple
Curiosity Blog – Elliot Temple

Elliot Temple wrote:

I just discovered this. Lots of people already know, I'm sure, but maybe some don't.

# net/http ran in: 128.734298 seconds.
# open-uri ran in: 268.869359 seconds.

require "net/http"
require "open-uri"

def timing(name)
  start_time = Time.now
  yield
  end_time = Time.now
  puts "#{name} ran in: #{end_time - start_time} seconds."
end

n = 1_000

timing("net/http") {Net::HTTP.start("www.pythonchallenge.com") do |http|
  n.times do |i|
    r = http.get("/pc/def/linkedlist.php")
  end
end}

Doesn't this version open the connection only once and then reuses it for all the requests?

timing("open-uri") {n.times do |i|
  open("http://www.pythonchallenge.com/pc/def/linkedlist.php&quot;\) do |x|
    x.read
  end
end}

This, I'm sure, doesn't reuse the connection.

···

--

Your benchmark is not very illustrative of the problem, try this one:

$ cat timing.rb
require 'net/http'
require 'open-uri'

def timing(name, n)
   start_time = Time.now
   n.times do yield end
   end_time = Time.now
   puts "#{name} ran in: #{end_time - start_time} seconds."
end

def test(uri, n)
   timing 'raw socket', n do
     s = TCPSocket.open uri.host, uri.port
     s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost: #{uri.host}\r\n\r\n"
     s.read.split("\r\n\r\n", 2).last
     s.close
   end

   Net::HTTP.start uri.host do |http|
     timing 'net/http cheat', n do
       r = http.get uri.request_uri
     end
   end

   timing 'net/http', n do
     Net::HTTP.start uri.host do |http|
       r = http.get uri.request_uri
     end
   end

   timing 'open-uri', n do
     uri.open do |x|
       x.read
     end
   end
end

n = 100

uri = URI.parse 'http://localhost/manual/&#39;

p uri.read.length

test uri, n

uri = URI.parse 'http://localhost/manual/mod/mod_rewrite.html&#39;

p uri.read.length

test uri, n

$ ruby timing.rb
9187
raw socket ran in: 1.184571 seconds.
net/http cheat ran in: 1.809506 seconds.
net/http ran in: 2.137558 seconds.
open-uri ran in: 2.606976 seconds.
87071
raw socket ran in: 1.729406 seconds.
net/http cheat ran in: 7.434297 seconds.
net/http ran in: 7.740268 seconds.
open-uri ran in: 13.605024 seconds.

You shouldn't cheat and have Net::HTTP reuse its connection. (It seems socket setup/teardown costs 3ms over loopback on my machine.)

open-uri and Net::HTTP's performance both degrade significantly on larger files. I believe this is due to their implementation, they both read into a buffer rather than fetching the entire response. open-uri buffers differently and provides progress callbacks which is probably the reason it performs worse the larger the file.

I tend to use open-uri because it has a simpler API. I don't have to worry about handling redirects because it all gets taken care of for me.

···

On Jul 3, 2006, at 9:51 PM, Elliot Temple wrote:

I just discovered this. Lots of people already know, I'm sure, but maybe some don't.

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Method call overhead doesn't account for 2 minutes when N was only set to 1000. At 100,000 times, URI parsing only takes a few seconds.

# parse curi.us ran in: 4.434573 seconds.
# parse follow the chain ran in: 8.940227 seconds.

timing('parse curi.us'){100_000.times do
   URI.parse("curi.us")
end}
timing('parse http://www.pythonchallenge.com/pc/def/linkedlist.php&#39;\){100_000.times do
   URI.parse("http://www.pythonchallenge.com/pc/def/linkedlist.php&quot;\)
end}

-- Elliot Temple

···

On Jul 3, 2006, at 11:50 PM, Logan Capaldo wrote:

Well of course the second one takes longer. It parses the URI and then does the same thing as the first. There's at least one more method call involved with the second every time through the the loop.

I don't know. But I started testing with n=1 and an outer loop, so the Net::HTTP.start is repeated, and net/http is still winning (by more than URI parsing accounts for).

I will run some longer tests later to get more accurate data (and intermix doing it each way to minimise the effect of random net traffic fluctuations). I will use a static page as Robert suggested.

-- Elliot Temple

···

On Jul 4, 2006, at 2:26 AM, Carlos wrote:

Elliot Temple wrote:

I just discovered this. Lots of people already know, I'm sure, but maybe some don't.
# net/http ran in: 128.734298 seconds.
# open-uri ran in: 268.869359 seconds.
require "net/http"
require "open-uri"
def timing(name)
  start_time = Time.now
  yield
  end_time = Time.now
  puts "#{name} ran in: #{end_time - start_time} seconds."
end
n = 1_000
timing("net/http") {Net::HTTP.start("www.pythonchallenge.com") do |http>
  n.times do |i|
    r = http.get("/pc/def/linkedlist.php")
  end
end}

Doesn't this version open the connection only once and then reuses it for all the requests?

Yeah but the first one uses a single object. The second one creates a new object everytime

···

On Jul 4, 2006, at 3:16 PM, Elliot Temple wrote:

Method call overhead doesn't account for 2 minutes when N was only set to 1000. At 100,000 times, URI parsing only takes a few seconds.