cURL in ruby? Faster than Net::HTTP?

I've found a couple of packages that claim to integrate the curl library
into ruby. Which one is the standard library?

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

Would it be smart of me to switch from Net::HTTP to curl? Because a
tenth of a second is precious in my application.

Thanks for your help.

···

--
Posted via http://www.ruby-forum.com/.

I can't speak to the speed of any curl library, but I can cite my recent experience building a crawler like app. I'm using non-blocking sockets and therefore can't utilize Net::HTTP and am hand-coding HTTP directly. Under OS X I found a lot of latency (around 100ms) for both IPSocket.getaddress() and Socket.sockaddr_in(). Under Linux packing sockaddr seems to incur a negligible cost. Under OS X I pack the sockaddr manually (yeah, it's gross). To mitigate the host lookup cost I maintain a cache (Hashmap) of host => IP. (At least under OS X, even resolving localhost takes 100ms, even on repeat calls.)

The point being that I would assume that Net::HTTP inherits the costs of these two calls. Which would explain at least some of the connection slowness. As for download speed, I could only make guesses, and they'd be pretty uneducated. I would suspect C has better I/O performance than Ruby, so a native library would probably be faster.

Corey

···

On Aug 22, 2006, at 8:17 PM, Ben Johnson wrote:

I've found a couple of packages that claim to integrate the curl library
into ruby. Which one is the standard library?

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

Would it be smart of me to switch from Net::HTTP to curl? Because a
tenth of a second is precious in my application.

Thanks for your help.

--
Posted via http://www.ruby-forum.com/.

Hey Ben-

  I haven't used the libcurl bindings myself so I can't comment on those. But you may want to look at Zed's rfuzz project[1]. It is for testing web apps but he also says that it is a faster replacement for net/http. Since its http parser is written in c using the same parser that mongrel does it should be faster then net/http.

Cheers-
-Ezra

[1] http://www.zedshaw.com/projects/rfuzz/

···

On Aug 22, 2006, at 8:17 PM, Ben Johnson wrote:

I've found a couple of packages that claim to integrate the curl library
into ruby. Which one is the standard library?

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

Would it be smart of me to switch from Net::HTTP to curl? Because a
tenth of a second is precious in my application.

Thanks for your help.

--
Posted via http://www.ruby-forum.com/.

The cURL library is indeed very fast, but it also suffers from a problem that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
process. To overcome that, you'll need c-ares[1], which will probably also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use Ruby's
non-blocking DNS resolver:

  require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projects/c-ares/

···

On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

why the lucky stiff wrote:

···

On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

The cURL library is indeed very fast, but it also suffers from a problem
that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will
block your
process. To overcome that, you'll need c-ares[1], which will probably
also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use
Ruby's
non-blocking DNS resolver:

  require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed
difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projects/c-ares/

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

--
Posted via http://www.ruby-forum.com/.

slightly offtopic: Why, have you read through resolv-replace? What's the
magic to it? If all it does is to run DNS queries asynchronously, then don't
you need to be running threads in your program in order to get any benefit
from it?

I've found that DNS lookups in Ruby are a total performance killer and I've
been looking at writing a bind-cache or a new DNS client altogether just to
get around this problem.

···

On 8/23/06, why the lucky stiff <ruby-talk@whytheluckystiff.net> wrote:

Net::HTTP suffers from: its DNS lookup is not asynchronous and will block
your
process. To overcome that, you'll need c-ares[1], which will probably
also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use
Ruby's
non-blocking DNS resolver:

  require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed difference,
so I abandoned the extension.

why the lucky stiff wrote:

The cURL library is indeed very fast, but it also suffers from a problem that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
process.

libcurl offers an asynchronous API that does the name resolving
asynchronously if you've built libcurl to do so.

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

whole program until the dns is resolved. I can't imagine that forking
another process would be more efficient then using net/http.

···

From my understanding dns lookups block in ruby, as in they stop the

No Kernel.`` doesn't fork a new process. It blocks your current process and waits for the subprocess to return. See Kernel.fork and Process.detach.

Also there's some gems that could probably help you out. Ara T. Howard's slave[1] comes to mind.

Corey

1. http://codeforpeople.com/lib/ruby/slave/

···

On Aug 22, 2006, at 9:21 PM, Ben Johnson wrote:

why the lucky stiff wrote:

On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

The cURL library is indeed very fast, but it also suffers from a problem
that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will
block your
process. To overcome that, you'll need c-ares[1], which will probably
also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use
Ruby's
non-blocking DNS resolver:

  require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed
difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projects/c-ares/

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

Does it use the native getaddrinfo()? The problem I've had on FreeBSD
is that getaddrinfo() will block.

_why

···

On Thu, Aug 24, 2006 at 04:15:02PM +0900, daniel.haxx@gmail.com wrote:

libcurl offers an asynchronous API that does the name resolving
asynchronously if you've built libcurl to do so.

Ben Johnson wrote:

What do you mean by the DNY lookup is asynchronous and will block my process? If I was to call curl directly from the command line using `curl` in ruby wouldn't that be much faster. In this instance it wo uld get it's own process and take better advantage of a dual processor system. Am I correct, because what I planned on doing was just using curl directly from the command line unless there is a downside to this.

Odds are the process startup would take up more time than you'd gain. IMO that's NOT a good way to leverage a dual-core processor. Doing hacks like this only makes sense in a CPU-intensive application (which curl hardly is), and you want to split the work between two (or maybe more) *threads* more or less equally. You also want these threads being managed in a thread pool to avoid OS thread initialisation time. For added hilarity, you need native threads for this, not green threads - the OS can't schedule those on different cores.

Technically, you could do this using processes instead of threads. Except once again, you want to outweigh the process initialisation time, and the time it takes to transfer data between the processes, with the added performance eliminating context switches brings. Which just might not be all that easy.

David Vallner

snacktime wrote:

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

From my understanding dns lookups block in ruby, as in they stop the
whole program until the dns is resolved. I can't imagine that forking
another process would be more efficient then using net/http.

In my program each curl request would be in its own thread. I also think
the forking a new process by using `` would be quicker. Mainly because I
am doing this on a dual processor server. Having everything run under
one process doesn't take advantage of that. Lastly, curl has a timeout
variable, so if for some reason the request didn't response it would
time out. I also noticed that running curl and Net::HTTP side by side,
curl wins hands down. There is even a hitch right before the request is
made in Net::HTTP, about .5 to 1 second.

Am I wrong here?

What I'm going to do is probably implement the curl functionaltiy in my
program and post the speed differences for future reference. Unless
someone tells me I'm about going about this all wrong.

Thanks a lot for everyones help.

···

--
Posted via http://www.ruby-forum.com/.

David Vallner wrote:

Ben Johnson wrote:

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

Odds are the process startup would take up more time than you'd gain.
IMO that's NOT a good way to leverage a dual-core processor. Doing hacks
like this only makes sense in a CPU-intensive application (which curl
hardly is), and you want to split the work between two (or maybe more)
*threads* more or less equally. You also want these threads being
managed in a thread pool to avoid OS thread initialisation time. For
added hilarity, you need native threads for this, not green threads -
the OS can't schedule those on different cores.

Technically, you could do this using processes instead of threads.
Except once again, you want to outweigh the process initialisation time,
and the time it takes to transfer data between the processes, with the
added performance eliminating context switches brings. Which just might
not be all that easy.

David Vallner

Thanks for your response.

I just implemented curl using `curl`. I would say I have about 60 - 100
simulatneous requests going out at the same time. With the switch
between Net::HTTP to using `curl` I noticed a speed increase of almost 3
times. Either Net::HTTP is slow or ruby is slow, but something in
Net::HTTP is slowing it down quite a bit.

···

--
Posted via http://www.ruby-forum.com/.

Does it matter whether it blocks or not? Ruby can't schedule its green
threads while you're inside a system-library call unless the call
knows about Ruby's scheduler (which it doesn't). Right?

···

On 8/24/06, why the lucky stiff <ruby-talk@whytheluckystiff.net> wrote:

On Thu, Aug 24, 2006 at 04:15:02PM +0900, daniel.haxx@gmail.com wrote:
> libcurl offers an asynchronous API that does the name resolving
> asynchronously if you've built libcurl to do so.

Does it use the native getaddrinfo()? The problem I've had on FreeBSD
is that getaddrinfo() will block.

_why

search for http reverse dns.

-a

···

On Fri, 25 Aug 2006, Ben Johnson wrote:

Thanks for your response.

I just implemented curl using `curl`. I would say I have about 60 - 100
simulatneous requests going out at the same time. With the switch
between Net::HTTP to using `curl` I noticed a speed increase of almost 3
times. Either Net::HTTP is slow or ruby is slow, but something in
Net::HTTP is slowing it down quite a bit.

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

Francis Cianfrocca wrote:

> > libcurl offers an asynchronous API that does the name resolving
> > asynchronously if you've built libcurl to do so.
>
> Does it use the native getaddrinfo()? The problem I've had on FreeBSD
> is that getaddrinfo() will block.

Does it matter whether it blocks or not? Ruby can't schedule its green
threads while you're inside a system-library call unless the call
knows about Ruby's scheduler (which it doesn't). Right?

You _could_ read up on the libcurl details in the libcurl docs, but
then what fun would that be? Let's continue making assumptions like
this...

No, it is _not_ asynchronous inside a system call and it is _not_ using
the native getaddrinfo() for asynchronous name resolves.

unknown wrote:

···

On Fri, 25 Aug 2006, Ben Johnson wrote:

Thanks for your response.

I just implemented curl using `curl`. I would say I have about 60 - 100
simulatneous requests going out at the same time. With the switch
between Net::HTTP to using `curl` I noticed a speed increase of almost 3
times. Either Net::HTTP is slow or ruby is slow, but something in
Net::HTTP is slowing it down quite a bit.

search for http reverse dns.

-a

Can you be a little more specific? Also, what if I was to connect to the
server via the ip address and not the domain name? Would that speed
things up a bit?

--
Posted via http://www.ruby-forum.com/.

You may have misunderstood me. Even if libcurl or anything else
resolves names "asynchronously" (which can mean more than one thing),
then does that make it faster on a per-resolution basis, or just more
concurrent? If the former, then I'll read your code to see how you did
it. If the latter, then doesn't a Ruby program need to be written in a
special way in order to benefit from the concurrency?

···

On 8/25/06, daniel.haxx@gmail.com <daniel.haxx@gmail.com> wrote:

Francis Cianfrocca wrote:

> > > libcurl offers an asynchronous API that does the name resolving
> > > asynchronously if you've built libcurl to do so.
> >
> > Does it use the native getaddrinfo()? The problem I've had on FreeBSD
> > is that getaddrinfo() will block.

> Does it matter whether it blocks or not? Ruby can't schedule its green
> threads while you're inside a system-library call unless the call
> knows about Ruby's scheduler (which it doesn't). Right?

You _could_ read up on the libcurl details in the libcurl docs, but
then what fun would that be? Let's continue making assumptions like
this...

No, it is _not_ asynchronous inside a system call and it is _not_ using
the native getaddrinfo() for asynchronous name resolves.

Never mind, I figured it out. As I suspected, curl just wrote their
own protocol handler for DNS lookups. They fit it into their
event-driven architecture so name lookups can be happening
simultaneously with other work. I didn't see any cacheing or anything
similar but maybe I didn't look hard enough. As with other approaches,
there's no magic speedup- you still have to write your program in such
a way as to capture the concurrency.

···

On 8/25/06, daniel.haxx@gmail.com <daniel.haxx@gmail.com> wrote:
> You _could_ read up on the libcurl details in the libcurl docs, but

then what fun would that be? Let's continue making assumptions like
this...

No, it is _not_ asynchronous inside a system call and it is _not_ using
the native getaddrinfo() for asynchronous name resolves.