Asynchronous HTTP request

Does anyone know how to do the following, but without threads, purely with asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

I'm not sure I understand EventMachine, but it doesn't seem like this code fits with the "event loop" model. Besides, I don't want to react to every chunk of data that comes it; I just want the result at the end.

Thanks.

Daniel DeLorme wrote:

Does anyone know how to do the following, but without threads, purely
with asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

Depends what you mean by "with asynchronous IO". Do you want to keep
calling select() and then only read data when its available? Then you're
basically rewriting eventmachine or io-reactor.

Otherwise, you can do res.read_body with a block - it will be called for
each chunk. But read_body will still block until the body is complete.

I'm not sure I understand EventMachine, but it doesn't seem like this
code fits with the "event loop" model. Besides, I don't want to react to
every chunk of data that comes it; I just want the result at the end.

But if you don't want the code to block until the body has read, but you
don't want the read to take place in another thread, then what do you
want?

What's the problem with threads anyway? Being able to do one thing while
you're waiting for something else to complete is exactly what they're
for.

···

--
Posted via http://www.ruby-forum.com/\.

I'm not sure I understand EventMachine, but it doesn't seem like this
code fits with the "event loop" model. Besides, I don't want to react to
every chunk of data that comes it; I just want the result at the end.

It might fit well. Give it a shot.

http://eventmachine.rubyforge.org/EventMachine/Protocols/HttpClient.html

-rp

···

--
Posted via http://www.ruby-forum.com/\.

You could check out GitHub - pauldix/typhoeus: Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic.

It allows you to create parallel HTTP requests pretty easily.

···

On May 13, 2010, at 3:37 AM, Daniel DeLorme wrote:

Does anyone know how to do the following, but without threads, purely with asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

I'm not sure I understand EventMachine, but it doesn't seem like this code fits with the "event loop" model. Besides, I don't want to react to every chunk of data that comes it; I just want the result at the end.

Thanks.

--
Zach Moazeni

Event machine is perfect for this kind of stuff. Weather it fits with the
rest of your web framework is more likely the thing that makes it an
unlikely selection (if you're using anything rack based for example)

http://eventmachine.rubyforge.org/EventMachine/Protocols/HttpClient.html
or
http://eventmachine.rubyforge.org/EventMachine/Protocols/HttpClient2.html

Give an example of how this is used.

HTH
Daniel

···

On 13 May 2010 17:37, Daniel DeLorme <dan-ml@dan42.com> wrote:

Does anyone know how to do the following, but without threads, purely with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

I'm not sure I understand EventMachine, but it doesn't seem like this code
fits with the "event loop" model. Besides, I don't want to react to every
chunk of data that comes it; I just want the result at the end.

Thanks.

Looks like you want futures, which can be provided by any number of
frameworks. A pretty awesome one to consider is dataflow, which is based
off ideas from the Oz language:

MenTaLguY's Omnibus library also provides futures, however I don't believe
it's presently maintained:

http://rubyforge.org/projects/concurrent

···

On Thu, May 13, 2010 at 1:37 AM, Daniel DeLorme <dan-ml@dan42.com> wrote:

Does anyone know how to do the following, but without threads, purely with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

--
Tony Arcieri
Medioh! A Kudelski Brand

Brian Candler wrote:

Daniel DeLorme wrote:

Does anyone know how to do the following, but without threads, purely
with asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

Depends what you mean by "with asynchronous IO". Do you want to keep calling select() and then only read data when its available? Then you're basically rewriting eventmachine or io-reactor.

I mean nonblocking. I don't want to keep calling select(), I just want to call it once, when I'm ready to process the data I asked for.

But if you don't want the code to block until the body has read, but you don't want the read to take place in another thread, then what do you want?

I just want to issue the http request, do other stuff while the request goes on its merry way, let the response accumulate at the socket, and read the data when I'm ready to. If at that time the response has accumulated at the socket then I don't have to wait, otherwise block until the data has finished coming in.

What's the problem with threads anyway? Being able to do one thing while you're waiting for something else to complete is exactly what they're for.

I can't agree with that. Thread are meant to achieve *concurrency*, meaning the concurrent (or at least interleaved) execution of instructions. If the only thing the IO thread does is wait for the data and then exit, there's nothing concurrent happening; it's just a way to simulate nonblocking IO. And creating a thread just for that seems to me like the proverbial jackhammer to drive in a nail, especially since 1.9 threads are no longer green.

Given that nonblocking IO is a paradigm that's been around for ages I was kinda hoping there was a neat & tidy way of doing it (a gem maybe?) but I haven't found it.

There's a library I wrote specifically for external http requests in mind.
It uses threads and blocks on first access (no callbacks).

  m = Muscle.new do |m|
    m.action(:users) do
      # get users from an external service
    end

    m.action(:slow_stuff, :timeout => 1.2) do
      # some unreliable action.
    end

    # Setup a special timeout handler for the second action
    # by default timeouts are set to 5 seconds
    m.on_timeout(:another) do
      "Sorry but :action timed out"
    end
  end

m[:users] # blocks when accessed until m[:users] action is completed.
Continues the remaining actions in the background

Not sure if it helps your situation, but it's simple and works effectively.

···

On 18 May 2010 14:19, Tony Arcieri <tony.arcieri@medioh.com> wrote:

On Thu, May 13, 2010 at 1:37 AM, Daniel DeLorme <dan-ml@dan42.com> wrote:

> Does anyone know how to do the following, but without threads, purely
with
> asynchronous IO?
>
> website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
> template = compute_lots_of_stuff()
> puts template.sub("<content goes here>", website.value)
>

Looks like you want futures, which can be provided by any number of
frameworks. A pretty awesome one to consider is dataflow, which is based
off ideas from the Oz language:

GitHub - larrytheliquid/dataflow: Dataflow concurrency for Ruby (inspired by the Oz language)

MenTaLguY's Omnibus library also provides futures, however I don't believe
it's presently maintained:

http://rubyforge.org/projects/concurrent

--
Tony Arcieri
Medioh! A Kudelski Brand

Daniel N wrote:

···

On 13 May 2010 17:37, Daniel DeLorme <dan-ml@dan42.com> wrote:

Does anyone know how to do the following, but without threads, purely with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

Event machine is perfect for this kind of stuff. Weather it fits with the
rest of your web framework is more likely the thing that makes it an
unlikely selection (if you're using anything rack based for example)

If you could show me how to use EventMachine in this case I'd be grateful. I couldn't figure out how to run compute_lots_of_stuff() while the http requests are executing.

I think you are trying to save the computer time at the expense of your
own. What you are asking to do is reimplement something that Thread
does very well, and without excessive resource usage. Even ruby1.8 can
run thousands of sleepy I/O threads without a problem. The reason you
can't find another library for doing what you want is that everyone uses
threads. If you really want a jackhammer, use EventMachine :slight_smile:

···

On Thu, 2010-05-13 at 22:29 +0900, Daniel DeLorme wrote:

> What's the problem with threads anyway? Being able to do one thing while
> you're waiting for something else to complete is exactly what they're
> for.

I can't agree with that. Thread are meant to achieve *concurrency*,
meaning the concurrent (or at least interleaved) execution of
instructions. If the only thing the IO thread does is wait for the data
and then exit, there's nothing concurrent happening; it's just a way to
simulate nonblocking IO. And creating a thread just for that seems to me
like the proverbial jackhammer to drive in a nail, especially since 1.9
threads are no longer green.

--
Matthew

Daniel DeLorme wrote:

I just want to issue the http request, do other stuff while the request
goes on its merry way, let the response accumulate at the socket, and
read the data when I'm ready to.

Hmm. Well you can delay reading the body like this:

  http = Net::HTTP.start(...)
  res = http.get(...)
  ... do some stuff
  answer = res.read_body

but it'll wait for the response headers before get() returns. So, you
should just pick the bits you need out of /usr/lib/ruby/1.8/net/http.rb
directly.

Note that get() just calls request(Get.new(...)), which takes you here:

    def request(req, body = nil, &block) # :yield: +response+
      unless started?
        start {
          req['connection'] ||= 'close'
          return request(req, body, &block)
        }
      end
      if proxy_user()
        unless use_ssl?
          req.proxy_basic_auth proxy_user(), proxy_pass()
        end
      end

      req.set_body_internal body
      begin_transport req
        req.exec @socket, @curr_http_version, edit_path(req.path)
        begin
          res = HTTPResponse.read_new(@socket)
        end while res.kind_of?(HTTPContinue)
        res.reading_body(@socket, req.response_body_permitted?) {
          yield res if block_given?
        }
      end_transport req, res

      res
    end

You can see there how to send the request (req.exec), and subsequently
how to read the response from the socket.

···

--
Posted via http://www.ruby-forum.com/\.

Daniel N wrote:

There's a library I wrote specifically for external http requests in mind.
It uses threads and blocks on first access (no callbacks).

GitHub - hassox/muscle: A simple parallel execution library

Except that's pretty much the same thing as my original example. And
indeed it's simple and effective. It's just that I happen to like the
concept of asynchronous IO so I would like to do it that way if
possible.

···

On 18 May 2010 14:19, Tony Arcieri <tony.arcieri@medioh.com> wrote:

Looks like you want futures, which can be provided by any number of
frameworks. A pretty awesome one to consider is dataflow, which is based
off ideas from the Oz language:

GitHub - larrytheliquid/dataflow: Dataflow concurrency for Ruby (inspired by the Oz language)

Oooh, that's a pretty nifty concept... but again it relies on threads
to take care of the concurrency, which I already knew how to do.

What context are you trying to do this in? Is it inside a rack request
(rails / merb / sinatra /pancake / other)? or is this in a stand alone
script?

Could you perhaps provide a bit of context for what you're trying to achive?

Cheers
Daniel

···

On 18 May 2010 16:56, Daniel DeLorme <dan-ml@dan42.com> wrote:

Daniel N wrote:

On 13 May 2010 17:37, Daniel DeLorme <dan-ml@dan42.com> wrote:

Does anyone know how to do the following, but without threads, purely

with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

Event machine is perfect for this kind of stuff. Weather it fits with the
rest of your web framework is more likely the thing that makes it an
unlikely selection (if you're using anything rack based for example)

If you could show me how to use EventMachine in this case I'd be grateful.
I couldn't figure out how to run compute_lots_of_stuff() while the http
requests are executing.

Brian Candler wrote:

Hmm. Well you can delay reading the body like this:

  http = Net::HTTP.start(...)
  res = http.get(...)
  ... do some stuff
  answer = res.read_body

but it'll wait for the response headers before get() returns. So, you should just pick the bits you need out of /usr/lib/ruby/1.8/net/http.rb directly.

Thanks for your answer. It was a bit more low-level than I would've liked to, but it helped me get the creative juices flowing. In the end my solution involved wrapping the request's socket in a Fiber. Quite a monkeypatch perhaps, but it seems to work:

   class ASync
     class ASocket < BasicObject
       def initialize(socket)
         @socket = socket
       end
       def method_missing(name, *args, &block)
         ::Fiber.yield if name =~ /read/
         @socket.send(name, *args, &block)
       end
     end
     def initialize(uri, headers={})
       uri = URI.parse(uri) unless uri.is_a?(URI)
       @fiber = ::Fiber.new do
         Net::HTTP.start(uri.host, uri.port) do |http|
           http.instance_eval{ @socket = ASocket.new(@socket) }
           @response = http.get(uri.request_uri, headers)
         end
       end
       @fiber.resume #send the request
     end
     def method_missing(*args, &block)
       @fiber.resume until @response
       @response.send(*args, &block)
     end
   end

That's one way of looking at it. However, Erlang ultimately relies on
threads to take care of concurrency as well, however Erlang's concurrency
model can do a lot better job of managing those threads than you ever can.
The same goes with dataflow.

···

On Tue, May 18, 2010 at 1:33 AM, Daniel DeLorme <dan-ml@dan42.com> wrote:

GitHub - larrytheliquid/dataflow: Dataflow concurrency for Ruby (inspired by the Oz language)

Oooh, that's a pretty nifty concept... but again it relies on threads
to take care of the concurrency, which I already knew how to do.

--
Tony Arcieri
Medioh! A Kudelski Brand

Daniel N wrote:

Daniel N wrote:

Does anyone know how to do the following, but without threads, purely

with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

Event machine is perfect for this kind of stuff. Weather it fits with the
rest of your web framework is more likely the thing that makes it an
unlikely selection (if you're using anything rack based for example)

If you could show me how to use EventMachine in this case I'd be grateful.
I couldn't figure out how to run compute_lots_of_stuff() while the http
requests are executing.

What context are you trying to do this in? Is it inside a rack request
(rails / merb / sinatra /pancake / other)? or is this in a stand alone
script?

Could you perhaps provide a bit of context for what you're trying to achive?

Ok, here's the context. I didn't put this in my OP because I figured
it would just bore everybody to tears.

This is inside a rack request. The idea is that I'm assembling a web
page by doing a bunch of sub-requests for the various parts of the
page. So I'll have something like:

   action "index" do
     @news = subreq("http://news.server")
     @ad = subreq("http://ad.server")
     @blog = subreq("http://blog.server")
     @forum = subreq("http://forum.server")
   end

All these sub-requests are launched asynchronously and, while they are
executing, the app generates the layout within which the output of the
subrequests will be embedded. So I'll have something like:

   response = ['<html><body>',
     '<div>',@ad,'</div>',
     '<div>',@news,'</div>',
     '<div>',@blog,'</div>',
     '<div>',@forum,'</div>',
     '</body></html>']

And when rack finally outputs the response to the client it will block
on the various subrequests unless/until they have completed.

What I can't figure out with EventMachine is how to have the "main
thread" generate the layout while the subrequests are executing.

···

On 18 May 2010 16:56, Daniel DeLorme <dan-ml@dan42.com> wrote:

On 13 May 2010 17:37, Daniel DeLorme <dan-ml@dan42.com> wrote:

So you didn't want a Thread, but you'll happily use a Fiber...

···

--
Posted via http://www.ruby-forum.com/.

The problem here is inversion of control. EventMachine inverts control on
you, and it sucks. You can't just do subreq(...) and expect it to return a
value. In the best case, you have subreq call a block when it completes.
The familiar pattern of "call function, get value" no longer applies.

···

On Tue, May 18, 2010 at 7:56 PM, Daniel DeLorme <dan-ml@dan42.com> wrote:

What I can't figure out with EventMachine is how to have the "main
thread" generate the layout while the subrequests are executing.

--
Tony Arcieri
Medioh! A Kudelski Brand

Ok now we're talking. So with rack you can't do true async with a callback.
Rack is callstack based, meaning that you have to return the value as the
response the the .call method on your application. This means that any
callback based async actually needs to block in order for the rack
application you're in to return it's result. You _could_ do it by returning
a custom object in the rack response that renders as much as possible while
it waits for the response, and then renders that when it can, but that
option may not be available depending on what framework you're using.

There are a couple of other things that could help you here that immediately
come to mind.

You can use GitHub - pauldix/typhoeus: Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic. which can fetch all the
resources in parallell, and then block until all the responses come in.
This is probably going to be relatively easy to implement, and means that
the total request time for the resources is only very slightly higher than
the longest single response.

You can use ESI. There's an esi for rack project on github by Joshua Hull
which could be useful to you. GitHub - joshbuddy/esi-for-rack: ESI implementation for Rack You
can also use esi outside of the rack request in apache or nginx, by
responding first with a layout file containing esi tags pointing to the
content to use. Ngins, Apache or the esi rack project can then assemble the
page for you using the resources specified.

Alternatively if you're not married to hard to rack, you can take a look at
something like cramp, or node.js for a true async server environment.

HTH
Daniel

···

On 19 May 2010 11:56, Daniel DeLorme <dan-ml@dan42.com> wrote:

Daniel N wrote:

On 18 May 2010 16:56, Daniel DeLorme <dan-ml@dan42.com> wrote:

Daniel N wrote:

On 13 May 2010 17:37, Daniel DeLorme <dan-ml@dan42.com> wrote:

Does anyone know how to do the following, but without threads, purely

with
asynchronous IO?

website = Thread.new{ Net::HTTP.get(URI.parse(url)) }
template = compute_lots_of_stuff()
puts template.sub("<content goes here>", website.value)

Event machine is perfect for this kind of stuff. Weather it fits with

the
rest of your web framework is more likely the thing that makes it an
unlikely selection (if you're using anything rack based for example)

If you could show me how to use EventMachine in this case I'd be

grateful.
I couldn't figure out how to run compute_lots_of_stuff() while the http
requests are executing.

What context are you trying to do this in? Is it inside a rack request
(rails / merb / sinatra /pancake / other)? or is this in a stand alone
script?

Could you perhaps provide a bit of context for what you're trying to
achive?

Ok, here's the context. I didn't put this in my OP because I figured
it would just bore everybody to tears.

This is inside a rack request. The idea is that I'm assembling a web
page by doing a bunch of sub-requests for the various parts of the
page. So I'll have something like:

action "index" do
   @news = subreq("http://news.server")
   @ad = subreq("http://ad.server")
   @blog = subreq("http://blog.server")
   @forum = subreq("http://forum.server")
end

All these sub-requests are launched asynchronously and, while they are
executing, the app generates the layout within which the output of the
subrequests will be embedded. So I'll have something like:

response = ['<html><body>',
   '<div>',@ad,'</div>',
   '<div>',@news,'</div>',
   '<div>',@blog,'</div>',
   '<div>',@forum,'</div>',
   '</body></html>']

And when rack finally outputs the response to the client it will block
on the various subrequests unless/until they have completed.

What I can't figure out with EventMachine is how to have the "main
thread" generate the layout while the subrequests are executing.

Brian Candler wrote:

So you didn't want a Thread, but you'll happily use a Fiber...

Well, yes, a Fiber is just a coroutine, nothing like a thread.