Asynchronous HTTP request

Tony Arcieri wrote:

What I can't figure out with EventMachine is how to have the "main
thread" generate the layout while the subrequests are executing.

The problem here is inversion of control. EventMachine inverts control on
you, and it sucks. You can't just do subreq(...) and expect it to return a
value. In the best case, you have subreq call a block when it completes.
The familiar pattern of "call function, get value" no longer applies.

Sorry if I'm being redundant, but I wanted to point out that EventMachine
*can* support non-inverted control semantics, with a Fiber-based wrapper
layer.

For example, the following is an excerpt from a test for a single-threaded
EventMachine application. Most of the methods below are being invoked on
remote server(s), but nothing ever blocks the thread. (Other Fibers
on the same EM thread will still be responding to UI events, etc.)

  def test_add_documents_to_catalog
    @app.reset_to_init_state

    catalog1_path = @app.testsv_uri + URI.encode("/@default-catalog-path/catalog1")
    catalogs = @app.catalog_manager
    cat = catalogs.open_catalog(catalog1_path, :delete_existing=>true)

    num_docs = cat.query_num_documents
    assert_equal(0, num_docs)

    catalogs.active_catalog = cat
    assert_equal( cat, catalogs.active_catalog )

    doc_paths = imageset_paths(1) + imageset_paths(2)
    per_dir_doc_paths = @app.partition_filelist_per_directory(doc_paths)
    records = @app.fetch_metadata_for_partitioned_filelist(per_dir_doc_paths)
    assert_equal( doc_paths.length, records.length )

    # run the 'store' test twice, to make sure the
    # "INSERT OR REPLACE" is working...
    2.times do
      cat.store_document_metadata(records)
      num_docs = cat.query_num_documents
      num_docs_expected = doc_paths.length
      assert_equal(num_docs_expected, num_docs)
    end

    # try some searches
    paths = cat.search("caption" => "World Series")
    assert_equal( 1, paths.length )
    assert_equal( imageset_paths(2)[0], paths[0] )

    # etc.
  end

Anyway, dunno if this adds anything to the topic. Apologies if not.

Regards,

Bill

···

On Tue, May 18, 2010 at 7:56 PM, Daniel DeLorme <dan-ml@dan42.com> wrote:

Daniel N wrote:

Ok, here's the context. I didn't put this in my OP because I figured
it would just bore everybody to tears.

This is inside a rack request. The idea is that I'm assembling a web
page by doing a bunch of sub-requests for the various parts of the
page. So I'll have something like:

action "index" do
   @news = subreq("http://news.server")
   @ad = subreq("http://ad.server")
   @blog = subreq("http://blog.server")
   @forum = subreq("http://forum.server")
end

All these sub-requests are launched asynchronously and, while they are
executing, the app generates the layout within which the output of the
subrequests will be embedded. So I'll have something like:

response = ['<html><body>',
   '<div>',@ad,'</div>',
   '<div>',@news,'</div>',
   '<div>',@blog,'</div>',
   '<div>',@forum,'</div>',
   '</body></html>']

And when rack finally outputs the response to the client it will block
on the various subrequests unless/until they have completed.

What I can't figure out with EventMachine is how to have the "main
thread" generate the layout while the subrequests are executing.

Ok now we're talking. So with rack you can't do true async with a callback.

Ah, but I never really wanted callbacks; that would be evented IO. The
various approaches to asynchronous IO are not terribly well defined, but
what I meant by nonblocking was simply:
   resource.send_request #=> doesn't block waiting for response!
   resource.get_response
This is not possible with Net::HTTP because those two phases of the http
request are bound into one monolithic get(url) operation.

callback based async actually needs to block in order for the rack
application you're in to return it's result. You _could_ do it by returning
a custom object in the rack response that renders as much as possible while
it waits for the response, and then renders that when it can, but that
option may not be available depending on what framework you're using.

I guess I was not clear enough, but this approach is exactly what I
tried to explain above.

You can use ESI. There's an esi for rack project on github by Joshua Hull
which could be useful to you. GitHub - joshbuddy/esi-for-rack: ESI implementation for Rack You
can also use esi outside of the rack request in apache or nginx, by
responding first with a layout file containing esi tags pointing to the
content to use. Ngins, Apache or the esi rack project can then assemble the
page for you using the resources specified.

Oh wow this was *really* interesting. This sent me on an hours-long
exploration of Varnish+ESI and Nginx+SSI. It turns out that Nginx will
fetch the subrequests in parallel but Varnish (caching proxy) will not.
This is probably fine if most of the subrequests are already cached
(which admittedly is the point of a caching proxy) but if not... Nginx
is the winner.

This opens a lot of possibilities. For example I can imagine serving up
simple "skeleton" pages with heavy caching and then generate all the
user-specific parts through SSI.

···

On 19 May 2010 11:56, Daniel DeLorme <dan-ml@dan42.com> wrote:

Daniel DeLorme wrote:

Brian Candler wrote:

So you didn't want a Thread, but you'll happily use a Fiber...

Well, yes, a Fiber is just a coroutine, nothing like a thread.

Except that the semantics of Threads are well defined. You start them,
they do stuff, you join them.

Are you saying that a Fiber will return control to you when it blocks
due to lack of data on a socket, as well as when the Fiber explicitly
"yields"? What value does it return to you in the blocking case?

Testing suggests otherwise.

$ cat ert.rb
p1, p2 = IO.pipe

f = Fiber.new do
  puts "Starting fiber"
  p1.gets
  puts "Ending fiber"
end

sleep 0.5
puts "Point A"
f.resume
puts "Point B"

$ ruby19 ert.rb
Point A
Starting fiber

As far as I can see: the fiber starts processing when f.resume is
called, but blocks when p1.gets is called.

So AFAICS, your code which thinks it can do work while the the HTTP
request is running, doesn't. Rather, the HTTP request is not sent at all
until the Fiber#resume is called, and at that point it will block as
necessary until the whole response is received.

···

--
Posted via http://www.ruby-forum.com/\.

And I'm pretty sure I was the first person to ever implement a Ruby
Fiber-based wrapper which provides normal flow control semantics on top of
an IoC-driven event-based framework with Revactor, for what it's worth.

Even so, there's been little success in actually applying that to an
asynchronous HTTP framework. Cramp and Rainbows are all that come to mind.
Although Revactor did support concurrent I/O alongside HTTP request
processing with Mongrel.

···

On Tue, May 18, 2010 at 10:26 PM, Bill Kelly <billk@cts.com> wrote:

Sorry if I'm being redundant, but I wanted to point out that EventMachine
*can* support non-inverted control semantics, with a Fiber-based wrapper
layer.

--
Tony Arcieri
Medioh! A Kudelski Brand

Brian Candler wrote:

Are you saying that a Fiber will return control to you when it blocks
due to lack of data on a socket, as well as when the Fiber explicitly
"yields"? What value does it return to you in the blocking case?

As an aside, Fibers can behave in that fashion when used within a
"never block" architecture like EventMachine.

There's also the neverblock library, which employs Fibers similarly:
http://www.espace.com.eg/neverblock

In my case, using a homegrown RPC library with Fibers, on top of
EventMachine, a method call on a 'remote' object suspends the Fiber
until the response is received:

  result = remote_object.fornstaff("dreelsprail")

So, a method call on remote_object explicitly yields (actually,
using Fiber#transfer) behind the scenes so that other fibers may run
in the interim.

It's interesting, so far, as a programming model. Since there's
only a single thread, there's no need for traditional concurrency
primitives like Mutexes.

On the other hand, reentrancy is still an issue. So if one is in
the process of modifying the state of an object, one does indeed
need to be aware when a method call might end up yielding the fiber.

...Highlighting the usefulness of approaches where "variables have
the property that they can only be bound/assigned to once" like
the dataflow library mentioned by Tony Arcieri elsewhere in this
thread.

Regards,

Bill

Brian Candler wrote:

Daniel DeLorme wrote:

Brian Candler wrote:

So you didn't want a Thread, but you'll happily use a Fiber...

Well, yes, a Fiber is just a coroutine, nothing like a thread.

Are you saying that a Fiber will return control to you when it blocks due to lack of data on a socket, as well as when the Fiber explicitly "yields"? What value does it return to you in the blocking case?

Given that I just said a Fiber is nothing like a thread, I'm not sure how you got the idea that I'm saying Fibers behave like threads (yield control on IO)

So AFAICS, your code which thinks it can do work while the the HTTP request is running, doesn't. Rather, the HTTP request is not sent at all until the Fiber#resume is called, and at that point it will block as necessary until the whole response is received.

I didn't post that code without testing it. If you look at it a bit more carefully maybe you'll understand how it works. The HTTP request is sent after the first Fiber#resume but the fiber yields before attempting to read the response.

Tony Arcieri wrote:

···

On Tue, May 18, 2010 at 10:26 PM, Bill Kelly <billk@cts.com> wrote:

Sorry if I'm being redundant, but I wanted to point out that EventMachine
*can* support non-inverted control semantics, with a Fiber-based wrapper
layer.

And I'm pretty sure I was the first person to ever implement a Ruby
Fiber-based wrapper which provides normal flow control semantics on top
of
an IoC-driven event-based framework with Revactor, for what it's worth.

Yeah I was going to mention revactor as well.

There is "async sinatra" if that's any help.

--
Posted via http://www.ruby-forum.com/\.

Daniel DeLorme wrote:

I didn't post that code without testing it. If you look at it a bit more
carefully maybe you'll understand how it works. The HTTP request is sent
after the first Fiber#resume but the fiber yields before attempting to
read the response.

Oh yes, sorry about that. I'd digested one of the method_missing
sections but not the other.

It still seems like unnecessary complexity when ruby threads are cheap,
but it achieves what you want.

···

--
Posted via http://www.ruby-forum.com/\.