Asynchronous http POST?

Hey everyone, I'm new to Ruby and to the mailing list, so go easy.
Basically, I have to POST to a certain url, then I wait for a response.
The catch is that I have to do this to two urls at once. Both of them
may respond to me almost instantly, or they may take up to 10 seconds to
respond. I need to have a post to both of these running at all times to
catch incoming events. I will also need to post to other urls at the
same time that these are running. So, I need to find a way to run these
two posts in the background constantly. From what I've read, ruby
threads will hang on a command like this, since the interpreter does not
have control. Can anyone help (or understand) me?

Thanks,
Ivan

···

--
Posted via http://www.ruby-forum.com/.

Ivan-

  This is a perfect job for eventmachine and em-http-request. You can run as many async http requests as you want without blocking and handle the results with callback blocks.

Cheers-

Ezra Zygmuntowicz
ez@engineyard.com

···

On Sep 9, 2009, at 10:15 PM, Ivan Shevanski wrote:

Hey everyone, I'm new to Ruby and to the mailing list, so go easy.
Basically, I have to POST to a certain url, then I wait for a response.
The catch is that I have to do this to two urls at once. Both of them
may respond to me almost instantly, or they may take up to 10 seconds to
respond. I need to have a post to both of these running at all times to
catch incoming events. I will also need to post to other urls at the
same time that these are running. So, I need to find a way to run these
two posts in the background constantly. From what I've read, ruby
threads will hang on a command like this, since the interpreter does not
have control. Can anyone help (or understand) me?

Thanks,
Ivan
--
Posted via http://www.ruby-forum.com/\.

Ezra Zygmuntowicz wrote:

Hey everyone, I'm new to Ruby and to the mailing list, so go easy.
Basically, I have to POST to a certain url, then I wait for a response.
The catch is that I have to do this to two urls at once. Both of them
may respond to me almost instantly, or they may take up to 10 seconds to
respond. I need to have a post to both of these running at all times to
catch incoming events. I will also need to post to other urls at the
same time that these are running. So, I need to find a way to run these
two posts in the background constantly. From what I've read, ruby
threads will hang on a command like this, since the interpreter does not
have control. Can anyone help (or understand) me?

Thanks,
Ivan
--
Posted via http://www.ruby-forum.com/\.

Ivan-

    This is a perfect job for eventmachine and em-http-request. You can run as many async http requests as you want without blocking and handle the results with callback blocks.

In small scale cases (such as a simple client) is there any reason not to use threads? EM just seems like overkill for a fairly simple client.

···

On Sep 9, 2009, at 10:15 PM, Ivan Shevanski wrote:

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Ezra Zygmuntowicz wrote:

···

On Sep 9, 2009, at 10:15 PM, Ivan Shevanski wrote:

these
two posts in the background constantly. From what I've read, ruby
threads will hang on a command like this, since the interpreter does
not
have control. Can anyone help (or understand) me?

Thanks,
Ivan
--
Posted via http://www.ruby-forum.com/\.

Ivan-

  This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and
handle the results with callback blocks.

GitHub - igrigorik/em-http-request: Asynchronous HTTP Client (EventMachine + Ruby)

Cheers-

Ezra Zygmuntowicz
ez@engineyard.com

I couldn't seem to get this running with threads, so I'm trying
eventmachine. I can get a single post to run fine with callback, but
what do I have to do to get continuous posts running? I need to have a
post to the site going at all times, while handling the responses.
Documentation/examples seem very hard to find. A decent em-http-request
tutorial would be great.
--
Posted via http://www.ruby-forum.com/\.

Joel VanderWerf wrote:

Ezra Zygmuntowicz wrote:

two posts in the background constantly. From what I've read, ruby

Ivan-

    This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and handle
the results with callback blocks.

In small scale cases (such as a simple client) is there any reason not
to use threads? EM just seems like overkill for a fairly simple client.

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

···

--
Posted via http://www.ruby-forum.com/\.

Ivan Shevanski wrote:

I couldn't seem to get this running with threads, so I'm trying eventmachine.

I think EM is overkill here. The following example uses PUT not POST,
but I'm sure you'll be able to adapt it.

require 'net/http'
require 'uri'

# Configuration variables:
THREAD_COUNT = 10
REQUESTS_PER_THREAD = 10
FILENAME = 'file_to_put'
URL = 'http://localhost/DropBox/file_to_put'

# Put a data string to the specified url:
def urlput(url, data)
    begin
        uri = URI.parse(url)
        response = nil
        value = nil
        Net::HTTP.start(uri.host) { |http|
            response, value = http.put(uri.path, data, nil)
        }
        p response.message if (response.code.to_i >= 300)
    rescue => e
        p e
    end
    value
end

# Read the file to put:
data = File.new(FILENAME).read

start = Time.now
$threads =
(1..THREAD_COUNT).each {|thread|
    $threads << Thread.new(thread) { |thread_no|
        (1..REQUESTS_PER_THREAD).each {
            urlput(URL, data)
        }
    }
}
$threads.each { |aThread| aThread.join }
puts "#{THREAD_COUNT*REQUESTS_PER_THREAD} requests completed in #{Time.now - start} seconds"

Clifford Heath.

A quick test seems to show that isn't the case. I wrote a simple webrick
servlet that accepts a post request and delays for a specified amount of time
(from the delay parameter to the post), and a client with 2 threads that post
to those URLs and keep track of when things start and end:

delay_servlet.rb:
require 'webrick'
require 'time'

class DelayServlet < WEBrick::HTTPServlet::AbstractServlet
  def do_POST(request, response)
    start_time = Time.now
    delay = 0
    if request.query["delay"]
      delay = request.query["delay"].to_i
    end

    sleep(delay)

    end_time = Time.now
    response.body = "delayed for #{delay}s, started at " +
      "#{start_time.iso8601}, ended at #{end_time.iso8601}\n"
  end
end

if __FILE__ == $0
  server = WEBrick::HTTPServer.new(:Port => 8000)
  server.mount("/", DelayServlet)

  trap("INT") {server.shutdown}
  server.start
end

delay_client.rb:
require 'net/http'
require 'time'

if __FILE__ == $0
  puts "Main thread start at #{Time.now.iso8601}"

  t1 = Thread.new do
    puts "Thread 1 start at #{Time.now.iso8601}"
    res = Net::HTTP.post_form(URI.parse('http://localhost:8000/&#39;\),
                              {'delay'=>'5'})
    puts "Response: " + res.body
    puts "Thread 1 end at #{Time.now.iso8601}"
  end

  t2 = Thread.new do
    puts "Thread 2 start at #{Time.now.iso8601}"
    res = Net::HTTP.post_form(URI.parse('http://localhost:8000/&#39;\),
                              {'delay'=>'7'})
    puts "Response: " + res.body
    puts "Thread 2 end at #{Time.now.iso8601}"
  end

  t1.join
  t2.join
  puts "Main thread end at #{Time.now.iso8601}"
end

Output:
Main thread start at 2009-09-10T16:46:17-04:00
Thread 1 start at 2009-09-10T16:46:17-04:00
Thread 2 start at 2009-09-10T16:46:17-04:00
Response: delayed for 5s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:22-04:00
Thread 1 end at 2009-09-10T16:46:22-04:00
Response: delayed for 7s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:24-04:00
Thread 2 end at 2009-09-10T16:46:24-04:00
Main thread end at 2009-09-10T16:46:24-04:00

So it sure looks like it isn't blocking all threads when waiting for a HTTP
response.

Ben

···

On Thursday 10 September 2009 15:48:56 Ivan Shevanski wrote:

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

In MRI, you can do multiplex I/O across threads, however the code that
implements this will make your eyes bleed (eval.c)

···

On Thu, Sep 10, 2009 at 1:48 PM, Ivan Shevanski <ocelot117@gmail.com> wrote:

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

--
Tony Arcieri
Medioh/Nagravision

I disagree that EM is overkill here. EM is not a heavyweight library and does a *much* better job of this type of http async stuff then threads and net/http does that EM really should be the preferred way of doing something like this.

require 'eventmachine'

def make_request(site='http://www.website.com/&#39;, body={})
   http = EventMachine::HttpRequest.new(site).post :body => body
   http.errback { p 'request failed' }
   http.callback {
     p http.response_header.status
     p http.response_header
     p http.response
   }
end

EM.run do
   # make a request every 1 second
   EM.add_periodic_timer(1) do
     make_request "http://foo.com/&quot;, :param => 'hi', :param2 => 'there'
   end
end

# look ma, no threads but I still get full async network concurrent IO.

Cheers-

Ezra Zygmuntowicz
ez@engineyard.com

···

On Sep 12, 2009, at 9:15 PM, Clifford Heath wrote:

Ivan Shevanski wrote:

I couldn't seem to get this running with threads, so I'm trying eventmachine.

I think EM is overkill here.

Ben Giddings wrote:

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

A quick test seems to show that isn't the case. I wrote a simple
webrick
servlet that accepts a post request and delays for a specified amount of
time
(from the delay parameter to the post), and a client with 2 threads that
post
to those URLs and keep track of when things start and end:

delay_servlet.rb:
require 'webrick'
require 'time'

class DelayServlet < WEBrick::HTTPServlet::AbstractServlet
  def do_POST(request, response)
    start_time = Time.now
    delay = 0
    if request.query["delay"]
      delay = request.query["delay"].to_i
    end

    sleep(delay)

    end_time = Time.now
    response.body = "delayed for #{delay}s, started at " +
      "#{start_time.iso8601}, ended at #{end_time.iso8601}\n"
  end
end

if __FILE__ == $0
  server = WEBrick::HTTPServer.new(:Port => 8000)
  server.mount("/", DelayServlet)

  trap("INT") {server.shutdown}
  server.start
end

delay_client.rb:
require 'net/http'
require 'time'

if __FILE__ == $0
  puts "Main thread start at #{Time.now.iso8601}"

  t1 = Thread.new do
    puts "Thread 1 start at #{Time.now.iso8601}"
    res = Net::HTTP.post_form(URI.parse('http://localhost:8000/&#39;\),
                              {'delay'=>'5'})
    puts "Response: " + res.body
    puts "Thread 1 end at #{Time.now.iso8601}"
  end

  t2 = Thread.new do
    puts "Thread 2 start at #{Time.now.iso8601}"
    res = Net::HTTP.post_form(URI.parse('http://localhost:8000/&#39;\),
                              {'delay'=>'7'})
    puts "Response: " + res.body
    puts "Thread 2 end at #{Time.now.iso8601}"
  end

  t1.join
  t2.join
  puts "Main thread end at #{Time.now.iso8601}"
end

Output:
Main thread start at 2009-09-10T16:46:17-04:00
Thread 1 start at 2009-09-10T16:46:17-04:00
Thread 2 start at 2009-09-10T16:46:17-04:00
Response: delayed for 5s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:22-04:00
Thread 1 end at 2009-09-10T16:46:22-04:00
Response: delayed for 7s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:24-04:00
Thread 2 end at 2009-09-10T16:46:24-04:00
Main thread end at 2009-09-10T16:46:24-04:00

So it sure looks like it isn't blocking all threads when waiting for a
HTTP
response.

Ben

Sure looks like you're right. Here's where I got that idea in my head:

http://www.rubycentral.com/pickaxe/tut_threads.html

"""
Multithreading

Often the simplest way to do two things at once is by using Ruby
threads. These are totally in-process, implemented within the Ruby
interpreter. That makes the Ruby threads completely portable---there is
no reliance on the operating system---but you don't get certain benefits
from having native threads. You may experience thread starvation (that's
where a low-priority thread doesn't get a chance to run). If you manage
to get your threads deadlocked, the whole process may grind to a halt.
(!!!) And if some thread happens to make a call to the operating system
that takes a long time to complete, all threads will hang until the
interpreter gets control back. (!!!) However, don't let these
potential problems put you off---Ruby threads are a lightweight and
efficient way to achieve parallelism in your code.
"""

(Sorry, I'm unsure if I'm allowed to use html tags or anything here, but
I think this will do. Looks like the faq link is broken.) Is this a
blatant lie? Maybe someone can explain to me what is actually being
referred to?

Thanks,
Ivan

···

On Thursday 10 September 2009 15:48:56 Ivan Shevanski wrote:

--
Posted via http://www.ruby-forum.com/\.

I'm not sure why ruby doesn't provide the ability to send the request
without reading the response, but it's fairly trivial to split the
Net::HTTP.request method into two halves to do so, as per below:

require 'net/http'

module Net
  class HTTP < Protocol
    # pasted first half of HTTP.request that writes the request to the
server,
    # does not return HTTPResponse and does not take a block
    def request_async(req, body = nil)
      if proxy_user()
        unless use_ssl?
          req.proxy_basic_auth proxy_user(), proxy_pass()
        end
      end

      req.set_body_internal body
      begin_transport req
      req.exec @socket, @curr_http_version, edit_path(req.path)
    end

    # second half of HTTP.request that yields or returns the response
    def read_response(req, body = nil, &block) # :yield: +response+
      begin
        res = HTTPResponse.read_new(@socket)
      end while res.kind_of?(HTTPContinue)
      res.reading_body(@socket, req.response_body_permitted?) {
        yield res if block_given?
      }
      end_transport req, res

      res
    end
  end
end

# Example usage for a non-blocking GET without following redirects:
http = Net::HTTP.new('www.google.com')
req = Net::HTTP::Get.new('/')
http.start
begin
  http.request_async(req)
  # do other stuff
  res = http.read_response(req)
ensure
  http.finish
end
res.value # raise if error
p res.body

···

--
Posted via http://www.ruby-forum.com/.

Ivan Shevanski wrote:

http://www.rubycentral.com/pickaxe/tut_threads.html

"""
Multithreading

Often the simplest way to do two things at once is by using Ruby threads. These are totally in-process, implemented within the Ruby interpreter. That makes the Ruby threads completely portable---there is no reliance on the operating system---but you don't get certain benefits from having native threads. You may experience thread starvation (that's where a low-priority thread doesn't get a chance to run). If you manage to get your threads deadlocked, the whole process may grind to a halt. (!!!) And if some thread happens to make a call to the operating system that takes a long time to complete, all threads will hang until the interpreter gets control back. (!!!) However, don't let these potential problems put you off---Ruby threads are a lightweight and efficient way to achieve parallelism in your code.
"""

Here's my relatively naive understanding of the situation (for MRI, 1.8):

System calls will block all threads, except in a few cases. The exceptions include:

1. Waiting on IO. Ruby's threads are really an abstraction over a single native thread calling select() on all the file descriptors that the ruby threads are waiting on. When a fd is ready for reading, say, the native thread starts executing the ruby thread that was waiting for that fd.

2. Starting processes and waiting for them to finish. This is why

  Thread.new { system "long-running process" }

is a useful idiom (and it even works on windows).

Still, if you expect a lot of threads, EM will probably be much more efficient instead.

But many other system calls (#flock without the nonblock flag, for example) will block all ruby threads.

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Eric, it's great that you thought about this as I'm currently stuck on
this.

However, your solution won't work. The http.start triggers the
Net::HTTP.start method which can take quite a while to complete. In
fact, it will take much longer than the actual request in cases where
the host wasn't queried for some time (and thus not cached).

···

--
Posted via http://www.ruby-forum.com/.

Ya, obviously this doesn't parallelize the connect, just the request.
Unless you're doing SSL, the only blocking thing Net::HTTP.connect does
is the underlying TCPSocket.open. If that is your bottleneck and you've
already set open_timeout as low as you can go, then you'd have to patch
deeper to get Net::HTTP to use connect_nonblock as per
http://www.ruby-doc.org/core/classes/Socket.html#M002091 instead of
TCPSocket.open

Jaap Haagmans wrote:

···

Eric, it's great that you thought about this as I'm currently stuck on
this.

However, your solution won't work. The http.start triggers the
Net::HTTP.start method which can take quite a while to complete. In
fact, it will take much longer than the actual request in cases where
the host wasn't queried for some time (and thus not cached).

--
Posted via http://www.ruby-forum.com/\.