Potential bug in Net::HTTP, and tentative patch

Hi all,

I am building a nice little system using Ruby (on Rails), one part of
which uses Net::HTTP to retrieve some data over HTTP. Everything seems
to work fine, but on some requests, I get an EOFError.

As I found out, this problem has already been reported, but without any
answer. See:
http://rubyforge.org/forum/forum.php?thread_id=28826&forum_id=6052

I think I may have traced the problem back to a bug in Net::HTTP. Here
is a two-liner to reproduce the error:

$ irb
>> require 'net/http'
=> true
>> res = Net::HTTP.get_response(URI.parse('http://snapcasa.com/get.aspx?code=1000&size=m&url=www.google.com'))
EOFError: end of file reached
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2236:in `read_chunked'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2216:in `read_body_0'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2182:in `read_body'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2207:in `body'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2146:in `reading_body'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1061:in `request'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:957:in `request_get'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:380:in `get_response'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:547:in `start'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:379:in `get_response'
    from (irb):2>>

My quick analysis:

In http.rb, line 2236 the "read_chunked" function calls
@socket.readline. This "readline" function, in protocol.rb line 126,
calls readuntil("\n"). This works fine if the data chunk is
"\n-terminated", but throws an EOFError if it is not.

I am not sure about the underlying standards of chunked http, maybe the
data chunk is supposed to always been \n-terminated, and it may be a
mis-behaving server, but the fact is: I can get the example image fine
with any browser, but not with Net::HTTP.

As a tentative fix, I wrote a patch that catches the EOFError in
read_chunked. You will the patch file attached. With the patch, things
work fine:

$ irb
>> require 'net/http'
=> true
>> res = Net::HTTP.get_response(URI.parse('http://snapcasa.com/get.aspx?code=1000&size=m&url=www.google.com'))
=> #<Net::HTTPOK 200 OK readbody=true>
>> File.open('test.jpg','w').write res.body
=> 2881
>>

With the patch, the above gives me a perfectly fine JPEG file. However,
I am afraid my current patch, with a big begin...rescue around most of
the body of the read_chunked function, catches the EOFError at level
higher than necessary, which is not good practice...

Anyway, before continuing any further, could someone involved in the
development of Ruby take a look at this, confirm the existence of the
bug, and maybe even come up with a better fix?

PS: please let me know if I posted this in the wrong list, or if I
should open a bug report on some bug tracking system.

Thank you for your help.

read_chunked_eoferror_tentative_fix.patch (1.05 KB)

···

--
Yves-Eric

Hi,

At Mon, 13 Apr 2009 12:40:32 +0900,
Yves-Eric Martin wrote in [ruby-talk:333704]:

In http.rb, line 2236 the "read_chunked" function calls
@socket.readline. This "readline" function, in protocol.rb line 126,
calls readuntil("\n"). This works fine if the data chunk is
"\n-terminated", but throws an EOFError if it is not.

Not "\n-terminated".

According to RFC2616 and RFC2068, chunks consist from
chunk-size and chunk-body, and the chunked-body is terminated
by "0" size chunk.

That is, the response doesn't seem to follow the RFCs.

···

--
Nobu Nakada

Thank you for pointing me to the RFC. Indeed, the response does not
seem RFC-compliant...

Other than my quick and dirty patch, is there a way to tell Net::HTTP
to ignore the EOFError and accept non-compliant input? Again, the point
is that an image, which displays fine in Internet Explorer, Firefox and
Safari, cannot be downloaded with Net::HTTP. While I understand the
"not RFC-compliant" argument, for practical reasons, it does seem a bit
limiting...

Thank you,

PS: I will also contact the administrator of the problem site regarding this
RFC compliance issue.

···

--
Yves-Eric

Nobuyoshi Nakada wrote:

Hi,

At Mon, 13 Apr 2009 12:40:32 +0900,
Yves-Eric Martin wrote in [ruby-talk:333704]:
  

In http.rb, line 2236 the "read_chunked" function calls
@socket.readline. This "readline" function, in protocol.rb line 126,
calls readuntil("\n"). This works fine if the data chunk is
"\n-terminated", but throws an EOFError if it is not.
    
Not "\n-terminated".

According to RFC2616 and RFC2068, chunks consist from
chunk-size and chunk-body, and the chunked-body is terminated
by "0" size chunk.

That is, the response doesn't seem to follow the RFCs.

Hi,

At Tue, 14 Apr 2009 12:33:01 +0900,
Yves-Eric Martin wrote in [ruby-talk:333820]:

Other than my quick and dirty patch, is there a way to tell Net::HTTP
to ignore the EOFError and accept non-compliant input? Again, the point
is that an image, which displays fine in Internet Explorer, Firefox and
Safari, cannot be downloaded with Net::HTTP. While I understand the
"not RFC-compliant" argument, for practical reasons, it does seem a bit
limiting...

See rdoc of Net::HTTPResponse#read_body and
Net::HTTP#request_get.

  out = "" # or open(destfile, "wb")
  begin
    Net::HTTP.get_response(uri) do |res|
      res.read_body {|s| out << s}
    end
  rescue EOFError
  end

···

--
Nobu Nakada

Works like a charm!

Thank you Nobu for your great help. I owe you a beer.

···

--
Yves-Eric

Nobuyoshi Nakada wrote:

See rdoc of Net::HTTPResponse#read_body and
Net::HTTP#request_get.

  out = "" # or open(destfile, "wb")
  begin
    Net::HTTP.get_response(uri) do |res|
      res.read_body {|s| out << s}
    end
  rescue EOFError
  end