Unsafe readline(), anything better?

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from lacking
this maximum allowing DOS against servers reading HTTP headers, for
example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

   1974 class << HTTPResponse
   1975 def read_new(sock) #:nodoc: internal use only 1976 httpv,
   code, msg = read_status_line(sock) 1977 res =
   response_class(code).new(httpv, code, msg) 1978
   each_response_header(sock) do |k,v| 1979 res.add_field k, v
   1980 end
   1981 res
   1982 end
   1983
   1984 private
   1985
   1986 def read_status_line(sock) 1987 str = sock.readline
   1988 m =
   /\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
   1989 raise HTTPBadResponse, "wrong status line: #{str.dump}"
   1990 m.captures
   1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching each
read for a newline, but that is just unfun. Have I overlooked something
obvious in my search? Hoping to not have to write my own safe/buffered IO
layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

    ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

Thanks,

Rob

Rob Muhlestein wrote:

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from lacking
this maximum allowing DOS against servers reading HTTP headers, for
example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

   1974 class << HTTPResponse
   1975 def read_new(sock) #:nodoc: internal use only 1976 httpv,
   code, msg = read_status_line(sock) 1977 res =
   response_class(code).new(httpv, code, msg) 1978
   each_response_header(sock) do |k,v| 1979 res.add_field k, v
   1980 end
   1981 res
   1982 end
   1983
   1984 private
   1985
   1986 def read_status_line(sock) 1987 str = sock.readline
   1988 m =
   /\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
   1989 raise HTTPBadResponse, "wrong status line: #{str.dump}"
   1990 m.captures
   1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching each
read for a newline, but that is just unfun. Have I overlooked something
obvious in my search? Hoping to not have to write my own safe/buffered IO
layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

    ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

Thanks,

Rob

class File
  def safe_readline(sep_string=$/,maxlen=nil)
    buf_size = 1024
    line = ""
    while !self.eof?
      s = read( [ buf_size, maxlen - line.size ].min )
      line << s
      if i = line.index( sep_string )
        line = line[0,i+sep_string.size]
        return [ line, true ]
      end
      return [ line[0,maxlen], false ] if maxlen &&
        line.size >= maxlen
    end
    [ line, true ]
  end
end

open('junk'){|h| p h.safe_readline("\n",9) }

Hi,

···

In message "Re: unsafe readline(), anything better?" on Fri, 29 Dec 2006 04:05:11 +0900, Rob Muhlestein <rmuhlestein@yahoo.com> writes:

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

   ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

I have added the second optional argument to specify maximum length
limit for gets, readline, readlines, etc. I will commit the change
for HEAD soon. Webrick and others should be updated using the new
feature.

              matz.

Thanks for the attempt, this is very close to the reader-flavor of
classes I've had to write in other langs. But it doesn't play nice
dealing with non-seekable streams like sockets, which can cause some
nasty IO socket blocking (or the nonblocking socket read run around).
For example:

  Header1: something
  Header2: else, followed by blank line
  Content-Length: 55 (or whatever)

  Here is the data portion that could be binary or text.

If I called h.safe_readline("\r\n",1024) and was working with an IO
socket there wouldn't be anything left by the time I want to read
the data. That is, if I ever had a chance to try since the read would
block my proggy from doing anything.

One solution is to create a Reader class and maintain a buffer which
keeps the extra overrun available. I was hoping at the binary level
during the byte read that readline does that it could keep a counter of
the number of bytes read as it reads them and throw or whatever when
maxlen exceeded. Doing a readbyte from most languages at the script/lang
level is usually way to costly, it would probably be too costly to weigh
down read() with a count and check for every byte, but perhaps not in
another safe_readline() native extension.

I do love that Ruby let's me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I'll do and post for review.

Thanks again,

···

On Thu, 28 Dec 2006 12:10:04 -0800, William James wrote:

Rob Muhlestein wrote:

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from
lacking this maximum allowing DOS against servers reading HTTP headers,
for example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

   1974 class << HTTPResponse
   1975 def read_new(sock) #:nodoc: internal use only 1976
   httpv, code, msg = read_status_line(sock) 1977 res =
   response_class(code).new(httpv, code, msg) 1978
   each_response_header(sock) do |k,v| 1979 res.add_field k, v
   1980 end
   1981 res
   1982 end
   1983
   1984 private
   1985
   1986 def read_status_line(sock) 1987 str =
   sock.readline 1988 m =
   /\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
   1989 raise HTTPBadResponse, "wrong status line:
   #{str.dump}" 1990 m.captures
   1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching
each read for a newline, but that is just unfun. Have I overlooked
something obvious in my search? Hoping to not have to write my own
safe/buffered IO layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

    ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong
exception or just return what could be read of the line up to that
point.

Thanks,

Rob

class File
  def safe_readline(sep_string=$/,maxlen=nil)
    buf_size = 1024
    line = ""
    while !self.eof?
      s = read( [ buf_size, maxlen - line.size ].min ) line << s
      if i = line.index( sep_string )
        line = line[0,i+sep_string.size]
        return [ line, true ]
      end
      return [ line[0,maxlen], false ] if maxlen &&
        line.size >= maxlen
    end
    [ line, true ]
  end
end

open('junk'){|h| p h.safe_readline("\n",9) }

--
Rob Muhlestein
http://rob.muhlestein.net

I would also consider the alternative of creating a SafeLineReader that can be stacked on *any* IO object (or rather any object that implements #read with the proper protocol) as input stream much the same way Java streams can be stacked. You gain flexibility but of course you loose the easy "I just need to open a file and it works". I do not have a clear bias either way because I love the simplicity of Ruby's IO classes but on the other hand you can also stuff too much into a single class.

Yet another alternative is to put your code into a Module which you can mix in to any IO object. You can still decide to to that on a general basis for class IO and StringIO.

Just my 0.02EUR...

Kind regards

  robert

···

On 28.12.2006 22:09, Rob Muhlestein wrote:

I do love that Ruby let's me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I'll do and post for review.

Glad to read your post. Couldn't sleep last night thinking through
this. Then this morning found that vulnerability in WEBrick based
on unsafe readlines (actually gets, but same deal). [see WEBrick DOS
Security Flaw thread]

The more I got into adding to IO the more I kept coming back to stack
design that you mention and that I'm familiar with. Mostly there are
a lot of methods in the IO and StringIO classes that would all have to
have equivalents that support the prev_read buffer. Also we face the
default duplex nature of IO and this is only for reads. So I do really
think stacking (the reader) stream is the way to go--especially with
support for things like '<<' in Ruby, which I love.

Maybe we could call it SafeReader so we can have its readline calling
a read_until (like I've done in other langs). In my few remaining
vacation hours, I'll try to write up a spec, perhaps in Rspec and post
for comment.

I'm still somewhat new to this Ruby stuff, but think I can get a gem
started on RubyForge for it, it would be my first. Then if there is enough
demand, we can implement one in C. By the way, anyone know of any gem
naming conventions to distinguish a pure Ruby gem from C extension
version, like Perl's XS suffix convention?

Also on a related topic, is there a convention or other reliable way to
avoid gem library directory and file name collisions? I did notice, for
example, webrick follows the webrick.rb and webrick/ convention as does
rspec and others. Are these names somehow registered publicly to avoid gem
collisions with others? Perhaps Brian or Ryan would care to set us newbies
straight on that. [I did do quite a bit of homework reading
http://rubygems.org particularly section 8. Distributing Gems]

···

On Fri, 29 Dec 2006 10:19:52 +0100, Robert Klemme wrote:

On 28.12.2006 22:09, Rob Muhlestein wrote:

I do love that Ruby let's me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what I'll
do and post for review.

I would also consider the alternative of creating a SafeLineReader that
can be stacked on *any* IO object (or rather any object that implements
#read with the proper protocol) as input stream much the same way Java
streams can be stacked. You gain flexibility but of course you loose the
easy "I just need to open a file and it works". I do not have a clear
bias either way because I love the simplicity of Ruby's IO classes but on
the other hand you can also stuff too much into a single class.

--
Rob Muhlestein
http://rob.muhlestein.net

Humm, from this last recent ruby CVS commit just today makes me wonder if
matz is listening to this thread. If so, matz, thanks for getting me giddy
about programming again, and thanks for taking a shot at addressing this!

matz 2006-12-30 04:21:50 +0900 (Sat, 30 Dec 2006)

  New Revision: 11428

  Modified files:
    trunk/ChangeLog
    trunk/ext/stringio/stringio.c
    trunk/io.c
    trunk/version.h

  Log:
    * ext/stringio/stringio.c (strio_gets): accepts limit argument.
    
    * ext/stringio/stringio.c (strio_readline, strio_each,
      strio_readlines): ditto.
    
    * ext/stringio/stringio.c (strio_getline): add limit capability.
    
    * io.c (rb_io_gets_m): accepts limit argument. [ruby-talk:231563]
    
    * io.c (rb_io_readline, rb_io_readlines, rb_io_each_line,
    argf_getline):
      ditto.
    
    * io.c (appendline): add limit capability.
    
    * io.c (rb_io_getline_fast, rb_io_getline): ditto.
    
    * io.c (rb_io_getline): small refactoring for DRY.
    
    * io.c (rb_io_s_foreach, rb_io_s_readlines): small refactoring.

···

On Fri, 29 Dec 2006 10:49:41 -0500, Rob Muhlestein wrote:

Maybe we could call it SafeReader so we can have its readline calling a
read_until (like I've done in other langs). In my few remaining vacation
hours, I'll try to write up a spec, perhaps in Rspec and post for
comment.

--
Rob Muhlestein
http://rob.muhlestein.net

Hi,

Humm, from this last recent ruby CVS commit just today makes me wonder if
matz is listening to this thread. If so, matz, thanks for getting me giddy
about programming again, and thanks for taking a shot at addressing this!

I am listening. We still need update for Webrick. Here's the my
personal patch (not tested at all).

              matz.

--- a/lib/webrick/httprequest.rb
+++ b/lib/webrick/httprequest.rb
@@ -317,10 +317,10 @@ module WEBrick
       @remaining_size = 0
     end

- def _read_data(io, method, arg)
+ def _read_data(io, method, *arg)
       begin
         WEBrick::Utils.timeout(@config[:RequestTimeout]){
- return io.__send__(method, arg)
+ return io.__send(method, *arg)
         }
       rescue Errno::ECONNRESET
         return nil
@@ -330,7 +330,11 @@ module WEBrick
     end

     def read_line(io)
- _read_data(io, :gets, LF)
+ line = _read_data(io, :gets, [LF, 1024])
+ if line.size == 1024 and line[-1,1] != LF
+ raise HTTPStatus::RequestURITooLarge
+ end
+ line
     end

     def read_data(io, size)

···

In message "Re: unsafe readline(), anything better?" on Sat, 30 Dec 2006 04:55:04 +0900, Rob Muhlestein <rmuhlestein@yahoo.com> writes: