Recalculate http content length

Hi,

  I'm trying to re-calculate the content length of HTTP packages,

but it seems that I get wrong values (mostly too small)

  This is the code -->

https://gist.github.com/bararchy/00b50eca203f0999a79e

  def self.rewrite_content_length(data)

    new_size = data[:http_body].bytesize

    if data[:http_headers].match(/Content-Length: (.*?)\r?\n/i)

      data[:http_headers].gsub!(/Content-Length: (.*?)\r?\n/i,

“Content-Length: #{new_size}\n”)

    else

      data[:http_headers].strip!

      data[:http_headers] << "\nContent-Length:

#{new_size}\n\n"

    end

    data

  end

  Isn't .bytesize is the way to go for this ?

···


**Safe-T.com

            Bar Hofesh**

            Information

Security Architect

          Support: (IL)1700700139, 927-9-8666110(ext 231)

          Haatzmaut

40 St, first floor.

          Beer-Sheva
          84150, Israel

          [www.Safe-T.com](http://www.Safe-T.com)

It's right, at least it should be right... but why are you stripping out
the "\r" characters? They're required according to the HTTP specs.

···

On 11 May 2015 at 21:45, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

Hi,

I'm trying to re-calculate the content length of HTTP packages, but it
seems that I get wrong values (mostly too small)
This is the code --> Rewrite · GitHub

def self.rewrite_content_length(data)
  new_size = data[:http_body].bytesize
  if data[:http_headers].match(/Content-Length: (.*?)\r?\n/i)
    data[:http_headers].gsub!(/Content-Length: (.*?)\r?\n/i,
"Content-Length: #{new_size}\n")
  else
    data[:http_headers].strip!
    data[:http_headers] << "\nContent-Length: #{new_size}\n\n"
  end
  data
end

Isn't .bytesize is the way to go for this ?
--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

My other question is: if there's no Content-Length header, are you checking
for Transfer-Encoding? If you're receiving data, and it doesn't have a
specified length, there's a good chance the server is sending it in chunks.
In that case you could have only part of the final response inside
data[:http_body]

Check this wikipedia article for a starting point:

···

On 11 May 2015 at 21:45, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

Hi,

I'm trying to re-calculate the content length of HTTP packages, but it
seems that I get wrong values (mostly too small)
This is the code --> Rewrite · GitHub

def self.rewrite_content_length(data)
  new_size = data[:http_body].bytesize
  if data[:http_headers].match(/Content-Length: (.*?)\r?\n/i)
    data[:http_headers].gsub!(/Content-Length: (.*?)\r?\n/i,
"Content-Length: #{new_size}\n")
  else
    data[:http_headers].strip!
    data[:http_headers] << "\nContent-Length: #{new_size}\n\n"
  end
  data
end

Isn't .bytesize is the way to go for this ?
--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Well, TBH both \r\n and \n works the same, so I
just prefer to stick to one thing along my code :slight_smile:

  I seems that the bytesize issue arises when UTF8 chars or binary

data is used.

···

On 05/11/2015 02:53 PM, Matthew Kerwin wrote:

      It's right, at

least it should be right… but why are you stripping out the
“\r” characters? They’re required according to the HTTP specs.

On 11 May 2015 at 21:45, Bar Hofesh bar.hofesh@safe-t.com > wrote:

Hi,

            I'm trying to re-calculate the content length of HTTP

packages, but it seems that I get wrong values (mostly
too small)

            This is the code --> [https://gist.github.com/bararchy/00b50eca203f0999a79e](https://gist.github.com/bararchy/00b50eca203f0999a79e)



            def self.rewrite_content_length(data)

              new_size = data[:http_body].bytesize

              if data[:http_headers].match(/Content-Length:

(.*?)\r?\n/i)

                data[:http_headers].gsub!(/Content-Length:

(.*?)\r?\n/i, “Content-Length: #{new_size}\n”)

              else

                data[:http_headers].strip!

                data[:http_headers] << "\nContent-Length:

#{new_size}\n\n"

              end

              data

            end

**Safe-T.com

                          Bar Hofesh**





                          Information

Security Architect

                        Support: (IL)1700700139, 927-9-8666110(ext
                        Haatzmaut

40 St, first floor.

                        Beer-Sheva
                        84150, Israel

                        [www.Safe-T.com](http://www.Safe-T.com)
            Isn't .bytesize is the way to go for this ?

             --


Matthew Kerwin

          [http://matthew.kerwin.net.au/](http://matthew.kerwin.net.au/)


**Safe-T.com

            Bar Hofesh**

            Information

Security Architect

          Support: (IL)1700700139, 927-9-8666110(ext 231)

          Haatzmaut

40 St, first floor.

          Beer-Sheva
          84150, Israel

          [www.Safe-T.com](http://www.Safe-T.com)

Yeha, this is a specific part of the code, if
the header is telling me the data will be chunked, I’ll just
buffer it, remove the “chunked” headers and recalculate the
Content-Length of the whole thing

···

On 05/11/2015 02:56 PM, Matthew Kerwin wrote:

      My other

question is: if there’s no Content-Length header, are you
checking for Transfer-Encoding? If you’re receiving data, and
it doesn’t have a specified length, there’s a good chance the
server is sending it in chunks. In that case you could have
only part of the final response inside data[:http_body]

      Check

this wikipedia article for a starting point: http://en.wikipedia.org/wiki/Chunked_transfer_encoding

On 11 May 2015 at 21:45, Bar Hofesh bar.hofesh@safe-t.com > wrote:

Hi,

            I'm trying to re-calculate the content length of HTTP

packages, but it seems that I get wrong values (mostly
too small)

            This is the code --> [https://gist.github.com/bararchy/00b50eca203f0999a79e](https://gist.github.com/bararchy/00b50eca203f0999a79e)



            def self.rewrite_content_length(data)

              new_size = data[:http_body].bytesize

              if data[:http_headers].match(/Content-Length:

(.*?)\r?\n/i)

                data[:http_headers].gsub!(/Content-Length:

(.*?)\r?\n/i, “Content-Length: #{new_size}\n”)

              else

                data[:http_headers].strip!

                data[:http_headers] << "\nContent-Length:

#{new_size}\n\n"

              end

              data

            end

            Isn't .bytesize is the way to go for this ?

             --


Matthew Kerwin

          [http://matthew.kerwin.net.au/](http://matthew.kerwin.net.au/)

If you're sticking to one thing, it should be the one in the spec.

···

On 11 May 2015 at 21:55, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

Well, TBH both \r\n and \n works the same, so I just prefer to stick to
one thing along my code :slight_smile:
I seems that the bytesize issue arises when UTF8 chars or binary data is
used.

On 05/11/2015 02:53 PM, Matthew Kerwin wrote:

It's right, at least it should be right... but why are you stripping out
the "\r" characters? They're required according to the HTTP specs.

On 11 May 2015 at 21:45, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

Hi,

I'm trying to re-calculate the content length of HTTP packages, but it
seems that I get wrong values (mostly too small)
This is the code -->
Rewrite · GitHub

def self.rewrite_content_length(data)
  new_size = data[:http_body].bytesize
  if data[:http_headers].match(/Content-Length: (.*?)\r?\n/i)
    data[:http_headers].gsub!(/Content-Length: (.*?)\r?\n/i,
"Content-Length: #{new_size}\n")
  else
    data[:http_headers].strip!
    data[:http_headers] << "\nContent-Length: #{new_size}\n\n"
  end
  data
end

Isn't .bytesize is the way to go for this ?
--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
   Matthew Kerwin
  http://matthew.kerwin.net.au/

--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

“tolerance provision” in Section
19.3
(note that it re-iterates the correct
sequence):

    The line terminator for message-header fields is the sequence

CRLF. However, we recommend that applications, when parsing such
headers, recognize a single LF as a line terminator and ignore
the leading CR.

But, I guess you are right, I’ll change to \r\n

···

On 05/11/2015 02:56 PM, Matthew Kerwin wrote:

      If you're

sticking to one thing, it should be the one in the spec.

On 11 May 2015 at 21:55, Bar Hofesh bar.hofesh@safe-t.com > wrote:

            Well, TBH both \r\n and \n works the

same, so I just prefer to stick to one thing along my
code :slight_smile:

            I seems that the bytesize issue arises when UTF8 chars

or binary data is used.

On 05/11/2015 02:53 PM, Matthew Kerwin wrote:

                    It's

right, at least it should be right… but why
are you stripping out the “\r” characters?
They’re required according to the HTTP specs.

                    On 11 May 2015 at 21:45, > > > Bar Hofesh <bar.hofesh@safe-t.com> > > >                         wrote:

Hi,

                          I'm trying to re-calculate the content

length of HTTP packages, but it seems that
I get wrong values (mostly too small)

                          This is the code --> [https://gist.github.com/bararchy/00b50eca203f0999a79e](https://gist.github.com/bararchy/00b50eca203f0999a79e)



                          def self.rewrite_content_length(data)

                            new_size = data[:http_body].bytesize

                            if

data[:http_headers].match(/Content-Length:
(.*?)\r?\n/i)

data[:http_headers].gsub!(/Content-Length:
(.*?)\r?\n/i, “Content-Length:
#{new_size}\n”)

                            else

                              data[:http_headers].strip!

                              data[:http_headers] <<

“\nContent-Length: #{new_size}\n\n”

                            end

                            data

                          end

**Safe-T.com

                                        Bar Hofesh**






                                        Information

Security Architect

                                      Support: (IL)1700700139,

927-9-8666110(ext 231)

                                      Haatzmaut

40 St, first floor.

                                      Beer-Sheva
                                      84150, Israel

                                      [www.Safe-T.com](http://www.Safe-T.com)
                          Isn't .bytesize is the way to go for this

?

                           --


Matthew Kerwin

                        [http://matthew.kerwin.net.au/](http://matthew.kerwin.net.au/)


**Safe-T.com

                          Bar Hofesh**





                          Information

Security Architect

                        Support: (IL)1700700139, 927-9-8666110(ext
                        Haatzmaut

40 St, first floor.

                        Beer-Sheva
                        84150, Israel

                        [www.Safe-T.com](http://www.Safe-T.com)


Matthew Kerwin

          [http://matthew.kerwin.net.au/](http://matthew.kerwin.net.au/)


**Safe-T.com

            Bar Hofesh**

            Information

Security Architect

          Support: (IL)1700700139, 927-9-8666110(ext 231)

          Haatzmaut

40 St, first floor.

          Beer-Sheva
          84150, Israel

          [www.Safe-T.com](http://www.Safe-T.com)

Just for future reference, RFC 2616 has been replaced, now it's the RFC
7230 family of specs that define HTTP/1.1.

For example: RFC 7230 - Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing which now says:

    "Although the line terminator for the start-line and header fields is
     the sequence CRLF, a recipient MAY recognize a single LF as a line
     terminator and ignore any preceding CR."

The same thing, but with different emphasis.

Anyway, back on topic, I really don't know why String#bytesize would be
miscounting the number of bytes in a binary (or UTF-8) string. Are you sure
it's getting it wrong? Have you tried dumping the bytes and comparing them
with your replaced header? I use something like this:

    class String
      def hexdump
        puts each_byte.to_a.map{|b|'%02X' % b}.each_slice(16).map{|s|s.join
' '}
      end
    end

... although it also relies on the 'byte' methods of String. You could use
string.unpack('C*') instead of each_byte.to_a if you don't trust it.

···

On 11 May 2015 at 22:00, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

"tolerance provision" in Section 19.3
<http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3&gt; (note
that it re-iterates the *correct* sequence):

The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.

But, I guess you are right, I'll change to \r\n

On 05/11/2015 02:56 PM, Matthew Kerwin wrote:

If you're sticking to one thing, it should be the one in the spec.

On 11 May 2015 at 21:55, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

Well, TBH both \r\n and \n works the same, so I just prefer to stick to
one thing along my code :slight_smile:
I seems that the bytesize issue arises when UTF8 chars or binary data is
used.

On 05/11/2015 02:53 PM, Matthew Kerwin wrote:

It's right, at least it should be right... but why are you stripping
out the "\r" characters? They're required according to the HTTP specs.

On 11 May 2015 at 21:45, Bar Hofesh <bar.hofesh@safe-t.com> wrote:

Hi,

I'm trying to re-calculate the content length of HTTP packages, but it
seems that I get wrong values (mostly too small)
This is the code -->
Rewrite · GitHub

def self.rewrite_content_length(data)
  new_size = data[:http_body].bytesize
  if data[:http_headers].match(/Content-Length: (.*?)\r?\n/i)
    data[:http_headers].gsub!(/Content-Length: (.*?)\r?\n/i,
"Content-Length: #{new_size}\n")
  else
    data[:http_headers].strip!
    data[:http_headers] << "\nContent-Length: #{new_size}\n\n"
  end
  data
end

Isn't .bytesize is the way to go for this ?
--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
   Matthew Kerwin
  http://matthew.kerwin.net.au/

--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
   Matthew Kerwin
  http://matthew.kerwin.net.au/

--

*[image: Safe-T.com] Bar Hofesh*
Information Security Architect
Support: (IL)1700700139, 927-9-8666110(ext 231)
Haatzmaut 40 St, first floor.
Beer-Sheva 84150, Israel
www.Safe-T.com

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/