Bit Manipulation for a Pure Ruby SHA1 implementation

I'm trying to write a pure Ruby implementation of SHA1. In order to do
so, I need to be able to directly manipulate bits. I know about the
bitwise operators in Ruby, but I'm having issues with the way that
Ruby wraps things up in classes. Basically I've run into two problems.

PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

I've seen the String.unpack and Array.pack used for binary data.
However, I can only get the bits out in one glob str.unpack("B*"), or
str.unpack("B8"*string.length), which still just gives me a bunch of
stringified 8bit binary values. Seems like there should be a better
way.

PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Any help would be greatly appreciated. Thanks in advance.

Hi, if you just need a portable SHA1 implementation, there is already one in
the standard library.
http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

As for your problems, I'm finding myself really confused by your
terminology. For example, when you say "adding a one-bit and k zeros" does
that mean num + 2**k, Or is num <<= 1 ; num += 1 ; num <<= k or something
else even?

It might be more helpful if you show "this is what I have, this is what I
want, this is how I tried to get there"

···

On Mon, Oct 11, 2010 at 11:35 PM, Jason Larsen <jason.larsen@gmail.com>wrote:

I'm trying to write a pure Ruby implementation of SHA1. In order to do
so, I need to be able to directly manipulate bits. I know about the
bitwise operators in Ruby, but I'm having issues with the way that
Ruby wraps things up in classes. Basically I've run into two problems.

PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

I've seen the String.unpack and Array.pack used for binary data.
However, I can only get the bits out in one glob str.unpack("B*"), or
str.unpack("B8"*string.length), which still just gives me a bunch of
stringified 8bit binary values. Seems like there should be a better
way.

PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Any help would be greatly appreciated. Thanks in advance.

I would create a class which is a thin wrapper around a String
instance and probably a length in bits. Then add methods that perform
bit manipulations that you need and use that in your implementation.

# untested
class BitStream
  NULL = "\000".force_encoding "BINARY"

  def initialize(s = "", l = 0)
     @s = s
     @l = l
     @s.force_encoding "BINARY"
  end

  def bit_length; @l; end

  def add_bit(b)
    b = case b
      when false, 0, nil
        0
      else
        1
    end

    self[@l] = b
  end

  def (index)
    raise "Index error" if index < 0 || index >= @l
    byte, off = index.divmod 8
    @s[byte].ord & (1 << off) == 0 ? 0 : 1
  end

  def =(index, bit)
    raise "Index error" if index < 0
    byte, off = index.divmod 8

    while @s.bytesize < byte
      @s << NULL
    end

    x = @s[byte].ord

    @s[byte] = (if bit == 1
      x | (1 << off)
    else
      x & (0xFF ^ (1 << off))
    end).chr
  end
end

Kind regards

robert

···

On Tue, Oct 12, 2010 at 6:35 AM, Jason Larsen <jason.larsen@gmail.com> wrote:

I'm trying to write a pure Ruby implementation of SHA1. In order to do
so, I need to be able to directly manipulate bits. I know about the
bitwise operators in Ruby, but I'm having issues with the way that
Ruby wraps things up in classes. Basically I've run into two problems.

PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

I've seen the String.unpack and Array.pack used for binary data.
However, I can only get the bits out in one glob str.unpack("B*"), or
str.unpack("B8"*string.length), which still just gives me a bunch of
stringified 8bit binary values. Seems like there should be a better
way.

PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Any help would be greatly appreciated. Thanks in advance.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Jason Larsen wrote in post #949355:

PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

A String in ruby is a collection of bytes, not bits.

I think you'll find that as long as your input to your sha1 algorithm is
also a String, and thus a multiple of 8 bits, everything will work fine.

The way the sha1 algorithm is defined it can be implemented for an input
which is not a multiple of 8 bits, but as you've found, it's not easy to
represent that anyway :slight_smile: Therefore, if you want your implementation to
work with strange numbers of bits, you'll first need to choose an input
representation.

If I were you, I'd be happy to limit my implementation to inputs which
are whole bytes. (After all, even a file on the filesystem is a whole
number of bytes)

PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Don't worry about Fixnum versus Bignum, that's internal to the
implementation and mostly hidden. Even on a 32-bit machine you can make
Integers which are 64-bits and above.

Unfortunately, Array#pack("Q") works differently on little-endian and
big-endian machines:

irb(main):001:0> [255].pack("Q")
=> "\377\000\000\000\000\000\000\000"

So for maximum portability I'd do two instances of "N", i.e.

n = 0x0123456789abcdef

=> 81985529216486895

[n >> 32, n].pack("NN")

=> "\001#Eg\211\253\315\357"

which you can check is correct like this:

[n >> 32, n].pack("NN").unpack("H*")

=> ["0123456789abcdef"]

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Sorry, looking back on that I even confused myself. I am aware of the
SHA1 implementation in the standard library, I'm just trying to write
one myself using pure Ruby.

This is what I have so far for padding:

def SHA1 message
    k = 448-((message.bitsize % block_bitsize) + 1)
    m = message.unpack("B*")[0].to_i(2) << (k+65)
    padding = 1
    padding <<= k+64
    padding |= message.bitsize # then add last 64 bit block (message
length) which is the len
    m |= padding # add the padding to the message
    #... split into 512-bit blocks
    #... do SHA1 operations
end

Where bitsize is defined as:
    class String; def bitsize; bytesize * 8; end; end

The method so far seems to be giving me the correct value for m.
However, my problem is with the leading zeros. For example, if the
message is "abc, message.unpack("B*")[0] returns
"011000010110001001100011". When I call to_i(2) on that number, the
leading 0 bit will disappear - which makes sense because to 01 an 1
are both the same number.

However, when I begin to perform sha1 operations on those blocks I
will need that leading 0 back; in the example where message = "abc",
m.to_s(2) or will only give me a 511 digit long output, when I need a
512 bit block. So I need to figure out how to keep those leading zeros
in place so I have the correct block size. Hopefully this makes more
sense.

···

On Oct 11, 11:56 pm, Josh Cheek <josh.ch...@gmail.com> wrote:

[Note: parts of this message were removed to make it a legal post.]

On Mon, Oct 11, 2010 at 11:35 PM,JasonLarsen<jason.lar...@gmail.com>wrote:

> I'm trying to write a pure Ruby implementation of SHA1. In order to do
> so, I need to be able to directly manipulate bits. I know about the
> bitwise operators in Ruby, but I'm having issues with the way that
> Ruby wraps things up in classes. Basically I've run into two problems.

> PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
> certain number of 0s. When appending numbers (e.g. 0b1) to the end of
> by string, Ruby wraps each number in its Fixnum class making it 1 byte
> long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
> that shouldn't be there. The fix I've found is I to tack 0x80 on the
> end, which concats 0b1000_0000. The downside is that I have to keep
> track of my zeros in batches of bytes (which at this point my
> implementation handles alright), but I'd like to know how to do it bit
> by bit if possible.

> I've seen the String.unpack and Array.pack used for binary data.
> However, I can only get the bits out in one glob str.unpack("B*"), or
> str.unpack("B8"*string.length), which still just gives me a bunch of
> stringified 8bit binary values. Seems like there should be a better
> way.

> PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
> bit long integer representing the length of the message being hashed.
> It seem like Fixnum is 8 bytes long on my machine, so the question is
> how do I add a 64bit binary representation of this number to the end
> of a string?

> Any help would be greatly appreciated. Thanks in advance.

Hi, if you just need a portable SHA1 implementation, there is already one in
the standard library.http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

As for your problems, I'm finding myself really confused by your
terminology. For example, when you say "adding a one-bit and k zeros" does
that mean num + 2**k, Or is num <<= 1 ; num += 1 ; num <<= k or something
else even?

It might be more helpful if you show "this is what I have, this is what I
want, this is how I tried to get there"

> [Note: parts of this message were removed to make it a legal post.]

> > I'm trying to write a pure Ruby implementation of SHA1. In order to do
> > so, I need to be able to directly manipulate bits. I know about the
> > bitwise operators in Ruby, but I'm having issues with the way that
> > Ruby wraps things up in classes. Basically I've run into two problems.

> > PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
> > certain number of 0s. When appending numbers (e.g. 0b1) to the end of
> > by string, Ruby wraps each number in its Fixnum class making it 1 byte
> > long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
> > that shouldn't be there. The fix I've found is I to tack 0x80 on the
> > end, which concats 0b1000_0000. The downside is that I have to keep
> > track of my zeros in batches of bytes (which at this point my
> > implementation handles alright), but I'd like to know how to do it bit
> > by bit if possible.

> > I've seen the String.unpack and Array.pack used for binary data.
> > However, I can only get the bits out in one glob str.unpack("B*"), or
> > str.unpack("B8"*string.length), which still just gives me a bunch of
> > stringified 8bit binary values. Seems like there should be a better
> > way.

> > PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
> > bit long integer representing the length of the message being hashed.
> > It seem like Fixnum is 8 bytes long on my machine, so the question is
> > how do I add a 64bit binary representation of this number to the end
> > of a string?

> > Any help would be greatly appreciated. Thanks in advance.

> Hi, if you just need a portable SHA1 implementation, there is already one in
> the standard library.http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

> As for your problems, I'm finding myself really confused by your
> terminology. For example, when you say "adding a one-bit and k zeros" does
> that mean num + 2**k, Or is num <<= 1 ; num += 1 ; num <<= k or something
> else even?

> It might be more helpful if you show "this is what I have, this is what I
> want, this is how I tried to get there"

Sorry, looking back on that I even confused myself. I am aware of the
SHA1 implementation in the standard library, I'm just trying to write
one myself using pure Ruby.

This is what I have so far for padding:

def SHA1 message
k = 448-((message.bitsize % block_bitsize) + 1)

      # EDIT
      # I think I need this here actually for certain cases
      k+=512 if k < 0

···

On Oct 12, 1:38 pm, Jarsen <jason.lar...@gmail.com> wrote:

On Oct 11, 11:56 pm, Josh Cheek <josh.ch...@gmail.com> wrote:
> On Mon, Oct 11, 2010 at 11:35 PM,JasonLarsen<jason.lar...@gmail.com>wrote:

m = message\.unpack\(&quot;B\*&quot;\)\[0\]\.to\_i\(2\) &lt;&lt; \(k\+65\)
padding = 1
padding &lt;&lt;= k\+64
padding |= message\.bitsize \# then add last 64 bit block \(message

length) which is the len
m |= padding # add the padding to the message
#... split into 512-bit blocks
#... do SHA1 operations
end

Where bitsize is defined as:
class String; def bitsize; bytesize * 8; end; end

The method so far seems to be giving me the correct value for m.
However, my problem is with the leading zeros. For example, if the
message is "abc, message.unpack("B*")[0] returns
"011000010110001001100011". When I call to_i(2) on that number, the
leading 0 bit will disappear - which makes sense because to 01 an 1
are both the same number.

However, when I begin to perform sha1 operations on those blocks I
will need that leading 0 back; in the example where message = "abc",
m.to_s(2) or will only give me a 511 digit long output, when I need a
512 bit block. So I need to figure out how to keep those leading zeros
in place so I have the correct block size. Hopefully this makes more
sense.