Integer to byte string - Speed improvements

I'm writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

I have working code below. I'm happy with it. However, I thought I'd
pose the question to the list: how much faster can you make the below
code run?

# The natural log of 2 (for base-2 logarithms)
Math::LOG2 = Math.log( 2 )

class Integer
  # Converts the integer into a string, where the value of each
character
  # is a byte in the bit representation of the integer.
  # Low-order bytes appear first in the string.

···

#
  # If the number of characters is not specified, the fewest number of
  # characters that can represent the integer is used.
  #
  #
  def to_bytestring( num_chars=nil )
    unless num_chars
      bits_needed = Math.log( self ) / Math::LOG2
      num_chars = ( bits_needed / 8.0 ).ceil
    end
    if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]
      [ self ].pack( pack_code )
    else
      (0...(num_chars)).map{ |i|
        (( self >> i*8 ) & 0xFF ).chr
      }.join
    end
  end

  # Creates an integer from a byte string.
  # See Integer#to_bytestring for more information.
  def self.from_bytestring( str )
    val = 0
    index = 0
    str.each_byte{ |b|
      val += b << 8 * index
      index += 1
    }
    val
  end

end

[
  0x48,
  0x6948,
  0x646c726f57206f6c6c6548,
  0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i| puts i, i.to_bytestring, '' }

#=> 72
#=> H

#=> 26952
#=> Hi

#=> 121404708493354166158910792
#=> Hello World
#=>
3876042760240808297111079005855559646422331404814505579794973210389374306838850
#=> Bignum is a sweet aspect of Ruby!

I forgot to add - I'd also be interested in seeing a golf tournament on
the two methods. Of course whitespace and variable names could be
changed, but I'm interested in more terse, alternative methods of
calculating the same information.

s = Marshal.dump(0xCC8080808080808080808080808080808080808080808080808080808080AA)
bytes = s[5..-2]
bytes.each_byte { |b| print "%02x " % b }
aa 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 cc
(low byte first, as you mentioned)
or if you prefer:
rev_bytes = bytes.reverse
rev_bytes.each_byte { |b| print "%02x " % b }

As far as it being faster, dunno, probably. Here's the benchmark:
irb(main):041:0> Process.times { s=Marshal.dump(0xCC8080808080808080808080808080808080808080808080808080808080AA) }

=> #<struct Struct::Tms utime=1.02, stime=0.54, cutime=0.0, cstime=0.0>

Regards,
   JJ

···

On Thu, 31 Aug 2006 12:20:32 -0400, Phrogz <gavin@refinery.com> wrote:

I'm writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

So, a couple bugs:

Phrogz wrote:

  # Low-order bytes appear first in the string.

Not necessarily

    if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]

in Array.pack, 'S', and 'L', use system-endian order.
To use little endian order use 'v' and 'V', respectively.

Personally, I think it's cleaner to just leave out the clause
altogether.
[...]

    unless num_chars
      bits_needed = Math.log( self ) / Math::LOG2
      num_chars = ( bits_needed / 8.0 ).ceil
    end

You're ignoring a bunch of edge cases here.
0 and 1, for example.

For 0, Math.log(0) == -Infinity, which will throw an error when 'ceil'
is called.
For 1, Math.log(1) == 0, so num_chars == 0, which isn't quite right
either.

You'll run into the same issue at other powers of 256, as well.

What you should actually do, instead of taking the ceiling, is round
down and add one.

You're also ignoring negative integers. Math.log doesn't like negative
numbers either, so you'll need to take the absolute value. This
doesn't solve the problem when you're reading back in, of
differentiating between positive and negative integers that have the
same byte code.

Here's my code. I ignore the positive/negative reading back in
problem.

#!/usr/bin/env ruby -w
# The natural log of 256 (for base-256 logarithms)
Math::LOG256 = Math.log( 256 )

class Integer
  # Converts the integer into a string, where the value of each
  # character is a byte in the bit representation of the integer,
  # low order bytes first.

···

#
  # If the number of characters is not specified, the fewest number of
  # characters that can represent the integer is used.
  def to_bytestring( num_octets=nil )

    # find out how many octets are required to encode it
    num_octets ||=
      (self == 0) ? 1 : ( Math.log(self.abs) / Math::LOG256 ).to_i + 1

    # encode the low num_octets bytes of the integer.
    shifted = (self << 8)
    Array.new(num_octets).map do
      ((shifted >>= 8) & 0xFF).chr
    end.join
  end

  # Creates an integer from a byte string.
  # See Integer#to_bytestring for more information.
  # NOTE: reads in negative integers as positive
  def self.from_bytestring( str )
    str.reverse.split(//).inject(0) do |val, char|
      (val << 8) | char[0]
    end
  end
end

[
  -1,
  0x0,
  0x1,
  255,
  256,
  -255,
  -256,
  0x48,
  26952,
  0x6948,
  0x646c726f57206f6c6c6548,
  0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i|
  begin
    p [i, i.to_bytestring, Integer.from_bytestring(i.to_bytestring)]
  rescue StandardError => err
    p err
  end
}

I'm writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

I have working code below. I'm happy with it. However, I thought I'd
pose the question to the list: how much faster can you make the below
code run?

# The natural log of 2 (for base-2 logarithms)
Math::LOG2 = Math.log( 2 )

No need for these, I'm on a campaign to let integers be integers.

class Integer
  # Converts the integer into a string, where the value of each
character
  # is a byte in the bit representation of the integer.
  # Low-order bytes appear first in the string.
  #
  # If the number of characters is not specified, the fewest number of
  # characters that can represent the integer is used.
  #
  def to_bytestring( num_chars=nil )

replace these four lines:

    unless num_chars
      bits_needed = Math.log( self ) / Math::LOG2

By the way, this won't work for negative integers. Dealing with
negative numbers provides some interesting questions which I'll ignore
for now.

      num_chars = ( bits_needed / 8.0 ).ceil
    end

with

        raise RangeError.new("Can't convert negative integer to bytestring")
        num_chars ||= self.size

    if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]
      [ self ].pack( pack_code )
    else

I might try getting rid of the multiplication

      (0...(num_chars)).map{ |i|
        (( self >> i*8 ) & 0xFF ).chr
      }.join
    end

With something like:

        result = []
        i = self
        until i == 0
              result << (i & 0xFF).chr
               i >>= 8
       end

So following those ideas:
rick@frodo:/public/rubyscripts$ cat tobytes.rb
class Integer
        def to_bytestring(num_chars=nil)
                raise RangeError.new("Can't convert negative number to
bytestring") if self < 0
                num_chars ||= self.size
                mask = 0xFF << (8 * num_chars - 1)
                while (self & mask) == 0
                        mask >>= 8
                        num_chars -= 1
                end

                result = []
                bits_left = self
                until bits_left == 0
                        result << (bits_left & 0xFF).chr
                        bits_left >>= 8
                end
                result.join
        end

        # Creates an integer from a byte string.
        # See Integer#to_bytestring for more information.
        def self.from_bytestring( str )
                val = 0
                index = 0
                str.each_byte{ |b|
                        val += b << 8 * index
                        index += 1
                }
                val
        end

end

[
        0x48,
        0x6948,
        0x646c726f57206f6c6c6548,
        0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i| puts i, i.to_bytestring, '' }

rick@frodo:/public/rubyscripts$ ruby tobytes.rb
72
H

26952
Hi

121404708493354166158910792
Hello World

3876042760240808297111079005855559646422331404814505579794973210389374306838850
Bignum is a sweet aspect of Ruby!

···

On 8/31/06, Phrogz <gavin@refinery.com> wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Phrogz wrote

I forgot to add - I'd also be interested in seeing a golf tournament on
the two methods. Of course whitespace and variable names could be
changed, but I'm interested in more terse, alternative methods of
calculating the same information.

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack('H*').reverse

terse enough? :slight_smile:

cheers

Simon

Noah Easterly wrote:

To use little endian order use 'v' and 'V', respectively.

Ah, thanks.

You're ignoring a bunch of edge cases here.
0 and 1, for example.

[...]

What you should actually do, instead of taking the ceiling, is round
down and add one.

Thanks for pointing that out.

You're also ignoring negative integers.

On purpose, actually. I happen to know in my domain that it would be an
error to hit that case. I prefer to keep my code lean and leave out
various validity checks that also kill the program (albeit in a
slightly cleaner way).

# The natural log of 256 (for base-256 logarithms)
Math::LOG256 = Math.log( 256 )

Ah, tricky! I'm unused to thinking of crazy-high bases. That's a nice
shortcut, thanks.

  def self.from_bytestring( str )
    str.reverse.split(//).inject(0) do |val, char|
      (val << 8) | char[0]
    end

No solution is complete without an inject. :wink:
I wonder (haven't tested yet): is splitting on "" any faster than
splitting on //?
Introducing a string.reverse also feels to me like it would slow things
down. Will need to run some benchmarks.

Thanks for the ideas and code!

"John Johnson" <johnatl@mac.com> writes:

As far as it being faster, dunno, probably. Here's the benchmark:
irb(main):041:0> Process.times {
s=Marshal.dump(0xCC8080808080808080808080808080808080808080808080808080808080AA)
}

=> #<struct Struct::Tms utime=1.02, stime=0.54, cutime=0.0, cstime=0.0>

I at first thought to have missed something, but Process.times doesn't
do what you think it does:

irb(main):005:0> Process.times { p "foo" }
=> #<struct Struct::Tms utime=0.18, stime=0.05, cutime=0.0, cstime=0.0>

The block doesn't even get called.

     Returns a +Tms+ structure (see +Struct::Tms+ on page 388) that
     contains user and system CPU times for this process.
                                        ^^^^^^^^^^^^^^^^

···

  JJ

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Simon Kröger wrote:

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack('H*').reverse

terse enough? :slight_smile:

Wow, that's hot. Thanks :slight_smile:

Got any golf for the reverse? (Creating the number from the string.)

Phrogz wrote:

Simon Kröger wrote:

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack('H*').reverse

terse enough? :slight_smile:

Wow, that's hot. Thanks :slight_smile:

Got any golf for the reverse? (Creating the number from the string.)

Well, just do the reverse:

s = 'Bignum is a sweet aspect of Ruby!'
puts s.reverse.unpack('H*').first.to_i(16)

cheers

Simon