RCR: unpack/pack Bignum

I’m sure this has been discussed before and maybe there are good reasons
why its not in there (or I’ve really missed something) already but I’d
like to see a simple and fast way to unpack/pack bignums out to/into
"raw" string data.

Ok, you can currently do along the lines of

def pack_int(integer)
["%x" % integer].pack(“H*”)
end

def unpack_int(str)
str.unpack(“H*”).first.hex
end

but its conceptually messy imho and slower than it could be. Also since we
can now pack quad’s ie. numbers that have a binary representation that
fits in 8 bytes I see no reason why we shouldn’t go all the way.

Comments/Ideas/Critique?

Regards,

Robert Feldt

To be correct it should actually be

def pack_int(integer)
hex = “%x” % integer
if (hex.length & 1) == 1
[“0” + hex].pack(“H*”)
else
[hex].pack(“H*”)
end
end

/RF

···

On Tue, 27 May 2003, Robert Feldt wrote:

Ok, you can currently do along the lines of

def pack_int(integer)
[“%x” % integer].pack(“H*”)
end

No one seems to be interested in this issue so I’ll have to reply to
myself… :wink:

No one has pointed out that a clean solution for this is currently
available so I went ahead and implemented it. Below is unit test I used
and one patch for bignum.c and one for pack.c. The patches are taken
against latest nightly snapshot:

$ ruby -v
ruby 1.8.0 (2003-05-27) [i386-mingw32]

It should work for both big and little endian architectures but I’ve only
tried on little endian. Would be great if someone can try on big-endian
machine.

This patch adds a ‘W’ template character to pack and unpack for
packing/unpacking an unsigned integer (Fixnum OR Bignum). The packing
is from MSB to LSB so that

[0xff00].pack(“W”) == “\377\000”

regardless of the endianness of the machine. Leading zeroes are trimmed
from the string (except for negative numbers see below). If you pack a
negative number you loose information about the sign, ie

[-1].pack(“W”).unpack(“W”).first == 1

which is the same as for template ‘I’ but in contrast to template ‘Q’.

I choose W as in “raW binary representation of number” but its hard to
find a good template char since most are taken.

I didn’t implement ‘w’ for dumping negative numbers since I don’t see
the need. However, the implementation hints at one possible way for how
to do ‘w’ (by only allowing negative numbers to have leading zeroes).

If someone finds this worthy/useful its in the public domain so use in
anyway you want. I tried to stay close to the style in Ruby source but I’m
sure the code can be even cleaner/nicer/faster.

Regards,

Robert Feldt

Ps. This post is probably too long; I’m sorry… Maybe ruby-core list is
better for these things? Or just to matz? I’m not fully up-to-date with
community procedures.

----------utest_bignum_pack_unpack.rb------------------------------------
require ‘test/unit’

class TestBignumPackAndUnpack < Test::Unit::TestCase
def test_01_pack_W_one_byte
(0…255).each do |i|
assert_equal(i.chr, [i].pack(“W”))
end
end

def num_with_bytes(bytes)
low_limit = 2**(8 * (bytes-1))
low_limit + rand(-low_limit + 2**(8*bytes))
end

def assert_pack_W_sampled(numBytes, numSamples = 100)
numSamples.times do
num = num_with_bytes(numBytes)
packed = [num].pack(“W”)
assert_equal(numBytes, packed.length, “num = #{num}”)
lsb_first = packed.reverse
numBytes.times do |i|
assert_equal(lsb_first[i], num & 0xff)
num >>= 8
end
end
end

def test_02_pack_W_sampled_positive_multi_bytes
(2…10).each do |num_bytes|
assert_pack_W_sampled(num_bytes, 25)
end
end

def test_03_pack_W_large
p1024 = [21024].pack(“W”)
assert_equal(1.chr + (0.chr*(1024/8)), p1024)
p1024_ones = [2
1024-1].pack(“W”)
assert_equal(0xff.chr * (1024/8), p1024_ones)
p2048 = [22048].pack(“W”)
assert_equal(1.chr + (0.chr*(2048/8)), p2048)
p2048_ones = [2
2048-1].pack(“W”)
assert_equal(0xff.chr * (2048/8), p2048_ones)
end

This might not be what one wants but I think main use is in

converting positive nums so lets leave it as is…

To do ‘w’ we could make sure that negative numbers always

start with leading 0. This way we could later unpack them without

losing the sign.

def test_04_pack_W_negative_numbers
assert_equal("\000\000\000\001", [-1].pack(“W”))
assert_equal("\000\000\000\002", [-2].pack(“W”))
assert_equal("\000\000\000\377", [-255].pack(“W”))
assert_equal("\000\000\377\377", [-216+1].pack(“W”))
assert_equal("\000\377\377\377", [-2
24+1].pack(“W”))
assert_equal("\377\377\377\377", [-232+1].pack(“W”))
assert_equal("\000\000\000\001\000\000\000\000", [-2
32].pack(“W”))
end

def test_05_unpack_W_one_byte
(0…255).each do |i|
assert_equal(i, i.chr.unpack(“W”).first)
end
end

def str_with_bytes(bytes)
s = ""
bytes.times {s << rand(256).chr}
s
end

def test_06_unpack_W_sampled_positive_multi_bytes
(2…10).each do |num_bytes|
25.times do
s = str_with_bytes(num_bytes)
num = s.unpack(“W”).first
lsb_first = s.reverse
num_bytes.times do |i|
assert_equal(lsb_first[i], num & 0xff,
“s = #{s.unpack(‘H*’)}, num = #{num}”)
num >>= 8
end
end
end
end

def test_07_unpack_W_large
u1024 = (1.chr + (0.chr*(1024/8))).unpack(“W”).first
assert_equal(21024, u1024)
u1024_ones = (0xff.chr * (1024/8)).unpack(“W”).first
assert_equal(2
1024-1, u1024_ones)
u2048 = (1.chr + (0.chr*(2048/8))).unpack(“W”).first
assert_equal(22048, u2048)
u2048_ones = (0xff.chr * (2048/8)).unpack(“W”).first
assert_equal(2
2048-1, u2048_ones)
end

def test_08_unpack_W_packed_negative_numbers
assert_equal(1, [-1].pack(“W”).unpack(“W”).first)
assert_equal(2, [-2].pack(“W”).unpack(“W”).first)
assert_equal(255, [-255].pack(“W”).unpack(“W”).first)
assert_equal(216-1, [-216+1].pack(“W”).unpack(“W”).first)
assert_equal(224-1, [-224+1].pack(“W”).unpack(“W”).first)
assert_equal(232-1, [-232+1].pack(“W”).unpack(“W”).first)
assert_equal(232, [-232].pack(“W”).unpack(“W”).first)
end

def test_09_cycle_pack_then_unpack
1000.times do
num = rand(2**200)
assert_equal(num, [num].pack(“W”).unpack(“W”).first)
end
end
end

---------upatch_bignum_c---------------------------------------------
— bignum.c 2003-05-28 23:50:04.000000000 +0200
+++ bignum.c.old 2003-05-28 11:09:30.000000000 +0200
@@ -306,110 +306,6 @@

#endif

-/* We should probably use endian in pack.c instead but I had problems

    • when linking so…
  • */
    -static int
    -big_endian()
    -{
  • static int init = 0;
  • static int big_endian_value;
  • char *p;
···
  • if (init) return big_endian_value;
  • init = 1;
  • p = (char*)&init;
  • return big_endian_value = (p[0]==1)?0:1;
    -}

-/* Pack a nonnegative bignum as raw binary data/bitstring starting from

    • MSB to LSB.
    • Returned data will be multiple of SIZEOF_BDIGITS so there can be up to
    • SIZEOF_BDIGITS-1 leading zeroes.
    • Assumes that val is really a bignum ie. fixnums
    • needs to be converted prior to calling this.
  • */
    -void
    -rb_nonneg_bignum_pack(buf, val)
  • char *buf;
  • VALUE val;
    -{
  • long len, i, j, chars;
  • char *next_digit;
  • len = RBIGNUM(val)->len;
  • next_digit = RBIGNUM(val)->digits + (len * SIZEOF_BDIGITS);
  • if (big_endian()) {
  •    for(i=0; i<len; i++) {
    
  •        next_digit -= SIZEOF_BDIGITS;
    
  •        for(j=0; j<SIZEOF_BDIGITS; j++) {
    
  •       *buf++ = *(next_digit+j);
    
  •   }
    
  • }
  • } else {
  •    for(i=0; i<len; i++) {
    
  •        next_digit -= SIZEOF_BDIGITS;
    
  •        for(j=SIZEOF_BDIGITS-1; j>=0; j--) {
    
  •       *buf++ = *(next_digit+j);
    
  •   }
    
  • }
  • }
    -}

-VALUE
-rb_bignum_unpack(buf, sign, len)

  • const char *buf;
  • int sign;
  • long len;
    -{
  • VALUE big;
  • long num_digits, i, j;
  • char *next_digit;
  • char *extra_digit;
  • long num_full_digits = len / SIZEOF_BDIGITS;
  • int extra_bytes = len % SIZEOF_BDIGITS;
  • num_digits = num_full_digits + (extra_bytes>0 ? 1 : 0);
  • big = bignew(num_digits, 1);
  • extra_digit = next_digit =
  •  (char*)RBIGNUM(big)->digits + num_full_digits * SIZEOF_BDIGITS;
    
  • if (big_endian()) {
  •    if (extra_bytes > 0) {
    
  •   for(i = 0; i < SIZEOF_BDIGITS - extra_bytes; i++) {
    
  •       *extra_digit++ = 0;
    
  •   }
    
  •   for(i = 0; i < extra_bytes; i++) {
    
  •       *extra_digit++ = *buf++;
    
  •   }
    
  •    }
    
  •    for(i = 0; i < num_full_digits; i++) {
    
  •   next_digit -= SIZEOF_BDIGITS;
    
  •   for(j = 0; j < SIZEOF_BDIGITS; j++) {
    
  •       *next_digit++ = *buf++;
    
  •   }
    
  •    }
    
  • } else {
  •    if (extra_bytes > 0) {
    
  •   for(i = extra_bytes - 1; i >= 0 ; i--) {
    
  •       *(extra_digit+i) = *buf++;
    
  •   }
    
  •   extra_digit += extra_bytes;
    
  •   for(i = 0; i < SIZEOF_BDIGITS - extra_bytes; i++) {
    
  •       *extra_digit++ = 0;
    
  •   }
    
  •    }
    
  •    for(i = 0; i < num_full_digits; i++) {
    
  •   next_digit -= SIZEOF_BDIGITS;
    
  •   for(j = SIZEOF_BDIGITS - 1; j >= 0; j--) {
    
  •       *(next_digit+j) = *buf++;
    
  •   }
    
  •    }
    
  • }
  • return bignorm(big);
    -}

VALUE
rb_cstr_to_inum(str, base, badcheck)
const char *str;

--------upatch_pack_c-------------------------------------------------------
— pack.c 2003-05-28 23:55:32.000000000 +0200
+++ pack.c.old 2003-05-28 11:19:21.000000000 +0200
@@ -376,21 +376,6 @@
static int uv_to_utf8 _((char*,unsigned long));
static unsigned long utf8_to_uv _((char*,long*));

-VALUE
-ensure_bignum(val)

  • VALUE val;
    -{
  • if (NIL_P(val)) {
  •    val = INT2FIX(0);
    
  • } else {
  •    val = rb_to_int(val);
    
  • }
  • if (FIXNUM_P(val)) {
  •    val = rb_int2big(FIX2LONG(val));
    
  • }
  • return val;
    -}

static VALUE
pack_pack(ary, fmt)
VALUE ary, fmt;
@@ -683,33 +668,6 @@
}
break;

  • case 'W':
    
  •        while (len-- > 0) {
    
  •       VALUE from;
    
  •            long len;
    
  •   long num_bytes_to_skip = 0;
    
  •   from = ensure_bignum(NEXTFROM);
    
  •            len = RBIGNUM(from)->len * SIZEOF_BDIGITS;
    
  •            {
    
  •       char tmp[len];
    
  •                rb_nonneg_bignum_pack(tmp, from);
    
  •       // Skip leading zeroes if positive bignum. Extend
    
  •       // this "strategy" for 'w' so that only negative
    
  •       // bignums (and 0) can have leading zero?
    
  •       if (RBIGNUM(from)->sign) {
    
  •           while (num_bytes_to_skip < (len-1) &&
    
  •   	       tmp[num_bytes_to_skip] == 0x00) {
    
  •             num_bytes_to_skip++;
    
  •           }
    
  •       }
    
  •       rb_str_buf_cat(res, ((char*)&tmp) + num_bytes_to_skip,
    
  •   		   len - num_bytes_to_skip);
    
  •   }
    
  •   }
    
  •   break;
    
  • case 'n':
      while (len-- > 0) {
      unsigned short s;
    

@@ -1456,11 +1414,6 @@
}
break;

  • case ‘W’:
  • rb_ary_push(ary, rb_bignum_unpack(s, 1, send - s));
    
  • s = send;
    
  • break;
    
  • case 'n':
      PACK_LENGTH_ADJUST(unsigned short,2);
      while (len-- > 0) {
    

No one seems to be interested in this issue so I’ll have to reply to
myself… :wink:

Actually, I am. I am doing packing of 64bits word for SNMP and my solution is
clumsy at best.

No one has pointed out that a clean solution for this is currently
available so I went ahead and implemented it. Below is unit test I used
and one patch for bignum.c and one for pack.c. The patches are taken
against latest nightly snapshot:

$ ruby -v
ruby 1.8.0 (2003-05-27) [i386-mingw32]

It should work for both big and little endian architectures but I’ve only
tried on little endian. Would be great if someone can try on big-endian
machine.

I’ll try on my Mac (should be big-endian, isn’t it?). It may take may a few
days before I get around it though.

Guillaume.

···

On Wednesday 28 May 2003 06:20 pm, you wrote:

This patch adds a ‘W’ template character to pack and unpack for
packing/unpacking an unsigned integer (Fixnum OR Bignum). The packing
is from MSB to LSB so that

[0xff00].pack(“W”) == “\377\000”

regardless of the endianness of the machine. Leading zeroes are trimmed
from the string (except for negative numbers see below). If you pack a
negative number you loose information about the sign, ie

[-1].pack(“W”).unpack(“W”).first == 1

which is the same as for template ‘I’ but in contrast to template ‘Q’.

I choose W as in “raW binary representation of number” but its hard to
find a good template char since most are taken.

I didn’t implement ‘w’ for dumping negative numbers since I don’t see
the need. However, the implementation hints at one possible way for how
to do ‘w’ (by only allowing negative numbers to have leading zeroes).

If someone finds this worthy/useful its in the public domain so use in
anyway you want. I tried to stay close to the style in Ruby source but I’m
sure the code can be even cleaner/nicer/faster.

Regards,

Robert Feldt

Ps. This post is probably too long; I’m sorry… Maybe ruby-core list is
better for these things? Or just to matz? I’m not fully up-to-date with
community procedures.

----------utest_bignum_pack_unpack.rb------------------------------------
require ‘test/unit’

class TestBignumPackAndUnpack < Test::Unit::TestCase
def test_01_pack_W_one_byte
(0…255).each do |i|
assert_equal(i.chr, [i].pack(“W”))
end
end

def num_with_bytes(bytes)
low_limit = 2**(8 * (bytes-1))
low_limit + rand(-low_limit + 2**(8*bytes))
end

def assert_pack_W_sampled(numBytes, numSamples = 100)
numSamples.times do
num = num_with_bytes(numBytes)
packed = [num].pack(“W”)
assert_equal(numBytes, packed.length, “num = #{num}”)
lsb_first = packed.reverse
numBytes.times do |i|
assert_equal(lsb_first[i], num & 0xff)
num >>= 8
end
end
end

def test_02_pack_W_sampled_positive_multi_bytes
(2…10).each do |num_bytes|
assert_pack_W_sampled(num_bytes, 25)
end
end

def test_03_pack_W_large
p1024 = [21024].pack(“W”)
assert_equal(1.chr + (0.chr*(1024/8)), p1024)
p1024_ones = [2
1024-1].pack(“W”)
assert_equal(0xff.chr * (1024/8), p1024_ones)
p2048 = [22048].pack(“W”)
assert_equal(1.chr + (0.chr*(2048/8)), p2048)
p2048_ones = [2
2048-1].pack(“W”)
assert_equal(0xff.chr * (2048/8), p2048_ones)
end

This might not be what one wants but I think main use is in

converting positive nums so lets leave it as is…

To do ‘w’ we could make sure that negative numbers always

start with leading 0. This way we could later unpack them without

losing the sign.

def test_04_pack_W_negative_numbers
assert_equal(“\000\000\000\001”, [-1].pack(“W”))
assert_equal(“\000\000\000\002”, [-2].pack(“W”))
assert_equal(“\000\000\000\377”, [-255].pack(“W”))
assert_equal(“\000\000\377\377”, [-216+1].pack(“W”))
assert_equal(“\000\377\377\377”, [-2
24+1].pack(“W”))
assert_equal(“\377\377\377\377”, [-232+1].pack(“W”))
assert_equal(“\000\000\000\001\000\000\000\000”, [-2
32].pack(“W”))
end

def test_05_unpack_W_one_byte
(0…255).each do |i|
assert_equal(i, i.chr.unpack(“W”).first)
end
end

def str_with_bytes(bytes)
s = “”
bytes.times {s << rand(256).chr}
s
end

def test_06_unpack_W_sampled_positive_multi_bytes
(2…10).each do |num_bytes|
25.times do
s = str_with_bytes(num_bytes)
num = s.unpack(“W”).first
lsb_first = s.reverse
num_bytes.times do |i|
assert_equal(lsb_first[i], num & 0xff,
“s = #{s.unpack(‘H*’)}, num = #{num}”)
num >>= 8
end
end
end
end

def test_07_unpack_W_large
u1024 = (1.chr + (0.chr*(1024/8))).unpack(“W”).first
assert_equal(21024, u1024)
u1024_ones = (0xff.chr * (1024/8)).unpack(“W”).first
assert_equal(2
1024-1, u1024_ones)
u2048 = (1.chr + (0.chr*(2048/8))).unpack(“W”).first
assert_equal(22048, u2048)
u2048_ones = (0xff.chr * (2048/8)).unpack(“W”).first
assert_equal(2
2048-1, u2048_ones)
end

def test_08_unpack_W_packed_negative_numbers
assert_equal(1, [-1].pack(“W”).unpack(“W”).first)
assert_equal(2, [-2].pack(“W”).unpack(“W”).first)
assert_equal(255, [-255].pack(“W”).unpack(“W”).first)
assert_equal(216-1, [-216+1].pack(“W”).unpack(“W”).first)
assert_equal(224-1, [-224+1].pack(“W”).unpack(“W”).first)
assert_equal(232-1, [-232+1].pack(“W”).unpack(“W”).first)
assert_equal(232, [-232].pack(“W”).unpack(“W”).first)
end

def test_09_cycle_pack_then_unpack
1000.times do
num = rand(2**200)
assert_equal(num, [num].pack(“W”).unpack(“W”).first)
end
end
end

---------upatch_bignum_c---------------------------------------------
— bignum.c 2003-05-28 23:50:04.000000000 +0200
+++ bignum.c.old 2003-05-28 11:09:30.000000000 +0200
@@ -306,110 +306,6 @@

#endif

-/* We should probably use endian in pack.c instead but I had problems

    • when linking so…
  • */
    -static int
    -big_endian()
    -{
  • static int init = 0;
  • static int big_endian_value;
  • char *p;
  • if (init) return big_endian_value;
  • init = 1;
  • p = (char*)&init;
  • return big_endian_value = (p[0]==1)?0:1;
    -}

-/* Pack a nonnegative bignum as raw binary data/bitstring starting from

    • MSB to LSB.
    • Returned data will be multiple of SIZEOF_BDIGITS so there can be up to
    • SIZEOF_BDIGITS-1 leading zeroes.
    • Assumes that val is really a bignum ie. fixnums
    • needs to be converted prior to calling this.
  • */
    -void
    -rb_nonneg_bignum_pack(buf, val)
  • char *buf;
  • VALUE val;
    -{
  • long len, i, j, chars;
  • char *next_digit;
  • len = RBIGNUM(val)->len;
  • next_digit = RBIGNUM(val)->digits + (len * SIZEOF_BDIGITS);
  • if (big_endian()) {
  •    for(i=0; i<len; i++) {
    
  •        next_digit -= SIZEOF_BDIGITS;
    
  •        for(j=0; j<SIZEOF_BDIGITS; j++) {
    
  •     *buf++ = *(next_digit+j);
    
  • }
    
  • }
  • } else {
  •    for(i=0; i<len; i++) {
    
  •        next_digit -= SIZEOF_BDIGITS;
    
  •        for(j=SIZEOF_BDIGITS-1; j>=0; j--) {
    
  •     *buf++ = *(next_digit+j);
    
  • }
    
  • }
  • }
    -}

-VALUE
-rb_bignum_unpack(buf, sign, len)

  • const char *buf;
  • int sign;
  • long len;
    -{
  • VALUE big;
  • long num_digits, i, j;
  • char *next_digit;
  • char *extra_digit;
  • long num_full_digits = len / SIZEOF_BDIGITS;
  • int extra_bytes = len % SIZEOF_BDIGITS;
  • num_digits = num_full_digits + (extra_bytes>0 ? 1 : 0);
  • big = bignew(num_digits, 1);
  • extra_digit = next_digit =
  •  (char*)RBIGNUM(big)->digits + num_full_digits * SIZEOF_BDIGITS;
    
  • if (big_endian()) {
  •    if (extra_bytes > 0) {
    
  • for(i = 0; i < SIZEOF_BDIGITS - extra_bytes; i++) {
    
  •     *extra_digit++ = 0;
    
  • }
    
  • for(i = 0; i < extra_bytes; i++) {
    
  •     *extra_digit++ = *buf++;
    
  • }
    
  •    }
    
  •    for(i = 0; i < num_full_digits; i++) {
    
  • next_digit -= SIZEOF_BDIGITS;
    
  • for(j = 0; j < SIZEOF_BDIGITS; j++) {
    
  •     *next_digit++ = *buf++;
    
  • }
    
  •    }
    
  • } else {
  •    if (extra_bytes > 0) {
    
  • for(i = extra_bytes - 1; i >= 0 ; i--) {
    
  •     *(extra_digit+i) = *buf++;
    
  • }
    
  • extra_digit += extra_bytes;
    
  • for(i = 0; i < SIZEOF_BDIGITS - extra_bytes; i++) {
    
  •     *extra_digit++ = 0;
    
  • }
    
  •    }
    
  •    for(i = 0; i < num_full_digits; i++) {
    
  • next_digit -= SIZEOF_BDIGITS;
    
  • for(j = SIZEOF_BDIGITS - 1; j >= 0; j--) {
    
  •     *(next_digit+j) = *buf++;
    
  • }
    
  •    }
    
  • }
  • return bignorm(big);
    -}

VALUE
rb_cstr_to_inum(str, base, badcheck)
const char *str;

--------upatch_pack_c------------------------------------------------------

  • — pack.c 2003-05-28 23:55:32.000000000 +0200
    +++ pack.c.old 2003-05-28 11:19:21.000000000 +0200
    @@ -376,21 +376,6 @@
    static int uv_to_utf8 _((char*,unsigned long));
    static unsigned long utf8_to_uv _((char*,long*));

-VALUE
-ensure_bignum(val)

  • VALUE val;
    -{
  • if (NIL_P(val)) {
  •    val = INT2FIX(0);
    
  • } else {
  •    val = rb_to_int(val);
    
  • }
  • if (FIXNUM_P(val)) {
  •    val = rb_int2big(FIX2LONG(val));
    
  • }
  • return val;
    -}

static VALUE
pack_pack(ary, fmt)
VALUE ary, fmt;
@@ -683,33 +668,6 @@
}
break;

  • case ‘W’:

  •        while (len-- > 0) {
    
  •     VALUE from;
    
  •            long len;
    
  • long num_bytes_to_skip = 0;
    
  • from = ensure_bignum(NEXTFROM);
    
  •            len = RBIGNUM(from)->len * SIZEOF_BDIGITS;
    
  •            {
    
  •     char tmp[len];
    
  •                rb_nonneg_bignum_pack(tmp, from);
    
  •     // Skip leading zeroes if positive bignum. Extend
    
  •     // this "strategy" for 'w' so that only negative
    
  •     // bignums (and 0) can have leading zero?
    
  •     if (RBIGNUM(from)->sign) {
    
  •         while (num_bytes_to_skip < (len-1) &&
    
  • 	       tmp[num_bytes_to_skip] == 0x00) {
    
  •           num_bytes_to_skip++;
    
  •         }
    
  •     }
    
  •     rb_str_buf_cat(res, ((char*)&tmp) + num_bytes_to_skip,
    
  • 		   len - num_bytes_to_skip);
    
  • }
    
  • }
    
  • break;
    
  • case ‘n’:
    while (len-- > 0) {
    unsigned short s;
    @@ -1456,11 +1414,6 @@
    }
    break;

  • case ‘W’:

  • rb_ary_push(ary, rb_bignum_unpack(s, 1, send - s));

  • s = send;

  • break;

  • case ‘n’:
    PACK_LENGTH_ADJUST(unsigned short,2);
    while (len-- > 0) {


Hi,

No one seems to be interested in this issue so I’ll have to reply to
myself… :wink:

I’ve thought about same thing (and Bignum constructor from
binary data), but not found good template characters for them.

It should work for both big and little endian architectures but I’ve only
tried on little endian. Would be great if someone can try on big-endian
machine.

This patch adds a ‘W’ template character to pack and unpack for
packing/unpacking an unsigned integer (Fixnum OR Bignum). The packing
is from MSB to LSB so that

[0xff00].pack(“W”) == “\377\000”

I guess Bignum (un)packing would be used for external data
exchange, so it’d be better to provide each singned/unsigned
and big/little endian conversion, like as ‘l’, ‘L’ and so on.
But ‘w’ is used already.

I choose W as in “raW binary representation of number” but its hard to
find a good template char since most are taken.

Agree.

A nitpicking:

  •            len = RBIGNUM(from)->len * SIZEOF_BDIGITS;
    
  •            {
    
  •     char tmp[len];
    
  •                rb_nonneg_bignum_pack(tmp, from);
    
  •     // Skip leading zeroes if positive bignum. Extend
    
  •     // this "strategy" for 'w' so that only negative
    
  •     // bignums (and 0) can have leading zero?
    

Variable length array and C++ style comment are C99 feature (or
a compiler’s extension).

···

At Thu, 29 May 2003 07:20:04 +0900, Robert Feldt wrote:


Nobu Nakada

It’s variable length, but without carrying a length indication? How would
you unpack such a thing when included in a string with other items; do you
have to add a length indication yourself?

Regards,

Brian.

···

On Thu, May 29, 2003 at 07:20:04AM +0900, Robert Feldt wrote:

This patch adds a ‘W’ template character to pack and unpack for
packing/unpacking an unsigned integer (Fixnum OR Bignum). The packing
is from MSB to LSB so that

[0xff00].pack(“W”) == “\377\000”

Hi,

Hi and thanks for your comments,

I guess Bignum (un)packing would be used for external data
exchange, so it’d be better to provide each singned/unsigned
and big/little endian conversion, like as ‘l’, ‘L’ and so on.
But ‘w’ is used already.

Hm, I’ve envisioned Bignum (un)packing for cryptographic applications
where nonnegative num ↔ MSB-first string is all there is, but maybe
you’re right. But there are very many combinations (signed/unsigned,
big/little endian for full string and for groups of 2/4 bytes etc). Which
ones should be supported?

What is ‘w’ used for? I had missed that one.

A nitpicking:

  •            len = RBIGNUM(from)->len * SIZEOF_BDIGITS;
    
  •            {
    
  •     char tmp[len];
    
  •                rb_nonneg_bignum_pack(tmp, from);
    
  •     // Skip leading zeroes if positive bignum. Extend
    
  •     // this "strategy" for 'w' so that only negative
    
  •     // bignums (and 0) can have leading zero?
    

Variable length array and C++ style comment are C99 feature (or
a compiler’s extension).

Yeah thats right, sorry. Easy to fix if there is interest in this patch. I
also made a late-night mistake of doing the patches the wrong way… :slight_smile:

Regards,

Robert

···

On Thu, 29 May 2003 nobu.nokada@softhome.net wrote:

Great, yes the PowerPC is supposed to be big-endian even though it
supports byte-reversed load and store and little-endian processing
accroding to
http://developer.apple.com/techpubs/hardware/DeviceManagers/pci_srvcs/pci_cards_drivers/PCI_BOOK.24e.html

/Robert

···

On Thu, 29 May 2003, Guillaume Marcais wrote:

It should work for both big and little endian architectures but I’ve only
tried on little endian. Would be great if someone can try on big-endian
machine.

I’ll try on my Mac (should be big-endian, isn’t it?). It may take may a few
days before I get around it though.

In general yes but its not currently implemented, instead unpack(“W”)
eats all chars. IMHO, pack/unpack include two major types of templates, one
group has fixed length and can be used in sequence with each other, the
other group eats as many chars it can while unpacking and thus are
typically used on their own. “m” is an examples of the latter and I
propose we have one for bignums as well. Although, we could do something
like for “A” and have the count be the width.

Maybe I’m mixing different things here, maybe better to have this a a
method on Numeric? Numeric#to_raw_string and Numeric.from_raw_string(str)?

Regards,

Robert

···

On Thu, 29 May 2003, Brian Candler wrote:

On Thu, May 29, 2003 at 07:20:04AM +0900, Robert Feldt wrote:

This patch adds a ‘W’ template character to pack and unpack for
packing/unpacking an unsigned integer (Fixnum OR Bignum). The packing
is from MSB to LSB so that

[0xff00].pack(“W”) == “\377\000”

It’s variable length, but without carrying a length indication? How would
you unpack such a thing when included in a string with other items; do you
have to add a length indication yourself?

Hi,

I guess Bignum (un)packing would be used for external data
exchange, so it’d be better to provide each singned/unsigned
and big/little endian conversion, like as ‘l’, ‘L’ and so on.
But ‘w’ is used already.

Hm, I’ve envisioned Bignum (un)packing for cryptographic applications
where nonnegative num ↔ MSB-first string is all there is, but maybe
you’re right. But there are very many combinations (signed/unsigned,
big/little endian for full string and for groups of 2/4 bytes etc). Which
ones should be supported?

Sorry, there’s no combination of signedness and endianness,
‘N’, ‘n’, ‘V’, ‘v’ are always unsigned.

Rather, what about size specifier extends generic integer
conversion?

[10].pack(“i(8)”) => “\0\0\0\0\0\0\0\012” # big endian
=> “\012\0\0\0\0\0\0\0” # little endian

What is ‘w’ used for? I had missed that one.

BER (Basic Encoding Rules) compression.

···

At Thu, 29 May 2003 14:52:59 +0900, Robert Feldt wrote:

At Thu, 29 May 2003 18:01:41 +0900, Robert Feldt wrote:

Maybe I’m mixing different things here, maybe better to have this a a
method on Numeric? Numeric#to_raw_string and Numeric.from_raw_string(str)?

It would be nice, but perhaps needs better name.


Nobu Nakada

Sorry, there’s no combination of signedness and endianness,
‘N’, ‘n’, ‘V’, ‘v’ are always unsigned.

Rather, what about size specifier extends generic integer
conversion?

[10].pack(“i(8)”) => “\0\0\0\0\0\0\0\012” # big endian
=> “\012\0\0\0\0\0\0\0” # little endian

Sounds like a nice idea. however, for consistency we should also change
‘A10’ to ‘A(10)’? Might not be adopted since it breaks old code?

Maybe I’m mixing different things here, maybe better to have this a a
method on Numeric? Numeric#to_raw_string and Numeric.from_raw_string(str)?

It would be nice, but perhaps needs better name.

Yeah, I’m not sure what to call it really. Should be on Integer and not
Numeric though, sorry.

/Robert

···

On Thu, 29 May 2003 nobu.nokada@softhome.net wrote:

Is it documented anywhere, what this ‘w’ template is useful for?

It appears to be an implementation of the encoding used for individual
integers in an OBJECT IDENTIFIER, from this (Japanese) post:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-list/22463

This seems like an enormously BER specific type of encoding to be taking
up a letter in pack! It’s not even used anywhere else in BER, just OIDs.
You can’t use it for INTEGERs, for example.

Just curious, since I’ve written a BER/SMIME toolkit over the last year.

Thanks,
Sam

Quoteing nobu.nokada@softhome.net, on Thu, May 29, 2003 at 07:02:06PM +0900:

···

What is ‘w’ used for? I had missed that one.

BER (Basic Encoding Rules) compression.

Hi,

···

In message “What is BER compression? (was RCR: unpack/pack Bignum)” on 03/05/31, Sam Roberts sroberts@uniserve.com writes:

Is it documented anywhere, what this ‘w’ template is useful for?

Ask Perl people, it’s there only for Perl pack compatibility.

						matz.

Hi,

···

At Thu, 29 May 2003 19:23:44 +0900, Robert Feldt wrote:

Rather, what about size specifier extends generic integer
conversion?

[10].pack(“i(8)”) => “\0\0\0\0\0\0\0\012” # big endian
=> “\012\0\0\0\0\0\0\0” # little endian

Sounds like a nice idea. however, for consistency we should also change
‘A10’ to ‘A(10)’? Might not be adopted since it breaks old code?

Now ‘A10’ means a 10 bytes string. Even if size specifier is
provided it should not be changed, however, ‘A(10)’ would be
legal and succeeding count also could be allowed; ‘A(10)2’ for
2 strings each consist of 10 bytes.


Nobu Nakada

s/pack/back/

···

On Mon, 2 Jun 2003, Yukihiro Matsumoto wrote:

Ask Perl people, it’s there only for Perl pack compatibility.

It’s an old format for storing indefinite precision integers. I have
a vague memory of this being supported in hardware for a few unusual
machines, and I think that it might’ve been used as a storage format
by some variant of COBOL or Fortran, but I couldn’t get you details.
Google says it’s used as part of the ASN1 stuff as well.

···

At 2:34 AM +0900 6/2/03, Yukihiro Matsumoto wrote:

Hi,

In message “What is BER compression? (was RCR: unpack/pack Bignum)” > on 03/05/31, Sam Roberts sroberts@uniserve.com writes:

Is it documented anywhere, what this ‘w’ template is useful for?

Ask Perl people, it’s there only for Perl pack compatibility.


Dan

--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

Quoteing matz@ruby-lang.org, on Mon, Jun 02, 2003 at 02:34:41AM +0900:

Is it documented anywhere, what this ‘w’ template is useful for?

Ask Perl people, it’s there only for Perl pack compatibility.

Well, I can’t find any reference to its usefulness in Perl, and I don’t
think Larry would have the time to explain it to me.

I can’t find it documented in the Pickaxe, or rubycentral.com/ref/.

When it is documented, I sugest NOT copy and pasting the Perl
documentation, its wrong. And don’t call it “compressed”, it’s
anti-compressed… How about:

‘w’ A variable-length binary encoding of an unsigned integer of
any size. Its format is a sequence of one or more bytes, each of
which provides seven bits of the total value, with the most
significant first. Bit eight of each byte is set, except for the last
byte, in which bit eight is clear.

In case anybody is interested, the format is similar to that defined in
X.690 (ASN.1’s BER encoding), section 8.20 “Encoding of a relative
object identifier value” (I can’t attach the page in pdf, the mailing
list bounced it back to me, but you can google it in seconds).

I didn’t know Perl compatibility was important enough to add stuff like
this! I’m mystified. It sounds like ‘w’ could be used to actually do
something useful, instead of this!

Cheers,
Sam

Ok, so ‘A10’ and ‘A(10)’ means the same thing. Sounds good. So with this
scheme

[1].pack(“N(5)”) # => “\0\0\0\0\001”

but what about

[0x010203].pack(“N(2)”)

? Should it truncate? MSB or LSB?

How can this be used when I have a Bignum that I don’t know how large it
will be? ‘N(*)’?

/Robert

···

On Thu, 29 May 2003 nobu.nokada@softhome.net wrote:

Sounds like a nice idea. however, for consistency we should also change
‘A10’ to ‘A(10)’? Might not be adopted since it breaks old code?

Now ‘A10’ means a 10 bytes string. Even if size specifier is
provided it should not be changed, however, ‘A(10)’ would be
legal and succeeding count also could be allowed; ‘A(10)2’ for
2 strings each consist of 10 bytes.

Hi,

···

In message “Re: What is BER compression? (was RCR: unpack/pack Bignum)” on 03/06/02, wbh W.B.Hill@uea.ac.uk writes:

On Mon, 2 Jun 2003, Yukihiro Matsumoto wrote:

Ask Perl people, it’s there only for Perl pack compatibility.

s/pack/back/

Did I have to say “compatibility with Perl pack function”?

						matz.

Dan Sugalski wrote:

It’s an old format for storing indefinite precision integers.
Google says it’s used as part of the ASN1 stuff as well.

BER = Basic Encoding Rules. The term comes from ISO networking, all
the protocols of which are defined in terms of protocol data units
(messages) defined in ASN.1. ASN.1 then requires choice of one of
the sets of encoding rules (BER, DER, XER, PER, etc) to turn a
possibly large, composite and unknown-length data value into an
unambiguous bitstream. Sort-of like what the funky XMLschema folk
have re-invented, but better - at least it doesn’t cause contortions
like those required for XML signatures :slight_smile:

The BER format described by Sam isn’t BER for an integer, but
is rather an encoding that BER uses internally to encode positive
integers used in OIDs and tag numbers.

BER for an integer includes a tag (0x02), a length, and the data
bytes. The length is either a length byte < 0x80, or the length
of a multi-byte length field (or-ed with 0x80) and that number
of bytes. The signed twos-complement byte-extended integer then
follows. Unsigned integers need a leading zero, so to represent
128 takes two data bytes.

BTW, because BER allows more than one way to encode some things,
which is not good for cryptographic signatures, DER (Distinguished
Encoding Rules) limit things to exactly one alternative in each
case. No contortions required…

PER (Packed Encoding Rules) specify a minimum-possible encoding
length for a given ASN.1, and XER (XML Encoding Rules) specifies
a maximum-length encoding :-))). Or close :stuck_out_tongue:

ASN.1 is not dead. It’s perhaps nobbled though by being too powerful,
because of the preponderance of incompetent, poorly designed and
overpriced tools available for it.

Clifford Heath.