Efficient processing of binary data streams in Ruby?

I'm writing a Ruby program that has to process binary data from files
and sockets. Data items are in bytes, 16-bit words, or 32-bit words,
and I cannot predict in advance whether the data will be msb-first or
lsb-first, so I end up writing things like this:

    def unpack_16(x)
        @msb_first ? ((x[0]<<8)|x[1]) : ((x[1]<<8)|x[0])

    def pack_16(x)
        y = "xx"
        if (@msb_first)
            y[0] = x>>8
            y[1] = x&255
            y[0] = x&255
            y[1] = x>>8

I expect, however, that this will be painfully slow, and I can't
imagine that this hasn't been though of before. Is there a better way
to do this that will result in much better performance?


def unpack_16( str )
  @msb_first ? str.unpack('n') : str.unpack('S')

def pack_16( num )
  @msb_first ? [num].pack('n') : [num].pack('S')

That will work for little-endian processors (Intel) but not for
big-endian processors (PowerPC, Sparc). For these methods to work on
the latter you'll have to do something like this ...

def unpack_16( str )
  str = str.reverse unless @msb_first

def pack_16( num )
  str = [num].pack('n')
  str.reverse unless @msb_first

Just define the desired method based on the processor type -- which
can be figued out by doing this ...

LITTLE_ENDIAN = [42].pack('I')[0] == 42

  # define little endian methods here
  # define big endian methods here

Hope that helps



On 3/8/07, theosib@gmail.com <theosib@gmail.com> wrote:

I'm writing a Ruby program that has to process binary data from files
and sockets. Data items are in bytes, 16-bit words, or 32-bit words,
and I cannot predict in advance whether the data will be msb-first or
lsb-first, so I end up writing things like this:

    def unpack_16(x)
        @msb_first ? ((x[0]<<8)|x[1]) : ((x[1]<<8)|x[0])

    def pack_16(x)
        y = "xx"
        if (@msb_first)
            y[0] = x>>8
            y[1] = x&255
            y[0] = x&255
            y[1] = x>>8

I expect, however, that this will be painfully slow, and I can't
imagine that this hasn't been though of before. Is there a better way
to do this that will result in much better performance?

this will be __extremely__ fast for even huge buffers of data

harp:~ > ruby a.rb
huge(100000) LSB(8) in 0.00117683410644531s
huge(100000) LSB(16) in 0.00181722640991211s
huge(100000) LSB(32) in 0.00884389877319336s
huge(100000) MSB(8) in 0.00245118141174316s
huge(100000) MSB(16) in 0.0045168399810791s
huge(100000) MSB(32) in 0.0078279972076416s

harp:~ > cat a.rb
require 'rubygems'
require 'narray'

module Intification
   LSB = :LSB
   MSB = :MSB
   HOST = [42].pack('i').unpack('c').first == 42 ? LSB : MSB

   def ints bits = 8, order = LSB
     words = bits / 8

     type =
       case bits.to_i
         when 8
         when 16
         when 32
           raise ArgumentError, bits.inspect

     na = NArray.to_na to_s, type, size/words
     order == HOST ? na : na.swap_byte

class String
   include Intification

def bm label
   a = Time.now
   b = Time.now
   puts "#{ label } in #{ b.to_f - a.to_f }s"

n = 100_000

huge = { :LSB => {}, :MSB => {} }

huge[:LSB][8] = [39,40,41,42].pack('c*') * n huge[:LSB][16] = [39,40,41,42].pack('s*') * n huge[:LSB][32] = [39,40,41,42].pack('i*') * n

huge[:MSB][8] = [39,40,41,42].pack('c*') * n huge[:MSB][16] = [39,40,41,42].pack('n*') * n huge[:MSB][32] = [39,40,41,42].pack('N*') * n

[:LSB, :MSB].each do |order|
   [8,16,32].each do |bits|
     bm "huge(#{ n }) #{ order.to_s}(#{ bits })" do
       string = huge[order][bits]
       ints = string.ints(bits, order)
       last = ints[-4..-1]
       raise unless last[0] = 39
       raise unless last[1] = 40
       raise unless last[2] = 41
       raise unless last[3] = 42


if youre on windows i have an narray install



On Fri, 9 Mar 2007, theosib@gmail.com wrote:

I'm writing a Ruby program that has to process binary data from files and
sockets. Data items are in bytes, 16-bit words, or 32-bit words, and I
cannot predict in advance whether the data will be msb-first or lsb-first,
so I end up writing things like this:

   def unpack_16(x)
       @msb_first ? ((x[0]<<8)|x[1]) : ((x[1]<<8)|x[0])

   def pack_16(x)
       y = "xx"
       if (@msb_first)
           y[0] = x>>8
           y[1] = x&255
           y[0] = x&255
           y[1] = x>>8

I expect, however, that this will be painfully slow, and I can't imagine
that this hasn't been though of before. Is there a better way to do this
that will result in much better performance?

be kind whenever possible... it is always possible.
- the dalai lama