How can I parse binary files?

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

Header (36 bytes):
- Version (4 byte unsigned integer) currently 1
- UIDValidity (4 byte unsigned integer)
- UIDNext (4 byte unsigned integer)
- Last Write Counter (4 byte unsigned integer)
- the rest unused

Message data (36 bytes per message):
- Filename (23 bytes including terminating NUL character)
- Flags (1 byte bitmask)
- UID (4 byte unsigned integer)
- Message size (4 byte unsigned integer)
- Date (4 byte time_t value)

Flags mask is 1:Recent, 2:Draft, 4:Deleted, 8:Flagged, 16:Answered,
32:Seen.

···

--
Posted via http://www.ruby-forum.com/.

String#unpack.

···

On 17/07/06, Fabio Vitale <fabio@sferaconsulting.it> wrote:

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

Fabio Vitale <fabio@sferaconsulting.it> writes:

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

In addition to parsing this yourself using ruby's String#unpack
method, you should also look at the BitStruct extension available at
http://redshift.sourceforge.net/bit-struct/

(And found via http://raa.ruby-lang.org/ by doing a search on
"binary")

Am I the only one who thinks that ruby-forum.com should include in a
prominent place pointers to standard ruby documentation, and to the
Ruby Application Archive? I don't object to people posting to the
list via the web form at ruby-forum.com, but I think that a prominent
display of common sources of information would help everyone.

Daniel Martin wrote:

Fabio Vitale <fabio@sferaconsulting.it> writes:

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

In addition to parsing this yourself using ruby's String#unpack
method, you should also look at the BitStruct extension available at
BitStruct

I've found bit-struct very intresting, anyway I cannot figure how to
load a binary file in a newly created bit-structure.
Any help appreciated.

Say I've an imap.mrk binary file,
I've defined class MRK as follows:

require 'bit-struct'

  class MRK < BitStruct
    unsigned :version, 4, "Version"
    unsigned :uid_Validity, 4, "UIDValidity"
    unsigned :uid_next, 4, "UIDNext"
    unsigned :last_write_counter, 4, "LastWriteCounter"
    rest :unused, "Unused"
  end

  mrk = MRK.new

And now: how to populate the mrk instance just created from the imap.mrk
binary file?

Thank you

···

--
Posted via http://www.ruby-forum.com/\.

without even looking at the docs i'd guess you could do

   data = IO.read 'your.data'

   mrk = MRK.new data

and, indeed, this seems to work:

   harp:~ > cat a.rb
   require 'bit-struct'

   class C < BitStruct
     unsigned :a, 16
     unsigned :b, 16
     unsigned :c, 16
   end

   c = C.new 'a' => 42

   p c

   buf = c.to_s

   p buf

   c = C.new buf

   p c.a

   harp:~ > ruby a.rb
   #<C a=42, b=0, c=0>
   "\000*\000\000\000\000"
   42

incidentally, you are probably going to want

   class MRK < BitStruct
     unsigned :version, 32, "Version"
     unsigned :uid_Validity, 32, "UIDValidity"
     unsigned :uid_next, 32, "UIDNext"
     unsigned :last_write_counter, 32, "LastWriteCounter"
     rest :unused, "Unused"
   end

the field size declares the number of __bits__ not __bytes__.

   bit-struct

regards.

-a

···

On Tue, 18 Jul 2006, Fabio Vitale wrote:

Daniel Martin wrote:

Fabio Vitale <fabio@sferaconsulting.it> writes:

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

In addition to parsing this yourself using ruby's String#unpack
method, you should also look at the BitStruct extension available at
BitStruct

I've found bit-struct very intresting, anyway I cannot figure how to
load a binary file in a newly created bit-structure.
Any help appreciated.

Say I've an imap.mrk binary file,
I've defined class MRK as follows:

require 'bit-struct'

class MRK < BitStruct
   unsigned :version, 4, "Version"
   unsigned :uid_Validity, 4, "UIDValidity"
   unsigned :uid_next, 4, "UIDNext"
   unsigned :last_write_counter, 4, "LastWriteCounter"
   rest :unused, "Unused"
end

mrk = MRK.new

And now: how to populate the mrk instance just created from the imap.mrk
binary file?

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

Fabio Vitale <fabio@sferaconsulting.it> writes:

And now: how to populate the mrk instance just created from the imap.mrk
binary file?

First off, the other message's advice about your field sizes should be
taken (you want to use "32", not "4"). Also, you almost certainly
want to add :endian => :native to your structure. Finally, you'll
want to adjust the bit_length method of your MRKHeader class since it
won't construct the appropriate length just from the field info.

  class MRKHeader < BitStruct
    unsigned :version, 32, "Version", :endian => :native
    unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
    unsigned :uid_next, 32, "UIDNext", :endian => :native
    unsigned :last_write_counter, 32, "LastWriteCounter", :endian => :native
    rest :unused, "Unused"
    def MRKHeader.bit_length
      super
      36*8
    end
  end

Okay, now let's assume that you also define the per-message structure
using BitStruct as MRKMessage. (For the message code, you don't need
to redefine bit_length since it can be computed straight from the
fields. Do however use the endianness option on all the integers)

Then:

File.open("imap.mrk") {|f|
  head_string = f.read(MRKHeader.round_byte_length)
  raise "No header!" unless head_string
  mrk_header = MRKHeader.new(head_string)
  puts mrk_header.inspect
  while msg_string = f.read(MRKMessage.round_byte_length) do
    puts MRKMessage.new(msg_string)
  end
}

This looks like a nice way.
I just wanted to show that in such a simple case unpack isn't that ugly, too.

open('file.bin', 'rb').do |f|
  version, uidValid, uidNext, lwCounter = f.read(36).unpack('IIII')
  name, flags, uid, size, date = f.read(36).unpack('Z23CIII')

  #do something
end

This is of course untested because i don't have such a file, but i hope
the idea is clear.

cheers

Simon

···

ara.t.howard@noaa.gov wrote:

On Tue, 18 Jul 2006, Fabio Vitale wrote:

Daniel Martin wrote:

Fabio Vitale <fabio@sferaconsulting.it> writes:

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

require 'bit-struct'

class MRK < BitStruct
   unsigned :version, 4, "Version"
   unsigned :uid_Validity, 4, "UIDValidity"
   unsigned :uid_next, 4, "UIDNext"
   unsigned :last_write_counter, 4, "LastWriteCounter"
   rest :unused, "Unused"
end

mrk = MRK.new

And now: how to populate the mrk instance just created from the imap.mrk
binary file?

without even looking at the docs i'd guess you could do

  data = IO.read 'your.data'

  mrk = MRK.new data

and, indeed, this seems to work:

[snip]

Daniel Martin <martind@martinhouse.internal> writes:

Then:

File.open("imap.mrk") {|f|
  head_string = f.read(MRKHeader.round_byte_length)
  raise "No header!" unless head_string
  mrk_header = MRKHeader.new(head_string)
  puts mrk_header.inspect
  while msg_string = f.read(MRKMessage.round_byte_length) do
    puts MRKMessage.new(msg_string)
  end
}

I forgot to open the file in binary mode, and forgot an inspect call.
I should have said:

File.open("imap.mrk", "rb") {|f|
  head_string = f.read(MRKHeader.round_byte_length)
  raise "No header!" unless head_string
  mrk_header = MRKHeader.new(head_string)
  puts mrk_header.inspect
  while msg_string = f.read(MRKMessage.round_byte_length) do
    puts MRKMessage.new(msg_string).inspect
  end
}

Daniel Martin wrote:

Daniel Martin <martind@martinhouse.internal> writes:

require 'bit-struct'

class MRKHeader < BitStruct
    unsigned :version, 32, "Version", :endian => :native
    unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
    unsigned :uid_next, 32, "UIDNext", :endian => :native
    unsigned :last_write_counter, 32, "LastWriteCounter", :endian =>
:native
    rest :unused, "Unused"
    def MRKHeader.bit_length
      super
      36*8
    end
end

File.open("imap.mrk", "rb") {|f|
  head_string = f.read(MRKHeader.round_byte_length)
  raise "No header!" unless head_string
  mrk_header = MRKHeader.new(head_string)
  puts mrk_header.inspect
  while msg_string = f.read(MRKMessage.round_byte_length) do
    puts MRKMessage.new(msg_string).inspect
  end
}

Now an error is raised:

ruby b.rb

#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872, unused="">
b.rb:19: uninitialized constant MRKMessage (NameError)
  from b.rb:14

Exit code: 1

Also the problem is that there is to process the Message data structure:
how can I accomplish this?
Thank you all very much for the help!

···

--
Posted via http://www.ruby-forum.com/\.

Fabio Vitale wrote:

This is the structure of class MRKMessage:

Message data (36 bytes per message):
- Filename (23 bytes including terminating NUL character)
- Flags (1 byte bitmask)
- UID (4 byte unsigned integer)
- Message size (4 byte unsigned integer)
- Date (4 byte time_t value)

Flags mask is 1:Recent, 2:Draft, 4:Deleted, 8:Flagged, 16:Answered,
32:Seen.

Now 3 major questions:

Q 1: what type must I declare for Filename in the class MRKMessage?

Q 2: what type must I declare for Flags in the class MRKMessage?

Q 3: what type must I declare for Date in the class MRKMessage?

...and 2 minor ones :-))

Q 4: How to decode Flags?

Q 5: How to decode Date?

BIG BIG THANKS TO ALL!

···

------------
require 'bit-struct'
class MRKHeader < BitStruct
    unsigned :version, 32, "Version", :endian => :native
    unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
    unsigned :uid_next, 32, "UIDNext", :endian => :native
    unsigned :last_write_counter, 32, "LastWriteCounter", :endian =>
:native
    rest :unused, "Unused"
    def MRKHeader.bit_length
      super
      36*8
    end
end

class MRKMessage < BitStruct
    char :filename, 184, "FileName", :endian => :native
    unsigned :flags, 8, "Flags", :endian => :native
    unsigned :uid, 32, "UID", :endian => :native
    unsigned :msg_size, 32, "MsgSize", :endian => :native
    unsigned :date, 32, "Date", :endian => :native
    def MRKMessage.bit_length
      super
      36*8
    end
end

File.open("imap.mrk", "rb") {|f|
  head_string = f.read(MRKHeader.round_byte_length)
  raise "No header!" unless head_string
  mrk_header = MRKHeader.new(head_string)
  puts mrk_header.inspect
  while msg_string = f.read(MRKMessage.round_byte_length) do
    puts MRKMessage.new(msg_string).inspect
  end
}

This now generates:

ruby b.rb

#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872, unused="">
#<MRKMessage
filename="\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\nmd5",
flags=48, uid=808464432, msg_size=942814256, date=1936535094>
#<MRKMessage
filename="g\000\000\000\000\000\00006\020\000\000\374P\000\000k\353\246Cmd5",
flags=48, uid=808464432, msg_size=858993712, date=1936535091>
#<MRKMessage filename="g\000\000\000\000\000\000
e\020\000\000\334\226\003\000X\373\253Cmd5", flags=48, uid=808464432,
msg_size=858993712, date=1936535092>

--
Posted via http://www.ruby-forum.com/\.

Fabio Vitale <fabio@sferaconsulting.it> writes:

Now 3 major questions:

Q 1: what type must I declare for Filename in the class MRKMessage?

Okay, first off I apologize but I lead you astray. Apparently it's
not enough to override bit_length in your subclass. When you read the
file, you're not getting the stuff lined up properly. Therefore I've
decided to make up for it by finishing the rest of your code for you.

Note that now I override round_byte_length instead, and we get:

require 'bit-struct'
class MRKHeader < BitStruct
    unsigned :version, 32, "Version", :endian => :native
    unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
    unsigned :uid_next, 32, "UIDNext", :endian => :native
    unsigned :last_write_counter, 32, "LastWriteCounter", :endian => :native
    rest :unused, "Unused"
    # Override so that it gets padded properly
    def MRKHeader.round_byte_length
      super
      36
    end
end

# Ideally, I'd construct some sort of "flags" bit-struct field
# Or define a boolean field type and make this a series of boolean
# fields.

# However, for now we can deal with a series of 0s and 1s

class MRKMessageFlags < BitStruct
  unsigned :flagUnused, 2, "Unused"
  unsigned :flagSeen, 1, "Seen"
  unsigned :flagAnswered, 1, "Answered"
  unsigned :flagFlagged, 1, "Flagged"
  unsigned :flagDeleted, 1, "Deleted"
  unsigned :flagDraft, 1, "Draft"
  unsigned :flagRecent, 1, "Recent"
end

class MRKMessage < BitStruct
  # Note "text" for nul-terminated strings
  text :filename, 23*8, "FileName", :endian => :native
  nest :flags, MRKMessageFlags, "Flags"
  unsigned :uid, 32, "UID", :endian => :native
  unsigned :msg_size, 32, "MsgSize", :endian => :native
  unsigned :date, 32, "Date", :endian => :native

  # Now we futz with the way that date is set and gotten.
  # we rename the existing date field to __date, and
  # then we supply our own meaning for "date" that does
  # translation into and out of seconds-since-1970

  # Again, the ideal solution would be to define a new bit-struct
  # field type that did this stuff itself.

  alias_method :__date=, :date=
  alias_method :__date, :date
  def date=(time)
    self.__date= time.to_i
  end
  def date
    Time.at(self.__date)
  end
  # we don't need to override the length computation here
end

File.open("imap.mrk", "rb") {|f|
  head_string = f.read(MRKHeader.round_byte_length)
  raise "No header!" unless head_string
  mrk_header = MRKHeader.new(head_string)
  puts mrk_header.inspect
  while msg_string = f.read(MRKMessage.round_byte_length) do
    puts MRKMessage.new(msg_string).inspect
  end
}

__END__

This produces (on the first bit from your file):

#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872,
unused="\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\n">
#<MRKMessage filename="md50000004286.msg", flags=#<MRKMessageFlags
flagUnused=0, flagSeen=1, flagAnswered=1, flagFlagged=0,
flagDeleted=0, flagDraft=0, flagRecent=0>, uid=4150, msg_size=20732,
date=Mon Dec 19 12:18:35 Eastern Standard Time 2005>

This is more what you expected, right?

Fabio Vitale wrote:
...

This now generates:

>ruby b.rb
#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872, unused="">
#<MRKMessage
filename="\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\nmd5",
flags=48, uid=808464432, msg_size=942814256, date=1936535094>
#<MRKMessage
filename="g\000\000\000\000\000\00006\020\000\000\374P\000\000k\353\246Cmd5",
flags=48, uid=808464432, msg_size=858993712, date=1936535091>
#<MRKMessage filename="g\000\000\000\000\000\000
e\020\000\000\334\226\003\000X\373\253Cmd5", flags=48, uid=808464432,
msg_size=858993712, date=1936535092>

...

Looks like you need to investigate #unpack.

In any case, Ruby Facets has a BinaryReader mixin
(http://facets.rubyforge.org/api/more/classes/BinaryReader.html\) that
does the reading and upacking for you. Just mix it into File (or a
subclass of File) and you should be good to go...

Cheers
Chris

Daniel Martin wrote:
Martin Thank you very much: you solved my problem like a charm!

···

--
Posted via http://www.ruby-forum.com/.

"ChrisH" <chris.hulan@gmail.com> writes:

Looks like you need to investigate #unpack.

In any case, Ruby Facets has a BinaryReader mixin
(http://facets.rubyforge.org/api/more/classes/BinaryReader.html\) that
does the reading and upacking for you. Just mix it into File (or a
subclass of File) and you should be good to go...

That's fine if you want to pull out each field in succession yourself,
but BitStruct provides much more than that, by providing a DSL for
packed-bit structures. Also, if you see my reply, you'll notice that
he was in fact very close to getting what he wanted.

Actually, going through this exercise has pointed up some features
that I would like to add to BitStruct, since in many cases it almost
but not quite completely was exactly what the poster wanted. It would
be nice to have an easy, obvious, and supported way to define extra
padding in a structure (as we needed to here). It would be nice to
have an easier, supported syntax for reading a structure from a binary
file. Finally, it would be nice to allow an easy way to define data
wrappers, as was done with the date property.

Hi, all. Sorry to respond so late to this thread. Thanks to Daniel for his excellent and thorough responses!

Just one comment below on Daniel's solution...

Daniel Martin wrote:

require 'bit-struct'
class MRKHeader < BitStruct
    unsigned :version, 32, "Version", :endian => :native
    unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
    unsigned :uid_next, 32, "UIDNext", :endian => :native
    unsigned :last_write_counter, 32, "LastWriteCounter", :endian => :native
    rest :unused, "Unused"
    # Override so that it gets padded properly
    def MRKHeader.round_byte_length
      super
      36
    end
end

I'd suggest defining a fixed-length "unused" field instead of a "rest" field. The rest construct is better for variable length data, such as a payload at the end of a packet.

So instead of

       rest :unused, "Unused"

you can consume the bytes with

       char :unused, (36*8-4*32), "Unused"

Then you don't have to worry about whether overriding #round_byte_length is the right thing to do or not.

The disadvantage to doing it this way is that inspect will print out stuff you don't care about. Solving this problem gets to the suggestions Daniel made in another post, so I'll respond separately.

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Daniel Martin wrote:
...

> In any case, Ruby Facets has a BinaryReader mixin
> (http://facets.rubyforge.org/api/more/classes/BinaryReader.html\) that
> does the reading and upacking for you. Just mix it into File (or a
> subclass of File) and you should be good to go...

That's fine if you want to pull out each field in succession yourself,
but BitStruct provides much more than that, by providing a DSL for
packed-bit structures. Also, if you see my reply, you'll notice that
he was in fact very close to getting what he wanted.

Actually, going through this exercise has pointed up some features
that I would like to add to BitStruct, since in many cases it almost
but not quite completely was exactly what the poster wanted. It would
be nice to have an easy, obvious, and supported way to define extra
padding in a structure (as we needed to here). It would be nice to
have an easier, supported syntax for reading a structure from a binary
file. Finally, it would be nice to allow an easy way to define data
wrappers, as was done with the date property.

My ref to BinaryReader wasn't a slight against BitStruct, I haven't
used either
so can't comment. Just found it and figured I'd throw it into the mix.

Just occurred to me that combining BitStruct with BinaryReader and
maybe StringIO could produce a nice BinaryIO class/module?

BTW, I noticed that all the fields in the BitStruct had endianess
specified.
Is there a way to set the endianess for the whole structure? Would you

have a strucutre with mixed endianess?

Nice work
Chris

Daniel Martin wrote:

Actually, going through this exercise has pointed up some features
that I would like to add to BitStruct, since in many cases it almost
but not quite completely was exactly what the poster wanted. It would
be nice to have an easy, obvious, and supported way to define extra
padding in a structure (as we needed to here). It would be nice to
have an easier, supported syntax for reading a structure from a binary
file. Finally, it would be nice to allow an easy way to define data
wrappers, as was done with the date property.

1. Padding.

You can use "char" fields, but that puts extra junk in the inspect output. I'll add an "ignore" or "pad" field type in the next release, which will behave like char but will not define accessors and will not show up in inspect. (Pad fields will have to show up in #to_s output, in order to preserve alignment, of course.)

2. Reading BitStructs from a file.

I'm not sure there's an easier way to do it than in your code, Daniel, but I'm open to suggestions. All of ruby's IO (including sockets) uses Strings, so we always have to read a String and then construct a BitStruct from that string using BitStruct.new.

3. Defining data wrappers.

Hm.... I've needed that too. I'll think about it.

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

This time I'm trying to write a binary file.

Q1: why does the structure MRKMessageFlags does not get the apropriate
values?

Q2: how do I convert a date to seconds-since-1970?

require 'bit-struct'
class MRKHeader < BitStruct
    unsigned :version, 32, "Version", :endian =>
:native
    unsigned :uid_Validity, 32, "UIDValidity", :endian =>
:native
    unsigned :uid_next, 32, "UIDNext", :endian =>
:native
    unsigned :last_write_counter, 32, "LastWriteCounter", :endian =>
:native
    rest :unused, "Unused"
    # Override so that it gets padded properly
    def MRKHeader.round_byte_length
      super
      36
    end
end

# Ideally, I'd construct some sort of "flags" bit-struct field
# Or define a boolean field type and make this a series of boolean
# fields.

# However, for now we can deal with a series of 0s and 1s

class MRKMessageFlags < BitStruct
  unsigned :flagUnused, 2, "Unused"
  unsigned :flagSeen, 1, "Seen"
  unsigned :flagAnswered, 1, "Answered"
  unsigned :flagFlagged, 1, "Flagged"
  unsigned :flagDeleted, 1, "Deleted"
  unsigned :flagDraft, 1, "Draft"
  unsigned :flagRecent, 1, "Recent"
end

class MRKMessage < BitStruct
  # Note "text" for nul-terminated strings
  text :filename, 23*8, "FileName", :endian => :native
  nest :flags, MRKMessageFlags, "Flags"
  unsigned :uid, 32, "UID", :endian => :native
  unsigned :msg_size, 32, "MsgSize", :endian => :native
  unsigned :date, 32, "Date", :endian => :native

  # Now we futz with the way that date is set and gotten.
  # we rename the existing date field to __date, and
  # then we supply our own meaning for "date" that does
  # translation into and out of seconds-since-1970

  # Again, the ideal solution would be to define a new bit-struct
  # field type that did this stuff itself.

  alias_method :__date=, :date=
  alias_method :__date, :date
  def date=(time)
    self.__date= time.to_i
  end
  def date
    Time.at(self.__date)
  end
  # we don't need to override the length computation here
end

File.open("imap2.mrk", "wb") {|f|
#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5887,
last_write_counter=9962,
unused="\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\n">

  mrk_header = MRKHeader.new()
  mrk_header.version = 1
  mrk_header.uid_Validity = 1106138982
  mrk_header.uid_next = 5887
  mrk_header.last_write_counter = 9962
  mrk_header.unused =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\n"
  puts mrk_header.inspect

  msg = MRKMessage.new()
  msg.filename = "md50000006021.msg"

  msg.flags.flagSeen = 1
  msg.flags.flagAnswered = 0
  msg.flags.flagFlagged = 1
  msg.flags.flagDeleted = 0
  msg.flags.flagDraft = 0
  msg.flags.flagRecent = 0
  puts msg.flags.inspect

  msg.uid = 5885
  msg.msg_size = 4184
  msg.date = "Mon Jul 24 12:34:04 2006"

  puts MRKMessage.new(msg).inspect
}

···

--
Posted via http://www.ruby-forum.com/.

ChrisH wrote:

BTW, I noticed that all the fields in the BitStruct had endianess
specified.
Is there a way to set the endianess for the whole structure? Would you

have a strucutre with mixed endianess?

Good point about setting the default endianness for all fields in a BitStruct. That's probably the more useful case than field-by-field settings. I'll do that in the next release...

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf <vjoel@path.berkeley.edu> writes:

2. Reading BitStructs from a file.

I'm not sure there's an easier way to do it than in your code, Daniel,
but I'm open to suggestions. All of ruby's IO (including sockets) uses
Strings, so we always have to read a String and then construct a
BitStruct from that string using BitStruct.new.

Well, the main issue is that I thought it was a bit clumsy to have to
both ask for the byte length and do the reading. Now, I'm not sure
this can be done for a structure with a "rest" parameter, but for
anything else it should be possible for the user of bit-struct to do
the read and structure init in one step, say:
  mrk_head = MRKHeader.read(f)
or
  mrk_head = MRKHeader.new
  mrk_head << f
Actually, those two steps should be combinable into something like:
  mrk_head = MRKHeader.new << f

The point is that the user of bit-struct should be able to avoid
knowing the actual byte length if at all possible.

This might even be possible to do with something that has a rest
parameter, so long as you have a maximum size limitation on "rest",
and then just use however many bytes you get back in one read call
(reasonable behavior for sockets).

3. Defining data wrappers.

Hm.... I've needed that too. I'll think about it.

I can think of at least two ways:

one is for most built-in field types to have overrideable data_in and
data_out procedures that take a single argument and by default just
return their argument but someone who needs, say, a date stored as
seconds-since-1970 could then just subclass BitStruct::UnsignedField
and override data_in and data_out.

The other is to add an extra data_in and data_out option to all fields
whose values should be Procs.

Actually, it's probably feasible to implement both.