DANGER ! Ruby-Newbie ahead: How to access binary files

Hi,

There is a file on my HD, which was written by a C program. The C
program wrotes the contents of an array of structures (each array
element was made from the same structure) to that file.

Since accessing that file looks like a very low level and
"procedure-based" thing to me I would be very interested how this job
can be done in a most ruby-like, objectoriented way.

Thank you very much for any help in advance!
Dont worry, use ruby!
mcc

If you're using Windows, make sure you open the file in binary mode:

  File.open(filename, "rb") ...

Otherwise, look up Array#unpack. There are examples of this in the
ImageInfo library that is on the RAA; I have a custom copy of it in
PDF::Writer.

-austin

···

On 16/01/06, Meino Christian Cramer <Meino.Cramer@gmx.de> wrote:

There is a file on my HD, which was written by a C program. The C
program wrotes the contents of an array of structures (each array
element was made from the same structure) to that file.

Since accessing that file looks like a very low level and
"procedure-based" thing to me I would be very interested how this job
can be done in a most ruby-like, objectoriented way.

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

Meino Christian Cramer wrote:

Hi,

There is a file on my HD, which was written by a C program. The C
program wrotes the contents of an array of structures (each array
element was made from the same structure) to that file.

Since accessing that file looks like a very low level and
"procedure-based" thing to me I would be very interested how this job
can be done in a most ruby-like, objectoriented way.

You need to know the format of the structure and its size.

You can then use IO#read to read bytes from the file into a string and String#unpack to extract the individual fields from the structure.

Of course, all of this should be encapsulated into a class :slight_smile:

this is a perfect use case to abstract the error prone method of
reading/seeking/writing that one would typically do with binary data. i use
mmap alot for these types of tasks at work, here is a little (silly) example:

first we build a c program to output an array of struct. note that we output
the sizeof(struct) as the first part of the file - this is because we can't
know how the compiler will pad structs so we make sure the correct size is
encoded into the file:

     harp:~ > cat a.c
     #include <stdlib.h>
     #include <stdio.h>

     struct foobar { int foo; float bar; };

     main ()
     {
       struct foobar a = { {40, 40.0}, {2, 2.0} };
       int size = sizeof(struct foobar);
       fwrite (&size, sizeof(int), 1, stdout);
       fwrite (&a, sizeof(a), 1, stdout);
     }

     harp:~ > gcc a.c

     harp:~ > a.out > a

next we write a ruby class to access the data. the access will be via mmap, so
any changes we make to the data can be tranparently written to disk with no
explicit io on our part - we simply use the objects as normal:

     harp:~ > cat a.rb
     #! /usr/bin/env ruby
     require "mmap" # ftp://moulon.inra.fr/pub/ruby/

     class Integer
       SIZEOF = [42].pack("i").size
     end
     class Float
       SIZEOF = [42.0].pack("f").size
     end
     module Foobar
       class Struct
         def initialize mmap, offset
           @mmap, @offset = mmap, offset
         end
         def foo
           @mmap[@offset, Integer::SIZEOF].unpack("i").first
         end
         def foo= i
           @mmap[@offset, Integer::SIZEOF] = [Integer(i)].pack("i")
         end
         def bar
           @mmap[@offset + Integer::SIZEOF, Float::SIZEOF].unpack("f").first
         end
         def bar= f
           @mmap[@offset + Integer::SIZEOF, Float::SIZEOF] = [Float(f)].pack("f")
         end
         def inspect
           { "foo" => foo, "bar" => bar }.inspect
         end
       end
       class List < ::Array
         def initialize mmap
           @mmap = mmap
           @sizeof = mmap[0, Integer::SIZEOF].unpack("i").first
           offset = Integer::SIZEOF
           while((offset + @sizeof) <= mmap.size)
             struct = Struct::new @mmap, offset
             self << struct
             offset += @sizeof
           end
         end
       end
       class File
         attr "path"
         attr "list"
         attr "mmap"
         def initialize path
           @path = path
           open(@path, "r+"){|f| @mmap = Mmap::new f, "rw", Mmap::MAP_SHARED}
           @list = List::new @mmap
         end
         def self::new *a, &b
           ff = super
           mmap = ff.mmap
           ::ObjectSpace::define_finalizer(ff){ mmap.msync; mmap.munmap }
           ff
         end
       end
     end

     ff = Foobar::File::new ARGV.shift
     fl = ff.list

     p fl

     fl.each{|f| f.foo = 42 and f.bar = 42.0} # automatically written!

the first time we run the progam we see the data the c program wrote:

     harp:~ > a.rb a
     [{"foo"=>40, "bar"=>40.0}, {"foo"=>2, "bar"=>2.0}]

but next time we see the data automatically written by the ruby program:

     harp:~ > a.rb a
     [{"foo"=>42, "bar"=>42.0}, {"foo"=>42, "bar"=>42.0}]

this is just a silly example, but it shows how objectification of something
like this might be done in a way that really makes working with the actual data
easier.

kind regards.

-a

···

On Tue, 17 Jan 2006, Meino Christian Cramer wrote:

There is a file on my HD, which was written by a C program. The C program
wrotes the contents of an array of structures (each array element was made
from the same structure) to that file.

Since accessing that file looks like a very low level and "procedure-based"
thing to me I would be very interested how this job can be done in a most
ruby-like, objectoriented way.

--
strong and healthy, who thinks of sickness until it strikes like lightning?
preoccupied with the world, who thinks of death, until it arrives like
thunder? -- milarepa

Doesn't this arithmetic assume that the C compiler is packing the fields
of the struct? What if fields are aligned on 8 byte boundaries, for
instance? I vaguely remember having some issues like this when porting
from x86 to sparc. I guess you could add __attribute__((__packed__)) to
the struct to be sure.

···

ara.t.howard@noaa.gov wrote:

        def bar
          @mmap[@offset + Integer::SIZEOF, Float::SIZEOF].unpack("f").first
        end
        def bar= f
          @mmap[@offset + Integer::SIZEOF, Float::SIZEOF] =
[Float(f)].pack("f")
        end

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

absolutely. i figured it was beyond the scope of the post to get into that -
but really the file format would need to export the shape of the struct in
some sort of header. to do this one would need to crawl the struct with a
'void *' and compute offsets from the address of the struct.

of course, this would about the point where one should pull out xdr or some
such. in practice, however, one often needs to read binary data written by a
program beyond one's control and the unpack approach will work most of the
time - wouldn't launch rockets with it though!

cheers.

-a

···

On Tue, 17 Jan 2006, Joel VanderWerf wrote:

ara.t.howard@noaa.gov wrote:

        def bar
          @mmap[@offset + Integer::SIZEOF, Float::SIZEOF].unpack("f").first
        end
        def bar= f
          @mmap[@offset + Integer::SIZEOF, Float::SIZEOF] =
[Float(f)].pack("f")
        end

Doesn't this arithmetic assume that the C compiler is packing the fields
of the struct? What if fields are aligned on 8 byte boundaries, for
instance? I vaguely remember having some issues like this when porting
from x86 to sparc. I guess you could add __attribute__((__packed__)) to
the struct to be sure.

--
strong and healthy, who thinks of sickness until it strikes like lightning?
preoccupied with the world, who thinks of death, until it arrives like
thunder? -- milarepa