Read write integer in binary into a file

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

···

--
Posted via http://www.ruby-forum.com/.

Hi,

···

----- Original Message -----
From: "Vianney Lecroart" <acemtp@gmail.com>
Newsgroups: comp.lang.ruby
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, October 25, 2007 11:36 PM
Subject: read write integer in binary into a file

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

How about Marshal?

myfile << Marshal.dump(mynum)

and

mynum = Marshal.load(myfile.read)

Regards,

Park Heesob

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
  str = ' '
  str[3] = n >> 24
  str[2] = n >> 16
  str[1] = n >> 8
  str[0] = n
  str
end

Here are the benchmark results vs the other methods mentioned:

                  user system total real
.pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
  x.report('.pack(i):') { n.times do; [number].pack('i'); end }
  x.report('pack_int32:') { n.times do; pack_int32(number); end }
  x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Adam

···

On Oct 25, 10:36 am, Vianney Lecroart <ace...@gmail.com> wrote:

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

Do you have to deal with each number individually? Maybe you could
build up an array of numbers and then pack them all at once:

arr =
while work_to_do do
  mynum = generate_next_number
  arr << mynum
end
myfile.write arr.pack('i*')

That way you aren't creating a new array for each number.

Similarly, for reading the file:
data = file.read
num_array = data.unpack('i*')

The '*' in (un)pack means to process the rest of the data in the same
way.

···

On Oct 25, 9:36 am, Vianney Lecroart <ace...@gmail.com> wrote:

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.
--
Posted viahttp://www.ruby-forum.com/.

Vianney Lecroart wrote:

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

How about Marshal?

Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.

···

--
Posted via http://www.ruby-forum.com/\.

Using only the number 2_000_000 seems to skew the results. I see your
results with your test, but if I change it slightly to use a variety
of integers, I get more balanced results:

  require 'benchmark'
  MAX = 2**30
  n = 1_000_000
  nums = (0..n).map{ (rand*MAX).to_i }

  Benchmark.bmbm do |x|
    x.report('pack(i):') { nums.each{ |num| [num].pack('i') } }
    x.report('pack32:') { nums.each{ |num| pack_int32(num) } }
    x.report('Dump:') { nums.each{ |num| Marshal.dump(num) } }
  end

Rehearsal --------------------------------------------
pack(i): 5.813000 0.109000 5.922000 ( 5.984000)
pack32: 5.234000 0.000000 5.234000 ( 5.281000)
Dump: 5.906000 0.125000 6.031000 ( 6.063000)
---------------------------------- total: 17.187000sec

               user system total real
pack(i): 5.687000 0.125000 5.812000 ( 5.875000)
pack32: 5.141000 0.016000 5.157000 ( 5.188000)
Dump: 6.000000 0.078000 6.078000 ( 6.141000)

···

On Oct 25, 10:09 am, Adam Preble <pre...@gmail.com> wrote:

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
  str = ' '
  str[3] = n >> 24
  str[2] = n >> 16
  str[1] = n >> 8
  str[0] = n
  str
end

Here are the benchmark results vs the other methods mentioned:

                  user system total real
.pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
  x.report('.pack(i):') { n.times do; [number].pack('i'); end }
  x.report('pack_int32:') { n.times do; pack_int32(number); end }
  x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Wu Junchen wrote:

irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!

irb

irb(main):001:0> x = [720850].pack('i')
=> "\322\377\n\000"
irb(main):002:0> x.length
=> 4

So clearly the integer 720850 is packed into 4 bytes as requested. Why
does it occupy 5 bytes in the file? But see the "\n" in position 2? That
means that the 3rd byte is a newline character, and on Windows, in text
files, Ruby turns newlines into CRLF. 2 bytes! Since you've got binary
data in your file you don't want to write a text file, so you must open
the file with the "b" flag in addition to "w":

f = open("test", "wb")

···

--
Posted via http://www.ruby-forum.com/\.

Vianney Lecroart wrote:

How about Marshal?

Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.

What file format? I dont see any problem with using Marshal, it doesnt
need a file format specified its simply just a marshal dump.

···

--
Posted via http://www.ruby-forum.com/\.

It seems that the marshaling of a number doesn't give a 4 bytes:

irb(main):036:0> mynum
=> 56515
irb(main):037:0> [mynum].pack("i")
=> "\303\334\000\000"
irb(main):038:0> Marshal.dump(mynum)
=> "\004\bi\002\303\334"

···

--
Posted via http://www.ruby-forum.com/.