Read write integer in binary into a file

Vianney_Lecroart · 25 October 2007 14:36

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

···

--
Posted via http://www.ruby-forum.com/.

Park_Heesob1 · 25 October 2007 15:03

Hi,

···

----- Original Message -----
From: "Vianney Lecroart" <acemtp@gmail.com>
Newsgroups: comp.lang.ruby
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, October 25, 2007 11:36 PM
Subject: read write integer in binary into a file

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

How about Marshal?

myfile << Marshal.dump(mynum)

and

mynum = Marshal.load(myfile.read)

Regards,

Park Heesob

Adam_Preble · 25 October 2007 16:09

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
  str = ' '
  str[3] = n >> 24
  str[2] = n >> 16
  str[1] = n >> 8
  str[0] = n
  str
end

Here are the benchmark results vs the other methods mentioned:

user system total real
.pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
  x.report('.pack(i):') { n.times do; [number].pack('i'); end }
  x.report('pack_int32:') { n.times do; pack_int32(number); end }
  x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Adam

···

On Oct 25, 10:36 am, Vianney Lecroart <ace...@gmail.com> wrote:

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

yermej · 25 October 2007 16:17

Do you have to deal with each number individually? Maybe you could
build up an array of numbers and then pack them all at once:

arr =
while work_to_do do
mynum = generate_next_number
arr << mynum
end
myfile.write arr.pack('i*')

That way you aren't creating a new array for each number.

Similarly, for reading the file:
data = file.read
num_array = data.unpack('i*')

The '*' in (un)pack means to process the rest of the data in the same
way.

···

On Oct 25, 9:36 am, Vianney Lecroart <ace...@gmail.com> wrote:

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.
--
Posted viahttp://www.ruby-forum.com/.

Wu_Junchen · 13 December 2007 13:26

Vianney Lecroart wrote:

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

Vianney_Lecroart · 25 October 2007 15:08

How about Marshal?

Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.

···

--
Posted via http://www.ruby-forum.com/\.

Gavin_Kistner3 · 25 October 2007 16:35

Using only the number 2_000_000 seems to skew the results. I see your
results with your test, but if I change it slightly to use a variety
of integers, I get more balanced results:

  require 'benchmark'
  MAX = 2**30
  n = 1_000_000
  nums = (0..n).map{ (rand*MAX).to_i }

  Benchmark.bmbm do |x|
    x.report('pack(i):') { nums.each{ |num| [num].pack('i') } }
    x.report('pack32:') { nums.each{ |num| pack_int32(num) } }
    x.report('Dump:') { nums.each{ |num| Marshal.dump(num) } }
  end

Rehearsal --------------------------------------------
pack(i): 5.813000 0.109000 5.922000 ( 5.984000)
pack32: 5.234000 0.000000 5.234000 ( 5.281000)
Dump: 5.906000 0.125000 6.031000 ( 6.063000)
---------------------------------- total: 17.187000sec

user system total real
pack(i): 5.687000 0.125000 5.812000 ( 5.875000)
pack32: 5.141000 0.016000 5.157000 ( 5.188000)
Dump: 6.000000 0.078000 6.078000 ( 6.141000)

···

On Oct 25, 10:09 am, Adam Preble <pre...@gmail.com> wrote:

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
  str = ' '
  str[3] = n >> 24
  str[2] = n >> 16
  str[1] = n >> 8
  str[0] = n
  str
end

Here are the benchmark results vs the other methods mentioned:

                  user system total real
.pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
  x.report('.pack(i):') { n.times do; [number].pack('i'); end }
  x.report('pack_int32:') { n.times do; pack_int32(number); end }
  x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Tim_Hunter1 · 13 December 2007 13:41

Wu Junchen wrote:

irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!

irb

irb(main):001:0> x = [720850].pack('i')
=> "\322\377\n\000"
irb(main):002:0> x.length
=> 4

So clearly the integer 720850 is packed into 4 bytes as requested. Why
does it occupy 5 bytes in the file? But see the "\n" in position 2? That
means that the 3rd byte is a newline character, and on Windows, in text
files, Ruby turns newlines into CRLF. 2 bytes! Since you've got binary
data in your file you don't want to write a text file, so you must open
the file with the "b" flag in addition to "w":

f = open("test", "wb")

···

--
Posted via http://www.ruby-forum.com/\.

Jon_Hawkins · 25 October 2007 15:16

Vianney Lecroart wrote:

How about Marshal?

Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.

What file format? I dont see any problem with using Marshal, it doesnt
need a file format specified its simply just a marshal dump.

···

--
Posted via http://www.ruby-forum.com/\.

Vianney_Lecroart · 25 October 2007 15:19

It seems that the marshaling of a number doesn't give a 4 bytes:

irb(main):036:0> mynum
=> 56515
irb(main):037:0> [mynum].pack("i")
=> "\303\334\000\000"
irb(main):038:0> Marshal.dump(mynum)
=> "\004\bi\002\303\334"

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Writing binary file ruby-talk	2	103	26 July 2010
Wite an integer to file as binary ruby-talk	8	117	18 June 2007
Reading and writing binary data from files ruby-talk	2	124	23 November 2002
How to write data in binary to a file? ruby-talk	10	1563	12 September 2011
Integer to byte string - Speed improvements ruby-talk	9	163	3 September 2006

Read write integer in binary into a file

Related topics