Read efficiency?

I'm wondering if anyone knows much about Ruby's efficiency with IO#read.
Specifically, I'm wondering about libraries I might use to speed up disk
reads.

To see what I mean, here's some test code that iterates over an
11-megabyte file. All it does is call IO#read on a number of bytes (set
on the command-line) over the entire file, and times it.

#!/usr/bin/env ruby
# readspeed.rb

buf_size = ARGV[0].to_i
fd = File.open("some.txt")

start = Time.now
while (fd.read(buf_size))
end
stop = Time.now

puts (stop - start).to_s + " seconds"

#--- EOF

Running this on my system yields:

$ ruby readspeed.rb 4096
0.014 seconds

$ ruby readspeed.rb 1
7.547 seconds

Obviously a big difference! This is a simplified version of the test I
was actually running, which tried to account for the increased amount of
overhead when calling with 1 byte at a time. There's still an
order-of-magnitude difference between the two...reading one byte at a
time is *slow*, slow enough to bog down an entire program.

I know this is supposed to be the case with unbuffered input, such as
the C standard library "read", but isn't IO#read supposed to be
buffered? What's causing this slowdown? I'm writing a class that will
hopefully speed up smaller reads from binary files by explicitly caching
data in memory, but I'm wondering if there are any pre-built (i.e.,
tested) solutions that Ruby programmers might be using.

···

--
Posted via http://www.ruby-forum.com/.

I hate it when I put my foot in my mouth. After further testing, I'm
almost entirely sure this is just due to overhead, not a problem with
disk access.

Who would have thought looping 11*2**20 times would incur a performance
hit?

···

--
Posted via http://www.ruby-forum.com/.

09:40:20 Temp$ ./bmx.rb
Rehearsal -----------------------------------------------------------------
f-100.bin simple 0.000000 0.000000 0.000000 ( 0.004000)
f-100.bin smart 0.016000 0.000000 0.016000 ( 0.005000)
f-100.bin complete 0.000000 0.000000 0.000000 ( 0.008000)
f-1048576.bin simple 0.094000 0.046000 0.140000 ( 0.133000)
f-1048576.bin smart 0.078000 0.016000 0.094000 ( 0.100000)
f-1048576.bin complete 0.015000 0.031000 0.046000 ( 0.054000)
f-104857600.bin simple 9.031000 3.875000 12.906000 ( 12.913000)
f-104857600.bin smart 5.766000 4.047000 9.813000 ( 9.820000)
f-104857600.bin complete 0.609000 3.156000 3.765000 ( 3.807000)
------------------------------------------------------- total: 26.780000sec

                                    user system total real
f-100.bin simple 0.016000 0.000000 0.016000 ( 0.007000)
f-100.bin smart 0.000000 0.000000 0.000000 ( 0.008000)
f-100.bin complete 0.000000 0.000000 0.000000 ( 0.007000)
f-1048576.bin simple 0.093000 0.109000 0.202000 ( 0.207000)
f-1048576.bin smart 0.078000 0.016000 0.094000 ( 0.103000)
f-1048576.bin complete 0.000000 0.046000 0.046000 ( 0.037000)
f-104857600.bin simple 8.594000 4.125000 12.719000 ( 12.760000)
f-104857600.bin smart 6.219000 3.500000 9.719000 ( 9.721000)
f-104857600.bin complete 0.593000 3.125000 3.718000 ( 3.778000)
09:42:46 Temp$

Code is here File reading benchmark · GitHub

Cheers

robert

···

2010/2/21 Robert Klemme <shortcutter@googlemail.com>:

Note: Ruby is not C but sometimes there is room for improvement. :slight_smile:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I hate it when I put my foot in my mouth. After further testing, I'm
almost entirely sure this is just due to overhead, not a problem with
disk access.

Who would have thought looping 11*2**20 times would incur a performance
hit?

I believe that Rubinius has less overhead "hit time" -- as long as you
define everything as methods so it can JIT them.
GL!
-rp

···

--
Posted via http://www.ruby-forum.com/\.