Why is ruby (on windows) so much slower at reading lines in a file (as
compared to perl, python or java) ? I’ve installed 1.8 preview 2 and it’s
much faster but still not nearly as fast as the other languages mentioned.
Thanks!
Greg B.
Why is ruby (on windows) so much slower at reading lines in a file (as
compared to perl, python or java) ? I’ve installed 1.8 preview 2 and it’s
much faster but still not nearly as fast as the other languages mentioned.
Thanks!
Greg B.
Can you post the exact programs that you are timing in each language?
Regards,
Brian.
On Wed, Apr 02, 2003 at 07:28:20AM +0900, Greg Brondo wrote:
Why is ruby (on windows) so much slower at reading lines in a file (as
compared to perl, python or java) ? I’ve installed 1.8 preview 2 and it’s
much faster but still not nearly as fast as the other languages mentioned.
I would like to point out that removing the …
lc += 1
if lc % 1000 == 0
puts "Read #{lc} lines"
end
in the ruby code made a significant difference.
You are not only testing file reading but a whole lot more, such as the
maths and screen io. As profiling shows.
With line counting and reporting
Elapsed = 11.28818595
% cumulative self self total
time seconds seconds calls ms/call ms/call name
67.34 7.05 7.05 1 7050.00 10470.00 IO#each_line
13.47 8.46 1.41 45425 0.03 0.03 Fixnum#==
9.65 9.47 1.01 45425 0.02 0.02 Fixnum#%
9.36 10.45 0.98 45425 0.02 0.02 Fixnum#+
0.19 10.47 0.02 46 0.43 0.43 Kernel.puts
0.00 10.47 0.00 1 0.00 0.00 Float#-
0.00 10.47 0.00 92 0.00 0.00 IO#write
0.00 10.47 0.00 1 0.00 10470.00 #toplevel
0.00 10.47 0.00 45 0.00 0.00 Fixnum#to_s
0.00 10.47 0.00 1 0.00 0.00 Array#[]
0.00 10.47 0.00 1 0.00 0.00 File#open
0.00 10.47 0.00 2 0.00 0.00 Time#to_f
0.00 10.47 0.00 1 0.00 0.00 String#+
0.00 10.47 0.00 2 0.00 0.00 Time#now
0.00 10.47 0.00 1 0.00 0.00 Float#to_s
Without line counting and reporting
Elapsed = 0.0706551075
% cumulative self self total
time seconds seconds calls ms/call ms/call name
100.00 0.07 0.07 1 70.00 70.00 IO#each_line
0.00 0.07 0.00 1 0.00 0.00 Kernel.puts
0.00 0.07 0.00 1 0.00 70.00 #toplevel
0.00 0.07 0.00 1 0.00 0.00 Array#[]
0.00 0.07 0.00 2 0.00 0.00 IO#write
0.00 0.07 0.00 2 0.00 0.00 Time#now
0.00 0.07 0.00 1 0.00 0.00 Float#to_s
0.00 0.07 0.00 1 0.00 0.00 Float#-
0.00 0.07 0.00 1 0.00 0.00 File#open
0.00 0.07 0.00 2 0.00 0.00 Time#to_f
0.00 0.07 0.00 1 0.00 0.00 String#+
IO from printing the counter on screen seems to be quite a significant
drain on the process and just doing a Fixnum#== seems to be quite a load.
Here they are:
------- RUBY ------
filename = ARGV[0]
lc = 0
starttime = Time.now.to_f
File.open(filename).each_line do
lc += 1
if lc % 1000 == 0
puts “Read #{lc} lines”
end
end
stoptime = Time.now.to_f
puts "Elapsed = " + (stoptime - starttime).to_s
------ PYTHON ------
import sys, time
filename = sys.argv[1]
fh = open(filename,‘r’)
lc = 0
starttime = time.time()
for line in fh:
lc += 1
if lc % 1000 == 0:
print “Read %d lines” % lc
------ JAVA ------
import java.util.;
import java.io.;
The stats for reading a text file with 1,572,000 lines in an CMD.EXE console
(on a PIII 866 w 512MB):
RUBY (1.8p2): 13.5 sec
PYTHON (2.3b2): 4 sec
JAVA (1.4.1): 7 sec
Thanks for the help!!!
Greg B.
“Brian Candler” B.Candler@pobox.com wrote in message
news:20030402110858.C19790@linnet.org…
On Wed, Apr 02, 2003 at 07:28:20AM +0900, Greg Brondo wrote:
Why is ruby (on windows) so much slower at reading lines in a file (as
compared to perl, python or java) ? I’ve installed 1.8 preview 2 and
it’s
much faster but still not nearly as fast as the other languages
mentioned.Can you post the exact programs that you are timing in each language?
Regards,
Brian.
I wonder why Fixnum#== is slower than % or + ? I think that’s just an
artefact of the profiler.
$ time ruby -e ‘1_000_000.times { 1 == 1 }’
real 0m2.366s
user 0m2.292s
sys 0m0.001s
$ time ruby -e ‘1_000_000.times { 1 + 1 }’
real 0m2.366s
user 0m2.302s
sys 0m0.000s
and repeat to be sure:
$ time ruby -e ‘1_000_000.times { 1 == 1 }’
real 0m2.440s
user 0m2.291s
sys 0m0.000s
$ time ruby -e ‘1_000_000.times { 1 + 1 }’
real 0m2.426s
user 0m2.306s
sys 0m0.000s
The source for fix_equal could hardly be simpler.
Cheers,
Brian.
On Thu, Apr 03, 2003 at 06:47:49PM +0900, Peter Hickman wrote:
13.47 8.46 1.41 45425 0.03 0.03 Fixnum#==
9.65 9.47 1.01 45425 0.02 0.02 Fixnum#%
9.36 10.45 0.98 45425 0.02 0.02 Fixnum#+
Cheers. I’ve run them here under FreeBSD-4.7 running on a Sony Vaio laptop,
Pentium 266MMX. For the data file, I’m using:
$ wc /usr/share/dict/words
235881 235881 2493066 /usr/share/dict/words
(with an average line length of only 10.6 bytes it’s perhaps not the most
realistic example, but it will do)
Running the code you gave, I get the following times averaged over three
runs:
Ruby (1.6.8) 6.81 secs
Ruby (1.8.0p2) 6.57 secs
Python (2.2.1) 6.72 secs *
Java (jdk-1.3.1p6_4) 16.76 secs **
** printed “Elapsed = 16”, ditto
Note however that I am running under X, and it seems quite a lot of time is
spent just writing the progress messages to the xterm. If I add ‘>/dev/null’
to the end of the command line and use unix ‘time’ then I get:
Ruby (1.6.8) 3.8 s
Ruby (1.8.0p2) 3.4 s
Python (2.2.1) 3.7 s
Java (jdk-1.3.1p6_4) 8.6 s
So on this basis Ruby stacks up pretty well against the languages you
mention, running under Unix. From your figures it seems that Ruby is slower
under Windows, and I would guess that this is somehow related to the way
Ruby performs I/O, or perhaps to the threading support (e.g. making extra
calls to select() before I/O operations). Maybe a Windows developer could
compare the Python and Ruby I/O source code and see if there are significant
differences in the way they talk to Windows.
Out of interest, I also ported your code to Perl (attached), and also
modified the Ruby code to use a more iterative style to be closer to Perl. I
get:
(to screen) (to /dev/null)
Perl (5.005_03) 3.9 secs 2.3 secs
Ruby (1.6.8) 6.5 secs 3.6 secs
So it seems that Perl is significantly faster in this simple test, but also
the block/yield approach does not by itself add a great deal of overhead
(fortunately!)
Regards,
Brian.
x.pl (237 Bytes)
x2.rb (226 Bytes)
On Thu, Apr 03, 2003 at 12:51:28AM +0900, Greg Brondo wrote:
Here they are:
Hi Greg,
out of curiosity, could you test the attached Java class and tell us
whether it behaves differently than the one you provided? Thanks!
Regards
robert
Test.java (1.14 KB)
Hi,
At Thu, 3 Apr 2003 07:35:25 +0900, Brian Candler wrote:
$ wc /usr/share/dict/words
235881 235881 2493066 /usr/share/dict/words
I suspect it is related to GC, note that Java is also slower
than Python in [ruby-talk:68500], and Greg uses more huge data
as test input. Both of Ruby and Java uses mark&sweep whereas
Python does reference count, and the latter might have advantage
in this case, many small objects are created and disposed soon.
–
Nobu Nakada
Be aware that perl cheats unmercifully at IO, to the extent of
peeking beneath the C IO library hood and acting directly on the
underlying buffer structure. While somewhat evil (okay, it’s very
evil) it does cut out at least one level of function calls for IO,
and speeds things up some.
At 7:35 AM +0900 4/3/03, Brian Candler wrote:
Out of interest, I also ported your code to Perl (attached), and also
modified the Ruby code to use a more iterative style to be closer to Perl. I
get:(to screen) (to /dev/null)
Perl (5.005_03) 3.9 secs 2.3 secs
Ruby (1.6.8) 6.5 secs 3.6 secsSo it seems that Perl is significantly faster in this simple test, but also
the block/yield approach does not by itself add a great deal of overhead
(fortunately!)
–
Dan
--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk
Brian Candler wrote:
On Thu, Apr 03, 2003 at 12:51:28AM +0900, Greg Brondo wrote:
Here they are:
Cheers. I’ve run them here under FreeBSD-4.7 running on a Sony Vaio laptop,
Pentium 266MMX. For the data file, I’m using:$ wc /usr/share/dict/words
235881 235881 2493066 /usr/share/dict/words(with an average line length of only 10.6 bytes it’s perhaps not the most
realistic example, but it will do)Running the code you gave, I get the following times averaged over three
runs:Ruby (1.6.8) 6.81 secs
Ruby (1.8.0p2) 6.57 secs
Python (2.2.1) 6.72 secs *
Java (jdk-1.3.1p6_4) 16.76 secs **
- it printed “Elapsed = 6”, so the time given is using the ‘time’ command;
this is slightly unfair to python as it includes the interpreter start-up
time which the Ruby time doesn’t.** printed “Elapsed = 16”, ditto
Note however that I am running under X, and it seems quite a lot of time is
spent just writing the progress messages to the xterm. If I add ‘>/dev/null’
to the end of the command line and use unix ‘time’ then I get:Ruby (1.6.8) 3.8 s
Ruby (1.8.0p2) 3.4 s
Python (2.2.1) 3.7 s
Java (jdk-1.3.1p6_4) 8.6 sSo on this basis Ruby stacks up pretty well against the languages you
mention, running under Unix. From your figures it seems that Ruby is slower
under Windows, and I would guess that this is somehow related to the way
Ruby performs I/O, or perhaps to the threading support (e.g. making extra
calls to select() before I/O operations). Maybe a Windows developer could
compare the Python and Ruby I/O source code and see if there are significant
differences in the way they talk to Windows.Out of interest, I also ported your code to Perl (attached), and also
modified the Ruby code to use a more iterative style to be closer to Perl. I
get:(to screen) (to /dev/null)
Perl (5.005_03) 3.9 secs 2.3 secs
Ruby (1.6.8) 6.5 secs 3.6 secsSo it seems that Perl is significantly faster in this simple test, but also
the block/yield approach does not by itself add a great deal of overhead
(fortunately!)Regards,
Brian.
filename=ARGV[0]
lc = 0
starttime = Time.now.to_f
f = File.open(filename)
while f.gets
lc += 1
if lc % 1000 == 0
puts “Read #{lc} lines”
end
end
stoptime = Time.now.to_fputs "Elapsed = " + (stoptime - starttime).to_s
Hmm… I tried it under Ruby 1.6.8 compiled with GCC on Solaris 2.8 and
it was still slow. I’ll try it again on Linux (Debian Unstable) and see
what I get.
Robert Klemme wrote:
Hi Greg,
out of curiosity, could you test the attached Java class and tell us
whether it behaves differently than the one you provided? Thanks!Regards
robert
package io;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;/**
Read performance test.
@author robert.klemme
@created 03.04.2003
@version $Id$
*/
public class Test {public static void main( String args ) {
try {
BufferedReader in =
new BufferedReader(
new InputStreamReader(
new BufferedInputStream(
new FileInputStream( args[0] ), 4096 ) ) );int lc = 0; String line; long starttime = System.currentTimeMillis() / 1000; while ( ( line = in.readLine() ) != null ) { lc += 1; if ( ( lc % 1000 ) == 0 ) { System.out.println( "Read " + lc + " lines" ); } } in.close(); long stoptime = System.currentTimeMillis() / 1000; System.out.println( "Elapsed = " + ( stoptime - starttime ) ); } catch ( Exception e ) { e.printStackTrace(); }
}
}
Robert, tried the new Java class. Speed up read by about 4 seconds.
BTW – thanks all for answering my question. Here’s my reason for
asking in the first place: I really like the promise and application of
Ruby so I’m trying to begin using it more and more in my daily work.
I’m a Python hack (and a Java Hack as well) but I really like the
cleaness and builtin regex that Ruby provides.
Anyway, I’m going to run some more tests now with the same dataset
without writing to the screen to see what I get. I’ll post the results
soon.
Thanks again!
Greg Brondo
Dan Sugalski wrote:
> Be aware that perl cheats unmercifully at IO, to the extent of > peeking beneath the C IO library hood and acting directly on the > underlying buffer structure. While somewhat evil (okay, it's very > evil) it does cut out at least one level of function calls for IO, > and speeds things up some. > -- > DanThis must be the vaunted Perl IO layer that I’ve seen mentioned from
time to time. Any reason Ruby couldn’t “cheat” as well? Anything Ruby
can borrow here?
Dan
–
a = [74, 117, 115, 116, 32, 65, 110, 111, 116, 104, 101, 114, 32, 82]
a.push(117,98, 121, 32, 72, 97, 99, 107, 101, 114)
puts a.pack(“C*”)
“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:weZia.3906$2x2.1766565@dca1-nnrp1.news.algx.net…
Robert Klemme wrote:
Hi Greg,
out of curiosity, could you test the attached Java class and tell us
whether it behaves differently than the one you provided? Thanks!
Robert, tried the new Java class. Speed up read by about 4 seconds.
So this is an improvement from 8.6 secs or from 16.76 secs?
BTW – thanks all for answering my question. Here’s my reason for
asking in the first place: I really like the promise and application of
Ruby so I’m trying to begin using it more and more in my daily work.
I’m a Python hack (and a Java Hack as well) but I really like the
cleaness and builtin regex that Ruby provides.
Yes, that’s true.
Anyway, I’m going to run some more tests now with the same dataset
without writing to the screen to see what I get. I’ll post the results
soon.
… including the original or the modified Java class?
robert
Interesting. In what way is it so evil, given that Perl successfully
runs on many platforms.
And why is such evilness even necessary, given that the C IO libraries
should already be as fast as possible? If it’s just “cutting out at
least one level of function call”, then that function call must be in
a pretty tight loop to make any difference, suggesting that the C IO
libraries are not in fact optimised.
And, as an experienced Perl internals hacker, do you have any
suggestion for speeding up Ruby’s IO?
Cheers,
Gavin
On Friday, April 4, 2003, 12:10:00 AM, Dan wrote:
So it seems that Perl is significantly faster in this simple test, but also
the block/yield approach does not by itself add a great deal of overhead
(fortunately!)
Be aware that perl cheats unmercifully at IO, to the extent of
peeking beneath the C IO library hood and acting directly on the
underlying buffer structure. While somewhat evil (okay, it’s very
evil) it does cut out at least one level of function calls for IO,
and speeds things up some.
Dan Sugalski wrote:
> Be aware that perl cheats unmercifully at IO, to the extent of > peeking beneath the C IO library hood and acting directly on the > underlying buffer structure. While somewhat evil (okay, it's very > evil) it does cut out at least one level of function calls for IO, > and speeds things up some. > -- > DanThis must be the vaunted Perl IO layer that I’ve seen mentioned from
time to time.
Nope. That’s more a Tcl/SysV style streams thing, and is relatively
new. This is old code, and has been in perl for years.
Any reason Ruby couldn’t “cheat” as well? Anything Ruby
can borrow here?
Sure, license willing. (And I’m pretty sure it is, but I’d
double-check with Larry and Matz first) The code’s in sv.c, in the
Perl_sv_gets function, though there’s a fair amount of gook and macro
expansion you may have to do to actually make sense of it. The code’s
in there, though. Evil, definitely evil, but it is there.
Dan
--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk
And why is such evilness even necessary, given that the C IO libraries
should already be as fast as possible?
If you work directly with the buffer associated with the IO, this can be
faster than call to the IO libraries.
Guy Decoux
Robert Klemme wrote:
“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:weZia.3906$2x2.1766565@dca1-nnrp1.news.algx.net…Robert Klemme wrote:
Hi Greg,
out of curiosity, could you test the attached Java class and tell us
whether it behaves differently than the one you provided? Thanks!Robert, tried the new Java class. Speed up read by about 4 seconds.
So this is an improvement from 8.6 secs or from 16.76 secs?
BTW – thanks all for answering my question. Here’s my reason for
asking in the first place: I really like the promise and application of
Ruby so I’m trying to begin using it more and more in my daily work.
I’m a Python hack (and a Java Hack as well) but I really like the
cleaness and builtin regex that Ruby provides.Yes, that’s true.
Anyway, I’m going to run some more tests now with the same dataset
without writing to the screen to see what I get. I’ll post the results
soon.… including the original or the modified Java class?
robert
Ok, latest results. this time I tried 1.6.8-cygwin on the same machine.
A little larger BIG file (avg line length = 46)
Elapsed = 8.83099997
Line count = 1638391
Seems the IO using Cygwin1.dll is much faster than the stock 1.6.8 mswin
compiled version.
So it seems that Perl is significantly faster in this simple test, but also
the block/yield approach does not by itself add a great deal of overhead
(fortunately!)Be aware that perl cheats unmercifully at IO, to the extent of
peeking beneath the C IO library hood and acting directly on the
underlying buffer structure. While somewhat evil (okay, it’s very
evil) it does cut out at least one level of function calls for IO,
and speeds things up some.Interesting. In what way is it so evil, given that Perl successfully
runs on many platforms.
It peeks into the undocumented internals of the C stdio library. It
only works because most C RTLs all descend from the same code base,
presumably dating back to the old K&R days. If a platform doesn’t use
the scheme we know about, then we can’t use the hack. And if for some
reason the internals of the C RTL changed the hack would break.
And why is such evilness even necessary, given that the C IO libraries
should already be as fast as possible? If it’s just “cutting out at
least one level of function call”, then that function call must be in
a pretty tight loop to make any difference, suggesting that the C IO
libraries are not in fact optimised.
Or it suggests that function calls have more overhead than you think,
or that a generalized interface is slower than a specific one, or
that error checking is expensive, or…
Perl essentially finds the C RTL’s underlying buffers and memcpy’s
the data out of it, twiddling the internal pointers and counts as it
goes. Can’t get much faster than that, short of having the system do
a DMA read right into a buffer that doesn’t have to move.
And, as an experienced Perl internals hacker, do you have any
suggestion for speeding up Ruby’s IO?
I’ve not looked into how Ruby does its I/O, so I’m not comfortable
making any meaningful suggestions there.
At 6:13 PM +0900 4/4/03, Gavin Sinclair wrote:
On Friday, April 4, 2003, 12:10:00 AM, Dan wrote:
–
Dan
--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk
Dan Sugalski wrote:
Dan Sugalski wrote:
Be aware that perl cheats unmercifully at IO, to the extent of
peeking beneath the C IO library hood and acting directly on the
underlying buffer structure. While somewhat evil (okay, it’s very
evil) it does cut out at least one level of function calls for IO,
and speeds things up some.Dan
This must be the vaunted Perl IO layer that I’ve seen mentioned from
time to time.Nope. That’s more a Tcl/SysV style streams thing, and is relatively new.
This is old code, and has been in perl for years.Any reason Ruby couldn’t “cheat” as well? Anything Ruby
can borrow here?Sure, license willing. (And I’m pretty sure it is, but I’d double-check
with Larry and Matz first) The code’s in sv.c, in the Perl_sv_gets
function, though there’s a fair amount of gook and macro expansion you
may have to do to actually make sense of it. The code’s in there,
though. Evil, definitely evil, but it is there.
Ok. New results:
Both files contain 1,572,682 lines.
BIG line file avg line length = 46
SMALL line file avg line length = 11
All tests without writing to screen (only elapsed time written at end of
run):
py22 BIG = 5
py22 SMALL = 3
ruby168 BIG = 319
ruby168 SMALL = 95
ruby18p2 BIG = 12
ruby18p2 SMALL = 12
Java BIG = 7
Java SMALL = 3
FYI –
Greg Brondo
At 1:04 AM +0900 4/4/03, Daniel Berger wrote:
But surely, for the operations that matter to Ruby (say “read” and
“write”), the overhead of calling C’s IO library must be - what -
1E-9 seconds?
If the libraries are fast enough for every other C program, why are
they not fast enough for Perl - just another C program?
Or is it more than just “read” and “write” that’s at stake here?
Gavin
On Friday, April 4, 2003, 7:37:49 PM, ts wrote:
And why is such evilness even necessary, given that the C IO libraries
should already be as fast as possible?
If you work directly with the buffer associated with the IO, this can be
faster than call to the IO libraries.