Hi, Rubyists.
I’ve been playing with some cross-language comparison in preparation for a
talk I
will be giving at work on ruby. I was playing with a simple word count program
and made the following (see below). It performs pretty well compared with
Java,
which kind of pleasantly surprised me ;-).
My question is this: 30% of the time is spent in Array#each. Is there any
way to speed this up and still keep some semblance of readability?
I was also interested to find that the line count was one out. I used RFC1000
as input and the last line has ^L-EOF. I ignore these chars in the C and Java
versions.
Regards,
-mark.
------------------------------
#! /usr/bin/ruby
···
if ARGV.size != 1
puts "usage: #{$0} "
exit(1)
end
f = File.readlines(ARGV[0])
nl = f.length
nw = nc = 0
f.each { |line|
nw += line.split.size
nc += line.length
}
puts " #{nl} #{nw} #{nc} #{ARGV[0]}"
-------------------------
$ time ruby wc.rb …/rfc.txt
8642 40156 315315 …/rfc.txt
real 0m0.450s user 0m0.030s sys 0m0.010s
$ time wc1.exe …/rfc.txt
8641 40305 315315 …/rfc.txt
real 0m0.141s user 0m0.020s sys 0m0.020s
$ time java wc …/rfc.txt
8641 40156 323956 …/rfc.txt
real 0m0.420s user 0m0.010s sys 0m0.040s
“Mark Probert” probertm@nortelnetworks.com wrote in message
news:5.1.0.14.2.20021204171209.020808e8@zcard04k.ca.nortel.com…
What are You measuring Ruby’s speed for?
Are You serious intending use of Ruby for such tasks as processing images in
real time?
I see Ruby being a perfect language to build sophisticated programs of
simple but quick blocks of native code.
Mark Probert probertm@nortelnetworks.com writes:
My question is this: 30% of the time is spent in Array#each. Is there any
way to speed this up and still keep some semblance of readability?
Don’t use Arrary#each.
#! /usr/bin/ruby
···
if ARGV.size != 1
puts “usage: #{$0} ”
exit(1)
end
f = File.open(ARGV[0]){|fh| fh.read}
nl = f.count(“\n”)
nw = f.tr_s("^\t\n\v\f\r ", “x”).count(“x”)
nc = f.size
puts " #{nl} #{nw} #{nc} #{ARGV[0]}"
–
eban
Mark Probert probertm@nortelnetworks.com writes:
f = File.readlines(ARGV[0])
I think if you use #sysread, it will improve the time too.
jenny:~> ruby tmp.rb rfc2571.txt
user system total real
using readlines 5.460000 0.140000 5.600000 ( 5.720842)
using sysread 4.720000 0.180000 4.900000 ( 5.006272)
The time difference is small since most of the time is spent in
manipulating Strings and Array. However, with larger files, the
difference will become obvious, especially in simple operations like
file copying, etc.
YS.
tmp.rb (855 Bytes)