Hello all,
I am running:
ruby 1.8.0 (2003-08-04) [i386-linux-gnu]
I am just writing to ask about and describe a really terrible
performance issue. I ran across this writing some code to
process mail folders. I wanted a method to gather up the
entire body of a mail message into one object, and initially
I chose to use a string as follows:
line = nil
buf = ""
while line = @file.gets
if line =~ /^From /
@pushback = line
break
end
buf += line
end
Processing a folder with 2355 mail messages with this took
about 386 seconds on my 1Ghz pentium machine.
Then, I decided to use an array of lines rather than a really
big string, as follows:
line = nil
buf = Array.new
while line = @file.gets
if line =~ /^From /
@pushback = line
break
end
buf.push line
end
This code processes the same folder in 9.4 seconds.
(about 40 times faster than the above).
We first noticed this problem when another (and entirely
different) ruby application began consuming huge amounts
of CPU resources, and before we were able to really
understand what was going on …
I hang my head in shame when I admit it …
we threw our hands up and recoded it in Perl.
Tom
···
–
Tom Trebisky
MMT Observatory
University of Arizona – Tucson
tom@mmto.org
Tom Trebisky wrote:
buf = ""
…
buf += line
and
buf = Array.new
…
buf.push line
are not comparable constructs.
The first one is the same as
buf = buf + line
which constructs a new string each time. The second one modifies the
existing array in place, which is why you saw better performance.
What you probably want is
buf << line
This just concatenates line on the end of the buf string, modifying buf
in place. An advantage of this is that the same code works whether buf
is a String or an Array (Array#<< is the same as Array#push).
[snip]
Then, I decided to use an array of lines rather than a really big string
ruby a.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
47.05 2.74 2.74 6 457.03 971.35 Integer#times
32.17 4.62 1.88 10000 0.19 0.19 String#+
10.19 5.21 0.59 10000 0.06 0.06 String#<<
7.51 5.65 0.44 5000 0.09 0.09 Array#+
3.08 5.83 0.18 5000 0.04 0.04 Array#<<
0.13 5.84 0.01 1 7.81 7.81 Profiler__.start_profile
0.00 5.84 0.00 1 0.00 679.69 Object#t_string_concat3
0.00 5.84 0.00 1 0.00 835.94 Object#t_string_append3
0.00 5.84 0.00 1 0.00 828.12 Object#t_array_append20
0.00 5.84 0.00 1 0.00 664.06 Object#t_array_concat20
0.00 5.84 0.00 6 0.00 0.00 Module#method_added
0.00 5.84 0.00 1 0.00 2132.81 Object#t_string_append20
0.00 5.84 0.00 1 0.00 5828.12 #toplevel
0.00 5.84 0.00 1 0.00 687.50 Object#t_string_concat20
expand -t2 a.rb
require ‘profile’
def t_string_concat3
str = “”
5000.times{|n| str << “abc”}
end
def t_string_concat20
str = “”
5000.times{|n| str << “abcabcabcabcabcabcab”}
end
def t_string_append3
str = “”
5000.times{|n| str += “abc”}
end
def t_string_append20
str = “”
5000.times{|n| str += “abcabcabcabcabcabcab”}
end
def t_array_concat20
ary =
5000.times{|n| ary << “abcabcabcabcabcabcab”}
end
def t_array_append20
ary =
5000.times{|n| ary += [“abcabcabcabcabcabcab”] }
end
t_string_concat3
t_string_concat20
t_string_append3
t_string_append20
t_array_concat20
t_array_append20
The fastest is Array.<<(element).
The slowest is String.+=(other_string).
···
Tom Trebisky tom@mmto.org wrote:
–
Simon Strandgaard