Nasty string growth performance

Hello all,

I am running:

ruby 1.8.0 (2003-08-04) [i386-linux-gnu]

I am just writing to ask about and describe a really terrible
performance issue. I ran across this writing some code to
process mail folders. I wanted a method to gather up the
entire body of a mail message into one object, and initially
I chose to use a string as follows:

    line = nil
    buf = ""
        while line = @file.gets
            if line =~ /^From /
                @pushback = line
                break
            end
            buf += line
        end

Processing a folder with 2355 mail messages with this took
about 386 seconds on my 1Ghz pentium machine.

Then, I decided to use an array of lines rather than a really
big string, as follows:

    line = nil
    buf = Array.new
        while line = @file.gets
            if line =~ /^From /
                @pushback = line
                break
            end
            buf.push line
        end

This code processes the same folder in 9.4 seconds.
(about 40 times faster than the above).

We first noticed this problem when another (and entirely
different) ruby application began consuming huge amounts
of CPU resources, and before we were able to really
understand what was going on …
I hang my head in shame when I admit it …
we threw our hands up and recoded it in Perl.

    Tom
···


Tom Trebisky
MMT Observatory
University of Arizona – Tucson
tom@mmto.org

Tom Trebisky wrote:

  buf = ""

            buf += line

and

  buf = Array.new

            buf.push line

are not comparable constructs.

The first one is the same as

buf = buf + line

which constructs a new string each time. The second one modifies the
existing array in place, which is why you saw better performance.

What you probably want is

buf << line

This just concatenates line on the end of the buf string, modifying buf
in place. An advantage of this is that the same code works whether buf
is a String or an Array (Array#<< is the same as Array#push).

[snip]

Then, I decided to use an array of lines rather than a really big string

ruby a.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
47.05 2.74 2.74 6 457.03 971.35 Integer#times
32.17 4.62 1.88 10000 0.19 0.19 String#+
10.19 5.21 0.59 10000 0.06 0.06 String#<<
7.51 5.65 0.44 5000 0.09 0.09 Array#+
3.08 5.83 0.18 5000 0.04 0.04 Array#<<
0.13 5.84 0.01 1 7.81 7.81 Profiler__.start_profile
0.00 5.84 0.00 1 0.00 679.69 Object#t_string_concat3
0.00 5.84 0.00 1 0.00 835.94 Object#t_string_append3
0.00 5.84 0.00 1 0.00 828.12 Object#t_array_append20
0.00 5.84 0.00 1 0.00 664.06 Object#t_array_concat20
0.00 5.84 0.00 6 0.00 0.00 Module#method_added
0.00 5.84 0.00 1 0.00 2132.81 Object#t_string_append20
0.00 5.84 0.00 1 0.00 5828.12 #toplevel
0.00 5.84 0.00 1 0.00 687.50 Object#t_string_concat20
expand -t2 a.rb
require ‘profile’
def t_string_concat3
str = “”
5000.times{|n| str << “abc”}
end
def t_string_concat20
str = “”
5000.times{|n| str << “abcabcabcabcabcabcab”}
end
def t_string_append3
str = “”
5000.times{|n| str += “abc”}
end
def t_string_append20
str = “”
5000.times{|n| str += “abcabcabcabcabcabcab”}
end
def t_array_concat20
ary =
5000.times{|n| ary << “abcabcabcabcabcabcab”}
end
def t_array_append20
ary =
5000.times{|n| ary += [“abcabcabcabcabcabcab”] }
end

t_string_concat3
t_string_concat20
t_string_append3
t_string_append20
t_array_concat20
t_array_append20

The fastest is Array.<<(element).

The slowest is String.+=(other_string).

···

Tom Trebisky tom@mmto.org wrote:


Simon Strandgaard