String#+ operatorbroken?

Hi:

I have some code sitting around that used the notation
(Yes, it was written long ago before I knew better)

s = ""
s += “whatever”

I tried to use it today with 1.8.0, but my program
went wild on memory usage.
(Now, remember, this code worked fine under 1.6.7 and
possibly 1.7.3.)

Simply changing it to

s = ""
s << “whatever”

fixes the problem. So, I did a little test:

s = ""
100000.times { |i|
s << “fred\n” # this completes in less than 1 second
#s += “fred\n” # this blows up real fast
}

Is there a problem with String#+= or is it something
that we should avoid?

···


Jim Freeze

To be more clear, ‘blows up’ refers to time and not necessarily memory.

···

On Wednesday, 19 February 2003 at 9:47:53 +0900, Jim Freeze wrote:

100000.times { |i|
s << “fred\n” # this completes in less than 1 second
#s += “fred\n” # this blows up real fast


Jim Freeze

If you push the “extra ice” button on the soft drink vending machine,
you won’t get any ice. If you push the “no ice” button, you’ll get
ice, but no cup.

Hi,

···

In message “String#+ operatorebroken?” on 03/02/19, Jim Freeze jim@freeze.org writes:

I have some code sitting around that used the notation
(Yes, it was written long ago before I knew better)

s = “”
s += “whatever”

Remember, s += “whatever” is a syntax sugar for

s = s + “whatever”

so that it allocates new string object every time, whereas

s << “whatever”

appends “whatever” to existing string object.

						matz.

Jim Freeze jim@freeze.org writes:

I have some code sitting around that used the notation
(Yes, it was written long ago before I knew better)

s = “”
s += “whatever”

I tried to use it today with 1.8.0, but my program went wild on
memory usage. (Now, remember, this code worked fine under 1.6.7 and
possibly 1.7.3.)

What version do you mean by 1.8.0? I.e. what is printed with:

ruby -e "p RUBY_RELEASE_DATE"

The current CVS version of ruby does not “blow up real fast” with the
“+=” case of your test program below. Perhaps you are encountering a
bug that has been since fixed?

···

s = “”
100000.times { |i|
s << “fred\n” # this completes in less than 1 second
#s += “fred\n” # this blows up real fast
}


matt

Remember, s += “whatever” is a syntax sugar for

s = s + “whatever”

so that it allocates new string object every time, whereas

s << “whatever”

appends “whatever” to existing string object.

  					matz.

Hmm. Would it make sense to ‘special-case’ that? They accomplish the same thing logically
(right?), and I would expect a language to evolve out these hidden ‘performance-bombs’. In fact,
anytime there’s an operator= method, maybe it should never allocate a new object?

Jason

Jim Freeze jim@freeze.org writes:

···

On Wednesday, 19 February 2003 at 9:47:53 +0900, Jim Freeze wrote:

100000.times { |i|
s << “fred\n” # this completes in less than 1 second
#s += “fred\n” # this blows up real fast

To be more clear, ‘blows up’ refers to time and not necessarily memory.

The “+=” version runs really slow in both ruby 1.6.8 and current CVS
(1.8.0). The “<<” version runs much faster in both.


matt

ruby -e “p RUBY_RELEASE_DATE”
“2003-02-12”

Ok, I have boiled down the code that is causing the problem.
To the best of my recollection, the problem did not occur
on 1.6.7. Here is the code:

#!/usr/bin/env ruby

class App
def initialize
@structs =
line = "fred "*20
# Comment out the line below and memory will not blow up.
100.times { @structs << line.split }
end#initialize

def run
s = “”
s200 = “a”*200
9080.times { s += s200 }
puts s.size # should be 1816000
end#run
end#class App

App.new.run

This app grew to 380MB in 1 minute on my computer. A few tweaks
and it can grow to over 1GB or hover around 17MB.

Changing ‘line = "fred "*20’ to ‘line = "fred "*10’ caused
the app to hover around 17MB.
Changing 100.times to 75.times caused the memory to grow much
slower.

I know that += shouldn’t be used here, but this smells like
a possible bug since @structs is not related to the loop
in App#run.

···

On Wednesday, 19 February 2003 at 15:02:10 +0900, Matt Armstrong wrote:

Jim Freeze jim@freeze.org writes:

I have some code sitting around that used the notation
(Yes, it was written long ago before I knew better)

s = “”
s += “whatever”

I tried to use it today with 1.8.0, but my program went wild on
memory usage. (Now, remember, this code worked fine under 1.6.7 and
possibly 1.7.3.)

What version do you mean by 1.8.0? I.e. what is printed with:

ruby -e "p RUBY_RELEASE_DATE"


Jim Freeze

“I’d love to go out with you, but I did my own thing and now I’ve got
to undo it.”

Hi,

Remember, s += “whatever” is a syntax sugar for

s = s + “whatever”

Hmm. Would it make sense to ‘special-case’ that? They accomplish the same thing logically
(right?), and I would expect a language to evolve out these hidden ‘performance-bombs’. In fact,
anytime there’s an operator= method, maybe it should never allocate a new object?

They are not same:

s = p = “”
s += “foo”
p s # “foo”
p p # “”

s = p = “”
s << “foo”
p s # “foo”
p p # “foo”

						matz.
···

In message “Re: String#+ operatorebroken?” on 03/02/19, Jason Persampieri ruby@persampieri.net writes:

This was discussed fairly extensively on -talk not so long ago. Check
the archives of the last few months.

Basically, programmers have to take some responsibility for the
methods they use. It’s something of a Ruby idiom to use << for
appending, and if you want to program in Ruby, you need to learn some
Ruby idioms.

Also consider the fact that the “+” method can be redefined.

Gavin

···

On Wednesday, February 19, 2003, 3:25:49 PM, Jason wrote:

Remember, s += “whatever” is a syntax sugar for

s = s + “whatever”

so that it allocates new string object every time, whereas

s << “whatever”

appends “whatever” to existing string object.

                                                  matz.

Hmm. Would it make sense to ‘special-case’ that? They accomplish the
same thing logically (right?), and I would expect a language to
evolve out these hidden ‘performance-bombs’. In fact, anytime
there’s an operator= method, maybe it should never allocate a new
object?

Ok, I have boiled down the code that is causing the problem.
To the best of my recollection, the problem did not occur
on 1.6.7. Here is the code:

It's the new algorithm for the GC

Guy Decoux

So can <<. The important thing here is that + and << can be redefined
independently of each other, whereas += is just syntax sugar involving +,
and can’t be defined independently of it. Making += into syntax sugar for <<
instead would confuse people and is very likely to break old programs.

Tim Bates

···

On Wed, 19 Feb 2003 03:15 pm, Gavin Sinclair wrote:

Basically, programmers have to take some responsibility for the
methods they use. It’s something of a Ruby idiom to use << for
appending, and if you want to program in Ruby, you need to learn some
Ruby idioms.

Also consider the fact that the “+” method can be redefined.


tim@bates.id.au

Well I think we found a weak spot in the new GC.

···

On Friday, 21 February 2003 at 1:36:00 +0900, ts wrote:

Ok, I have boiled down the code that is causing the problem.
To the best of my recollection, the problem did not occur
on 1.6.7. Here is the code:

It’s the new algorithm for the GC


Jim Freeze

“All snakes who wish to remain in Ireland will please raise their right
hands.”
– Saint Patrick

Hi,

···

In message “Re: String#+ operatorebroken?” on 03/02/21, Jim Freeze jim@freeze.org writes:

It’s the new algorithm for the GC

Well I think we found a weak spot in the new GC.

Probably. Could you see what if “1.8” around line 307 in gc.c

heap_slots *= 1.8;

to something like 1.4 or 1.2?

						matz.

It still grows with 1.2, but not as fast.
After 1 minute, it had reached 200MB.

I’m no memory expert, so could you explain the
coupling between s and @structs with respect to
memory.

···

On Friday, 21 February 2003 at 6:15:41 +0900, Yukihiro Matsumoto wrote:

Hi,

In message “Re: String#+ operatorebroken?” > on 03/02/21, Jim Freeze jim@freeze.org writes:

It’s the new algorithm for the GC

Well I think we found a weak spot in the new GC.

Probably. Could you see what if “1.8” around line 307 in gc.c

heap_slots *= 1.8;

to something like 1.4 or 1.2?

  					matz.


Jim Freeze

Never count your chickens before they rip your lips off

Hi,

···

In message “Re: String#+ operatorebroken?” on 03/02/21, Jim Freeze jim@freeze.org writes:

Well I think we found a weak spot in the new GC.
Probably. Could you see what if “1.8” around line 307 in gc.c

heap_slots *= 1.8;

to something like 1.4 or 1.2?

It still grows with 1.2, but not as fast.
After 1 minute, it had reached 200MB.

Sorry, I was looking at wrong place. The current GC increases
malloc_limit too radically. I think I fixed the problem. It consumes
less then 30MB. But I have no idea why it’s still slower than 1.6.8
after the fix.

						matz.

less then 30MB. But I have no idea why it's still slower than 1.6.8
after the fix.

Probably I'm wrong but the difference is in str_new(). Apparently the GC
is not important in this case, ruby take all it's time in str_new (or
rb_str_new2() for 1.6.8).

The difference : str_new() can call memset()

If you replace malloc(), memset() with calloc() : 1.8 is as fast as 1.6.8.
Someone can verify this with its copy of ruby ?

Guy Decoux

Hi,

···

In message “Re: String#+ operatorebroken?” on 03/02/23, ts decoux@moulon.inra.fr writes:

Probably I’m wrong but the difference is in str_new(). Apparently the GC
is not important in this case, ruby take all it’s time in str_new (or
rb_str_new2() for 1.6.8).

The difference : str_new() can call memset()

If you replace malloc(), memset() with calloc() : 1.8 is as fast as 1.6.8.
Someone can verify this with its copy of ruby ?

Verified. And I confirmed memset() is not needed at all. Thank you!

Since 1.8 does 1/3 GC, it should be faster. there still should be
something left. I will continue examining.

						matz.