String copy-on-write question

Hello group,

Ruby implements copy-on-write for strings, so you can do stuff like
this very cheaply:

   str = 0.chr * (2**24) # 16MiB allocated
   str[100..-1] # this costs only a small amount of memory

How come this optimization does not apply in this case?:

  str[100..-2] # this costs around 16MiB bytes of memory

As a side effect, if using regexps on a large string, the pre-match
and post-match variables behave differently:

  s = 0.chr * (2**23) + "Hello" + 0.chr * (2**23) # About 16MiB
allocated (after GC)
  s.scan(/Hello/) { |m| p m } # This is free
  p $'.size # This is free
  p $`.size # This costs another 8MiB.

Any insights?

Lars

Interesting. Do you also happen to know why not an additional field is used that stores the length? Is the reason maybe usage of C library string functions that work on zero terminated strings?

Cheers

  robert

···

On 05.05.2008 18:07, ts wrote:

Lars Christensen wrote:

Well, it's best if you look at rb_str_substr() in string.c

   str[100..-1] # this costs only a small amount of memory

ruby just need to adjust the pointer and the length in the new
object

  str[100..-2] # this costs around 16MiB bytes of memory

one character is missing from the previous string, if it do the
same thing than previously then it must
  * adjust the pointer
  * adjust the length
  * add \0 at the end

This mean that fatally it has modified the string, this is why it
duplicate.

  p $'.size # This is free
  p $`.size # This costs another 8MiB.

same reason here.

Robert Klemme wrote:

Interesting. Do you also happen to know why not an additional field is used that stores the length?

I've not understood : it has a field which give it the length of
the string, for example with

Ah, ok. This happens when one is too lazy to look into the source. :slight_smile: Somehow I had assumed that the length was not stored because you made the point that the \0 could not be inserted without altering the original. I concluded, there is no length. :slight_smile:

  str = '0' * 200
  str[100 .. -1]

the first object (in str) will have 200 for its length
the field length in the new object will have the value 100
  

                             Is the reason maybe usage of C library string functions that work on zero terminated strings?

only matz know this :slight_smile:

Well, maybe he'll stop by and enlighten us.

Kind regards

  robert

···

On 05.05.2008 18:33, ts wrote: