Implementation of String or Array #slice and similar ops?

As some may recall, I'm fiddling with some string manipulation in a recent
series of articles. I'm wondering about the specific implementation of
operations like slice, on String (and on Array).

Specifically, does slicing a String copy the bytes, or point into the old
string, and similarly for Array? I see that if I modify the slice, the original
is not modified, and vice versa, but have the bytes already been copied when the
slice is created? Or not until later?

Similar questions for Array, all around, and for operations like <<.

Bottom line, are all the methods of String and Array directly allocating
storage, or are any of them "virtual"?

My guess is that the bytes are copied. If so, is anyone aware of a class or
classes that address substrings and such as virtual copies?

Thanks,

···

--
Ron Jeffries
www.XProgramming.com
I'm giving the best advice I have. You get to decide if it's true for you.

[...]

Specifically, does slicing a String copy the bytes, or point into the old
string, and similarly for Array? I see that if I modify the slice, the original
is not modified, and vice versa, but have the bytes already been copied when the
slice is created? Or not until later?

Similar questions for Array, all around, and for operations like <<.

Bottom line, are all the methods of String and Array directly allocating
storage, or are any of them "virtual"?

My guess is that the bytes are copied. If so, is anyone aware of a class or
classes that address substrings and such as virtual copies?

Ruby does use COW for arrays and strings, so internal pointers are
reused and the data they point to duplicated when actually needed.

somearray[start, size] will always reuse the internal array.
If I'm reading rb_str_substr correctly, somestring[start, rsize] will
copy the bytes unless start'+rsize == somestring.size (start' being
the adjusted offset), amongst other conditions.

···

On Fri, Dec 02, 2005 at 07:27:32PM +0900, Ron Jeffries wrote:

--
Mauricio Fernandez

What does "COW" mean?

A serious problem occurs when you go modify the original (likely
large) array after taking a slice. It allocates a new area for the
original array and leaves the slice (probably small) pointing the the
original area with the entire array. This can cause serious
performance issues (memory and run-time). Here is a simple example
that shows the problem:

time ruby -e 'n=2**13; a=(1..n).to_a; n.times { |i| a.push(a[i,1]) };
IO.readlines("/proc/#$$/status").grep(/VmSize/).display'
VmSize: 593020 kB
50.44user 0.65system 1:00.24elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1957major+121801minor)pagefaults 0swaps

You'll find that both memory and runtime are O(n**2). vs. the
equivalent without element sharing (O(n)):

time ruby -e 'n=2**13; a=(1..n).to_a; n.times { |i| a.push([a[i]]) };
IO.readlines("/proc/#$$/status").grep(/VmSize/).display'
VmSize: 3384 kB
0.01user 0.00system 0:00.22elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+558minor)pagefaults 0swaps

Back in September, I provided a patch to 1.9 for arrays. It addressed
this issue plus made operations near the beginning of the array (i.e.
shift/unshift) just as fast as operations near the end (i.e.
pop/push). Here was my post:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/5861

This patch hasn't been accepted to the ruby head (yet). The guys
doing the Sydney fork of ruby (1.8 derivative) do plan on integrating
it. I'm not sure where they are on it.

I haven't had time to do the same for strings.

···

On 12/2/05, Mauricio Fernández <mfp@acm.org> wrote:

On Fri, Dec 02, 2005 at 07:27:32PM +0900, Ron Jeffries wrote:
Ruby does use COW for arrays and strings, so internal pointers are
reused and the data they point to duplicated when actually needed.

somearray[start, size] will always reuse the internal array.
If I'm reading rb_str_substr correctly, somestring[start, rsize] will
copy the bytes unless start'+rsize == somestring.size (start' being
the adjusted offset), amongst other conditions.

Eric Mahurin wrote:

What does "COW" mean?

Copy On Write.

copy on write

Gary Wright

···

On Dec 2, 2005, at 9:28 AM, Eric Mahurin wrote:

What does "COW" mean?

Thanks to all for the info on this. It sounds like with strings I won't get any
advantage from COW at all on slicing, and that even with arrays I'll not be much
better off.

For my current purposes, I don't see how to use arrays anyway, since I'm
interested primarily in large chunks of contiguous memory.

Any pointers relating to what I'm doing in my Extended Set Theory articles will
be gratefully received. A ping via email will help, as I don't get back here as
frequently as I probably should.

Thanks,

···

--
Ron Jeffries
www.XProgramming.com
I'm giving the best advice I have. You get to decide if it's true for you.

Thanks. It looks like the normal COW usage from this wiki is when the
shared resource starts and ends at the same memory locations for all
uses. I think the conventional COW as described above would apply to
Array#replace and Array#clone, but not Array#slice. The way ruby does
COW is clearly a win for Array#replace and Array#clone, but not for
Array#slice (and many others) as demonstrated in my example.

···

On 12/2/05, gwtmp01@mac.com <gwtmp01@mac.com> wrote:

On Dec 2, 2005, at 9:28 AM, Eric Mahurin wrote:
> What does "COW" mean?

copy on write

Copy-on-write - Wikipedia

can you use narray? it stores the elements as continuous memory. i've got a
patch that allows it's use with mmap to - so a file can be mapped, manipulated
as an narray, and the bytes are persisted to file automagically.

-a

···

On Sun, 4 Dec 2005, Ron Jeffries wrote:

Thanks to all for the info on this. It sounds like with strings I won't get
any advantage from COW at all on slicing, and that even with arrays I'll not
be much better off.

For my current purposes, I don't see how to use arrays anyway, since I'm
interested primarily in large chunks of contiguous memory.

--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

Well, I wouldn't rule it out. Seems like it might have a lot of stuff that I
wouldn't need, being numerically oriented. Its base functionality might be
interesting.

Are there docs for it anywhere handy? I didn't find any with a quick goog.

Thanks,

···

On Sat, 3 Dec 2005 19:12:20 -0700, ara.t.howard@noaa.gov wrote:

can you use narray? it stores the elements as continuous memory. i've got a
patch that allows it's use with mmap to - so a file can be mapped, manipulated
as an narray, and the bytes are persisted to file automagically.

--
Ron Jeffries
www.XProgramming.com
I'm giving the best advice I have. You get to decide if it's true for you.