Memory behavior of String#dup

Hi all,

does String#dup also copy the byte sequence of the string or does it only
copy a reference and does a copy on write?

robert

Hi,

···

In message “Memory behavior of String#dup” on 03/06/30, “Robert Klemme” bob.news@gmx.net writes:

does String#dup also copy the byte sequence of the string or does it only
copy a reference and does a copy on write?

When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

						matz.

“Yukihiro Matsumoto” matz@ruby-lang.org schrieb im Newsbeitrag
news:1056962193.833961.26048.nullmailer@picachu.netlab.jp…

does String#dup also copy the byte sequence of the string or does it
only
copy a reference and does a copy on write?

When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

s1 = “foo”
s2 = s2.dup

So, if I understand you correctly s1 and s2 don’t share the same byte
sequence since s1 is the only string referring tho the sequence “foo” when
the dup occurs (i.e. the sequence is not shared). Is that correct?

The question why I’m asking is, that for hashes where an entry shares the
key (either directly because it is the same string in h[s1]=s1 or
indirectly because the value is an instance that refers the key
indirectly) there would be enourmous memory consumption if all those
dup’ed hash key strings did also contain a copy of the byte sequence. The
problem I have with this duping is that I can’t prevent it. So there’s at
least the overhead of a new created String instance, because apparently (v
1.7.3) the Hash doesn’t honor the freeze state of the string.

If that change has not been incorporated I suggest doing the dup only if a
string is not frozen. Otherwise the user has no chance to avoid the dup
for strings.

Regards

robert

h = Hash.new

s1 = “key 1”
s2 = “key 2”
s2.freeze

h[s1]=s1
h[s2]=s2

h.each do |k,v|
puts “#{k}=>#{v}”
puts “#{k.id}=>#{v.id}”
case k
when s1
p [k.equal?( s1 ), v.equal?( s1 )]
when s2
p [k.equal?( s2 ), v.equal?( s2 )]
end
end

yields

key 1=>key 1
22381332=>22394808
[false, true]
key 2=>key 2
22376868=>22390356
[false, true]

···

In message “Memory behavior of String#dup” > on 03/06/30, “Robert Klemme” bob.news@gmx.net writes:

Hi,

When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

s1 = “foo”
s2 = s2.dup

So, if I understand you correctly s1 and s2 don’t share the same byte
sequence since s1 is the only string referring tho the sequence “foo” when
the dup occurs (i.e. the sequence is not shared). Is that correct?

In this case, fortunately memory is shared. Since all literal strings
have their copy-on-write entries internally.

The question why I’m asking is, that for hashes where an entry shares the
key (either directly because it is the same string in h[s1]=s1 or
indirectly because the value is an instance that refers the key
indirectly) there would be enourmous memory consumption if all those
dup’ed hash key strings did also contain a copy of the byte sequence. The
problem I have with this duping is that I can’t prevent it. So there’s at
least the overhead of a new created String instance, because apparently (v
1.7.3) the Hash doesn’t honor the freeze state of the string.

String hash keys are duped and frozen with their memory shared. Is
this what you want to hear?

						matz.
···

In message “Re: Memory behavior of String#dup” on 03/06/30, “Robert Klemme” bob.news@gmx.net writes:

“Yukihiro Matsumoto” matz@ruby-lang.org schrieb im Newsbeitrag
news:1056966350.233426.26926.nullmailer@picachu.netlab.jp…

Hi,

When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

s1 = “foo”
s2 = s2.dup

So, if I understand you correctly s1 and s2 don’t share the same byte
sequence since s1 is the only string referring tho the sequence “foo”
when
the dup occurs (i.e. the sequence is not shared). Is that correct?

In this case, fortunately memory is shared. Since all literal strings
have their copy-on-write entries internally.

Ok, then I possibly didn’t understand you correctly.

The question why I’m asking is, that for hashes where an entry shares
the
key (either directly because it is the same string in h[s1]=s1 or
indirectly because the value is an instance that refers the key
indirectly) there would be enourmous memory consumption if all those
dup’ed hash key strings did also contain a copy of the byte sequence.
The
problem I have with this duping is that I can’t prevent it. So there’s
at
least the overhead of a new created String instance, because apparently
(v
1.7.3) the Hash doesn’t honor the freeze state of the string.

String hash keys are duped and frozen with their memory shared. Is
this what you want to hear?

:-)) Yeah, that sounds good.

Though I still worry about the overhead of one more ruby instance (there
must be some bookkeeping done etc.). Is this neglectible?

robert
···

In message “Re: Memory behavior of String#dup” > on 03/06/30, “Robert Klemme” bob.news@gmx.net writes:

Hi,

···

In message “Re: Memory behavior of String#dup” on 03/06/30, “Robert Klemme” bob.news@gmx.net writes:

Though I still worry about the overhead of one more ruby instance (there
must be some bookkeeping done etc.). Is this neglectible?

I guess so. It’s only 20 bytes per object on 32 bit CPU.

						matz.

“Yukihiro Matsumoto” matz@ruby-lang.org schrieb im Newsbeitrag
news:1056982404.612203.27747.nullmailer@picachu.netlab.jp…

Hi,

Though I still worry about the overhead of one more ruby instance
(there
must be some bookkeeping done etc.). Is this neglectible?

I guess so. It’s only 20 bytes per object on 32 bit CPU.

Hm, that amounts to 2 million bytes for 100000 instances - which is not to
much IMHO. Plus, there will be some overheads for object lookups I guess.

I’d like to propose the change to not dup frozen strings as Hash keys.
Should I enter an RCR? Do we discuss this here?

Regards

robert
···

In message “Re: Memory behavior of String#dup” > on 03/06/30, “Robert Klemme” bob.news@gmx.net writes:

Hi,

···

In message “Re: Memory behavior of String#dup” on 03/07/02, “Robert Klemme” bob.news@gmx.net writes:

Hm, that amounts to 2 million bytes for 100000 instances - which is not to
much IMHO. Plus, there will be some overheads for object lookups I guess.

I’d like to propose the change to not dup frozen strings as Hash keys.
Should I enter an RCR? Do we discuss this here?

Early optimization is the source of all evil. :wink:

Putting joke aside, frozen key string is very useful for finding
bugs. So I think optimization should be done differently.

						matz.

Hi,

···

In message “Re: Memory behavior of String#dup” on 03/07/02, Yukihiro Matsumoto matz@ruby-lang.org writes:

Putting joke aside, frozen key string is very useful for finding
bugs. So I think optimization should be done differently.

Your suggestion inspired me a new dup-freeze optimization. It’ll be
available soon on the CVS. Thank you.

						matz.

“Yukihiro Matsumoto” matz@ruby-lang.org schrieb im Newsbeitrag
news:1057156227.243350.7287.nullmailer@picachu.netlab.jp…

Hi,

Putting joke aside, frozen key string is very useful for finding
bugs. So I think optimization should be done differently.

You lost me here. Maybe I wasn’t clear enough and we have a
misunderstanding. I meant - quite informally:

class Hash
def =(key, val)
if key.kind_of? String && !key.frozen?
key = key.dup
key.freeze
end

now insert key and value

end
end

Your suggestion inspired me a new dup-freeze optimization. It’ll be
available soon on the CVS. Thank you.

You’re welcome! Do you mean a specialized dup method that returns self if
frozen like

class Object
def dupFreeze
frozen? ? self : dup
end
end

Kind regards

robert
···

In message “Re: Memory behavior of String#dup” > on 03/07/02, Yukihiro Matsumoto matz@ruby-lang.org writes:

Hi,

···

In message “Re: Memory behavior of String#dup” on 03/07/03, “Robert Klemme” bob.news@gmx.net writes:

Your suggestion inspired me a new dup-freeze optimization. It’ll be
available soon on the CVS. Thank you.

You’re welcome! Do you mean a specialized dup method that returns self if
frozen like

Yes. Also this specialized dup returns hidden shared string without
making copy if it is available.

						matz.

“Yukihiro Matsumoto” matz@ruby-lang.org schrieb im Newsbeitrag
news:1057166023.426462.7561.nullmailer@picachu.netlab.jp…

Hi,

Your suggestion inspired me a new dup-freeze optimization. It’ll be
available soon on the CVS. Thank you.

You’re welcome! Do you mean a specialized dup method that returns self
if
frozen like

Yes. Also this specialized dup returns hidden shared string without
making copy if it is available.

Sounds great! Thanks a lot!

robert
···

In message “Re: Memory behavior of String#dup” > on 03/07/03, “Robert Klemme” bob.news@gmx.net writes: