Deep cloning, how?

Robert_K1 · 17 October 2009 15:15

That's what I'd guess, too. Basically the documentation should state for #dup and #clone something like this: "It is not normally necessary to override this method in subclasses. Customization of copying is done via method #initialize_copy."

Kind regards

robert

···

On 17.10.2009 16:19, Rick DeNatale wrote:

On Fri, Oct 16, 2009 at 12:30 PM, Caleb Clausen <vikkous@gmail.com> wrote:

Object#dup does not call new; I think it's more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

And clone doesn't call initialize EITHER:

class A
  def initialize(iv)
    @iv = iv
    puts "initialize called"
  end

  def initialize_copy(arg)
    puts "initialize copy called, my iv is #{@iv}"

  end
end

puts "Creating original"
a = A.new(42)
puts "calling dup"
a1 = a.dup
puts "calling clone"
a2 = a.clone

outputs

Creating original
initialize called
calling dup
initialize copy called, my iv is 42
calling clone
initialize copy called, my iv is 42

It you look at the source code in object.c It becomes apparent that
Object#dup and Object#clone do pretty much the same thing except for
propagating the frozen bit and singleton classes:

VALUE
rb_obj_clone(obj)
    VALUE obj;
{
    VALUE clone;

    if (rb_special_const_p(obj)) {
        rb_raise(rb_eTypeError, "can't clone %s", rb_obj_classname(obj));
    }
    clone = rb_obj_alloc(rb_obj_class(obj));
    RBASIC(clone)->klass = rb_singleton_class_clone(obj);
    RBASIC(clone)->flags = (RBASIC(obj)->flags | FL_TEST(clone,
FL_TAINT)) & ~(FL_FREEZE|FL_FINALIZE);
    init_copy(clone, obj);
    RBASIC(clone)->flags |= RBASIC(obj)->flags & FL_FREEZE;

    return clone;
}

VALUE
rb_obj_dup(obj)
    VALUE obj;
{
    VALUE dup;

    if (rb_special_const_p(obj)) {
        rb_raise(rb_eTypeError, "can't dup %s", rb_obj_classname(obj));
    }
    dup = rb_obj_alloc(rb_obj_class(obj));
    init_copy(dup, obj);

    return dup;
}
static void
init_copy(dest, obj)
    VALUE dest, obj;
{
    if (OBJ_FROZEN(dest)) {
        rb_raise(rb_eTypeError, "[bug] frozen object (%s) allocated",
rb_obj_classname(dest));
    }
    RBASIC(dest)->flags &= ~(T_MASK|FL_EXIVAR);
    RBASIC(dest)->flags |= RBASIC(obj)->flags & (T_MASK|FL_EXIVAR|FL_TAINT);
    if (FL_TEST(obj, FL_EXIVAR)) {
  rb_copy_generic_ivar(dest, obj);
    }
    rb_gc_copy_finalizer(dest, obj);
    switch (TYPE(obj)) {
      case T_OBJECT:
      case T_CLASS:
      case T_MODULE:
  if (ROBJECT(dest)->iv_tbl) {
      st_free_table(ROBJECT(dest)->iv_tbl);
      ROBJECT(dest)->iv_tbl = 0;
  }
  if (ROBJECT(obj)->iv_tbl) {
      ROBJECT(dest)->iv_tbl = st_copy(ROBJECT(obj)->iv_tbl);
  }
    }
    rb_funcall(dest, id_init_copy, 1, obj);
}

This code is from 1.8.6 just cuz that's what I happened to grab.

In both cases the same subroutine is used to create the state of the
new object prior to calling intialize_copy and that subroutine
basically allocates the new object, copies instance variables "under
the table" and then invokes initialize_copy, no initialize method is
ever called on the result object.

Which makes me thing that the whole "+dup+ typically uses the class
of the descendent object to create the new instance" is meaningless,
or untrue. Probably this is a vestige of an older implementation.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Dev_Guy · 17 October 2009 21:05

Rick DeNatale wrote:

Object#dup does not call new; I think it's more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

And clone doesn't call initialize EITHER:

I had made this assumption, otherwise it would have made more sense to
overload initialize to accept the source object that gets passed to initialize_copy ... the code would be ugly as you would need to do type checking at runtime ( if iv.class = A using your sample ) to execute the correct code.

my c++ and copy constructor concept got in the way earlier, ruby doesn't quite do what a c++ developer would expect, the initialize_copy is a cleaner way to do this lacking static type checking

thanks for the sample code to validate this point

···

On Fri, Oct 16, 2009 at 12:30 PM, Caleb Clausen <vikkous@gmail.com> wrote:

class A
  def initialize(iv)
    @iv = iv
    puts "initialize called"
  end

  def initialize_copy(arg)
    puts "initialize copy called, my iv is #{@iv}"

  end
end

puts "Creating original"
a = A.new(42)
puts "calling dup"
a1 = a.dup
puts "calling clone"
a2 = a.clone

outputs

Creating original
initialize called
calling dup
initialize copy called, my iv is 42
calling clone
initialize copy called, my iv is 42

It you look at the source code in object.c It becomes apparent that
Object#dup and Object#clone do pretty much the same thing except for
propagating the frozen bit and singleton classes:

VALUE
rb_obj_clone(obj)
    VALUE obj;
{
    VALUE clone;

    if (rb_special_const_p(obj)) {
        rb_raise(rb_eTypeError, "can't clone %s", rb_obj_classname(obj));
    }
    clone = rb_obj_alloc(rb_obj_class(obj));
    RBASIC(clone)->klass = rb_singleton_class_clone(obj);
    RBASIC(clone)->flags = (RBASIC(obj)->flags | FL_TEST(clone,
FL_TAINT)) & ~(FL_FREEZE|FL_FINALIZE);
    init_copy(clone, obj);
    RBASIC(clone)->flags |= RBASIC(obj)->flags & FL_FREEZE;

    return clone;
}

VALUE
rb_obj_dup(obj)
    VALUE obj;
{
    VALUE dup;

    if (rb_special_const_p(obj)) {
        rb_raise(rb_eTypeError, "can't dup %s", rb_obj_classname(obj));
    }
    dup = rb_obj_alloc(rb_obj_class(obj));
    init_copy(dup, obj);

    return dup;
}
static void
init_copy(dest, obj)
    VALUE dest, obj;
{
    if (OBJ_FROZEN(dest)) {
        rb_raise(rb_eTypeError, "[bug] frozen object (%s) allocated",
rb_obj_classname(dest));
    }
    RBASIC(dest)->flags &= ~(T_MASK|FL_EXIVAR);
    RBASIC(dest)->flags |= RBASIC(obj)->flags & (T_MASK|FL_EXIVAR|FL_TAINT);
    if (FL_TEST(obj, FL_EXIVAR)) {
  rb_copy_generic_ivar(dest, obj);
    }
    rb_gc_copy_finalizer(dest, obj);
    switch (TYPE(obj)) {
      case T_OBJECT:
      case T_CLASS:
      case T_MODULE:
  if (ROBJECT(dest)->iv_tbl) {
      st_free_table(ROBJECT(dest)->iv_tbl);
      ROBJECT(dest)->iv_tbl = 0;
  }
  if (ROBJECT(obj)->iv_tbl) {
      ROBJECT(dest)->iv_tbl = st_copy(ROBJECT(obj)->iv_tbl);
  }
    }
    rb_funcall(dest, id_init_copy, 1, obj);
}

This code is from 1.8.6 just cuz that's what I happened to grab.

In both cases the same subroutine is used to create the state of the
new object prior to calling intialize_copy and that subroutine
basically allocates the new object, copies instance variables "under
the table" and then invokes initialize_copy, no initialize method is
ever called on the result object.

Which makes me thing that the whole "+dup+ typically uses the class
of the descendent object to create the new instance" is meaningless,
or untrue. Probably this is a vestige of an older implementation.

--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely

Brian_Candler · 19 October 2009 08:32

Robert Klemme wrote:

But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.

I don't understand what you mean by that. If #dup calls self.class.new
then you obviously get a new and hence unfrozen object.

It is certainly true that the *default* implementation of both #dup and
#clone (defined in Object) calls initialize_copy. A generic #dup must
behave this way; it doesn't know what the new() method arguments are in
any particular subclass of Object. I don't think this should be taken as
necessarily implying that you are expected to leave #dup alone in your
own classes, and only override #initialize_copy instead.

The way I read the documentation implies to me that #dup in user defined
classes *should* call new. Silly example:

  class NewsReader
    def initialize(url, state_filename)
      @url = url
      @http_client = HTTPClient.new(@url)
      @state_filename = state_filename
      @state_file = File.open(@state_filename)
    end
    def dup
      self.class.new(@url, @state_filename.dup)
    end
  end

Here the logic of how to build a NewsReader, including building all the
associated helper objects, is built into the #initialize method. I don't
think you would want to duplicate all this logic in #initialize_copy.
Furthermore, I think I would expect #clone only to copy the top object,
and leave all the instance variables aliased.

Obviously there are no hard-and-fast rules here, and with Ruby there are
many ways to achieve the same goal.

I'd certainly agree this is an area where Ruby's documentation falls
short.

Taking another example: I don't think you'll disagree that 99% of the
time you are expected to leave Object.new alone and instead define
#initialize in your own classes. But you wouldn't find that out from the
documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()

···

2009/10/16 lith <minilith@gmail.com>:

------------------------------------------------------------------------
Not documented

$ ri Object#initialize
Nothing known about Object#initialize

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 16 October 2009 20:05

Robert Klemme wrote:

But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.

I'd understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).

Also for shallow copy in order to avoid aliasing! IMHO a proper setup
looks like this:

Robert, I like this setup, thanks for the sample code to look over, just discovered why adding 'super' is important, which was missing from my notes and an oversight on my part.

It is sufficient to call 'super' and not 'super source'? if you are passing stuff up the hierarchy construction chain.

You seem to be mixing two things: super in #initialize and #initialize_copy. In #initialize_copy you can simply write "super" (without brackets) because that will make sure the argument list is propagated. You can do this because #initialize_copy will always only have one argument, the object that was duped / cloned.

In the constructor I explicitly wrote "super()" because the super class #initialize does not have arguments and "super" will break as soon as you add parameters to the sub class constructor. Of course, if you change both classes in parallel you can stick with "super".

I am going to conjecture 'super' ends up becoming 'super self', which make sense because the parent constructor don't care about sub class data members. Does that make any sense to you?

No. Neither for #initialize nor for #initialize_copy you want self as argument to super.

Kind regards

robert

···

On 10/16/2009 09:45 PM, Rajinder Yadav wrote:

2009/10/16 lith <minilith@gmail.com>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert_K1 · 19 October 2009 15:10

Robert Klemme wrote:

But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.

I don't understand what you mean by that. If #dup calls self.class.new then you obviously get a new and hence unfrozen object.

It is certainly true that the *default* implementation of both #dup and #clone (defined in Object) calls initialize_copy. A generic #dup must behave this way; it doesn't know what the new() method arguments are in any particular subclass of Object. I don't think this should be taken as necessarily implying that you are expected to leave #dup alone in your own classes, and only override #initialize_copy instead.

The way I read the documentation implies to me that #dup in user defined classes *should* call new. Silly example:

  class NewsReader
    def initialize(url, state_filename)
      @url = url
      @http_client = HTTPClient.new(@url)
      @state_filename = state_filename
      @state_file = File.open(@state_filename)
    end
    def dup
      self.class.new(@url, @state_filename.dup)
    end
  end

Here the logic of how to build a NewsReader, including building all the associated helper objects, is built into the #initialize method.

Brian, the approach shown above does not work well with subclasses. The code attempts to be safe with regard to inheritance (by doing self.class.new instead of NewsReader.new) but it will fail miserably as soon as a sub class constructor has a different argument list (which is not too uncommon).

I completely agree with Rick here: the comment in Object#dup is probably outdated. The most reasonable way to customize object cloning *and* dupping is to implement #initialize_copy in a way to at least ensure no aliasing of unfrozen members takes place.

I don't think you would want to duplicate all this logic in #initialize_copy.

You would not duplicate the logic from #initialize in #initialize_copy because #initialize_copy does a completely different job: it copies state of an instance which is known to be consistent and just needs to ensure that aliasing of object references does not break your class invariants later accidentally. This is the reason why in #initialize_copy different logic should be applied - even for shallow copies! Method #initialize OTOH needs to work with its arguments which were provided from the outside (outside of this class that is) and may not meet expectations or valid ranges.

Furthermore, I think I would expect #clone only to copy the top object, and leave all the instance variables aliased.

As far as I can see both #clone and #dup are meant to do shallow copies but I may be wrong here. At least this is what the contract ob Object promises and I tend to be cautious about changing such things. Even if you redefine semantics to being deep copy for certain classes then implementing it in #initialize_copy is superior to other approaches IMHO.

Obviously there are no hard-and-fast rules here, and with Ruby there are many ways to achieve the same goal.

That's true. But I would say at least when considering inheritance some ways are better than others. In fact I have been doing self.class.new most of the time in #dup because I completely forgot about #initialize_copy. But I will certainly change that habit from now on.

I'd certainly agree this is an area where Ruby's documentation falls short.

Right.

Taking another example: I don't think you'll disagree that 99% of the time you are expected to leave Object.new alone and instead define #initialize in your own classes. But you wouldn't find that out from the documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()
------------------------------------------------------------------------
Not documented

$ ri Object#initialize
Nothing known about Object#initialize

Funny that you mention it: #new and #initialize on one side and #dup / #clone and #initialize_copy on the other side have one thing in common: object allocation is separated from initialization. I believe this was a wise decision because that way allocation policies can be implemented easier than in languages like C++ and Java where both are inseparable.

For example, you can add your own #deep_dup to the language:

class Object
def deep_dup
cp = self.class.allocate

     instance_variables.each do |var|
       cp.instance_variable_set(instance_variable_get(var))
     end

cp.initialize_deep_copy(self)

cp
end

   def initialize_deep_copy(source)
     # nothing to do here
   end
end

class String
   def initialize_deep_copy(source)
     replace source
   end
end

# note this implementation is not robust against
# circles in the object graph!
class Array
   def initialize_deep_copy(source)
     source.each do |y|
       self << y.deep_dup
     end
   end
end

a = %w{foo bar baz}
b = a.dup
b[2].replace "CHANGED"

p a, b

a = %w{foo bar baz}
b = a.deep_dup
b[2].replace "CHANGED"

p a, b

Kind regards

robert

···

On 10/19/2009 10:32 AM, Brian Candler wrote:

2009/10/16 lith <minilith@gmail.com>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Dev_Guy · 19 October 2009 16:14

Robert Klemme wrote:

Taking another example: I don't think you'll disagree that 99% of the time
you are expected to leave Object.new alone and instead define #initialize in
your own classes. But you wouldn't find that out from the documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()
------------------------------------------------------------------------
Not documented

$ ri Object#initialize
Nothing known about Object#initialize

Funny that you mention it: #new and #initialize on one side and #dup /
#clone and #initialize_copy on the other side have one thing in common:
object allocation is separated from initialization. I believe this was a
wise decision because that way allocation policies can be implemented easier
than in languages like C++ and Java where both are inseparable.

I wonder as I mention already maybe this design has more to do with
the fact that Ruby does not perform static type checking like C++ /
Java does at compile time. In C++ you just declare a copy constructor
(initialize/constructor), if you have other (overloaded) constructor
code, then static type checking ensure the correct code logic is
executed, thus allowing you to write a cleaner clone method. In Ruby
if initialize was called during cloning, you would need to add the
logic to perform the dynamic type checking test using Object.class.
Who would want to write this boilerplate code over and over? So Ruby's
was around this was to use initialize_copy as I am going to assume
here.

I think cloning in the initializer code would be a better design if
Ruby did static type checking. The fact Ruby still does (dynamic) type
checking at runtime, means Ruby code gets penalized for performance.

It seems the way Ruby does dup/clone/initialize/initialize_copy *throw
in subclassing* is a source of confusion for many and not really
intuitive, barring good or bad design. The length of this thread and
replies would seem to indicate this is a weakness in Ruby design, or I
am simply biased with my C++ background? Definitely better updated
documentation would help to ensure the correct policy to follow in
Ruby.

···

On Mon, Oct 19, 2009 at 11:10 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

On 10/19/2009 10:32 AM, Brian Candler wrote:

2009/10/16 lith <minilith@gmail.com>:

Kind regards
   robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

--
Kind Regards,
Rajinder Yadav

http://devmentor.org
Do Good! ~ Share Freely

Robert_K1 · 20 October 2009 19:50

Robert Klemme wrote:
Taking another example: I don't think you'll disagree that 99% of the time
you are expected to leave Object.new alone and instead define #initialize in
your own classes. But you wouldn't find that out from the documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()
------------------------------------------------------------------------
Not documented

$ ri Object#initialize
Nothing known about Object#initialize

Funny that you mention it: #new and #initialize on one side and #dup /
#clone and #initialize_copy on the other side have one thing in common:
object allocation is separated from initialization. I believe this was a
wise decision because that way allocation policies can be implemented easier
than in languages like C++ and Java where both are inseparable.

I wonder as I mention already maybe this design has more to do with
the fact that Ruby does not perform static type checking like C++ /
Java does at compile time. In C++ you just declare a copy constructor
(initialize/constructor), if you have other (overloaded) constructor
code, then static type checking ensure the correct code logic is
executed, thus allowing you to write a cleaner clone method.

C++'s copy constructor is not a "clone method". For example, it will happily "clone" any subclass instance. Cloning typically ensures at least the class of the new instance is the same as for the original.

Static typing is just one reason why Ruby and C++ differ here: another important reason is the memory model of both languages. In Ruby you only have object references which can only be copied by value. In C++ on the other hand you have a whole toolbox of options (value objects, pointers, references - plus constant variants). You can see that when looking at Java: it has static typing but just one way to access objects - via references. This is the same model as in Ruby and alas, also Java has a method clone() which behaves similar (although the programming model is different), i.e. it creates a new instance of the same class with all members set to the same references as the original.

Side note: I find Java's cloning is broken in several ways. If you want to make a class Cloneable you can only use "final" for primitive value members because otherwise you cannot prevent aliasing between old and new instance. Then, interface Cloneable does not contain method clone which does not make the compiler catch a missing public method clone(). Lastly, I would have preferred the return type to be generic; although I do have to admit that I did not think this through completely. I guess Sun's engineers had good reasons not to change this.

In Ruby
if initialize was called during cloning, you would need to add the
logic to perform the dynamic type checking test using Object.class.

In other words: you would have to manually implement method overloading.

Who would want to write this boilerplate code over and over? So Ruby's
was around this was to use initialize_copy as I am going to assume
here.

You make it sound like a workaround but it isn't. For a language like Ruby this is a good solution - and compared to Java's it's almost perfect. It just lacks the public recognition.

I think cloning in the initializer code would be a better design if
Ruby did static type checking. The fact Ruby still does (dynamic) type
checking at runtime, means Ruby code gets penalized for performance.

I don't follow you here. If you want a language with static type checking you'll have to look elsewhere. We don't have static type checking in Ruby - in fact it's one of the core assets of the language. Ruby with static typing would not be Ruby. Reasoning about which approach would be best if Ruby had static typing is pretty useless.

It seems the way Ruby does dup/clone/initialize/initialize_copy *throw
in subclassing* is a source of confusion for many and not really
intuitive, barring good or bad design. The length of this thread and
replies would seem to indicate this is a weakness in Ruby design, or I
am simply biased with my C++ background? Definitely better updated
documentation would help to ensure the correct policy to follow in
Ruby.

I would attribute this confusion to the documentation and to the fact
that this is a rare topic to come up. I cannot remember a "how to properly clone objects" thread in the last years that would have covered the topic as thoroughly as we did here.

I don't think we are facing a weakness in Ruby's design here. C++ cannot be a role model for Ruby (regardless of whether you consider C++'s approach good or bad) because both languages are very different as I have tried to show above. It may be that your "C++ background" clouds your view on Ruby.

Thanks for the interesting discussion!

Kind regards

robert

···

On 10/19/2009 06:14 PM, Rajinder Yadav wrote:

On Mon, Oct 19, 2009 at 11:10 AM, Robert Klemme > <shortcutter@googlemail.com> wrote:

On 10/19/2009 10:32 AM, Brian Candler wrote:

2009/10/16 lith <minilith@gmail.com>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

lim · 20 October 2009 20:53

Side note: I find Java's cloning is broken in several ways.

You're not alone:
http://www.artima.com/intv/bloch13.html

Robert_K1 · 20 October 2009 21:55

He must have copied it from me.

Seriously, although I agree to almost everything he says I would like to add that cloning (done properly, for example as done in Ruby) does have advantages over copy construction as well (just to name the most prominent one: you do not need to know the class of the object to clone). In fact, they are two different concepts and sometimes one is more appropriate and sometimes the other one.

Cheers

robert

···

On 20.10.2009 22:53, lith wrote:

Side note: I find Java's cloning is broken in several ways.

You're not alone:
artima - Josh Bloch on Design

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Idioms for dup/clone ruby-talk	6	84	16 June 2004
Object#clone and Object#dup ruby-talk	4	126	15 March 2004
How to make a deep copy of an object (Searching for Idiom) ruby-talk	17	123	14 December 2004
Recursive dup ruby-talk	3	79	5 May 2008
Overriding Object#clone vs. Overriding Object#dup for Deep-Cloning ruby-talk	3	144	20 September 2012

Deep cloning, how?

Related topics