How to make a deep copy of an object (Searching for Idiom)

Hello Group,

I sometimes which to make a deep copy of an object. I know I could use Marshal,
but thats slow so I want to write a routine #deep_copy. (Or should I overwrite
#dup ?)

Now the question is, how do you write those. I could use this:

class A
  ...
  def deep_copy
    result = self.dup
    result.field1 = self.field1.deep_copy
    ...
  end
end

or

class A
  def initialize(field1 = 'default value', ...)
    @field1 = field
    ...
  end

  def deep_copy
    self.new(@field1.deep_copy)
  end
end

Which allows me to use instance variables.

Is there something more elegant. What do you prefer? Am I on the right track?

Best regards,

Brian

···

--
Brian Schröder
http://www.brian-schroeder.de/

If your object doesn't have singleton methods you can use this construct:

Marshal.load(Marshal.dump(a))

Cheers,
Kent.

···

On Dec 10, 2004, at 2:59 PM, Brian Schröder wrote:

Hello Group,

I sometimes which to make a deep copy of an object. I know I could use Marshal,
but thats slow so I want to write a routine #deep_copy. (Or should I overwrite
#dup ?)

Now the question is, how do you write those. I could use this:

class A
  ...
  def deep_copy
    result = self.dup
    result.field1 = self.field1.deep_copy
    ...
  end
end

or

class A
  def initialize(field1 = 'default value', ...)
    @field1 = field
    ...
  end

  def deep_copy
    self.new(@field1.deep_copy)
  end
end

Which allows me to use instance variables.

Is there something more elegant. What do you prefer? Am I on the right track?

Best regards,

Brian

--
Brian Schröder
http://www.brian-schroeder.de/

Brian Schröder wrote:

Hello Group,

I sometimes which to make a deep copy of an object. I know I could use
Marshal, but thats slow so I want to write a routine #deep_copy. (Or
should I overwrite
#dup ?)

Brian, the classes in my new RVG library must have deep_copy methods. I
asked this question on c.l.r a few weeks ago but didn't get any
suggestions. I did some searching around and found some an old thread on
ruby_talk which helped, but with ruby-talk down I can't find it right now.

The general form looks like this:

            def deep_copy
                copy = self.class.new
                ivs = instance_variables
                ivs.each do |iv|
                    itv = instance_variable_get(iv)
                    otv = case
                        when itv.nil?
                            nil
                        when itv.respond_to?(:deep_copy)
                            itv.deep_copy
                        when itv.respond_to?(:dup)
                            itv.dup
                        else
                            itv
                        end
                    copy.instance_variable_set(iv, otv)
                end
                return copy
            end

The idea is that the instance variables can refer 1) to other objects that
have a deep_copy method, 2) to "normal" Ruby objects that can be duped, and
3) to immediate objects like Fixnum which don't need to be duped, just
assigned. I also have a special case for nil since it responds to :dup but
can't actually be duped.

If #initialize takes arguments, then you'll need a slightly different
version.

For testing purposes I also implemented a deep_equal method with the same
general form.

P.S. I'd appreciate any hearing any criticisms.

Have you considered the somewhat contrived

class A; attr_accessor :a; end;
a = A.new
a.a = a

or more realistically

class A; attr_accessor :children; def add(name); (@children ||= ) << C.new(name, self) end; end
class C; def initialize(name, parent); @name, @parent = name, parent end end
A.new.add "foo"

?

···

On Sat, Dec 11, 2004 at 04:59:10AM +0900, Brian Schröder wrote:

Hello Group,

I sometimes which to make a deep copy of an object. I know I could use Marshal,
but thats slow so I want to write a routine #deep_copy. (Or should I overwrite
#dup ?)

Now the question is, how do you write those. I could use this:

class A
  ...
  def deep_copy
    result = self.dup
    result.field1 = self.field1.deep_copy
    ...
  end
end

or

class A
  def initialize(field1 = 'default value', ...)
    @field1 = field
    ...
  end

  def deep_copy
    self.new(@field1.deep_copy)
  end
end

Which allows me to use instance variables.

Is there something more elegant. What do you prefer? Am I on the right track?

--
Hassle-free packages for Ruby?
RPA is available from http://www.rubyarchive.org/

Thanks, but that was what I wanted to avoid.

Regards,

Brian

···

On Sat, 11 Dec 2004 07:56:42 +0900 Kent Sibilev <ksibilev@bellsouth.net> wrote:

If your object doesn't have singleton methods you can use this
construct:

Marshal.load(Marshal.dump(a))

--
Brian Schröder
http://www.brian-schroeder.de/

Interesting. But this won't work for instance variables that point to arrays of
deep_copy-able objects. Right?

Maybe this is connected to the "object-state" thread floating around somewehere
else.

Regards,

Brian

···

On Sat, 11 Dec 2004 08:27:25 +0900 Tim Hunter <cyclists@nc.rr.com> wrote:

Brian Schröder wrote:

> Hello Group,
>
> I sometimes which to make a deep copy of an object. I know I could use
> Marshal, but thats slow so I want to write a routine #deep_copy. (Or
> should I overwrite
> #dup ?)

Brian, the classes in my new RVG library must have deep_copy methods. I
asked this question on c.l.r a few weeks ago but didn't get any
suggestions. I did some searching around and found some an old thread on
ruby_talk which helped, but with ruby-talk down I can't find it right now.

The general form looks like this:

            def deep_copy
                copy = self.class.new
                ivs = instance_variables
                ivs.each do |iv|
                    itv = instance_variable_get(iv)
                    otv = case
                        when itv.nil?
                            nil
                        when itv.respond_to?(:deep_copy)
                            itv.deep_copy
                        when itv.respond_to?(:dup)
                            itv.dup
                        else
                            itv
                        end
                    copy.instance_variable_set(iv, otv)
                end
                return copy
            end

The idea is that the instance variables can refer 1) to other objects that
have a deep_copy method, 2) to "normal" Ruby objects that can be duped, and
3) to immediate objects like Fixnum which don't need to be duped, just
assigned. I also have a special case for nil since it responds to :dup but
can't actually be duped.

If #initialize takes arguments, then you'll need a slightly different
version.

For testing purposes I also implemented a deep_equal method with the same
general form.

P.S. I'd appreciate any hearing any criticisms.

--
Brian Schröder
http://ruby.brian-schroeder.de/

Yes, my structures are not cyclic.

Regards,

Brian

···

On Sat, 11 Dec 2004 19:03:22 +0900 Mauricio Fernández <batsman.geo@yahoo.com> wrote:

On Sat, Dec 11, 2004 at 04:59:10AM +0900, Brian Schröder wrote:
> Hello Group,
>
> I sometimes which to make a deep copy of an object. I know I could use
> Marshal, but thats slow so I want to write a routine #deep_copy. (Or should
> I overwrite#dup ?)
>
> Now the question is, how do you write those. I could use this:
>
> class A
> ...
> def deep_copy
> result = self.dup
> result.field1 = self.field1.deep_copy
> ...
> end
> end
>
> or
>
> class A
> def initialize(field1 = 'default value', ...)
> @field1 = field
> ...
> end
>
> def deep_copy
> self.new(@field1.deep_copy)
> end
> end
>
> Which allows me to use instance variables.
>
> Is there something more elegant. What do you prefer? Am I on the right
> track?

Have you considered the somewhat contrived

class A; attr_accessor :a; end;
a = A.new
a.a = a

or more realistically

class A; attr_accessor :children; def add(name); (@children ||= ) <<
C.new(name, self) end; end class C; def initialize(name, parent); @name,
@parent = name, parent end end A.new.add "foo"

?

--
Brian Schröder
http://www.brian-schroeder.de/

Brian Schröder wrote:

Interesting. But this won't work for instance variables that point to
arrays of deep_copy-able objects. Right?

I knew you'd spot that :slight_smile: In a couple of cases I had to replace Arrays with
something like this:

        class Content < Array
            def deep_copy
                copy = self.class.new
                each do |c|
                    copy << case
                        when c.nil?
                            nil
                        when c.respond_to?(:deep_copy)
                            c.deep_copy
                        when c.respond_to?(:dup)
                            c.dup
                        else
                            c
                        end
                end
                return copy
            end
        end

And, yes, I have to be careful not to use methods that return Array objects.

I've also been thinking about deep_copy, but for a different reason. I
want to keep a mutation log for each object instance. So if this
occurs:

foo = Obj.new(:blah)
foo.warp_blah!
foo.twist_blah = :different_blah

Then somewhere, preferably abstracted inside each object instance, I'd
have a log like:

puts foo.mutation_log => ["wrap_blah!", ["twist_blah=",
:different_blah]]

When the method Obj.checkpoint is called, the log is cleared and the
current state returned as a Marshall-ing, or some other store-savable
format.

I understand this is not precisely the same thing as deep_copy, but I
wonder if anyone else has done this? Obviously or not, I'm looking to
implement an orthogonal persistence object system in ruby.
Cheers,
Chuckl

Maybe it would make sense to extend the base classes Object, Array, Hash with a
deep-copy functionality. That would be something for the extensions project.

The problem here is, that we have object state that is not contained in
"visible slots" i.e. instance variables. So this would be one case, where the
proposal for a

#each_state_slot
#assign_to_state_slot(key, value),

extension of all ruby objects would make sense. Then a deep copy would be dead
simple. (Except for recursive structures, ...)

Note, that in my view no objects with infinite state slots exist, because they
would have to be generated from more fundamental state. E.g.

class Infinite
  def each_state_slot
    i = 0
    loop do yield(i+=1, :state) end
  end
end

would simply be a wrong implementation. Infinite in this example has no own
state.

Regards,

Brian

···

On Sat, 11 Dec 2004 09:22:24 +0900 Tim Hunter <cyclists@nc.rr.com> wrote:

Brian Schröder wrote:

> Interesting. But this won't work for instance variables that point to
> arrays of deep_copy-able objects. Right?

I knew you'd spot that :slight_smile: In a couple of cases I had to replace Arrays with
something like this:

        class Content < Array
            def deep_copy
                copy = self.class.new
                each do |c|
                    copy << case
                        when c.nil?
                            nil
                        when c.respond_to?(:deep_copy)
                            c.deep_copy
                        when c.respond_to?(:dup)
                            c.dup
                        else
                            c
                        end
                end
                return copy
            end
        end

And, yes, I have to be careful not to use methods that return Array objects.

--
Brian Schröder
http://www.brian-schroeder.de/

It's not quite the same as what you're saying (it's difficult to do a
true mutation log if your code doesn't write to a mutation log by
defaultt), but look at Transaction::Simple. It does live object
transactions and checkpointing.

-austin

···

On Sat, 11 Dec 2004 22:42:25 +0900, Chuckl <Killian2422@yahoo.com> wrote:

I've also been thinking about deep_copy, but for a different reason. I
want to keep a mutation log for each object instance. So if this
occurs:

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

"Chuckl" <Killian2422@yahoo.com> wrote in message

I've also been thinking about deep_copy, but for a different reason. I
want to keep a mutation log for each object instance. So if this
occurs:

foo = Obj.new(:blah)
foo.warp_blah!
foo.twist_blah = :different_blah

Then somewhere, preferably abstracted inside each object instance, I'd
have a log like:

puts foo.mutation_log => ["wrap_blah!", ["twist_blah=",
:different_blah]]

When the method Obj.checkpoint is called, the log is cleared and the
current state returned as a Marshall-ing, or some other store-savable
format.

Madeline does something quite similar http://madeleine.sourceforge.net/ ...
may be worth a look.

"Brian Schröder" <ruby@brian-schroeder.de> schrieb im Newsbeitrag news:20041211105957.76dae75d@black.wg...

Brian Schröder wrote:

> Interesting. But this won't work for instance variables that point to
> arrays of deep_copy-able objects. Right?

I knew you'd spot that :slight_smile: In a couple of cases I had to replace Arrays with
something like this:

        class Content < Array
            def deep_copy
                copy = self.class.new
                each do |c|
                    copy << case
                        when c.nil?
                            nil
                        when c.respond_to?(:deep_copy)
                            c.deep_copy
                        when c.respond_to?(:dup)
                            c.dup
                        else
                            c
                        end
                end
                return copy
            end
        end

And, yes, I have to be careful not to use methods that return Array objects.

Maybe it would make sense to extend the base classes Object, Array, Hash with a
deep-copy functionality. That would be something for the extensions project.

IMHO not. Reason beeing that the semantics of deep copy are class dependend. You might not want to copy all instances in an object graph for deep copy and that might totally depend on the class at hand and / or (even worse) application. IMHO there is no real general solution to thid. Of course you could define a method in Object like

def deep_copy
  Marshal.load( Marshal.dump( self ) )
end

but you don't gain much that way. And it won't even work in the general case (consider Singletons etc.).

The problem here is, that we have object state that is not contained in
"visible slots" i.e. instance variables. So this would be one case, where the
proposal for a

There's a much more serious problem with the proposed implementation: it does not cope with graphs of objects that contain cycles. Do do that you need to keep track of objects copied already. Marshal does this - and it's efficient. If you want to do that yourself, you'll likely need a hash[old oid -> copy] to manage this. I doubt though that it's more efficient than Marshal.

I had to discover that there's more to deep copying than just traversing the object graph and copying each instance in turn a while ago myself. I help you can benefit from my earlier errors... :slight_smile:

Kind regards

    robert

···

On Sat, 11 Dec 2004 09:22:24 +0900 > Tim Hunter <cyclists@nc.rr.com> wrote:

Thanks -- Madeline looks terrific.

Ok, then back to the original question. If I don't want a general solution, but
want to deep-copy my special class, that contains some instance variables, with
some multi-dimensional arrays in it. What is a nice way to go about it?

Regards,

Brian

···

On Sat, 11 Dec 2004 19:42:23 +0900 "Robert Klemme" <bob.news@gmx.net> wrote:

"Brian Schröder" <ruby@brian-schroeder.de> schrieb im Newsbeitrag
news:20041211105957.76dae75d@black.wg...
> On Sat, 11 Dec 2004 09:22:24 +0900 > > Tim Hunter <cyclists@nc.rr.com> wrote:
>
>> Brian Schröder wrote:
>>
>> > Interesting. But this won't work for instance variables that point to
>> > arrays of deep_copy-able objects. Right?
>>
>> I knew you'd spot that :slight_smile: In a couple of cases I had to replace Arrays
>> with
>> something like this:
>>
>> class Content < Array
>> def deep_copy
>> copy = self.class.new
>> each do |c|
>> copy << case
>> when c.nil?
>> nil
>> when c.respond_to?(:deep_copy)
>> c.deep_copy
>> when c.respond_to?(:dup)
>> c.dup
>> else
>> c
>> end
>> end
>> return copy
>> end
>> end
>>
>> And, yes, I have to be careful not to use methods that return Array
>> objects.
>>
>
> Maybe it would make sense to extend the base classes Object, Array, Hash
> with a
> deep-copy functionality. That would be something for the extensions
> project.

IMHO not. Reason beeing that the semantics of deep copy are class
dependend. You might not want to copy all instances in an object graph for
deep copy and that might totally depend on the class at hand and / or (even
worse) application. IMHO there is no real general solution to thid. Of
course you could define a method in Object like

def deep_copy
  Marshal.load( Marshal.dump( self ) )
end

but you don't gain much that way. And it won't even work in the general
case (consider Singletons etc.).

> The problem here is, that we have object state that is not contained in
> "visible slots" i.e. instance variables. So this would be one case, where
> the
> proposal for a

There's a much more serious problem with the proposed implementation: it
does not cope with graphs of objects that contain cycles. Do do that you
need to keep track of objects copied already. Marshal does this - and it's
efficient. If you want to do that yourself, you'll likely need a hash[old
oid -> copy] to manage this. I doubt though that it's more efficient than
Marshal.

I had to discover that there's more to deep copying than just traversing the
object graph and copying each instance in turn a while ago myself. I help
you can benefit from my earlier errors... :slight_smile:

Kind regards

    robert

--
Brian Schröder
http://www.brian-schroeder.de/

If your deep copy corresponds to a notion of object state deeper than its
immediate instance variables, and you need to do other state-related
traversals of your object graph, this might help.

"Robert Klemme" <bob.news@gmx.net> wrote in message
news:32010qF3gk6feU1@individual.net...

There's a much more serious problem with the proposed implementation: it
does not cope with graphs of objects that contain cycles.

Often a graph with cycles can be naturally decomposed into:
(a) a tree(or several trees), plus
(b) a bunch of non-tree edges

If your objects have a 'containment' structure, or clean boundaries of
object state, or a natural XML representation of element nesting vs. idrefs,
you likely have such a tree structure. You can iterate separately over the
two subgraphs: the tree portion (knowing it will have no cycles), and the
cross-tree edges (knowing all nodes to be copied have already been copied).

class MyNode
    #each_tree_edge #each_cross_tree_edge are application dependent
    # parent_node, source_node params not needed for deep_copy
    def each_tree_edge # {|parent_node, child_attr, child_node| .. }
        # class-specific iterator
    end
    def each_cross_tree_edge # {|source, cross_attr, related_node| .. }
        # class-specific iterator
    end
    def rec_each_tree_edge &block
        each_tree_edge { |p, a, n|
            yield p, a, n
            n.rec_each_tree_edge &block
        }
    end
    def rec_each_cross_tree_edge &block
        each_cross_tree_edge {|s, a, n| yield s, a, n}
        each_tree_edge {|s, a, n| n.rec_each_cross_tree_edge &block}
    end
    def deep_copy (h = {})
        cp = self.class.new
        h[self.id] = cp
        each_tree_edge { |p, iv, n|
            cp.instance_variable_set iv, n.deep_copy(h)
        }
        rec_each_cross_tree_edge { |s, iv, n|
            cp.instance_variable_set iv, h[n.id]
            # assumes no sharing between copy and original
            # alternately: cp.instance_variable_set iv, (h[n.id] || n)
        }
        cp
    end
end

# example
class House
    attr_accessor :roof, :garage, :subdivision
    def each_tree_edge
        yield self, "@roof", roof
        yield self, "@garage", garage
    end
    def each_cross_tree_edge
        yield self, "@subdivision", subdivision
    end
end

class Garage
    attr_accessor :door, :house
    def each_tree_edge
        yield self, "@door", door
    end
    def each_cross_tree_edge
        yield self, "@house", house
    end
end

If all state is in instance variables you could implement Object#deep_copy
once. Otherwise you need special case handling for Arrays, Hashes, and other
built-ins; and may need to further distinguish arrays used for multiple tree
edges vs. for multiple cross-tree edges. A few judicious macros can hide the
fact that the array itself is purely a Ruby implementation artifact for a
many-relationship, whether tree or cross_tree, and generate the iterators,
so all you would need to say is:
class House
    contains_one :roof, :garage # tree
    has_one :subdivision # cross tree
    contains_many :rooms # tree
    has_many :neighbors # cross tree
end
class Garage
    contains_one :door
    has_one :house # smarter macro would do inverses
end

You could get a consistent family of methods -- deep_copy, deep_equal,
deep_freeze, marshal_dump, marshal_load, etc. -- automatically from such
macros (except that #copy and #equal can be controlled at the level of
individual instance variables or accessors, while the built-in #freeze
cannot).

If you need finer-grained control than overriding the iterators at the class
level (similar to the earlier discussion of #freeze and object state), you
could parameterize on which accessors need to be traversed from a given node
i.e. pass around some kind of 'metagraph' parameter: a metanode could be as
simple as a hash of accessor symbols, each mapping to another corresponding
metanode. A metagraph, applied to some graph of objects, exposes (projects)
some portion of that graph via #each.... But then again, at some point you
might really be better off with a good macro library that can manipulate
relations (nested, not flat) and the kind of projection operation the
metagraph is trying to do :slight_smile:

> Maybe it would make sense to extend the base classes Object, Array, Hash
> with a
> deep-copy functionality. That would be something for the extensions
> project.

IMHO not. Reason beeing that the semantics of deep copy are class
dependend. You might not want to copy all instances in an object graph

for

deep copy and that might totally depend on the class at hand and / or

(even

worse) application. IMHO there is no real general solution to thid. Of
course you could define a method in Object like

True, they are (at least) class dependent. The above is one approach to
providing that
customization by deferring the tree- and nontree-traversal control, and
providing a generic Object#deep_copy based on those methods. This can then
be used for other traversals besides deep_copy.

But does it work ? and is it worth it ? :slight_smile:

"Brian Schröder" <ruby@brian-schroeder.de> schrieb im Newsbeitrag news:20041211115044.674cfc6a@black.wg...

"Brian Schröder" <ruby@brian-schroeder.de> schrieb im Newsbeitrag
news:20041211105957.76dae75d@black.wg...
>
>> Brian Schröder wrote:
>>
>> > Interesting. But this won't work for instance variables that point >> > to
>> > arrays of deep_copy-able objects. Right?
>>
>> I knew you'd spot that :slight_smile: In a couple of cases I had to replace >> Arrays
>> with
>> something like this:
>>
>> class Content < Array
>> def deep_copy
>> copy = self.class.new
>> each do |c|
>> copy << case
>> when c.nil?
>> nil
>> when c.respond_to?(:deep_copy)
>> c.deep_copy
>> when c.respond_to?(:dup)
>> c.dup
>> else
>> c
>> end
>> end
>> return copy
>> end
>> end
>>
>> And, yes, I have to be careful not to use methods that return Array
>> objects.
>>
>
> Maybe it would make sense to extend the base classes Object, Array, > Hash
> with a
> deep-copy functionality. That would be something for the extensions
> project.

IMHO not. Reason beeing that the semantics of deep copy are class
dependend. You might not want to copy all instances in an object graph for
deep copy and that might totally depend on the class at hand and / or (even
worse) application. IMHO there is no real general solution to thid. Of
course you could define a method in Object like

def deep_copy
  Marshal.load( Marshal.dump( self ) )
end

but you don't gain much that way. And it won't even work in the general
case (consider Singletons etc.).

> The problem here is, that we have object state that is not contained in
> "visible slots" i.e. instance variables. So this would be one case, > where
> the
> proposal for a

There's a much more serious problem with the proposed implementation: it
does not cope with graphs of objects that contain cycles. Do do that you
need to keep track of objects copied already. Marshal does this - and it's
efficient. If you want to do that yourself, you'll likely need a hash[old
oid -> copy] to manage this. I doubt though that it's more efficient than
Marshal.

I had to discover that there's more to deep copying than just traversing the
object graph and copying each instance in turn a while ago myself. I help
you can benefit from my earlier errors... :slight_smile:

Kind regards

    robert

Ok, then back to the original question. If I don't want a general solution, but
want to deep-copy my special class, that contains some instance variables, with
some multi-dimensional arrays in it. What is a nice way to go about it?

This seems to work fairly good:

class Object
  def deep_copy(h = {})
    ident = self.id
    cpy = h[ident]

    unless cpy
      cpy = case self
        when String
          frozen? ? self : dup
        when Array
          map {|o| o.deep_copy(h)}
        when Hash
          # this looses some state (i.e. default etc.)
          inject({}) {|c, (k,v)| c[k.deep_copy(h)] = v.deep_copy(h); c}
        when Class, Module, Symbol, FalseClass, TrueClass, NilClass, Fixnum, Bignum
          self
        # probably more special cases like Struct
        else
          cpy = self.class.allocate

          instance_variables.each do |var|
            cpy.instance_variable_set(var, instance_variable_get(var).deep_copy(h))
          end

          cpy
      end

      cpy.freeze if frozen?
      h[ident] = cpy
    end

    cpy
  end
end

Kind regards

    robert

···

On Sat, 11 Dec 2004 19:42:23 +0900 > "Robert Klemme" <bob.news@gmx.net> wrote:

> On Sat, 11 Dec 2004 09:22:24 +0900 >> > Tim Hunter <cyclists@nc.rr.com> wrote:

"itsme213" <itsme213@hotmail.com> schrieb im Newsbeitrag
news:XCtvd.2096$jf5.492@fe1.texas.rr.com...

But does it work ? and is it worth it ? :slight_smile:

A good question. A very good question. :-))

Kind regards

    robert