Efficient Object Reconstruction?

I have a problem, and it's been bugging me for a while, so I thought
that I'd ask people that know a lot more about the internals of Ruby
than I do.

In PDF::Writer, I have a complex object graph. Its complexity is
mostly caused by the fact that all objects must know their parent
object. Maybe something like:

  foo#35
    bar#36 (@parent = foo#35, ...)
    baz#37 (@parent = foo#35, ...)

When this graph gets roundtripped through Marshal, I might have:

  foo#38
    bar#39 (@parent = foo#35, ...)
    baz#40 (@parent = foo#35, ...)

This means that there are now "odd" copies of at least the parent
(foo#35) around. This is something that *does* happen when
Transaction::Simple is called. Now, if I then add something else:

  foo#38
    bar#39 (@parent = foo#35, ...)
    baz#40 (@parent = foo#35, ...)
    bom#41 (@parent = foo#38, ...)

Obviously, this not only gets silly after a while, but it gets pretty
inefficient.

I want to fix Transaction::Simple so that the object graph is
*properly* reconstructed. That is, if I Marshal.dump the foo with
object ID 35, I don't mind if it's reconstituted as #38, but I want
"old" references to #35 reconstructed as references to #38.

How can I do that? Preferably, how can I do that without crawling the
object graph -- which would be extremely inefficient?

-austin

···

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

Austin Ziegler wrote:

I have a problem, and it's been bugging me for a while, so I thought
that I'd ask people that know a lot more about the internals of Ruby
than I do.

In PDF::Writer, I have a complex object graph. Its complexity is
mostly caused by the fact that all objects must know their parent
object. Maybe something like:

  foo#35
    bar#36 (@parent = foo#35, ...)
    baz#37 (@parent = foo#35, ...)

When this graph gets roundtripped through Marshal, I might have:

  foo#38
    bar#39 (@parent = foo#35, ...)
    baz#40 (@parent = foo#35, ...)

I don't think so. Since the whole graph is copied foo#35 is copied and
references are adjusted so you end up with parent = foo#38 - regardless
whether you copy the root object or any other object (Marshal cannot know
what the root / parent object is anyway):

09:33:39 [ruby]: ruby mar-ref.rb
134688280 134688268
134688148 134688100
false false
134687860 134687896
false false
09:34:41 [ruby]: cat mar-ref.rb
T = Struct.new(:ref)
a =
a << T.new(a)
print a.object_id, " ", a[0].object_id, "\n"
b = Marshal.load(Marshal.dump(a))
print b.object_id, " ", b[0].object_id, "\n"
print b.object_id == a.object_id, " ", b[0].object_id == a[0].object_id,
"\n"
c = Marshal.load(Marshal.dump(a[0]))
print c.ref.object_id, " ", c.object_id, "\n"
print c.ref.object_id == a.object_id, " ", c.object_id == a[0].object_id,
"\n"

This means that there are now "odd" copies of at least the parent
(foo#35) around. This is something that *does* happen when
Transaction::Simple is called. Now, if I then add something else:

  foo#38
    bar#39 (@parent = foo#35, ...)
    baz#40 (@parent = foo#35, ...)
    bom#41 (@parent = foo#38, ...)

Obviously, this not only gets silly after a while, but it gets pretty
inefficient.

This might be a problem of Transaction::Simple (disclaimer, I never used
it). You probably have to copy the complete graph every time and discard
the old one.

I want to fix Transaction::Simple so that the object graph is
*properly* reconstructed. That is, if I Marshal.dump the foo with
object ID 35, I don't mind if it's reconstituted as #38, but I want
"old" references to #35 reconstructed as references to #38.

How can I do that? Preferably, how can I do that without crawling the
object graph -- which would be extremely inefficient?

I don't think there is an easy solution that allows to write an instance
twice to the stream and adjusting references on checkout other than
writing it twice in one Marshal.dump step. The easiest might be to create
a mapping on your own, store this mapping and adjust it after
deserializing.

Kind regards

    robert

Maybe try storing the parent<->child relationships in an external table?

The attached program demonstrates this behaviour using Transaction::Simple.

Any ideas, anyone?

-austin

ref.rb (1.14 KB)

···

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

I don't think that I meaningfully can, because:

(1) we're talking *hundreds* of objects,
(2) we're not talking about a database

That said, it *might* theoretically be possible, but I think that we'd
be adding so much overhead that it wouldn't be funny. (It is possible
for a transaction to be started for any number of reasons, and those
reasons will only be increasing as the layout engine improves.)

-austin

···

On 9/6/05, Trans <transfire@gmail.com> wrote:

Maybe try storing the parent<->child relationships in an external table?

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

Austin Ziegler schrieb:

The attached program demonstrates this behaviour using Transaction::Simple.

Any ideas, anyone?

This seems to be a problem with the implementation of Transaction::Simple.

The problem occurs when you have two objects that hold references to each other, say o1 and o2. When you start a transaction on o1, you store the current state as a Marshal dump of the object graph:

   o1 <-----> o2

When you rewind the transaction, you restore the Marshal dump. The resulting object graph contains new objects o1' and o2', holding references to each other.

   o1' <-----> o2'

Since you don't want to return the new object o1' but reuse the original object o1, you replace all the instance variables of o1 by those of o1'. The resulting object graph then looks like this:

   o1' <-----> o2' <----- o1 <----- o2

I fear that a fix for this problem wouldn't fit into Transaction::*Simple* :frowning:

Regards,
Pit

Austin Ziegler schrieb:

The attached program demonstrates this behaviour using
Transaction::Simple. Any ideas, anyone?

This seems to be a problem with the implementation of
Transaction::Simple.

Well, yes. This is part of the reason that I'm trying to figure out how
to fix it. It's not a *huge* problem, but it *is* an annoying one.

The problem occurs when you have two objects that hold references to
each other, say o1 and o2. When you start a transaction on o1, you
store the current state as a Marshal dump of the object graph:

   o1 <-----> o2

When you rewind the transaction, you restore the Marshal dump. The
resulting object graph contains new objects o1' and o2', holding
references to each other.

   o1' <-----> o2'

Since you don't want to return the new object o1' but reuse the
original object o1, you replace all the instance variables of o1 by
those of o1'. The resulting object graph then looks like this:

   o1' <-----> o2' <----- o1 <----- o2

I fear that a fix for this problem wouldn't fit into
Transaction::*Simple* :frowning:

That's, um, more or less what's going on. What's interesting is that it
only matters when attributes *aside* from containment change during the
transaction.

Got any recommendations on how to fix it ... preferably without walking
the object graph?

The long term goal is to do something like this:

  pdf.start_transaction # stores pdf as pdf'
    # make modifications
  pdf.rewind_transaction # replace pdf with pdf'
    # make modifications
  pdf.commit_transaction # keep pdf

It's the *replace* that I'm having a problem with, and I suspect that
this isn't something I can do without, say, evil.rb. Does anyone have
any suggestions on how might this might be done? I noticed this a while
back (and implemented an API change to gloss over it) when I was trying
to make it so that one could choose to compress a PDF document on save
rather than when it's created.

I suspect (strongly) that I could also save on PDF::Writer memory use if
I could get this working (and working quickly).

It might be acceptable to require a migration to Transaction::Full or something
as yet unwritten, but I *want* it to be pure Ruby, if possible.
(Otherwise, it'll
be something that I make a run-time option.)

-austin

···

On 9/6/05, Pit Capitain <pit@capitain.de> wrote:
--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

Austin Ziegler wrote:

Got any recommendations on how to fix it ... preferably without
walking
the object graph?

As I said, use another level of indirection and store references in a hash
and use that for resolution. You might be able to do that behind the
scenes with a changed attr_accessor.

Kind regards

    robert