Customized serialization unreliable?

Hi,

I have a strange problem with the customized serialization...
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

class YYY
   attr_reader :data, :version

   def initialize
     @data = "different objects"
     @version = 1
   end

   def marshal_dump()
      return [@version,@data]
   end

   def marshal_load(var)
      @version = var[0]
      case @version
           when 1
               @data = var[1]
           else
               #do something else
           end
   end
end

The problem I find out is the serialization works randomly, sometime the
object is serialized correctly and sometime not.(since I'm not able to
deserialize the object, some of the attributes of the objects become
nil) also I find out the size of serialized file is different each time.
and I try to put the serialized data in memory, the size of the memory
is also different each time. like this:

                  dumpStr = Marshal.dump(mainResults)
                  puts dumpStr.length.to_s()
                  mainResults2 = Marshal.load(dumpStr)

Does someone know anything about the customized serialization? There is
not a lot of doc about this....

Thanks you very much

Sayoyo

···

--
Posted via http://www.ruby-forum.com/.

sayoyo Sayoyo wrote:

Hi,

I have a strange problem with the customized serialization...
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

Another thing to try: call marshal_dump on your top-level object. You should get a nested structure of arrays and so on. Does this look ok?

Now, feed that data structure back into your top-level marshal_load method. Does this cause the problem too?

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

sayoyo Sayoyo wrote:

Hi,

I have a strange problem with the customized serialization...
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

class YYY
   attr_reader :data, :version

   def initialize
     @data = "different objects"
     @version = 1
   end

   def marshal_dump()
      return [@version,@data]
   end

Your problems most likely are caused by you returning an array instead
of a String. I'm surprised Marshal doesn't complain.

class Foo
  def marshal_dump
    Marshal.dump([@version, @data])
  end
  def marshal_load(data)
    @version, @data = *Marshal.load(data)
  end
  ...
end

This works here quite well without any random outtakes. At least I
haven't yet hit one.

Regards
Stefan

···

--
Posted via http://www.ruby-forum.com/\.

   def marshal_dump()
      return [@version,@data]
   end

If there are actually classes built on instance methods marshal_dump and
marshal_load, then Marshal would be completely ignoring those methods.
Assuming you meant "_dump" which is the instance method to overload for
customized serialization, and either "self._load" or klassname._load
which is the class method for the same, then this would be happening:

Array.to_s will be called in Marshal.dump because it expects a string.
For example, with the YYY class:
@version = 1
@data = ["e", "i"]
_dump returns [1, ["e", "i"]]
The Marshaling process will form "1ei"

Your Loading process with "1ei" will give
@version = 1
@data = "e" # instead of ["e", "i"]

If for whatever reason you are Marshaling the results of your
marshal_dump methods -- Marshal.dump(my_yyy.marshal_dump) --, then you
have neglected to properly handle object graph cycles, and shared
references.

Further, from the ruby-doc Marshal class:
    "Some objects cannot be dumped: if the objects to be dumped include
bindings, procedure or method objects, instances of class IO, or
singleton objects, a TypeError will be raised."

···

--
Posted via http://www.ruby-forum.com/\.

Hi,

First, Thanks you very much for helping me, Yes, I have tried it, and
the serialization still works randomly. no idea why...

I suspect the there is a problem in memory allocation somewhere when
data is written to the "file", since it is an random effect, I can
hardly put a hand on it.

do you know who is the responsible of this part of ruby?

Sayoyo

···

--
Posted via http://www.ruby-forum.com/.

sayoyo Sayoyo wrote:

Hi,

I have a strange problem with the customized serialization...
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

class YYY
   attr_reader :data, :version

   def initialize
     @data = "different objects"
     @version = 1
   end

   def marshal_dump()
      return [@version,@data]
   end

Your problems most likely are caused by you returning an array instead of a String. I'm surprised Marshal doesn't complain.

I don't think a String must be returned? Apparently the standard lib also does not think so:

irb(main):009:0> require 'ostruct'
=> true
irb(main):010:0> o=OpenStruct.new
=> #<OpenStruct>
irb(main):011:0> o.foo=123
=> 123
irb(main):012:0> o.marshal_dump
=> {:foo=>123}
irb(main):013:0> o.marshal_dump.class
=> Hash
irb(main):014:0>

class Foo
  def marshal_dump
    Marshal.dump([@version, @data])
  end
  def marshal_load(data)
    @version, @data = *Marshal.load(data)
  end
  ...
end

This works here quite well without any random outtakes. At least I haven't yet hit one.

I'll show you one - but it's not random. :slight_smile: Your approach with the String works only well for simple cases. But the downside is that it does not handle loops in object graphs properly:

robert@fussel /cygdrive/c/Temp
$ cat marsh.rb
F = Struct.new :x, :y
a = F.new
b = F.new
c = F.new a,b
a.x = c
b.x = c
t1 = Marshal.load(Marshal.dump(c))
p t1.equal?(t1.x.x)
class F
   def marshal_dump
     [x,y]
   end

   def marshal_load(dat)
     self.x, self.y = dat
   end
end
t2 = Marshal.load(Marshal.dump(c))

robert@fussel /cygdrive/c/Temp
$ cat marsh.rb
F = Struct.new :x, :y
a = F.new
b = F.new
c = F.new a,b
a.x = c
b.x = c

t1 = Marshal.load(Marshal.dump(c))
p t1.equal?(t1.x.x)

class F
   def marshal_dump
     [x,y]
   end

   def marshal_load(dat)
     self.x, self.y = dat
   end
end

t2 = Marshal.load(Marshal.dump(c))
p t2.equal?(t2.x.x)

class F
   def marshal_dump
     Marshal.dump([x,y])
   end

   def marshal_load(dat)
     self.x, self.y = Marshal.load(dat)
   end
end

t3 = Marshal.load(Marshal.dump(c))
p t3.equal?(t3.x.x)

robert@fussel /cygdrive/c/Temp
$ ruby marsh.rb
true
marsh.rb:26:in `marshal_dump': stack level too deep (SystemStackError)
         from marsh.rb:26:in `dump'
         from marsh.rb:26:in `marshal_dump'
         from marsh.rb:26:in `dump'
         from marsh.rb:26:in `marshal_dump'
         from marsh.rb:26:in `dump'
         from marsh.rb:26:in `marshal_dump'
         from marsh.rb:26:in `dump'
         from marsh.rb:26:in `marshal_dump'
          ... 15600 levels...
         from marsh.rb:26:in `dump'
         from marsh.rb:26:in `marshal_dump'
         from marsh.rb:34:in `dump'
         from marsh.rb:34

robert@fussel /cygdrive/c/Temp
$

Kind regards

  robert

···

On 23.05.2008 02:29, Stefan Rusterholz wrote:

   def marshal_dump()
      return [@version,@data]
   end

If there are actually classes built on instance methods marshal_dump and marshal_load, then Marshal would be completely ignoring those methods.

I don't think so, at least not in 1.8.6:

robert@fussel /cygdrive/c/Temp
$ irb
irb(main):001:0> class F
irb(main):002:1> def marshal_dump
irb(main):003:2> puts "dump"
irb(main):004:2> "123"
irb(main):005:2> end
irb(main):006:1> def marshal_load(x)
irb(main):007:2> puts "load #{x}"
irb(main):008:2> end
irb(main):009:1> end
=> nil
irb(main):010:0> Marshal.load(Marshal.dump(F.new))
dump
load 123
=> #<F:0x7ff7be4c>
irb(main):011:0>

Assuming you meant "_dump" which is the instance method to overload for customized serialization, and either "self._load" or klassname._load which is the class method for the same, then this would be happening:

I believe this is yet another mechanism to control custom marshalling.

Array.to_s will be called in Marshal.dump because it expects a string.

I don't think so. The mechanism is different. It would be too fragile to depend on #to_s to return something that can be used to deserialize.

For example, with the YYY class:
@version = 1
@data = ["e", "i"]
_dump returns [1, ["e", "i"]]
The Marshaling process will form "1ei"

Your Loading process with "1ei" will give
@version = 1
@data = "e" # instead of ["e", "i"]

If for whatever reason you are Marshaling the results of your marshal_dump methods -- Marshal.dump(my_yyy.marshal_dump) --, then you have neglected to properly handle object graph cycles, and shared references.

Correct. See my other posting for a nice example. :slight_smile:

All in all I believe, if one wants to exclude some fields from serialization (like with "transient" in Java) the best way is to implement #marshal_dump to just return an array of the fields that need to be serialized and deserialized and implement #marshal_load(ar) accordingly. That way Marshal can properly handle loops in object graphs etc.

Kind regards

  robert

···

On 23.05.2008 10:33, Andrew Mitchell wrote:

First, Thanks you very much for helping me, Yes, I have tried it, and the serialization still works randomly. no idea why...

I suspect the there is a problem in memory allocation somewhere when data is written to the "file", since it is an random effect, I can hardly put a hand on it.

My gut feeling rather points to an effect caused by different ordering of objects in a Hash. Or you have an issue caused by a loop in your object graph. As far as I can see custom serialization works ok - at least for non complex structures:

irb(main):016:0> F = Struct.new :a, :b do
irb(main):017:1* def marshal_dump
irb(main):018:2> [a,b]
irb(main):019:2> end
irb(main):020:1>
irb(main):021:1* def marshal_load(x)
irb(main):022:2> self.a = x[0]
irb(main):023:2> self.b = x[1]
irb(main):024:2> end
irb(main):025:1> end
=> F
irb(main):026:0> x = F.new 1,2
=> #<struct F a=1, b=2>
irb(main):027:0> s = Marshal.load(Marshal.dump(x))
=> #<struct F a=1, b=2>
irb(main):028:0>

I am not sure why you need custom serialization. But here is an alternative approach: create a method that returns a data structure which you then serialize and add a class method that constructs your in memory structure from that state. E.g.

class Foo
   attr_accessor :name, :size

   def to_serial
     [name, size]
   end

   def self.from_serial(obj)
     f = new
     f.name = obj[0]
     f.size = obj[1]
     f
   end
end

Of course, this only works if you know what you are deserializing.

Another alternative is to separate "configuration state" (which is serializable, e.g. file name) from "operation state" (which is not serializable, e.g. file descriptor) and serialize only the configuration state. This is probably the cleanest approach.

do you know who is the responsible of this part of ruby?

You probably can find out by looking at the sources.

Kind regards

  robert

···

On 22.05.2008 18:48, sayoyo Sayoyo wrote: