[SUMMARY] SerializableProc (#38)

The solutions this time show some interesting differences in approach, so I want
to walk through a handful of them below. The very first solution was from Robin
Stocker and that's a fine place to start. Here's the class:

  class SerializableProc

    def initialize( block )
      @block = block
      # Test if block is valid.
      to_proc
    end

    def to_proc
      # Raises exception if block isn't valid, e.g. SyntaxError.
      eval "Proc.new{ #{@block} }"
    end

    def method_missing( *args )
      to_proc.send( *args )
    end

  end

It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source is
just a String so we can serialize that with ease and we can always create a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:

  Proc.new {
    puts "Hello world!"
  }

To this:

  SerializableProc.new %q{
    puts "Hello world!"
  }

In the first pure Ruby version we're building a Proc with the block of code to
define the body. In the second SerializableProc version, we're just passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:

  Obvious problems of this approach are the lack of closures and editor
  support (depending on the inverse quality of your editor :P)...

We'll get back to the lack of closures issue later, but I found the "inverse
quality of your editor" claim interesting. The meaning is that a poor editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a String is
being entered, it may continue to syntax highlight the code inside. Of course,
you could always remove the %q whenever you want to see the code highlighting,
but that's tedious.

Getting back to Robin's class, initialize() just stores the String and creates a
Proc from it so an Exception will be thrown at construction time if fed invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing makes
SerializableProc behave close to a Proc. Anytime it sees a method call that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.

We don't see anything specific to Serialization in Robin's code, because both
Marshal (PStore uses Marshal) and YAML can handle a custom class with String
instance data. Like magic, it all just works.

Robin had a complaint though:

  I imagine my solution is not very fast, as each time a method on the
  SerializableProc is called, a new Proc object is created.
  
  The object could be saved in an instance variable @proc so that speed is
  only low on the first execution. But that would require the definition of
  custom dump methods for each Dumper so that it would not attempt to dump
  @proc.

My own solution (and others), do cache the Proc and define some custom dump
methods. Let's have a look at how something like that comes out:

  class SerializableProc
    def self._load( proc_string )
      new(proc_string)
    end

    def initialize( proc_string )
      @code = proc_string
      @proc = nil
    end

    def _dump( depth )
      @code
    end

    def method_missing( method, *args )
      if to_proc.respond_to? method
        @proc.send(method, *args)
      else
        super
      end
    end

    def to_proc( )
      return @proc unless @proc.nil?

      if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
        @proc = eval @code
      elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
        @proc = eval "lambda #{@code}"
      else
        @proc = eval "lambda { #{@code} }"
      end
    end

    def to_yaml( )
      @proc = nil
      super
    end
  end

My initialize() is the same, save that I create a variable to hold the Proc
object and I wasn't clever enough to trigger the early Exception when the code
is bad. My to_proc() looks scary but I just try to accept a wider range of
Strings, wrapping them in only what they need. The end result is the same.
Note that any Proc created is cached. My method_missing() is also very similar.
If the Proc object responds to the method, it is forwarded. The first line of
method_missing() calls to_proc() to ensure we've created one. After that, it
can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to support
Marshal. First, _dump() is expected to return a String that could be used to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in this
case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected by the
Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor and
never has to recreate it:

  require "delegate"
  require "yaml"

  class SProc < DelegateClass(Proc)

      attr_reader :proc_src

      def initialize(proc_src)
          super(eval("Proc.new { #{proc_src} }"))
          @proc_src = proc_src
      end

      def ==(other)
          @proc_src == other.proc_src rescue false
      end

      def inspect
          "#<SProc: #{@proc_src.inspect}>"
      end
      alias :to_s :inspect

      def marshal_dump
          @proc_src
      end

      def marshal_load(proc_src)
          initialize(proc_src)
      end

      def to_yaml(opts = {})
          YAML::quick_emit(self.object_id, opts) { |out|
              out.map("!rubyquiz.com,2005/SProc" ) { |map|
                  map.add("proc_src", @proc_src)
              }
          }
      end

  end

  YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
      SProc.new(val["proc_src"])
  }

Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is defined to
inherit from DelegateClass(Proc), which sets a type for the object so delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass. That's the
instance that will receive forwarded messages. Dominik also defined a custom
==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the same
thing I did, as you can see. The YAML support is different. SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in, whenever
it is encountered. Here's what the class looks like when it's resting in a YAML
file:

  !rubyquiz.com,2005/SProc
  proc_src: |2-
     >*args|
            puts "Hello world"
            print "Args: "
            p args

The advantage here is that the YAML export procedure never touches the Proc so
it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely different
road to solving the problem. Time and space don't allow me to recreate and
annotate the code here, but Florian described the premise well in the submission
message:

  I wrote this a while ago and it works by extracting a proc's origin file
  name and line number from its .inspect string and using the source code
  (which usually does not have to be read from disc) -- it works with
  procs generated in IRB, eval() calls and regular files. It does not work
  from ruby -e and stuff like "foo".instance_eval "lambda {}".source
  probably doesn't work either.
  
  Usage:
  
     code = lambda { puts "Hello World" }
     puts code.source
     Marshal.load(Marshal.dump(code)).call
     YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that
source, and even implements a partial Ruby parser with standard libraries. I'm
telling you, that code reads like a good mystery novel for programmers. Don't
miss it!

One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:

  Proc's documentation tells us that "Proc objects are blocks of code that
  have been bound to a set of local variables." (That is, they are "closures"
  with "bindings".) Do any of the proposed solutions so far store local
  variables?
  
  # That is, can the following Proc be serialized?
    local_var = 42
    code = proc { local_var += 1 } # <= what should that look like in YAML?
    code.call #=> 43

An excellent point. These toys we're creating have serious limitations to be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly serialized.

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...

Ruby Quiz wrote:

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Good stuff, JEGII, Robin, Chris2, Dave.

I can also really sympathize with Chris' disgust over the YAML.add_ruby_type methods... It is undergoing deprecation in favor of:

  class SerializableProc
     yaml_type "tag:rubyquiz.org,2005:SerializableProc"
  end

_why

Has anybody thought about serialized enclosures? I was thinking of a way to use enclosures across multiple apache requests, and came to the conclusion that it was too much trouble. In this case I just use a standard proc object and it gets re-initialized on each requests and don't serialize it, but I always thought it would be nice to maintain some sort of persistent state across requests.

Wouldn't it be possible to write a C extension for serializable closures?

-Jeff

···

----- Original Message ----- From: "Ruby Quiz" <james@grayproductions.net>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, July 14, 2005 6:51 AM
Subject: [SUMMARY] SerializableProc (#38)

The solutions this time show some interesting differences in approach, so I want
to walk through a handful of them below. The very first solution was from Robin
Stocker and that's a fine place to start. Here's the class:

class SerializableProc

  def initialize( block )
    @block = block
    # Test if block is valid.
    to_proc
  end

  def to_proc
    # Raises exception if block isn't valid, e.g. SyntaxError.
    eval "Proc.new{ #{@block} }"
  end

  def method_missing( *args )
    to_proc.send( *args )
  end

end

It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source is
just a String so we can serialize that with ease and we can always create a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:

Proc.new {
puts "Hello world!"
}

To this:

SerializableProc.new %q{
puts "Hello world!"
}

In the first pure Ruby version we're building a Proc with the block of code to
define the body. In the second SerializableProc version, we're just passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:

Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :P)...

We'll get back to the lack of closures issue later, but I found the "inverse
quality of your editor" claim interesting. The meaning is that a poor editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a String is
being entered, it may continue to syntax highlight the code inside. Of course,
you could always remove the %q whenever you want to see the code highlighting,
but that's tedious.

Getting back to Robin's class, initialize() just stores the String and creates a
Proc from it so an Exception will be thrown at construction time if fed invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing makes
SerializableProc behave close to a Proc. Anytime it sees a method call that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.

We don't see anything specific to Serialization in Robin's code, because both
Marshal (PStore uses Marshal) and YAML can handle a custom class with String
instance data. Like magic, it all just works.

Robin had a complaint though:

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.

The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.

My own solution (and others), do cache the Proc and define some custom dump
methods. Let's have a look at how something like that comes out:

class SerializableProc
def self._load( proc_string )
new(proc_string)
end

def initialize( proc_string )
@code = proc_string
@proc = nil
end

def _dump( depth )
@code
end

def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end

def to_proc( )
return @proc unless @proc.nil?

if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end

def to_yaml( )
@proc = nil
super
end

My initialize() is the same, save that I create a variable to hold the Proc
object and I wasn't clever enough to trigger the early Exception when the code
is bad. My to_proc() looks scary but I just try to accept a wider range of
Strings, wrapping them in only what they need. The end result is the same.
Note that any Proc created is cached. My method_missing() is also very similar.
If the Proc object responds to the method, it is forwarded. The first line of
method_missing() calls to_proc() to ensure we've created one. After that, it
can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to support
Marshal. First, _dump() is expected to return a String that could be used to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in this
case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected by the
Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor and
never has to recreate it:

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

    attr_reader :proc_src

    def initialize(proc_src)
        super(eval("Proc.new { #{proc_src} }"))
        @proc_src = proc_src
    end

    def ==(other)
        @proc_src == other.proc_src rescue false
    end

    def inspect
        "#<SProc: #{@proc_src.inspect}>"
    end
    alias :to_s :inspect

    def marshal_dump
        @proc_src
    end

    def marshal_load(proc_src)
        initialize(proc_src)
    end

    def to_yaml(opts = {})
        YAML::quick_emit(self.object_id, opts) { |out|
            out.map("!rubyquiz.com,2005/SProc" ) { |map|
                map.add("proc_src", @proc_src)
            }
        }
    end

end

YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
    SProc.new(val["proc_src"])
}

Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is defined to
inherit from DelegateClass(Proc), which sets a type for the object so delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass. That's the
instance that will receive forwarded messages. Dominik also defined a custom
==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the same
thing I did, as you can see. The YAML support is different. SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in, whenever
it is encountered. Here's what the class looks like when it's resting in a YAML
file:

!rubyquiz.com,2005/SProc
proc_src: |2-
   >*args|
          puts "Hello world"
          print "Args: "
          p args

The advantage here is that the YAML export procedure never touches the Proc so
it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely different
road to solving the problem. Time and space don't allow me to recreate and
annotate the code here, but Florian described the premise well in the submission
message:

I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.

Usage:

   code = lambda { puts "Hello World" }
   puts code.source
   Marshal.load(Marshal.dump(code)).call
   YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that
source, and even implements a partial Ruby parser with standard libraries. I'm
telling you, that code reads like a good mystery novel for programmers. Don't
miss it!

One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
  local_var = 42
  code = proc { local_var += 1 } # <= what should that look like in YAML?
  code.call #=> 43

An excellent point. These toys we're creating have serious limitations to be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly serialized.

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...

why the lucky stiff <ruby-talk@whytheluckystiff.net> writes:

Ruby Quiz wrote:

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Good stuff, JEGII, Robin, Chris2, Dave.

I can also really sympathize with Chris' disgust over the
YAML.add_ruby_type methods... It is undergoing deprecation in favor
of:

  class SerializableProc
     yaml_type "tag:rubyquiz.org,2005:SerializableProc"
  end

And then #yaml_dump and #yaml_load? That would rule.

···

_why

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Jeffrey Moss wrote:

Wouldn't it be possible to write a C extension for serializable closures?

I think NodeWrap does this. See http://rubystuff.org/nodewrap/

It's pretty cool stuff.

Christian Neukirchen wrote:

And then #yaml_dump and #yaml_load? That would rule.

Class.yaml_new or Object.yaml_initialize. And Object.to_yaml.

If folks prefer the Marshal setup, though, I'll change it. It's only been like this for a handful of minor releases.

_why

why the lucky stiff <ruby-talk@whytheluckystiff.net> writes:

Christian Neukirchen wrote:

And then #yaml_dump and #yaml_load? That would rule.

Class.yaml_new or Object.yaml_initialize. And Object.to_yaml.

If folks prefer the Marshal setup, though, I'll change it. It's only
been like this for a handful of minor releases.

Very good too, I'm looking forward to that.

Does this get into 1.8.3 (if that version will ever appear)?

···

_why

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org