The solutions this time show some interesting differences in approach, so I want
to walk through a handful of them below. The very first solution was from Robin
Stocker and that's a fine place to start. Here's the class:
class SerializableProc
def initialize( block )
@block = block
# Test if block is valid.
to_proc
end
def to_proc
# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end
def method_missing( *args )
to_proc.send( *args )
end
end
It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source is
just a String so we can serialize that with ease and we can always create a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:
Proc.new {
puts "Hello world!"
}
To this:
SerializableProc.new %q{
puts "Hello world!"
}
In the first pure Ruby version we're building a Proc with the block of code to
define the body. In the second SerializableProc version, we're just passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:
Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :P)...
We'll get back to the lack of closures issue later, but I found the "inverse
quality of your editor" claim interesting. The meaning is that a poor editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a String is
being entered, it may continue to syntax highlight the code inside. Of course,
you could always remove the %q whenever you want to see the code highlighting,
but that's tedious.
Getting back to Robin's class, initialize() just stores the String and creates a
Proc from it so an Exception will be thrown at construction time if fed invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing makes
SerializableProc behave close to a Proc. Anytime it sees a method call that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.
We don't see anything specific to Serialization in Robin's code, because both
Marshal (PStore uses Marshal) and YAML can handle a custom class with String
instance data. Like magic, it all just works.
Robin had a complaint though:
I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.
The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.
My own solution (and others), do cache the Proc and define some custom dump
methods. Let's have a look at how something like that comes out:
class SerializableProc
def self._load( proc_string )
new(proc_string)
end
def initialize( proc_string )
@code = proc_string
@proc = nil
end
def _dump( depth )
@code
end
def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end
end
def to_proc( )
return @proc unless @proc.nil?
if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end
end
def to_yaml( )
@proc = nil
super
end
end
My initialize() is the same, save that I create a variable to hold the Proc
object and I wasn't clever enough to trigger the early Exception when the code
is bad. My to_proc() looks scary but I just try to accept a wider range of
Strings, wrapping them in only what they need. The end result is the same.
Note that any Proc created is cached. My method_missing() is also very similar.
If the Proc object responds to the method, it is forwarded. The first line of
method_missing() calls to_proc() to ensure we've created one. After that, it
can safely use the @proc variable.
The _load() class method and _dump() instance method is what it takes to support
Marshal. First, _dump() is expected to return a String that could be used to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in this
case, since we're using the source.
There are multiple ways to support YAML serialization, but I opted for the super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected by the
Proc's absence and it will recreate it when needed.
Taking one more step, Dominik Bathon builds the Proc in the constructor and
never has to recreate it:
require "delegate"
require "yaml"
class SProc < DelegateClass(Proc)
attr_reader :proc_src
def initialize(proc_src)
super(eval("Proc.new { #{proc_src} }"))
@proc_src = proc_src
end
def ==(other)
@proc_src == other.proc_src rescue false
end
def inspect
"#<SProc: #{@proc_src.inspect}>"
end
alias :to_s :inspect
def marshal_dump
@proc_src
end
def marshal_load(proc_src)
initialize(proc_src)
end
def to_yaml(opts = {})
YAML::quick_emit(self.object_id, opts) { |out|
out.map("!rubyquiz.com,2005/SProc" ) { |map|
map.add("proc_src", @proc_src)
}
}
end
end
YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
SProc.new(val["proc_src"])
}
Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is defined to
inherit from DelegateClass(Proc), which sets a type for the object so delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass. That's the
instance that will receive forwarded messages. Dominik also defined a custom
==(), "because that doesn't really work with method_missing/delegate."
Dominik's code uses a different interface to support Marshal, but does the same
thing I did, as you can see. The YAML support is different. SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in, whenever
it is encountered. Here's what the class looks like when it's resting in a YAML
file:
!rubyquiz.com,2005/SProc
proc_src: |2-
>*args|
puts "Hello world"
print "Args: "
p args
The advantage here is that the YAML export procedure never touches the Proc so
it doesn't need to be hidden or removed and rebuilt.
Florian's solution is also worth mention, though it takes a completely different
road to solving the problem. Time and space don't allow me to recreate and
annotate the code here, but Florian described the premise well in the submission
message:
I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.
Usage:
code = lambda { puts "Hello World" }
puts code.source
Marshal.load(Marshal.dump(code)).call
YAML.load(code.to_yaml).call
The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that
source, and even implements a partial Ruby parser with standard libraries. I'm
telling you, that code reads like a good mystery novel for programmers. Don't
miss it!
One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:
Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?
# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43
An excellent point. These toys we're creating have serious limitations to be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly serialized.
My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.
Tomorrow we have a quiz to sample some algorithmic fun...