Memoization, files and Marshal?

Hi all,

Inspired by a recent blog entry by Mauricio Fernandez, I decided to try
to add a method to the memoize package that would allow users to
memoize using a file based approach rather than memory, either to
conserve memory or for persistance.

However, I'm not sure it's possible. If it is, I'd love to know how.
If it's not, well, pstore it is.

Here's what I tried:

def memoize_to_file(name, file)
   meth = method(name)
   cache = Hash.new.update(Marshal.load(File.read(file))) rescue nil
   cache ||= {}
   (class << self; self; end).class_eval do
      define_method(name) do |*args|
         if cache.has_key?(args)
            cache[args]
         else
            cache[args] ||= meth.call(*args)
            File.open(file, "wb+"){ |f| Marshal.dump(cache, f) }
         end
      end
   end
   cache
end

# Code snippet
include Memoize
def fib(n)
   return n if n < 2
   fib(n-1) + fib(n-2)
end
memoize_to_file(:fib, "temp.txt")
fib(10)

However, running that gives me this error: in `fib': undefined method
`+' for #<File:temp.txt (closed)> (NoMethodError)

Any ideas?

Thanks,

Dan

harp:~ > cat a.rb
   def memoize_to_file(name, file)
     cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
     (class << self; self; end).class_eval do
        define_method(name) do |*args|
           unless cache.has_key?(args)
             cache[args] = method(name).call(*args)
             File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
           end
           cache[args]
        end
     end
   end

   def fib(n)
    return n if n < 2
    fib(n-1) + fib(n-2)
   end

   memoize_to_file "fib", "fib.cache"
   n = fib 10
   p n

   harp:~ > ruby a.rb
   55

regards.

-a

···

On Sun, 1 Jan 2006, Daniel Berger wrote:

Hi all,

Inspired by a recent blog entry by Mauricio Fernandez, I decided to try
to add a method to the memoize package that would allow users to
memoize using a file based approach rather than memory, either to
conserve memory or for persistance.

However, I'm not sure it's possible. If it is, I'd love to know how.
If it's not, well, pstore it is.

Here's what I tried:

def memoize_to_file(name, file)
  meth = method(name)
  cache = Hash.new.update(Marshal.load(File.read(file))) rescue nil
  cache ||= {}
  (class << self; self; end).class_eval do
     define_method(name) do |*args|
        if cache.has_key?(args)
           cache[args]
        else
           cache[args] ||= meth.call(*args)
           File.open(file, "wb+"){ |f| Marshal.dump(cache, f) }
        end
     end
  end
  cache
end

# Code snippet
include Memoize
def fib(n)
  return n if n < 2
  fib(n-1) + fib(n-2)
end
memoize_to_file(:fib, "temp.txt")
fib(10)

However, running that gives me this error: in `fib': undefined method
`+' for #<File:temp.txt (closed)> (NoMethodError)

Any ideas?

Thanks,

--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

<snip>

   harp:~ > cat a.rb
   def memoize_to_file(name, file)
     cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
     (class << self; self; end).class_eval do
        define_method(name) do |*args|
           unless cache.has_key?(args)
             cache[args] = method(name).call(*args)
             File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
           end
           cache[args]
        end
     end
   end

   def fib(n)
    return n if n < 2
    fib(n-1) + fib(n-2)
   end

   memoize_to_file "fib", "fib.cache"
   n = fib 10
   p n

As is, I get a stack level too deep error on Windows. If I change the
"cache[args] =" to "cache[args] ||=", I always get back nil.

I'm probably being thick here, but any help is appreciated.

Thanks,

Dan

···

ara.t.howard@noaa.gov wrote:

On Sun, 1 Jan 2006, Daniel Berger wrote:

sorry, don't know how that's running on linux in the first place... you need:

   --- a.rb 2005-12-31 20:30:59.000000000 -0700
   +++ b.rb 2005-12-31 20:31:10.000000000 -0700
   @@ -1,9 +1,10 @@
    def memoize_to_file(name, file)
   + meth = method(name)
      cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
      (class << self; self; end).class_eval do
         define_method(name) do |*args|
            unless cache.has_key?(args)
   - cache[args] = method(name).call(*args)
   + cache[args] = meth.call(*args)
              File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
            end
            cache[args]

with that it works for me on linux and windows.

cheers.

-a

···

On Sun, 1 Jan 2006, Daniel Berger wrote:

ara.t.howard@noaa.gov wrote:

On Sun, 1 Jan 2006, Daniel Berger wrote:

<snip>

   harp:~ > cat a.rb
   def memoize_to_file(name, file)
     cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
     (class << self; self; end).class_eval do
        define_method(name) do |*args|
           unless cache.has_key?(args)
             cache[args] = method(name).call(*args)
             File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
           end
           cache[args]
        end
     end
   end

   def fib(n)
    return n if n < 2
    fib(n-1) + fib(n-2)
   end

   memoize_to_file "fib", "fib.cache"
   n = fib 10
   p n

As is, I get a stack level too deep error on Windows. If I change the
"cache[args] =" to "cache[args] ||=", I always get back nil.

I'm probably being thick here, but any help is appreciated.

--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

Thanks, that did the trick nicely. It's now part of memoize 1.2.0. :slight_smile:

Regards,

Dan

···

ara.t.howard@noaa.gov wrote:

On Sun, 1 Jan 2006, Daniel Berger wrote:

> ara.t.howard@noaa.gov wrote:
>> On Sun, 1 Jan 2006, Daniel Berger wrote:
>
> <snip>
>
>> harp:~ > cat a.rb
>> def memoize_to_file(name, file)
>> cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
>> (class << self; self; end).class_eval do
>> define_method(name) do |*args|
>> unless cache.has_key?(args)
>> cache[args] = method(name).call(*args)
>> File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
>> end
>> cache[args]
>> end
>> end
>> end
>>
>> def fib(n)
>> return n if n < 2
>> fib(n-1) + fib(n-2)
>> end
>>
>> memoize_to_file "fib", "fib.cache"
>> n = fib 10
>> p n
>
> As is, I get a stack level too deep error on Windows. If I change the
> "cache[args] =" to "cache[args] ||=", I always get back nil.
>
> I'm probably being thick here, but any help is appreciated.

sorry, don't know how that's running on linux in the first place... you need:

   --- a.rb 2005-12-31 20:30:59.000000000 -0700
   +++ b.rb 2005-12-31 20:31:10.000000000 -0700
   @@ -1,9 +1,10 @@
    def memoize_to_file(name, file)
   + meth = method(name)
      cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
      (class << self; self; end).class_eval do
         define_method(name) do |*args|
            unless cache.has_key?(args)
   - cache[args] = method(name).call(*args)
   + cache[args] = meth.call(*args)
              File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
            end
            cache[args]

with that it works for me on linux and windows.

cheers.

-a
--

> sorry, don't know how that's running on linux in the first place... you need:
>
> --- a.rb 2005-12-31 20:30:59.000000000 -0700
> +++ b.rb 2005-12-31 20:31:10.000000000 -0700
> @@ -1,9 +1,10 @@
> def memoize_to_file(name, file)
> + meth = method(name)
> cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
> (class << self; self; end).class_eval do
> define_method(name) do |*args|
> unless cache.has_key?(args)
> - cache[args] = method(name).call(*args)
> + cache[args] = meth.call(*args)
> File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
> end
> cache[args]

[...]

Thanks, that did the trick nicely. It's now part of memoize 1.2.0. :slight_smile:

Dumping the cache on every miss seems expensive.
I changed the code to save it once every N misses, and made the cache update
atomic to prevent problems with:
* concurrent executions of the memoized method (processes and threads)
* no HD space being left (the previous cache state is preserved)

I also applied those changes to memoize.rb 1.2.0 and generalized it to work
with instance methods, see the patch after memo.rb.

batsman@tux-chan:/tmp$ cat memo.rb
def memoize_to_file(name, file = nil, period = -1)
  meth = method(name)
  cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
  save_cache_proc = lambda do
    tmpfile = "#{file}.#{Process.pid}"
    begin
      File.open(tmpfile, "wb+"){|f| Marshal.dump(cache, f) }
      File.rename tmpfile, file
    rescue Exception
    end
  end
  at_exit { save_cache_proc.call } if file
  calls = period
  (class << self; self; end).class_eval do
     define_method(name) do |*args|
        unless cache.has_key?(args)
          cache[args] = meth.call(*args)
          if file && period != -1 && (calls -= 1) <= 0
            calls = period
            save_cache_proc.call
          end
        end
        cache[args]
     end
  end
end

def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end

def fib2(n)
return n if n < 2
fib2(n-1) + fib2(n-2)
end

def fib3(n)
return n if n < 2
fib3(n-1) + fib3(n-2)
end

memoize_to_file "fib", "fib.cache", 1
memoize_to_file "fib2", "fib2.cache"
memoize_to_file "fib3", "fib3.cache", 10
require 'benchmark'
Benchmark.bmbm(10) do |bm|
  bm.report("period 1") { fib 1000 }
  bm.report("period 10"){ fib3 1000 }
  bm.report("at_exit") { fib2 1000 }
end
batsman@tux-chan:/tmp$ ruby memo.rb
Rehearsal ---------------------------------------------
period 1 4.030000 0.510000 4.540000 ( 5.507344)
period 10 0.430000 0.060000 0.490000 ( 0.570224)
at_exit 0.030000 0.000000 0.030000 ( 0.039652)
------------------------------------ total: 5.060000sec

                user system total real
period 1 0.000000 0.000000 0.000000 ( 0.000035)
period 10 0.000000 0.000000 0.000000 ( 0.000033)
at_exit 0.000000 0.000000 0.000000 ( 0.000035)

The patch to memoize.rb:

--- memoize.rb.orig 2006-01-02 11:02:54.000000000 +0100
+++ memoize.rb 2006-01-02 12:31:46.000000000 +0100
@@ -1,39 +1,106 @@
module Memoize
    MEMOIZE_VERSION = "1.2.0"

memoize.rb (2.43 KB)

···

On Sun, Jan 01, 2006 at 09:32:56PM +0900, Daniel Berger wrote:
-
+
+ def self.memoize(recv, name, file=nil, period=-1)
+ class << recv; self end.instance_eval{ instance_memoize(name, file, period) }
+ end
+
+ def memoize(name, file=nil, period=-1)
+ Memoize.memoize(self, name, file, period)
+ end
+end
+
+class Module
    # Memoize the method +name+. If +file+ is provided, then the method results
- # are stored on disk rather than in memory. This consumes virtually no
- # memory and is persistant.
- def memoize(name, file=nil)
- meth = method(name)
-
+ # are stored on disk in addition to the in-memory cache.
+ def instance_memoize(name, file=nil, period=-1)
+ meth = instance_method(name)
+
       if file
          cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
       else
          cache = {}
       end
-
- if file
- (class << self; self; end).class_eval do
- define_method(name) do |*args|
- unless cache.has_key?(args)
- cache[args] = meth.call(*args)
- File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
- end
- cache[args]
- end
+
+ save_cache_proc = lambda do
+ tmpfile = "#{file}.#{Process.pid}"
+ begin
+ File.open(tmpfile, "wb+"){|f| Marshal.dump(cache, f) }
+ File.rename tmpfile, file
+ rescue Exception
          end
- else
- (class << self; self; end).class_eval do
- define_method(name) do |*args|
- if cache.has_key?(args)
- cache[args]
- else
- cache[args] ||= meth.call(*args)
- end
+ end
+
+ at_exit { save_cache_proc.call } if file
+
+ calls = period
+ define_method(name) do |*args|
+ unless cache.has_key?(args)
+ cache[args] = meth.bind(self).call(*args)
+ if file && period != -1 && (calls -= 1) <= 0
+ calls = period
+ save_cache_proc.call
             end
          end
+ cache[args]
       end
+
       cache
    end
-end
\ No newline at end of file
+end
+
+if __FILE__ == $0
+ def fib(n)
+ return n if n < 2
+ fib(n-1) + fib(n-2)
+ end
+
+ def fib2(n)
+ return n if n < 2
+ fib2(n-1) + fib2(n-2)
+ end
+
+ def fib3(n)
+ return n if n < 2
+ fib3(n-1) + fib3(n-2)
+ end
+
+ include Memoize
+
+ module DumbFib
+ def fib(n)
+ return n if n < 2
+ fib(n-1) + fib(n-2)
+ end
+ end
+
+ DumbFib2 = DumbFib.clone
+
+ memoize :fib, "xfib.memoize.cache", 1
+ memoize :fib2, "xfib2.memoize.cache"
+ memoize :fib3, "xfib3.memoize.cache", 10
+
+ class Dummy1
+ include DumbFib
+ instance_memoize :fib, "xfib.dumb1.cache"
+ end
+ class Dummy2
+ include DumbFib2
+ end
+ module DumbFib2
+ instance_memoize :fib, "xfib.dumb2.cache"
+ end
+
+ require 'benchmark'
+ runs = 1
+ Benchmark.bmbm(10) do |bm|
+ dummy1 = Dummy1.new
+ dummy2 = Dummy2.new
+ bm.report("period 1") { runs.times{ fib 500 } }
+ bm.report("period 10"){ runs.times{ fib3 500 } }
+ bm.report("at_exit") { runs.times{ fib2 500 } }
+ bm.report("at_exit''") { runs.times{ dummy1.fib 500 } }
+ bm.report("at_exit'''") { runs.times{ dummy2.fib 500 }; runs = 100000}
+ end
+
+end

batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ----------------------------------------------
period 1 0.870000 0.160000 1.030000 ( 1.247975)
period 10 0.100000 0.010000 0.110000 ( 0.151390)
at_exit 0.010000 0.000000 0.010000 ( 0.034512)
at_exit'' 0.020000 0.000000 0.020000 ( 0.030174)
at_exit''' 0.010000 0.000000 0.010000 ( 0.030094)
------------------------------------- total: 1.180000sec

                 user system total real
period 1 0.490000 0.000000 0.490000 ( 0.548635)
period 10 0.500000 0.000000 0.500000 ( 0.539279)
at_exit 0.500000 0.000000 0.500000 ( 0.562339)
at_exit'' 0.490000 0.000000 0.490000 ( 0.534087)
at_exit''' 0.490000 0.000000 0.490000 ( 0.541301)
batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ----------------------------------------------
period 1 0.000000 0.000000 0.000000 ( 0.000048)
period 10 0.000000 0.000000 0.000000 ( 0.000019)
at_exit 0.000000 0.000000 0.000000 ( 0.000017)
at_exit'' 0.000000 0.000000 0.000000 ( 0.000019)
at_exit''' 0.000000 0.000000 0.000000 ( 0.000018)
------------------------------------- total: 0.000000sec

                 user system total real
period 1 0.480000 0.000000 0.480000 ( 0.533512)
period 10 0.490000 0.000000 0.490000 ( 0.542617)
at_exit 0.500000 0.000000 0.500000 ( 0.544705)
at_exit'' 0.500000 0.000000 0.500000 ( 0.551834)
at_exit''' 0.490000 0.000000 0.490000 ( 0.538747)

I also attached the full sources for your convenience.

--
Mauricio Fernandez

nice idea.

seems like the perfect time to introduce a Memoize::Cache class wrapping
pstore though doesn't it?

cheers.

-a

memoize.rb (2.43 KB)

···

On Mon, 2 Jan 2006, Mauricio Fernandez wrote:

Dumping the cache on every miss seems expensive.
I changed the code to save it once every N misses, and made the cache update
atomic to prevent problems with:
* concurrent executions of the memoized method (processes and threads)
* no HD space being left (the previous cache state is preserved)

I also applied those changes to memoize.rb 1.2.0 and generalized it to work
with instance methods, see the patch after memo.rb.

--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

This message is in MIME format. The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

>
> Dumping the cache on every miss seems expensive.
> I changed the code to save it once every N misses, and made the cache update
> atomic to prevent problems with:
> * concurrent executions of the memoized method (processes and threads)
> * no HD space being left (the previous cache state is preserved)
>
> I also applied those changes to memoize.rb 1.2.0 and generalized it to work
> with instance methods, see the patch after memo.rb.

nice idea.

Looks interesting but I get this with windows xp using his sample code:

ruby memoize.rb

Rehearsal ----------------------------------------------
period 1 memoize.rb:39:in `has_key?': stack level too deep
(SystemStackError)
        from memoize.rb:39:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
         ... 539 levels...
        from c:/ruby184/lib/ruby/1.8/benchmark.rb:293:in `measure'
        from c:/ruby184/lib/ruby/1.8/benchmark.rb:261:in `bmbm'
        from c:/ruby184/lib/ruby/1.8/benchmark.rb:259:in `bmbm'
        from memoize.rb:97

Regards,

Dan

···

ara.t.howard@noaa.gov wrote:

On Mon, 2 Jan 2006, Mauricio Fernandez wrote:

Your stack is too small; fib(500) goes 1000 levels deep. Change
those examples to fib(100) etc.:

batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ---------------------------------------------
period 1 memoize.rb:54:in `fib': let's see how deep we get (RuntimeError)
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
         ... 993 levels...
        from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
        from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:261:in `bmbm'
        from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:259:in `bmbm'
        from memoize.rb:96

given

batsman@tux-chan:/tmp$ diff -u memoize.rb.mod memoize.rb
--- memoize.rb.mod 2006-01-02 21:38:18.000000000 +0100
+++ memoize.rb 2006-01-02 21:46:10.000000000 +0100
@@ -51,7 +51,7 @@

if __FILE__ == $0
    def fib(n)
- return n if n < 2
+ raise "let's see how deep we get" if n < 2
       fib(n-1) + fib(n-2)
    end

@@ -97,10 +97,10 @@
       dummy1 = Dummy1.new
       dummy2 = Dummy2.new
       bm.report("period 1") { runs.times{ fib 500 } }
- bm.report("period 10"){ runs.times{ fib3 500 } }
- bm.report("at_exit") { runs.times{ fib2 500 } }
- bm.report("at_exit''") { runs.times{ dummy1.fib 500 } }
- bm.report("at_exit'''") { runs.times{ dummy2.fib 500 }; runs = 100000}
+ #bm.report("period 10"){ runs.times{ fib3 500 } }
+ #bm.report("at_exit") { runs.times{ fib2 500 } }
+ #bm.report("at_exit''") { runs.times{ dummy1.fib 500 } }
+ #bm.report("at_exit'''") { runs.times{ dummy2.fib 500 }; runs = 100000}
    end

end

···

On Tue, Jan 03, 2006 at 04:32:57AM +0900, Daniel Berger wrote:

> > I also applied those changes to memoize.rb 1.2.0 and generalized it to work
> > with instance methods, see the patch after memo.rb.

Looks interesting but I get this with windows xp using his sample code:

>ruby memoize.rb
Rehearsal ----------------------------------------------
period 1 memoize.rb:39:in `has_key?': stack level too deep
(SystemStackError)
        from memoize.rb:39:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
         ... 539 levels...

--
Mauricio Fernandez