Memoize and GC

I have a quick question. I just re-read:

  http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

  def foo
    @foo ||= something
  end

will mean that @foo be GC'd when the object is GC'd, but if I do

  FOO = []

  def foo
    FOO[object_id] ||= something
  end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

One way could be using a closure that redefines the method:

irb(main):019:0> class A
irb(main):020:1> def a
irb(main):021:2> puts "expensive calculation of a..."
irb(main):022:2> a = "complex value"
irb(main):023:2> self.class.send(:define_method, :a) do
irb(main):024:3* a
irb(main):025:3> end
irb(main):026:2> a
irb(main):027:2> end
irb(main):028:1> end
=> nil
irb(main):029:0> obj = A.new
=> #<A:0xb7c9ef20>
irb(main):030:0> obj.a
expensive calculation of a...
=> "complex value"
irb(main):031:0> obj.a
=> "complex value"

Jesus.

···

On Tue, Apr 13, 2010 at 6:26 PM, Intransition <transfire@gmail.com> wrote:

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

def foo
@foo ||= something
end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO =

def foo
FOO[object_id] ||= something
end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

I would live with polluting the instance var space. Btw, I usually do not use memoize but rather an explicit Hash with default_proc. That way I have control over where I place the Hash and hence how long it lives.

Of course you can hack something together using #object_id and an external Hash but then you also need to create a finalizer and soon things get overly complicated. If you have issues with polluting the instance variable namespace you can a) put all memoize containers into a single instance variable (typically a Hash) and / or b) generate instance variable names which are likely to be conflict free.

Kind regards

  robert

···

On 04/13/2010 06:26 PM, Intransition wrote:

I have a quick question. I just re-read:

  http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

  def foo
    @foo ||= something
  end

will mean that @foo be GC'd when the object is GC'd, but if I do

  FOO =

  def foo
    FOO[object_id] ||= something
  end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

  FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

···

On 4/13/10, Intransition <transfire@gmail.com> wrote:

will mean that @foo be GC'd when the object is GC'd, but if I do

  FOO =

  def foo
    FOO[object_id] ||= something
  end

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

···

On Apr 13, 1:28 pm, Caleb Clausen <vikk...@gmail.com> wrote:

On 4/13/10, Intransition <transf...@gmail.com> wrote:

> will mean that @foo be GC'd when the object is GC'd, but if I do

> FOO =

> def foo
> FOO[object_id] ||= something
> end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

I like the closure idea, but I guess the memozing got lost a little bit ;).
thus I would write
def a
_a = nil
  self.class.module_eval do
     define_method :a do _a ||= "expensive calculation" end
     define_method :_invalidate_a do _a = nil end
  end
  a
end
Cheers
R.

···

2010/4/13 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Tue, Apr 13, 2010 at 6:26 PM, Intransition <transfire@gmail.com> wrote:

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

def foo
@foo ||= something
end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO =

def foo
FOO[object_id] ||= something
end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

One way could be using a closure that redefines the method:

irb(main):019:0> class A
irb(main):020:1> def a
irb(main):021:2> puts "expensive calculation of a..."
irb(main):022:2> a = "complex value"
irb(main):023:2> self.class.send(:define_method, :a) do

--
Learning without thought is labor lost; thought without learning is perilous.”
--- Confucius

Undocumented and without a test framework, I offer you a class I wrote
a while ago that's still live and in use a couple of years later. It's
not exactly what you're asking for, but close. (It returns string-
based keys for accessing the objects because my usage has it running
in a separate process from the main app.) I reasonably hate the
decision I made to return values versus arrays based on number of
arguments, and it's not as DRY as it could be, but it's working :slight_smile:

# Example usage
# class SituationServer < Phrogz::Librarian
# ...
# end

···

On Apr 13, 11:28 am, Caleb Clausen <vikk...@gmail.com> wrote:

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

#
# CLEANING_SCHEDULE = 60*60 # Clean up once an hour
# MAX_STALENESS = 60*60 # Remove items not used in the last
hour
#
# @server = SituationServer.new
# Thread.new{
# loop {
# sleep CLEANING_SCHEDULE
# removed = @server.remove_stale( MAX_STALENESS )
# puts "Just removed #{removed} stale item#{:s unless
removed==1}..."
# }
# }

module Phrogz; end
class Phrogz::Librarian
  def initialize( key_size=4 )
    @key_size = key_size
    @max_keys = 16 ** key_size
    @serializer = "%0#{key_size}x"
    @library = {}
    @stored = {}
    @access = {}
  end

  def store( *objects )
    keys = objects.map{ |object|
      key,obj = @library.find{ |k,o| o==object }
      unless key
        # FIXME: bail if @library.size >= MAX_KEYS, or else this will
infinite loop
        # TODO: come up with a better strategy for finding a free key
        begin
          key = @serializer % rand( @max_keys )
        end while @library.has_key?( key )
        @library[ key ] = object
        @stored[ key ] = Time.now
      end
      key
    }
    keys.length == 1 ? keys.first : keys
  end
  alias_method :<<, :store

  def fetch( *keys )
    now = Time.now # So that multiple accesses are exactly
synchronized
    objects = keys.map{ |key|
      if @library.has_key?( key )
        @access[ key ] = now
        @library[ key ]
      end
    }
    objects.length == 1 ? objects.first : objects
  end

  def size
    @library.size
  end

  def storage_time( *keys )
    results = @stored.values_at( *keys )
    results.length == 1 ? results.first : results
  end

  def age( *keys )
    results = @stored.values_at( *keys ).map{ |t| Time.now - t }
    results.length == 1 ? results.first : results
  end

  def last_access( *keys )
    results = @access.values_at( *keys )
    results.length == 1 ? results.first : results
  end

  def staleness( *keys )
    results = @access.values_at( *keys ).map{ |t| Time.now - t }
    results.length == 1 ? results.first : results
  end

  def discard( *keys )
    results = keys.map{ |key|
      @stored.delete(key)
      @access.delete(key)
      @library.delete(key)
    }
    results.length == 1 ? results.first : results
  end

  def remove_stale( max_staleness )
    now = Time.now
    stale_keys = @access.select{ |key,time| (now-time) >
max_staleness }.keys
    stale_keys.each{ |key|
      @stored.delete( key )
      @access.delete( key )
      object = @library.delete( key )
      yield object if block_given?
    }
    stale_keys.length
  end

  def remove_old( max_age )
    now = Time.now
    stale_keys = @stored.select{ |key,time| (now-time) >
max_age }.keys
    stale_keys.each{ |key|
      @stored.delete( key )
      @access.delete( key )
      object = @library.delete( key )
      yield object if block_given?
    }
    stale_keys.length
  end

end

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Kind regards

robert

···

2010/4/13 Intransition <transfire@gmail.com>:

On Apr 13, 1:28 pm, Caleb Clausen <vikk...@gmail.com> wrote:

On 4/13/10, Intransition <transf...@gmail.com> wrote:

> will mean that @foo be GC'd when the object is GC'd, but if I do

> FOO =

> def foo
> FOO[object_id] ||= something
> end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Last time I went looking for a cache class, I found several written in
ruby, but I really needed something that would be super-fast, so it
had to be written in c. I did write about half of a ruby hash class in
c before giving up.... For most use cases, tho, presumably including
this one, a ruby implementation would be fine. Having a customizable
age-out policy is the only other nice-to-have I can think of.

···

On 4/13/10, Intransition <transfire@gmail.com> wrote:

On Apr 13, 1:28 pm, Caleb Clausen <vikk...@gmail.com> wrote:

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.

···

On Apr 14, 7:02 am, Robert Klemme <shortcut...@googlemail.com> wrote:

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Definitively! But this would be a different beast for me. One might
be able to implement this by using a LRUHash or by inheriting from it.
But I would not combine that in a single class. Another option could
be to have a modular approach where the container gets a
DeletionPolicy that is responsible for deciding when or what to
delete. The interface could look like

interface DeletionPolicy
  def insert(key,val)
  def remove(key, val)
  def access(key, val)
  def next_delete # yields key
end

Just a quick hack though withou

Kind regards

robert

···

2010/4/14 Intransition <transfire@gmail.com>:

On Apr 14, 7:02 am, Robert Klemme <shortcut...@googlemail.com> wrote:

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I kind of like that idea.

···

On Apr 15, 3:16 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2010/4/14 Intransition <transf...@gmail.com>:

> On Apr 14, 7:02 am, Robert Klemme <shortcut...@googlemail.com> wrote:

>> What else do you want to handle? My implementation [1] does also only
>> consider size but I have also seen implementations that consider
>> insertion or access time. Personally I dislike those solutions
>> because they might throw away stuff too early and make the
>> implementation more complex. The main point (resource control) is
>> already achieved by limiting the size of a LRU map.

> Fair enough, I just thought it would be nice to have the *option* of a
> time-based cache.

Definitively! But this would be a different beast for me. One might
be able to implement this by using a LRUHash or by inheriting from it.
But I would not combine that in a single class. Another option could
be to have a modular approach where the container gets a
DeletionPolicy that is responsible for deciding when or what to
delete. The interface could look like

interface DeletionPolicy
def insert(key,val)
def remove(key, val)
def access(key, val)
def next_delete # yields key
end

Just a quick hack though withou