TupleSpace performance (TupleBag really)

We've been struggling with this problem for months. We use TupleSpace to
implement a distributed processing framework and periodically if the number
of objects
in the TupleBag gets large (not ridiculous but 50,000 or so) the TupleSpace
begins to take 100pct cpu and is effectively neutered and must be restarted.
We've been trying to come up
with a better implementation of TupleBag but have not had much luck so far.
Has anyone else done this?

Mark

I've heard a few people talk of it (mainly, TupleSpace with a persistence (database)), but to my knowledge nobody's done it.

···

On Feb 7, 2007, at 09:21, Mark Alexander Friedgan wrote:

We've been struggling with this problem for months. We use TupleSpace to
implement a distributed processing framework and periodically if the number
of objects
in the TupleBag gets large (not ridiculous but 50,000 or so) the TupleSpace
begins to take 100pct cpu and is effectively neutered and must be restarted.
We've been trying to come up
with a better implementation of TupleBag but have not had much luck so far.
Has anyone else done this?

What's happening is that you've populated a hash enough that you've saturated the bins. You're now hitting a lot of hash collisions. You can either improve the hash method on the tuples (possibly... really depends on what you're storing), or switch the underlying data structure (maybe a balanced tree will suit you better at large tuple populations).

Or, if you're using patterns ala Gelernter, you might be able to pre-partition your spaces into multiple bags based on activities or specialists (or job or task type). Sounds like you've probably already tried that tho.

···

On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:

We've been struggling with this problem for months. We use TupleSpace to
implement a distributed processing framework and periodically if the number
of objects
in the TupleBag gets large (not ridiculous but 50,000 or so) the TupleSpace
begins to take 100pct cpu and is effectively neutered and must be restarted.
We've been trying to come up
with a better implementation of TupleBag but have not had much luck so far.
Has anyone else done this?

This is probably as good a time as any to mention to the wider Ruby community that I've been working on a new implementation of a tuple space in Ruby, completely independent of the standard implementation. My implementation currently searches a list to do template matching, but it's on my to-do list to research and implement more scalable data structures and algorithms.

The most general form of template matching is actually quite challenging to implement, and to achieve scalability, I'll have to impose some restrictions. The current DRb-based Rinda is extremely flexible in what values can be in tuples, but in my implementation, I'll most likely require that all tuple components be scalars--numbers, strings, etc. You can use YAML to store more complex structures (opaquely to the tuple space), but arbitrary user objects, regular expressions, and class names (e.g., take [String, 5, Integer]) won't be allowed directly as tuple components. This is a trade-off, but one that I think most applications using a tuple space can live with. This restriction makes the problem more amendable to techniques developed in the fields of databases, data analysis (e.g., clustering algorithms to partition the space), machine learning, etc.

I have a working, though unreleased, distributed tuple-space implementation right now and written in Ruby. It doesn't use DRb at all; it uses a custom wire protocol over SSL (with validation of client and server certificates). It supports a good many features beyond the standard tuple space; namely, multiple tuple space regions, local and global scopes (two-level hierarchy of tuple spaces), private one-to-one and group communication, region privileges, and a "monitor" operation (the read_all operation returns all available matching tuples; the monitor operation returns all available and future matching tuples). Additionally, my implementation allows processes to pass file descriptors over the local tuple space (which, by definition, is confined to a given host, and so communication takes place over a Unix domain socket).

More information on my implementation is available in my talk slides at

   CAIDA Resource Catalog

The design of the tuple space is being driven by the needs of CAIDA's next-generation distributed network measurement infrastructure named Archipelago.

I hope to release the code under the GPL, with the client library under the LGPL. There's no clear time table for code release. It's also going to be months down the line before I can work on implementing scalable matching in earnest, since other work has higher priority.

  --Young

···

On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:

We've been trying to come up with a better implementation of TupleBag
but have not had much luck so far. Has anyone else done this?

Ryan Davis wrote:

We've been struggling with this problem for months. We use TupleSpace to
implement a distributed processing framework and periodically if the number
of objects
in the TupleBag gets large (not ridiculous but 50,000 or so) the TupleSpace
begins to take 100pct cpu and is effectively neutered and must be restarted.
We've been trying to come up
with a better implementation of TupleBag but have not had much luck so far.
Has anyone else done this?

What's happening is that you've populated a hash enough that you've saturated the bins. You're now hitting a lot of hash collisions.

Do Ruby's hashes not extend themselves once they hit a certain saturation? Is there a hard limit at play here? Or is it simply a matter of the hash function not distributing keys well?

···

On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:

--
Alex

Are you sure? In Ruby, the number of bins is increased (exponentially) when
there are more than 5 entries per bin...

Actually, in a TupleBag tuples are indexed by their size and stored in an
array. Finding a tuple matching a given template in that array is O(n)

···

On Fri, Feb 09, 2007 at 09:50:34AM +0900, Ryan Davis wrote:

On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:

>We've been struggling with this problem for months. We use TupleSpace to
>implement a distributed processing framework and periodically if the
>number of objects in the TupleBag gets large (not ridiculous but 50,000 or
>so) the TupleSpace begins to take 100pct cpu and is effectively neutered
>and must be restarted. We've been trying to come up with a better
>implementation of TupleBag but have not had much luck so far. Has anyone
>else done this?

What's happening is that you've populated a hash enough that you've
saturated the bins. You're now hitting a lot of hash collisions. You
can either improve the hash method on the tuples (possibly... really
depends on what you're storing), or switch the underlying data
structure (maybe a balanced tree will suit you better at large tuple
populations).

    ##
    # Finds a live tuple that matches +template+.

    def find(template)
      @hash.fetch(template.size, ).find do |tuple|
        tuple.alive? && template.match(tuple)
      end
    end

If you want to make TupleSpace perform better, you'll have to make that
operation (finding a live tuple matching an arbitrary template) faster.

[Of course, the TupleSpace behaves much better when it gets the read request
_before_ the tuple, since it just has to scan the list of templates waiting
for a tuple.]

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby
** Latest postings **
What's new in Ruby 1.9, Feb. 07 update
  http://eigenclass.org/hiki.rb?Changes-in-Ruby-1.9-update-6
Firebrigade: automated, sandboxed testing of RubyGems packages by other devels
  http://eigenclass.org/hiki.rb?firebrigade-launched

i have. if people like it, i'll release next week.

1) it's a hack

2) it's persistent

   jib:~ > cat a.rb
   require 'sqlitespace'

   rss = Rinda::SQLiteSpace.new
   any_pair = [nil, nil]
   rss.read_all(any_pair).sort.each{|t| p t} # note this prints FIRST!
   rss.write [Time.now, rand]

   jib:~ > ruby a.rb

   jib:~ > ruby a.rb
   [Fri Feb 09 16:53:37 MST 2007, 0.388129945611581]

   jib:~ > ruby a.rb
   [Fri Feb 09 16:53:37 MST 2007, 0.388129945611581]
   [Fri Feb 09 16:53:52 MST 2007, 0.250816953601316]

   jib:~ > ruby a.rb
   [Fri Feb 09 16:53:37 MST 2007, 0.388129945611581]
   [Fri Feb 09 16:53:52 MST 2007, 0.250816953601316]
   [Fri Feb 09 16:53:54 MST 2007, 0.400566845666617]

3) it's pretty fast (and uses little cpu)

   jib:~ > cat b.rb
   require 'sqlitespace'

   rss = Rinda::SQLiteSpace.new

   rss.transaction{ 42424.times{ write [Time.now, rand] } }

   jib:~ > time ruby b.rb

   real 0m19.063s
   user 0m18.110s
   sys 0m0.560s

4) it's a hack

5) obviously it requires sqlite to be installed, but that's it

6) it's a hack

here's the code:

harp:~ > cat sqlitespace.rb

require 'monitor'
require 'thread'
require 'drb/drb'
require 'rinda/rinda'
require 'rinda/tuplespace'
require 'fileutils'
require 'base64'

begin
   require 'rubygems'
rescue LoadError
   42
ensure
   require 'sqlite'
end

module Rinda
   class SQLiteSpace < ::Rinda::TupleSpace
     def initialize(timeout=60, prefix=home_dir)
       super()
     ensure
       setup prefix
     end

     %w[ prefix dir bag read_waiter take_waiter notify_waiter ].each{|a| attr a}

     def setup(prefix=home_dir)
       @prefix = prefix
       FileUtils.mkdir_p @prefix
       @dir = File.join @prefix, 'sqlitespace'
       FileUtils.mkdir_p @dir

       @bag = SQLiteBag.new File.join(@dir, 'bag')
       @read_waiter = SQLiteBag.new File.join(@dir, 'read_waiter')
       @take_waiter = SQLiteBag.new File.join(@dir, 'take_watier')
       @notify_waiter = SQLiteBag.new File.join(@dir, 'notify_watier')
     end

     def transaction *a, &b
       @bag.transaction{
         @read_waiter.transaction{
           @take_waiter.transaction{
             @notify_waiter.transaction{
               instance_eval &b
             }
           }
         }
       }
     end

     def home_dir
       ENV['HOME'] || ENV['USERPROFILE'] # *nix || winblows
     end
   end

   class SQLiteBag
     class DB
       SCHEMA = <<-SQL
         create table bag(
           id integer primary key,
           size integer,
           blobsize integer,
           blob blob
         )
       SQL

       %w[ path db in_transaction ].each{|a| attr a}

       def initialize path
         @path = path
         @db = ::SQLite::Database.new(path) rescue ::SQLite::Database.new(path,0)
         @in_transaction = false
         setup
       end

       def setup
         begin
           @db.execute(SCHEMA){}
         rescue ::SQLite::SQLException => e
           42
         end
       end

···

On Thu, 8 Feb 2007, Eric Hodel wrote:

I've heard a few people talk of it (mainly, TupleSpace with a persistence (database)), but to my knowledge nobody's done it.

     #
     # for older rubys
     #
       begin
         Base64
         def encode64 blob
           Base64.encode64 blob
         end

         def decode64 blob
           Base64.decode64 blob
         end
       rescue NameError
       end

     #
     # handles DRbUndumped feature of Rinda::TupleEntry class
     #
       def encode obj
         data = [obj.class]
         obj.instance_variables.each do |ivar|
           value = obj.instance_variable_get ivar
           data << [ivar, value]
         end
         blob = encode64(Marshal.dump(data))
       end
     #
     # handles DRbUndumped feature of Rinda::TupleEntry class
     #
       def decode blob
         data = Marshal.load(decode64(blob))
         klass = data.shift
         obj = klass.allocate
         data.each{|ivar, value| obj.instance_variable_set ivar, value}
         obj
       end

       def execute *a, &b
         @db.execute *a, &b
       end

       def transaction *a, &b
         raise 'nested transaction' if @in_transaction
         begin
           @db.execute 'begin transaction'
           @in_transaction = true
           yield
         ensure
           @db.execute 'end transaction'
           @in_transaction = false
         end
       end

       def template_entry(ary)
         if ::Rinda::TemplateEntry === ary
           ary
         else
           ::Rinda::TemplateEntry.new ary
         end
       end

       def push(ary)
         size = ary.size
         blob = encode ary
         blobsize = blob.size
         @db.execute "insert into bag values(null, #{ size }, #{ blob.size }, '#{ blob }')"
       end

       def delete(ary)
         size = ary.size
         blob = encode ary
         blobsize = blob.size
         @db.execute "delete from bag where size=#{ size } and blobsize=#{ blobsize } and blob='#{ blob }'"
       end

       def find_all(template, limit=nil, &block)
         template= Rinda::TemplateEntry.new(Array.new(template)) if Numeric === template
         size = template.size
         all =
         @db.execute "select blob from bag where size=#{ size } #{ limit ? ('limit=%d' % limit) : '' }" do |tuple|
           blob = tuple['blob']
           t = decode blob
           if t.alive? && template.match(t)
             block ? block[t] : (all << t)
           end
         end
         block ? nil : all
       end

       def find(template, limit=1, &block)
         find_all(template, limit).first
       end

       def find_all_template(tuple, limit=nil, &block)
         size = tuple.size
         all =
         @db.execute "select blob from bag where size=#{ size } #{ limit ? ('limit=%d' % limit) : '' }" do |tuple|
           blob = tuple['blob']
           t = decode blob
           if t.alive? && t.match(tuple)
             block ? block[t] : (all << t)
           end
         end
         block ? nil : all
       end

       def delete_unless_alive
         deleted =
         @db.transaction do
           find_all do |tuple|
             unless tuple.alive?
               delete tuple
               deleted << tuple
             end
           end
         end
         deleted
       end
     end

     %w[ path db ].each{|a| attr a}

     def initialize path
       @path = path
       @db = DB.new path
       @hash = {}
     end

     def transaction *a, &b
       @db.transaction *a, &b
     end

     %w( push delete find_all find find_all_template delete ).each do |m|
       module_eval <<-code
         def #{ m } *a, &b
           @db.#{ m } *a, &b
         end
       code
     end
   end
end

enjoy.

-a
--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama

-a
--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama

Ryan Davis wrote:
>What's happening is that you've populated a hash enough that you've
>saturated the bins. You're now hitting a lot of hash collisions.
Do Ruby's hashes not extend themselves once they hit a certain
saturation? Is there a hard limit at play here?

The number of bins is doubled when there are more than 5 entries per bin
(avg.).

Or is it simply a matter of the hash function not distributing keys well?

It's a matter of the tuples being indexed by their sizes, and performing O(n)
matching over all the tuples of a given size to find one matching a given
template (see my other message).

···

On Fri, Feb 09, 2007 at 06:29:48PM +0900, Alex Young wrote:

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby
                        ** Latest postings **
What's new in Ruby 1.9, Feb. 07 update
  http://eigenclass.org/hiki.rb?Changes-in-Ruby-1.9-update-6
Firebrigade: automated, sandboxed testing of RubyGems packages by other devels
  http://eigenclass.org/hiki.rb?firebrigade-launched

What version/implementation of sqlite are you using?

Cheers,
Bob

···

On 9-Feb-07, at 7:04 PM, ara.t.howard@noaa.gov wrote:

On Thu, 8 Feb 2007, Eric Hodel wrote:

I've heard a few people talk of it (mainly, TupleSpace with a persistence (database)), but to my knowledge nobody's done it.

i have. if people like it, i'll release next week.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;

>
>I've heard a few people talk of it (mainly, TupleSpace with a persistence
>(database)), but to my knowledge nobody's done it.
>
i have. if people like it, i'll release next week.

1) it's a hack
2) it's persistent

[...]

3) it's pretty fast (and uses little cpu)

  jib:~ > cat b.rb
  require 'sqlitespace'

  rss = Rinda::SQLiteSpace.new

  rss.transaction{ 42424.times{ write [Time.now, rand] } }

  jib:~ > time ruby b.rb

  real 0m19.063s
  user 0m18.110s
  sys 0m0.560s

It's pretty fast considering it's persistent, but it is of course slower than
the std. TupleSpace...

$ cat tuplespace.rb
#!/usr/bin/env ruby

require 'rinda/tuplespace'

ts = Rinda::TupleSpace.new

require 'benchmark'
p Benchmark.measure {
  42424.times{ ts.write [Time.now, rand] }
}

#require 'rubygems'
#gem "sqlite-ruby"
require 'sqlitespace'
$VERBOSE = false
puts "Using SQLite #{SQLite::Version::STRING}"
ts2 = Rinda::SQLiteSpace.new

p Benchmark.measure {
  ts2.transaction{ 42424.times{ write [Time.now, rand] } }
}

$ time ruby tuplespace.rb
#<Benchmark::Tms:0xa795b878 @total=4.62, @cutime=0.0, @label="", @stime=0.24, @real=4.64543700218201, @utime=4.38, @cstime=0.0>
/home/batsman/usr//lib/ruby/site_ruby/1.8/sqlite/database.rb:243: warning: `*' interpreted as argument prefix
/home/batsman/usr//lib/ruby/site_ruby/1.8/sqlite/statement.rb:103: warning: `*' interpreted as argument prefix
(eval):2: warning: `*' interpreted as argument prefix
(eval):2: warning: `*' interpreted as argument prefix
(eval):2: warning: `*' interpreted as argument prefix
(eval):2: warning: `*' interpreted as argument prefix
(eval):2: warning: `*' interpreted as argument prefix
(eval):2: warning: `*' interpreted as argument prefix
(eval):1: warning: method redefined; discarding old delete
Using SQLite 2.2.2
#<Benchmark::Tms:0xa7717d8c @total=74.8, @cutime=0.0, @label="", @stime=1.79, @real=76.8895101547241, @utime=73.01, @cstime=0.0>

real 1m21.836s
user 1m17.625s
sys 0m2.060s

(it seems slower w/o .transaction)

... and it won't scale any better, so it wouldn't solve the OP's problem:

      def find_all(template, limit=nil, &block)
        template= Rinda::TemplateEntry.new(Array.new(template)) if Numeric
        === template
        size = template.size
        all =
        @db.execute "select blob from bag where size=#{ size } #{ limit ?
        ('limit=%d' % limit) : '' }" do |tuple|
          blob = tuple['blob']
          t = decode blob
          if t.alive? && template.match(t)
            block ? block[t] : (all << t)
          end
        end
        block ? nil : all
      end

[...]

      def find_all_template(tuple, limit=nil, &block)
        size = tuple.size
        all =
        @db.execute "select blob from bag where size=#{ size } #{ limit ?
        ('limit=%d' % limit) : '' }" do |tuple|
          blob = tuple['blob']
          t = decode blob
          if t.alive? && t.match(tuple)
            block ? block[t] : (all << t)
          end
        end
        block ? nil : all
      end

PS: some warnings removed...

--- sqlitespace.rb.orig 2007-02-10 09:33:20.000000000 +0100
+++ sqlitespace.rb 2007-02-10 10:09:50.000000000 +0100
@@ -43,7 +43,7 @@
          @read_waiter.transaction{
            @take_waiter.transaction{
              @notify_waiter.transaction{
- instance_eval &b
+ instance_eval(&b)
              }
            }
          }
@@ -79,7 +79,7 @@
        def setup
          begin
            @db.execute(SCHEMA){}
- rescue ::SQLite::SQLException => e
+ rescue (SQLite::SQLException rescue ::SQLite::Exceptions::SQLException) # tested with sqlite-ruby 2.2.3
            42
          end
        end
@@ -88,7 +88,7 @@
      # for older rubys

···

On Sat, Feb 10, 2007 at 09:04:51AM +0900, ara.t.howard@noaa.gov wrote:

On Thu, 8 Feb 2007, Eric Hodel wrote:

      #
        begin
- Base64
+ x = Base64
          def encode64 blob
            Base64.encode64 blob
          end
@@ -122,7 +122,7 @@
        end

        def execute *a, &b
- @db.execute *a, &b
+ @db.execute(*a, &b)
        end

        def transaction *a, &b
@@ -213,7 +213,7 @@
      end

      def transaction *a, &b
- @db.transaction *a, &b
+ @db.transaction(*a, &b)
      end

      %w( push delete find_all find find_all_template delete ).each do |m|

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby
                        ** Latest postings **
What's new in Ruby 1.9, Feb. 07 update
  http://eigenclass.org/hiki.rb?Changes-in-Ruby-1.9-update-6
Firebrigade: automated, sandboxed testing of RubyGems packages by other devels
  http://eigenclass.org/hiki.rb?firebrigade-launched

1.3.1 - oops.

but, i'll tweak so it works with both the older and newer (2.0.1) versions...

sorry bout that.

what are you running?

-a

···

On Sat, 10 Feb 2007, Bob Hutchison wrote:

On 9-Feb-07, at 7:04 PM, ara.t.howard@noaa.gov wrote:

On Thu, 8 Feb 2007, Eric Hodel wrote:

I've heard a few people talk of it (mainly, TupleSpace with a persistence (database)), but to my knowledge nobody's done it.

i have. if people like it, i'll release next week.

What version/implementation of sqlite are you using?

--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama

give this one a whirl...

-a

sqlitespace.rb (5.57 KB)

···

On Sat, 10 Feb 2007, Bob Hutchison wrote:

On 9-Feb-07, at 7:04 PM, ara.t.howard@noaa.gov wrote:

On Thu, 8 Feb 2007, Eric Hodel wrote:

I've heard a few people talk of it (mainly, TupleSpace with a persistence (database)), but to my knowledge nobody's done it.

i have. if people like it, i'll release next week.

What version/implementation of sqlite are you using?

--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama

It's pretty fast considering it's persistent, but it is of course slower
than the std. TupleSpace...

<snip bench>

yes. i think it could be made quite a bit faster though. it's also broken
wrst to callbacks now, so i'm working on that first! :wink:

... and it won't scale any better, so it wouldn't solve the OP's problem:

not currently, but i think it easily could - it's just a matter of having a
smarter schema in the db.

PS: some warnings removed...

thanks - in next version...

cheers.

-a

···

On Sat, 10 Feb 2007, Mauricio Fernandez wrote:
--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama

  This message is in MIME format. The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

I've heard a few people talk of it (mainly, TupleSpace with a persistence (database)), but to my knowledge nobody's done it.

i have. if people like it, i'll release next week.

What version/implementation of sqlite are you using?

give this one a whirl...

Thanks.

Okay I admit that I never quite worked out what was going on with the sqlite packages in ruby. There are two it seems in the same project: sqlite and sqlite3. And I think there is another project out there someplace.

Trying to install sqlite as a gem installs version 2.0.1 but the most recent version, if I understand rubyforge correctly, is 2.2.3; and 2.0.1 is marked a beta. Even though gem install tries to install 2.0.1 there is a gem on rubyforge for 2.2.3.

Worse, 2.0.1 won't compile on my machine. I havn't looked into it yet but it reeks of missing libraries.

Sqlite3 does appear to install. I know I've had it working before so this doesn't surprise me too much (this is also the reason I never worked out what the different projects are).

Oh well. Something to do on a Saturday morning. And the delayed gratification in seeing your 'hack' isn't all it's cracked up to be, I think I prefer instant gratification -- but gratification there will be :slight_smile:

Cheers,
Bob

···

On 9-Feb-07, at 10:01 PM, ara.t.howard@noaa.gov wrote:

On Sat, 10 Feb 2007, Bob Hutchison wrote:

On 9-Feb-07, at 7:04 PM, ara.t.howard@noaa.gov wrote:

On Thu, 8 Feb 2007, Eric Hodel wrote:

-a
--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama
<sqlitespace.rb>

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;
xampl for Ruby -- <http://rubyforge.org/projects/xampl/&gt;