SWIG Ruby Memory Management

Hi everyone,

I've recently gotten up to speed with SWIG as I try to create Ruby
bindings for the GDAL open source project. SWIG Ruby supports seems
pretty good with one exception - memory management. I have run into
three issues:

* There is no support for telling SWIG that ownership of an object has
changed

* There is no built in infrastructure for making it easy to implement a
mark function

* Multiple Ruby objects can wrap the same C++ object / C struct.

I describe these issues in more detail in these two posts:

http://mailman.cs.uchicago.edu/pipermail/swig-dev/2005-August/014991.html
http://mailman.cs.uchicago.edu/pipermail/swig-dev/2005-August/014992.html

Since these seem to be problems that many people run into when writing
Ruby extensions with SWIG, I have spent some time implementing generic
solutions for them and would like to run them by the list for
comment/feedback/suggestions.

API

···

---------
The basic api is:

/* Add a mapping from a C/C++ struct to a Ruby object */
static void SWIG_RubyAddMapping(void* ptr, VALUE object);

/* Get the Ruby object that owns the specified C/C++ struct */
static VALUE SWIG_RubyInstanceFor(void* ptr);

/* Remove a mapping from a C/C++ struct to a Ruby object */
static void SWIG_RubyRemoveMapping(void* ptr);

The SWIG wrapper code takes care of calling RubyAddMapping and
RubyRemoveMapping when Ruby objects in extensions are created/deleted.
Thus the extension programmer generally only has to call
SWIG_RubyInstanceFor when implementing a mark function (although in some
cases you may have to call SWIG_RubyRemoveMapping if your are manually
destroying objects on the C++ side).

Implementation
----------------
To solve issues #2 and #3 requires keeping track of mappings between
C/C++ structs and Ruby objects. Reading through various newsgroups I
have seen two general approaches - have a hash table that stores these
mappings or add an field to each C/C++ struct to store a reference to a
Ruby object.

As can be seen in the code below, I have gone down the hashtable route
with a slight twist - I use a Ruby hash table from C to store these
mappings. I did this instead of using a C++ map since I wanted to the
code to be just C and it was easier then implementing a hash table from
scratch in C.

Performance
---------------
To make sure using a Ruby hash table performed adequately, I ran some
performance tests.

Test - Create 100,000 Animal (implemented in an extension, taken from
the SWIG Ruby manual). The test machine was a Pentium M Thinkpad T41
laptop.

Test Time
No mappings 9.3 to 9.6 seconds
Mappings using Ruby hash 9.5 to 9.9 seconds
Mappings using std::map 30 seconds

You can see there is slight performance hit for tracking mappings
between C++ objects and Ruby object. Note the C++ map was so slow
because deletions took a lot of time (and there were 100,000 deletetions
by the end of the test run).

Summary
---------
If people think it would be a good idea to add this functionality to
SWIG (whether using this implementation or some other one) then I'd be
glad to take the lead on this.

Thanks,

Charlie

Code
---------------------
This is a new file called rubymappings.swg

/* Global Ruby hash table to store mappings from C/C++
    structs to Ruby Objects. */
static VALUE swig_ruby_mappings;

/* Get a Ruby number to reference a pointer */
static VALUE SWIG_RubyPtrToReference(void* ptr) {
  /* We cast the pointer to an unsigned long
   and then store a reference to it using
   a Ruby number object. */

  /* Convert the pointer to a Ruby number */
  unsigned long value = (unsigned long) ptr;
  return LONG2NUM(value);
}

/* Get a Ruby number to reference an object */
static VALUE SWIG_RubyObjectToReference(VALUE object) {
  /* We cast the object to an unsigned long
   and then store a reference to it using
   a Ruby number object. */

  /* Convert the Object to a Ruby number */
  unsigned long value = (unsigned long) object;
  return LONG2NUM(value);
}

/* Get a Ruby object from a previously stored reference */
static VALUE SWIG_RubyReferenceToObject(VALUE reference) {
  /* The provided Ruby number object is a reference
  to the Ruby object we want.*/

  /* First convert the Ruby number to a C number */
  unsigned long value = NUM2LONG(reference);
  return (VALUE) value;
}

/* Add a mapping from a C/C++ struct to a Ruby object */
static void SWIG_RubyAddMapping(void* ptr, VALUE object) {
  /* In a Ruby hash table we store the pointer and
  the associated Ruby object. The trick here is
  that we cannot store the Ruby object directly - if
  we do then it cannot be garbage collected. So
  instead we typecast it as a unsigned long and
  convert it to a Ruby number object.*/

  /* Get a reference to the pointer as a Ruby number */
  VALUE key = SWIG_RubyPtrToReference(ptr);

  /* Get a reference to the Ruby object as a Ruby number */
  VALUE value = SWIG_RubyObjectToReference(object);

         rb_hash_aset(swig_ruby_mappings, key, value);
}

/* Get the Ruby object that owns the specfied C/C++ struct */
static VALUE SWIG_RubyInstanceFor(void* ptr) {
  /* Get a reference to the pointer as a Ruby number */
  VALUE key = SWIG_RubyPtrToReference(ptr);

  /* Now lookup the value stored in the Ruby hash table */
  VALUE value = rb_hash_aref(swig_ruby_mappings, key);
  
  if (value == Qnil) {
    return Qnil;
  }
  else {
    /* Convert this value to Ruby object */
    return SWIG_RubyReferenceToObject(value);
  }
}

/* Remove a mapping from a C/C++ struct to a Ruby object */
static void SWIG_RubyRemoveMapping(void* ptr) {
  /* Get a reference to the pointer as a Ruby number */
  VALUE key = SWIG_RubyPtrToReference(ptr);
  VALUE object = SWIG_RubyInstanceFor(ptr);

  /* Reset the C/C++ data struct associated with the Object.
  This is needed in case a Ruby object exists longer than
  its underlying C++ object. By setting the data_struct
  to nil, code in SWIG_Ruby_ConvertPtr can detect this problem
  and return an error message, as opposed to causing a
  segmentation fault.*/

  if (object != Qnil) {
    DATA_PTR(object) = 0;
  }
    
  /* Now delete the object from the hash table. To
  do this we need to call the Hash.delete method
  in Ruby. */
  static VALUE delete_function = rb_intern("delete");
  rb_funcall(swig_ruby_mappings, delete_function, 1, key);
}

/* Setup a Ruby hash table to store mappings */
void SWIG_RubyInitializeMappings() {
  /* Create a ruby hash table to store mappings from C++
  objects to Ruby objects. Also make sure to tell
  the garabage collector about the hash table */
  swig_ruby_mappings = rb_hash_new();
  rb_gc_register_address(&swig_ruby_mappings);
}

Patches to Hook Into SWIG
----------------------

Index: ruby.cxx

RCS file: /cvsroot/swig/SWIG/Source/Modules/ruby.cxx,v
retrieving revision 1.78
diff -u -i -r1.78 ruby.cxx
--- ruby.cxx 17 Aug 2005 20:35:53 -0000 1.78
+++ ruby.cxx 18 Aug 2005 18:41:22 -0000
@@ -465,6 +465,9 @@
       NIL);
      Printf(f_init,"\n");

+ // Initialize mapping code
+ Printf(f_init,"SWIG_RubyInitializeMappings();\n");
+
      Language::top(n);

      /* Finish off our init function */
@@ -1098,7 +1107,8 @@
        if (current == CONSTRUCTOR_INITIALIZE) {
    String *action = Getattr(n,"wrap:action");
    if (action) {
- Append(action,"DATA_PTR(self) = result;");
+ Append(action,"DATA_PTR(self) = result;\n");
+ Append(action,"SWIG_RubyAddMapping(result, self);\n");
    }
        }
        emit_action(n,f);
@@ -1850,6 +1860,8 @@
      Printf(freebody, "free((char*) %s);\n", Swig_cparm_name(0,0));
        }
      }
+ Printf(freebody, tab4, NIL);
+ Printf(freebody, "SWIG_RubyRemoveMapping(%s);\n", Swig_cparm_name(0,0));
      Printv(freebody, "}\n", NIL);

      Printv(f_wrappers, freebody, NIL);

Index: ruby.swg

RCS file: /cvsroot/swig/SWIG/Lib/ruby/ruby.swg,v
retrieving revision 1.29
diff -u -i -r1.29 ruby.swg
--- ruby.swg 1 Feb 2005 00:08:19 -0000 1.29
+++ ruby.swg 18 Aug 2005 18:44:19 -0000
@@ -6,6 +6,7 @@

  %runtime "rubyhead.swg"
  %runtime "swigrun.swg" // Common C API type-checking code
+%runtime "rubymappings.swg" /* Stores mappings from C/C++ struct
to Ruby objects */
  %runtime "rubydef.swg"

  %insert(initbeforefunc) "swiginit.swg"

Index: rubydef.swg

RCS file: /cvsroot/swig/SWIG/Lib/ruby/rubydef.swg,v
retrieving revision 1.17
diff -u -i -r1.17 rubydef.swg
--- rubydef.swg 1 Feb 2005 00:08:20 -0000 1.17
+++ rubydef.swg 18 Aug 2005 18:43:53 -0000
@@ -71,6 +71,13 @@

      if (!ptr)
    return Qnil;
+
+ /* Have we already wrapped this pointer? */

+ obj = SWIG_RubyInstanceFor(ptr);

+

+ if (obj != Qnil) {
+ return obj;
+ }

      if (type->clientdata) {
        sklass = (swig_class *) type->clientdata;

Charlie Savage schrieb:

...
As can be seen in the code below, I have gone down the hashtable route
with a slight twist - I use a Ruby hash table from C to store these
mappings. I did this instead of using a C++ map since I wanted to the
code to be just C and it was easier then implementing a hash table from
scratch in C.
...

I can't say anything about your code, cause I've never used SWIG yet, but it looks like you've done a very nice work. I just wanted to mention a hash table implementation in C which is used in the Ruby source code: just look at st.[hc].

Regards, Pit