Using Data_Wrap_Struct for nested structs

Hi,

I'm in the process of writing Ruby bindings for a C library. I've planned to wrap these structs using Data_Wrap_Struct etc.. but the problem that I'm facing is to do with nested structs. Here's an example below:

typedef struct pmUnits {
     unsigned int pad : 8;
     signed int scaleCount : 4;
     unsigned int scaleTime : 4;
     unsigned int scaleSpace : 4;
     signed int dimCount : 4;
     signed int dimTime : 4;
     signed int dimSpace : 4;
} pmUnits;

typedef struct pmDesc {
     pmID pmid; /* unique identifier */
     int type; /* base data type (see below) */
     pmInDom indom; /* instance domain */
     int sem; /* semantics of value (see below) */
     pmUnits units; /* dimension and units */
} pmDesc;

I am wanting to wrap these in two separate classes but the pmUnits field in the pmDesc struct isn't a pointer so it can't easily be wrapped. At the moment I am passing the reference of pmDesc.unit and creating a pmUnits class from that. This seems dangerous though as this class then does not own the memory which could be free()ed when the parent object becomes out of scope and gets garbage collected.

It's also worth noting that pmUnits is used in other parts of the library independently of pmDesc - its not always nested in pmDesc.

I initially started by having nested structs created as instance variables on the class done so via rb_iv_set() any memcpy()ing the data out of the struct but this started to become a problem when I needed to get the underlying C struct back from the Ruby class.

I'm wondering what others do in this situation?

See this gist for more details: https://gist.github.com/ryandoyle/21ce3cc661a36f22f10f

Cheers,
Ryan

As usual with Ruby, there must be several ways to manage data across
the Ruby-C divide. Personally, when a class has to exists in the C
world, I define a structure with the name of the class. For example, for
a class called Framereader:

typedef struct
{
...
...
} framereader_stc;

This, of course, can include nested classes. Why not? Then, in Init I have:

  cls_framereader=rb_define_class("Framereader",rb_cObject);
  rb_define_singleton_method(cls_framereader,"new",new_framereader,2);

and the first two lines of the "new" C implementations are as follows:

VALUE new_framereader(VALUE self,VALUE v_x,VALUE v_y)
{
  framereader_stc *s;
  VALUE sdata=Data_Make_Struct(cls_framereader,framereader_stc,NULL,free_framereader,s);
...
...

The return value of "new" is:

...
...
  return sdata;
}

This allocated data I make sure to free inside the free method
(free_framereader in the example):

static void free_framereader(void *p)
{
  framereader_stc *s=(framereader_stc *)p;
...
...
  free(s);
}

If I make sure that I free here what may have been allocated through
the lifetime of the object instance, there are no leaks. Often, my C
structures include pointers that are set to NULL at initialization,
and may be allocated at some point during runtime. In free, I have to
check if this has happened, and eventually free whatever memory
involved, As well as stop threads, close I/O units, and so on.

Then, in all C-defined methods, I get a pointer to my structure. For
example, if I had declared a method called 'unit' like this:

...
...
  rb_define_method(cls_framereader,"unit",framereader_unit,0);
...
...

the C code for the method will begin as follows:

VALUE framereader_unit(VALUE self)
{
  framereader_stc *s;
  Data_Get_Struct(self,framereader_stc,s);
...
...

This way, I am sure that the memory I use for the C side is properly
managed by Ruby's garbage collector. All the C memory space I use
either belongs to a class instance (and so I take care to free it in
the appropriate free method code), or I make sure to allocate and free
it within the same ruby-called method: generally, I write the 'malloc'
and 'free' lines one after the other, and then I insert whatever code
uses that memory in the middle.

When I need to pass something from C code that will be read from some
other C code, and that data will need to do part of the trip in
Ruby-land, I transform it into a Ruby String with rb_str_new. That
string is then part of the Ruby world, and thus managed by the Ruby
GC. But this does not happen too often. From experience I can say
that, if I come across problems of memory chunks being used in various
unrelated C code sections, if I find out that I'd need to juggle
around too many of these opaque data strings, the code is in need of a
bit of refactoring.

When I write a C part, that needs to be motivated (either for
performance reasons or because I need to interface external
libraries). If two C parts need to exchange data, either they have to
do it via Ruby (often, with translation from C structures to variables
that have a meaning for Ruby - that may have a cost in CPU cycles, but
I may find I can afford that), or I may find it possible and
beneficial to merge the two C parts into one - not just glue the two
codebases together, but merge the concepts represented by the classes
that the two C parts belong to.

This, of course, varies a lot from case to case.

HTH

Carlo

···

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Hi,

I'm in the process of writing Ruby bindings for a C library. I've
planned to wrap these structs using Data_Wrap_Struct etc.. but the
problem that I'm facing is to do with nested structs. Here's an
example below:

typedef struct pmUnits {

<snip>

} pmUnits;

typedef struct pmDesc {

<snip>

    pmUnits units; /* dimension and units */
} pmDesc;

I am wanting to wrap these in two separate classes but the pmUnits
field in the pmDesc struct isn't a pointer so it can't easily be
wrapped. At the moment I am passing the reference of pmDesc.unit and
creating a pmUnits class from that. This seems dangerous though as
this class then does not own the memory which could be free()ed when
the parent object becomes out of scope and gets garbage collected.

Yep. You need to prevent the parent from being GC-ed as long as
the child unit object is alive (and vice-versa)

It's also worth noting that pmUnits is used in other parts of the
library independently of pmDesc - its not always nested in pmDesc.

I initially started by having nested structs created as instance
variables on the class done so via rb_iv_set() any memcpy()ing the
data out of the struct but this started to become a problem when I
needed to get the underlying C struct back from the Ruby class.

It's also tricky to have copied data and attempting to synchronize
multiple sources of truth.

However, you may use rb_iv_set to stash a reference to the parent object
to prevent it from going out-of-scope AND use rb_iv_set again to stash a
reference to the child so they reference each other.

I'm wondering what others do in this situation?

If I'm understanding correctly, I would do the following:

static VALUE rb_pmapi_pmdesc_pmunits(VALUE self) {
    pmUnits *pm_units = &rb_pmapi_pmdesc_ptr(self)->units;

    /* check for existing objects wrapped, first */
    VALUE units = rb_iv_get(self, "@units");

    if (NIL_P(units)) {
  /* note: no free callback registered for embedded units: */
  units = Data_Wrap_Struct(pcp_pmapi_pmunits_class, 0, 0, pm_units);

  /*
   * This circular relationship prevents either from being GC-ed
   * before the other; but they will still be GC-ed together
   * Note: someone could still wreck your day by reaching into
   * the the objects to remove the references.
   */
  rb_iv_set(units, "@desc", self);
  rb_iv_set(self, "@units", units);
    }

    return units;
}

When you allocate a standalone pmUnits struct, you will need to wrap it
with an actual free callback (the -1 means use the default free
function):

    Data_Wrap_Struct(pcp_pmapi_pmunits_class, 0, -1, pm_units);

However, it looks like Ruby 2.3 will deprecate Data_*_Struct for the
safer TypedData macros instead. You can use two separate rb_data_type_t
structs, but they can share the same klass VALUE:

static const rb_data_type_t pm_units_embedded = {
    "pmUnits_embedded",
    { 0, 0, 0, },
    0, 0, 0
};

static const rb_data_type_t pm_units_standalone = {
    "pmUnits_standalone",
    { 0, RUBY_TYPED_DEFAULT_FREE, 0, },
    0, 0, 0
};

And use them as:

TypedData_Wrap_Struct(pcp_pmapi_pmunits_class, &pm_units_standalone, pm_units);
/* or */
TypedData_Wrap_Struct(pcp_pmapi_pmunits_class, &pm_units_embedded, pm_units);

···

Ryan Doyle <ryan@doylenet.net> wrote:

hey guys kindly click on the link and vote for me before 23rd,December.one
email id=1 vote. I wanna win this competiton FOREVERNOIDA | Merchant#985295

···

On Sun, Dec 20, 2015 at 3:04 PM, Eric Wong <normalperson@yhbt.net> wrote:

Ryan Doyle <ryan@doylenet.net> wrote:
> Hi,
>
> I'm in the process of writing Ruby bindings for a C library. I've
> planned to wrap these structs using Data_Wrap_Struct etc.. but the
> problem that I'm facing is to do with nested structs. Here's an
> example below:
>
> typedef struct pmUnits {

<snip>

> } pmUnits;
>
> typedef struct pmDesc {

<snip>

> pmUnits units; /* dimension and units */
> } pmDesc;
>
> I am wanting to wrap these in two separate classes but the pmUnits
> field in the pmDesc struct isn't a pointer so it can't easily be
> wrapped. At the moment I am passing the reference of pmDesc.unit and
> creating a pmUnits class from that. This seems dangerous though as
> this class then does not own the memory which could be free()ed when
> the parent object becomes out of scope and gets garbage collected.

Yep. You need to prevent the parent from being GC-ed as long as
the child unit object is alive (and vice-versa)

> It's also worth noting that pmUnits is used in other parts of the
> library independently of pmDesc - its not always nested in pmDesc.
>
> I initially started by having nested structs created as instance
> variables on the class done so via rb_iv_set() any memcpy()ing the
> data out of the struct but this started to become a problem when I
> needed to get the underlying C struct back from the Ruby class.

It's also tricky to have copied data and attempting to synchronize
multiple sources of truth.

However, you may use rb_iv_set to stash a reference to the parent object
to prevent it from going out-of-scope AND use rb_iv_set again to stash a
reference to the child so they reference each other.

> I'm wondering what others do in this situation?

If I'm understanding correctly, I would do the following:

static VALUE rb_pmapi_pmdesc_pmunits(VALUE self) {
    pmUnits *pm_units = &rb_pmapi_pmdesc_ptr(self)->units;

    /* check for existing objects wrapped, first */
    VALUE units = rb_iv_get(self, "@units");

    if (NIL_P(units)) {
        /* note: no free callback registered for embedded units: */
        units = Data_Wrap_Struct(pcp_pmapi_pmunits_class, 0, 0, pm_units);

        /*
         * This circular relationship prevents either from being GC-ed
         * before the other; but they will still be GC-ed together
         * Note: someone could still wreck your day by reaching into
         * the the objects to remove the references.
         */
        rb_iv_set(units, "@desc", self);
        rb_iv_set(self, "@units", units);
    }

    return units;
}

When you allocate a standalone pmUnits struct, you will need to wrap it
with an actual free callback (the -1 means use the default free
function):

    Data_Wrap_Struct(pcp_pmapi_pmunits_class, 0, -1, pm_units);

However, it looks like Ruby 2.3 will deprecate Data_*_Struct for the
safer TypedData macros instead. You can use two separate rb_data_type_t
structs, but they can share the same klass VALUE:

static const rb_data_type_t pm_units_embedded = {
    "pmUnits_embedded",
    { 0, 0, 0, },
    0, 0, 0
};

static const rb_data_type_t pm_units_standalone = {
    "pmUnits_standalone",
    { 0, RUBY_TYPED_DEFAULT_FREE, 0, },
    0, 0, 0
};

And use them as:

TypedData_Wrap_Struct(pcp_pmapi_pmunits_class, &pm_units_standalone,
pm_units);
/* or */
TypedData_Wrap_Struct(pcp_pmapi_pmunits_class, &pm_units_embedded,
pm_units);

Hi Carlo,

Thanks for the feedback. What I've compromised on for the moment is to make the pmUnits Ruby object immutable and is created from memcpy()ing the pmUnits struct out of pmDesc. It now looks something like this and only has read-only accessor methods for the wrapped pmUnits struct.

static VALUE rb_pmapi_pmunits_alloc(VALUE klass) {
     pmUnits *units_to_wrap = ALLOC(pmUnits);

     return Data_Wrap_Struct(klass, 0, rb_pmapi_pmunits_free, units_to_wrap);
}

VALUE rb_pmapi_pmunits_new(pmUnits pm_units) {
     VALUE instance;
     pmUnits *units_from_instance;

     instance = rb_pmapi_pmunits_alloc(pcp_pmapi_pmunits_class);
     Data_Get_Struct(instance, pmUnits, units_from_instance);
     memcpy(units_from_instance, &pm_units, sizeof(pmUnits));

     return instance;
}

pmUnits rb_pmapi_pmunits_get(VALUE self) {
     pmUnits pm_units;

     memcpy(&pm_units, rb_pmapi_pmunits_ptr(self), sizeof(pmUnits));

     return pm_units;
}

And rb_pmapi_pmunits_new() and rb_pmapi_pmunits_get() is used when creating a pmDesc as follows:

static VALUE rb_pmapi_pmdesc_initialize(VALUE self, VALUE pmid, VALUE type, VALUE indom, VALUE sem, VALUE units) {
     pmDesc *pm_desc = rb_pmapi_pmdesc_ptr(self);

     pm_desc->units = rb_pmapi_pmunits_get(units);
     pm_desc->indom = NUM2UINT(indom);
     pm_desc->pmid = NUM2UINT(pmid);
     pm_desc->sem = NUM2INT(sem);
     pm_desc->type = NUM2INT(type);

     return self;
}

static VALUE rb_pmapi_pmdesc_units(VALUE self) {
     return rb_pmapi_pmunits_new(rb_pmapi_pmdesc_ptr(self)->units);
}

void init_rb_pmapi_pmdesc(VALUE pmapi_class) {

     pcp_pmapi_pmdesc_class = rb_define_class_under(pmapi_class, "PmDesc", rb_cObject);
     rb_define_method(pcp_pmapi_pmdesc_class, "initialize", rb_pmapi_pmdesc_initialize, 5);
     ...
     rb_define_method(pcp_pmapi_pmdesc_class, "units", rb_pmapi_pmdesc_units, 0);

     rb_define_alloc_func(pcp_pmapi_pmdesc_class, rb_pmapi_pmdesc_alloc);

}

This does mean each call to #units creates a new Ruby object instead of the same one but it seems to be a pattern I've seen elsewhere so perhaps that's okay?

I could just set an instance variable and not worry about rb_pmapi_pmunits_new()? Maybe just:

static VALUE rb_pmapi_pmdesc_initialize(VALUE self, VALUE pmid, VALUE type, VALUE indom, VALUE sem, VALUE units) {
     pmDesc *pm_desc = rb_pmapi_pmdesc_ptr(self);

     pm_desc->units = rb_pmapi_pmunits_get(units);
     pm_desc->indom = NUM2UINT(indom);
     pm_desc->pmid = NUM2UINT(pmid);
     pm_desc->sem = NUM2INT(sem);
     pm_desc->type = NUM2INT(type);

     rb_iv_set(self, '@units', units);

     return self;
}

This will still have the correct C struct data, but calls to #units could then just be

rb_define_attr(pcp_pmapi_pmdesc_class, "units", 1, 0)

Thanks again and any feedback on this approach would be appreciated.

Ryan

(re-adding ruby-talk to Cc:)

Yes, the GC can handle circular references all day long.

···

Ryan Doyle <ryan@doylenet.net> wrote:

This looks like a nice solution. I assume Ruby's GC can handle this
circular reference? From the reading I've done it seems to be the
case.

Hi Eric,

Eric Wong <normalperson@yhbt.net> writes:

   * This circular relationship prevents either from being GC-ed
   * before the other; but they will still be GC-ed together
   * Note: someone could still wreck your day by reaching into
   * the the objects to remove the references.
   */
  rb_iv_set(units, "@desc", self);
  rb_iv_set(self, "@units", units);

It’s been a while since I last wrote Ruby C extensions, but I still
remember having read that it is possible to call rb_iv_set() with
instance variable names that do not start with the @ sign, e.g.

  rb_iv_set(units, "desc", self);

. If I remember correctly, the effect of such a statement was that the
GC still recognises the reference, and the value from the variable is
still reachable from the C side by means of rb_iv_get(). However, the
instance variable is inaccessible from the Ruby side even by means of
metaprogramming. Has this changed, or is my memory of the issue simply
incorrect? Because if this still holds true, @-less C-side instance
variables would make a nice fit for preventing the GC from collecting
objects, and they’d still be protected from being removed from within
Ruby land.

Greetings
Marvin

···

--
#!/sbin/quintus
Blog: http://www.guelkerdev.de

GnuPG key: F1D8799FBCC8BC4F

Quoting Ryan Doyle (ryan@doylenet.net):

Thanks for the feedback. What I've compromised on for the moment is to make
the pmUnits Ruby object immutable and is created from memcpy()ing the
pmUnits struct out of pmDesc. It now looks something like this and only has
read-only accessor methods for the wrapped pmUnits struct.

It is difficult for me to evaluate. I have settled for only allocating
memory that has to be managed by Ruby's GC from the "new" method, and
this simplifies things a lot. I never needed to split between creating
the memory area and assigning it, and very rarely I had the need to
manage from C other sorts of ruby object than booleans, numbers, strings,
arrays and hashes.

Maybe others have ways that can cover more your specific usage. I
personally prefer to keep the two worlds quite separate. It is enough
for me to be able to refer to one specific area of memory associated
to each object instance, and use it in a pure C way from within C
code. Let's say that I am not attracted by complex tricks,
generally. Occam's razor is my friend.

Carlo

···

Subject: Re: Using Data_Wrap_Struct for nested structs
  Date: Sun 20 Dec 15 06:42:34PM +1100

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Eric Wong <normalperson@yhbt.net> writes:
> * Note: someone could still wreck your day by reaching into
> * the the objects to remove the references.
> */
> rb_iv_set(units, "@desc", self);
> rb_iv_set(self, "@units", units);

It’s been a while since I last wrote Ruby C extensions, but I still
remember having read that it is possible to call rb_iv_set() with
instance variable names that do not start with the @ sign, e.g.

  rb_iv_set(units, "desc", self);

Yes, it's possible. I thought it was a weird behavior, possibly
unofficially supported and subject to change.

However it's in doc/extension.rdoc (the new README.EXT) in trunk:

  VALUE rb_iv_get(VALUE obj, const char *name) ::

    Retrieve the value of the instance variable. If the name is not
    prefixed by `@', that variable shall be inaccessible from Ruby.

So I guess it's safe to use in your own extensions.

. If I remember correctly, the effect of such a statement was that the
GC still recognises the reference, and the value from the variable is
still reachable from the C side by means of rb_iv_get(). However, the
instance variable is inaccessible from the Ruby side even by means of
metaprogramming. Has this changed, or is my memory of the issue simply
incorrect? Because if this still holds true, @-less C-side instance
variables would make a nice fit for preventing the GC from collecting
objects, and they’d still be protected from being removed from within
Ruby land.

Yep, all true. They're used extensively inside C Ruby this way.

···

Quintus <quintus@quintilianus.eu> wrote:

Hi Eric,

Eric Wong <normalperson@yhbt.net> writes:

Yes, it's possible. I thought it was a weird behavior, possibly
unofficially supported and subject to change.

However it's in doc/extension.rdoc (the new README.EXT) in trunk:

Good to know that this is now officially documented. Thank you for both
investigation and confirmation.

Happy coding!
Marvin

···

--
#!/sbin/quintus
Blog: http://www.guelkerdev.de

GnuPG key: F1D8799FBCC8BC4F

Does anyone know how I can remove myself from receiving these emails?
Thanks.

···

On Tue, Dec 22, 2015 at 3:00 AM, Quintus <quintus@quintilianus.eu> wrote:

Hi Eric,

Eric Wong <normalperson@yhbt.net> writes:
> Yes, it's possible. I thought it was a weird behavior, possibly
> unofficially supported and subject to change.
>
> However it's in doc/extension.rdoc (the new README.EXT) in trunk:

Good to know that this is now officially documented. Thank you for both
investigation and confirmation.

Happy coding!
Marvin

--
#!/sbin/quintus
Blog: http://www.guelkerdev.de

GnuPG key: F1D8799FBCC8BC4F

Melissa Fares <melissa.fares@gmail.com> writes:

Does anyone know how I can remove myself from receiving these emails?
Thanks.

Go to:

  Mailing Lists

Select "Ruby Talk" and "Unsubscribe", enter your email address, and
submit the form.

Why is this so excessive on Ruby-Talk? I am member of a dozen
mailinglists, and not a single one receives so many "how can I
unsubscribe" emails like Ruby-Talk.

Greetings
Marvin

···

--
#!/sbin/quintus
Blog: http://www.guelkerdev.de

GnuPG key: F1D8799FBCC8BC4F

Perhaps Ruby attracts users unfamiliar with mailing lists and don't
know to check the List-Unsubscribe header.

It might help to put the unsubscribe info in the signature of
every unsigned/non-DKIM message as some lists do. Something like:

···

Quintus <quintus@quintilianus.eu> wrote:

Why is this so excessive on Ruby-Talk? I am member of a dozen
mailinglists, and not a single one receives so many "how can I
unsubscribe" emails like Ruby-Talk.

Hello,

···

2015-12-23 5:55 GMT+09:00 Eric Wong <normalperson@yhbt.net>:

It might help to put the unsubscribe info in the signature of
every unsigned/non-DKIM message as some lists do. Something like:

Thanks for your suggestion.
I've configured the mailing list manager.

--
Shugo Maeda

Shugo Maeda <shugo@ruby-lang.org> writes:

Thanks for your suggestion.
I've configured the mailing list manager.

Thank you. Let’s hope that this finally resolves this issue.

Greetings
Marvin

···

--
#!/sbin/quintus
Blog: http://www.guelkerdev.de

GnuPG key: F1D8799FBCC8BC4F