Data_Make_Struct and ALLOC Considered Harmful?

Billtj · 20 August 2002 15:30

Hi,

This is a follow-up of the thread with title “Data_Make_Struct
Considered Dangerous?”. Rather than making the thread larger and
larger, I try to start discussing the issue with a fresh start.

All this discussion is using ruby -v: ruby 1.6.7 (2002-03-01)
[i686-linux].

Data_Make_Struct is defined in ruby.h as

#define Data_Make_Struct(klass,type,mark,free,sval) (
sval = ALLOC(type),
memset(sval, 0, sizeof(type)),
Data_Wrap_Struct(klass,mark,free,sval)
)

As we can see, the sequence is first ALLOC() is called, and then the
memory is set to zero by using memset(). ALLOC() is just an alias of
xmalloc(), which is just an alias of ruby_xmalloc(). ruby_xmalloc()
is defined in gc.c as

void *
ruby_xmalloc(size)
long size;
{
void *mem;

if (size < 0) {
rb_raise(rb_eNoMemError, "negative allocation size (or too big)");
}
if (size == 0) size = 1;
malloc_memories += size;

if (malloc_memories > GC_MALLOC_LIMIT) {
rb_gc();
}
RUBY_CRITICAL(mem = malloc(size));
if (!mem) {
rb_gc();
RUBY_CRITICAL(mem = malloc(size));
if (!mem) {
    if (size >= 10 * 1024 * 1024) {
	rb_raise(rb_eNoMemError, "tried to allocate too big memory");
    }
    mem_error("failed to allocate memory");
}
}

return mem;

}

As we can see, rb_gc() may be called before the memory is returned.
rb_gc(), in turn, calls all the rb_gc_mark() functions. Therefore,
one particular issue is:

“Unless the C programmer is very careful, when using ALLOC it is very
possible that the marking function is called before a memory space is
properly initialized”.

Suppose we assume that the marking functions are really proper, i.e.,
they always check the state of each memory before rb_gc_mark is called
such as:

if (ptr) rb_gc_mark (...);

Suppose also we have a struct such as

typedef struct
{
    struct_A *data1;
    struct_B *data2;
} struct_C;

which is involved in some rb_gc_mark() because struct_C is part of
(but not direct) internal data of some class. Now, the following code
may fail:

....
ptr = ALLOC (struct_C);
ptr->data1 = ALLOC (struct_A);    /* may fail here because of

rb_gc() */
ptr->data2 = ALLOC (struct_B);

but the following code will never fail:

....
ptr = SAFE_MALLOC (struct_C);
ptr->data1 = SAFE_MALLOC (struct_A);
ptr->data2 = SAFE_MALLOC (struct_B);

where SAFE_MALLOC is just a wrapper of malloc() that deals with
out-of-memory. The only way to solve the ALLOC problem is to do
"double-initialization":

....
ptr = ALLOC (struct_C);
ptr->data1 = NULL;
ptr->data2 = NULL;    /* or just use one memset when it is proper

*/
ptr->data1 = ALLOC (struct_A);
…

We see that the initialization to zero by memset in Data_Make_Struct
does not really help here. It only helps when struct_C is a direct
data part of some class, in which case it was created using

obj = Data_Make_Struct (cClass, struct_C, ..., ..., ptr);

But when ptr is some other level deeper inside obj, the above problem
occurs.

Therefore, rather than dealing with this extra issue, why don’t we
simply not use ALLOC()? Yes, when used properly, nothing is harmful.
But just like the GO TO statement, it creates a potential pitfall for
the unaware C programmer. This stems from the fact that structured
programming was created to minimize the “jumping around” problem with,
for example, the GO TO statement. But ALLOC(), with its rb_gc(), with
its rb_gc_mark(), follows some “jumping around” pattern too.

Not using ALLOC also has this benefit: Let Ruby manage its own memory
and let the C part manage its own memory. There is no need for the C
part to be involved in Ruby memory management. (If really necessary,
the C part can call rb_gc() explicitly.) There was a comment in the
previous thread that by not using ALLOC, the Ruby heuristic of when to
start the next GC might be way off; but isn’t that this problem also
occurs when the C part is not there? (For example, with
GC_MALLOC_LIMIT of 8000000, isn’t that there is always a potential of
Ruby of wasting 7999999 bytes of memory, whether it is pure Ruby or a
mix of Ruby and C? If the pure Ruby already has some mechanism of
getting rid the wasted 7999999 bytes, then there is no need for the C
part to interfere.)

Finally, any Ruby function that allocates some memory (such as
rb_str_new2) has the potential of invoking rb_gc_mark(). I don’t see
any other way around this problem. However, when we don’t use
ALLOC(), we reduce the possibility of error. I guess it is the same
as the GO TO statement. When used properly, it is not dangerous; and
even when it is removed, a language like C is still "dangerous"
because of pointers. The removal (or disuse) of the GO TO statement
does not make a language 100% safe, but it seems that now people do
not use GO TO very-very often anymore because people have understood
the issue.

Therefore, to summarize:

1) Use malloc() (or some SAFE_MALLOC()) to allocate memory; don't

use ALLOC
2) Because of 1), use Data_Wrap_Struct; don’t use Data_Make_Struct
3) Be aware that before calling any Ruby function that may
allocate memory (such as Data_Wrap_Struct() and rb_str_new2()), all
memory should be proper (double-initialization may be required)
4) In the absence of 3), using malloc() a C programmer can proceed
as usual: allocate memory and then initialize it (no need for
double-initialization)

Regards,

Bill

A particular note to Guy: If you have not understood the issue : don’t
response.
We have no need for a technical-less “it is stupid” comment.

···

============================================================================

Data_Make_Struct() is not dangerous. The goal of Data_Make_Struct()
is
to call ALLOC(), initialize all members of the struct to zero then
call
Data_Wrap_Struct()

If you have not understood this : don’t use it.

Guy Decoux

ts1 · 20 August 2002 15:52

sval = ALLOC(type),\
memset(sval, 0, sizeof(type)),\

[...]

    typedef struct
    {
        struct_A *data1;
        struct_B *data2;
    } struct_C;

[...]

out-of-memory. The only way to solve the ALLOC problem is to do
"double-initialization":

    ....
    ptr = ALLOC (struct_C);
    ptr-> data1 = NULL;
    ptr-> data2 = NULL; /* or just use one memset when it is proper

pigeon% cat aa.c
#include "ruby.h"

typedef struct
{
int *x;
} struct_A;

typedef struct
{
int *x;
} struct_B;

typedef struct
{
struct_A *data1;
struct_B *data2;
} struct_C;

void Init_aa()
{
    struct_C *c;
    VALUE res = Data_Make_Struct(rb_cObject, struct_C, 0, free, c);
    if (c->data1 == NULL) {
        rb_warn("c->data1 is NULL");
    }
    if (c->data2 == NULL) {
        rb_warn("c->data2 is NULL");
    }
}
pigeon%

pigeon% ruby -raa -e 1
warning: c->data1 is NULL
warning: c->data2 is NULL
pigeon%

Guy Decoux

Mauricio_Fernndez · 20 August 2002 18:57

Suppose we assume that the marking functions are really proper, i.e.,
they always check the state of each memory before rb_gc_mark is called
such as:
if (ptr) rb_gc_mark (...);
Suppose also we have a struct such as
typedef struct
{
    struct_A *data1;
    struct_B *data2;
} struct_C;
which is involved in some rb_gc_mark() because struct_C is part of
(but not direct) internal data of some class.

If I understand it correctly, you’ve got a pointer to a struct_C value
inside the struct that is actually Data_Wrap_Struct()'ed.

It seems to me that either

the struct_C value is ALLOCated or malloc()ed and there’s no need
to mark any VALUE whatsoever (as it isn’t wrapped), so it won’t attempt
to mark data1 and data2 before these parts of the struct are properly
initialized
you use Data_Make_Struct and the struct is memset to 0, so nothing
bad happens if the mark function is called by rb_gc_mark().

Now, the following code
may fail:
....
ptr = ALLOC (struct_C);
ptr->data1 = ALLOC (struct_A);    /* may fail here because of
rb_gc() */
ptr->data2 = ALLOC (struct_B);

Why? There’s no marking involved if you only use ALLOC, AFAIK.
No way ptr->data1 and ptr->data2 can be used before initialized.

where SAFE_MALLOC is just a wrapper of malloc() that deals with
out-of-memory. The only way to solve the ALLOC problem is to do
“double-initialization”:
....
ptr = ALLOC (struct_C);
ptr->data1 = NULL;
ptr->data2 = NULL;    /* or just use one memset when it is proper
*/
ptr->data1 = ALLOC (struct_A);
…

I can’t see the point of this, but wouldn’t SAFE_ALLOC (a wrapper to
ALLOC which memsets to 0) be a better solution?

I don’t really see the problem in the code above; maybe if you gave a
little more context. I think it’s a non-issue as no marking function is
involved to mark the fields of struct_C, AFAIK.

···

On Wed, Aug 21, 2002 at 12:30:24AM +0900, Bill Tj wrote:

–
_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Linux: the operating system with a CLUE… Command Line User Environment.
– seen in a posting in comp.software.testing

Mauricio_Fernndez · 20 August 2002 19:00

For some reason he wants to use ALLOC, and not Data_Make_Struct. Ergo,
no VALUE is constructed, and there’s no marking afterwards, so… no
problem?

···

On Wed, Aug 21, 2002 at 12:52:43AM +0900, ts wrote:

VALUE res = Data_Make_Struct(rb_cObject, struct_C, 0, free, c);

–
_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Win95 is not a virus; a virus does something.
– unknown source

William_Djaja_Tjokro · 20 August 2002 19:10

Hi Guy,

It took me a while to understand what is being proven by the code
below. I guess that it shows that double initialization is
unnecessary. Yes, it is true, but in my original post the issue is when
struct_C is not the direct data pointer in the object, but one or more
levels deeper. We need some other struct_D as the direct data pointer in
the object:

....
typedef struct
{
    struct_C* data3;
} struct_D;

struct_D* c;

VALUE res = Data_Make_Struct(rb_cObject, struct_D, 0, free, c);
c->data3 = ALLOC (struct_C);
c->data3->data1 = ALLOC (struct_A);    /* may rb_gc_mark() here

while data2 is still undefined */
…

One good advice from Paul Brannan is instead of double initialization, we
can reverse the order of creation of data (inner first, then outer).

Regards,

Bill

···

===========================================================================
ts decoux@moulon.inra.fr wrote:

pigeon% cat aa.c
#include “ruby.h”

typedef struct
{
int *x;
} struct_A;

typedef struct
{
int *x;
} struct_B;

typedef struct
{
struct_A *data1;
struct_B *data2;
} struct_C;

void Init_aa()
{
struct_C *c;
VALUE res = Data_Make_Struct(rb_cObject, struct_C, 0, free, c);
if (c->data1 == NULL) {
rb_warn(“c->data1 is NULL”);
}
if (c->data2 == NULL) {
rb_warn(“c->data2 is NULL”);
}
}
pigeon%

pigeon% ruby -raa -e 1
warning: c->data1 is NULL
warning: c->data2 is NULL
pigeon%

Guy Decoux

William_Djaja_Tjokro · 20 August 2002 19:30

Hi,

To give it more context, suppose struct_B contains a VALUE (a pure VALUE,
which is not a wrapper):

typedef struct
{
    VALUE data4;
} struct_B;

and the marking function for the top level wrapper object is as follows:

....
if (ptr->data1) rb_gc_mark (ptr->data1...);
if (ptr->data2) rb_gc_mark (ptr->data2->data4);
....

Therefore, when “ptr->data1 = ALLOC (struct_A);” is executed, ptr->data2
may not be NULL.

I also like your suggestion of SAFE_ALLOC(), although for philosophical
reason, I choose SAFE_MALLOC().

Regards,

Bill

···

==================================================================
Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

which is involved in some rb_gc_mark() because struct_C is part of
(but not direct) internal data of some class.

If I understand it correctly, you’ve got a pointer to a struct_C value
inside the struct that is actually Data_Wrap_Struct()'ed.

It seems to me that either

the struct_C value is ALLOCated or malloc()ed and there’s no need
to mark any VALUE whatsoever (as it isn’t wrapped), so it won’t attempt
to mark data1 and data2 before these parts of the struct are properly
initialized

you use Data_Make_Struct and the struct is memset to 0, so nothing
bad happens if the mark function is called by rb_gc_mark().

Now, the following code
may fail:
....
ptr = ALLOC (struct_C);
ptr->data1 = ALLOC (struct_A);    /* may fail here because of
rb_gc() */
ptr->data2 = ALLOC (struct_B);

Why? There’s no marking involved if you only use ALLOC, AFAIK.
No way ptr->data1 and ptr->data2 can be used before initialized.

I can’t see the point of this, but wouldn’t SAFE_ALLOC (a wrapper to
ALLOC which memsets to 0) be a better solution?

I don’t really see the problem in the code above; maybe if you gave a
little more context. I think it’s a non-issue as no marking function is
involved to mark the fields of struct_C, AFAIK.

Paul_Brannan · 20 August 2002 19:52

Or perhaps CALLOC or RUBY_CALLOC, to be consistent with C naming convention.

Paul

···

On Wed, Aug 21, 2002 at 04:30:47AM +0900, William Djaja Tjokroaminata wrote:

Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

I can’t see the point of this, but wouldn’t SAFE_ALLOC (a wrapper to
ALLOC which memsets to 0) be a better solution?
I also like your suggestion of SAFE_ALLOC(), although for philosophical
reason, I choose SAFE_MALLOC().

Mauricio_Fernndez · 20 August 2002 23:09

In fact I wanted more detail about the top level object (the one that
is constructed with Data_Make_Struct), but I got it from you last reply
to Guy.

(copied here for reference)

typedef struct {
struct_C* data3;
} struct_D;
struct_D* c;

VALUE res = Data_Make_Struct(rb_cObject, struct_D, 0, free, c);
c->data3 = ALLOC(struct_C);

It get it now. I just couldn’t see why you’d like to do
ptr = ALLOC(struct_C);
instead of using Data_Make_Struct(). But well, I guess it’s for
performance; since you’re already making res, yet another VALUE if
pointless.

BTW, I do believe this thread wouldn’t have been so loong if you had
explained before that the top level object manages (marks) things inside
struct_C instead of letting it do this itself. It turns out Guy’s
last post reflects that he (amongst several others) thought there were
only two layers involved. As for me, I couldn’t see why you were so
pressed to use ALLOC(struct_C) instead of Data_Make_Struct and letting it
manage its fields.
In fact, I now remember that’s how this discussion started (performance when
wrapping C structs), but it was so long ago I think we all forgot it

I consider we finally did Thread.current.kill ;->

···

On Wed, Aug 21, 2002 at 04:30:47AM +0900, William Djaja Tjokroaminata wrote:

Hi,

To give it more context, suppose struct_B contains a VALUE (a pure VALUE,
which is not a wrapper):
typedef struct
{
    VALUE data4;
} struct_B;
and the marking function for the top level wrapper object is as follows:
....
if (ptr->data1) rb_gc_mark (ptr->data1...);
if (ptr->data2) rb_gc_mark (ptr->data2->data4);
....

–
_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

“You, sir, are nothing but a pathetically lame salesdroid!
I fart in your general direction!”
– Randseed on #Linux

William_Djaja_Tjokr1 · 21 August 2002 13:32

Just the last details. I want to use

ptr = ALLOC(struct_C);

because the data is really a pure C data (no Ruby object involved). Using
Data_Make_Struct to do the allocation would be overkill, don’t you think?

I guess my original post was so long that some details were really lost,
but really I have tried to explain that this problem is only an issue
when the struct is one or more levels deeper inside the wrapping object.

Yes, the discussion became difused somewhat because really the current
difference between ALLOC() and alloc() involves two aspects: the memory
problem when ALLOC() is not used properly and the execution performance
difference between the two. I myself sometimes confused as I am learning
to program in this Ruby/C world.

Thread.current.kill(-9) ?

Regards,

Bill

···

============================================================================
Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

It get it now. I just couldn’t see why you’d like to do
ptr = ALLOC(struct_C);
instead of using Data_Make_Struct(). But well, I guess it’s for
performance; since you’re already making res, yet another VALUE if
pointless.

BTW, I do believe this thread wouldn’t have been so loong if you had
explained before that the top level object manages (marks) things inside
struct_C instead of letting it do this itself. It turns out Guy’s
last post reflects that he (amongst several others) thought there were
only two layers involved. As for me, I couldn’t see why you were so
pressed to use ALLOC(struct_C) instead of Data_Make_Struct and letting it
manage its fields.
In fact, I now remember that’s how this discussion started (performance when
wrapping C structs), but it was so long ago I think we all forgot it

I consider we finally did Thread.current.kill ;->

Topic		Replies	Views
Data_Make_Struct Considered Dangerous? ruby-talk	38	288	22 August 2002
Multi-threaded C++, new, and xmalloc ruby-talk	4	110	24 November 2006
Questions about GC, memory management and C extensions ruby-talk	5	121	8 July 2005
In an extension, doing a Data_Make_Struct and getting value back ruby-talk	4	123	5 November 2005
A couple of Ruby API questions ruby-talk	1	127	9 February 2005

Data_Make_Struct and ALLOC Considered Harmful?

Related topics