Data_Make_Struct Considered Dangerous?

If you ignore the out-of-memory situation, then there is no need to use
ALLOC. The point of ALLOC is that a) you may want to free memory
periodically and b) if you are out of memory, malloc will fail and ALLOC
will not. If this is not a problem for you, or if you need to do your
own memory allocation that does not involve malloc, then do not use
ALLOC.

Some of us do have to worry about out-of-memory situations. For these
cases, ALLOC makes perfect sense.

I have other code that uses operator new to allocate memory for objects
in Ruby extensions. In C++, this makes perfect sense.

Use whichever method of allocation makes sense in your application.

Paul

···

On Fri, Aug 16, 2002 at 05:55:16AM +0900, William Djaja Tjokroaminata wrote:

  1. Second, the “danger” in Data_Make_Struct comes from ALLOC and even
    from the Data_Wrap_Struct itself, which may invoke the garbage
    collector. However, I still do not get the answer why I ever need to
    use ALLOC instead of malloc and its derivatives. For example,
    ignoring run-out-of-memory problem, a C statement such as

Hi Paul,

Thanks for the information. It becomes clear to me now. Yes, when I
mentioned “malloc and its derivatives”, it means some other functions that
may provide extra security for guarding against out-of-memory (basically
some wrapper function around malloc), but which has nothing to do with
garbage collector. Because my current objective is to be able to run with
GC.disable, then indeed I don’t need ALLOC.

To me, writing Ruby C extension has been a very rewarding
experience. Yes, Ruby is very elegant, and C is very powerful. When we
combine these two, wow, it becomes a much more powerful combination,
because there are things that we cannot do easily in Ruby alone (such as
having a pointer, as in my previous question), and there are things
that we cannot do easily with standard C library alone (such as
dynamic creation of user-defined objects).

The critical junction, however, to me is the Data_Make_Struct/Data_Wrap_Struct
and the understanding on the consequence of having a garbage collector in
C programming, which is really something new. Once we master this, I
guess everything else is just minor detail.

Regards,

Bill

···

==========================================================================
Paul Brannan pbrannan@atdesk.com wrote:

If you ignore the out-of-memory situation, then there is no need to use
ALLOC. The point of ALLOC is that a) you may want to free memory
periodically and b) if you are out of memory, malloc will fail and ALLOC
will not. If this is not a problem for you, or if you need to do your
own memory allocation that does not involve malloc, then do not use
ALLOC.

Some of us do have to worry about out-of-memory situations. For these
cases, ALLOC makes perfect sense.

I have other code that uses operator new to allocate memory for objects
in Ruby extensions. In C++, this makes perfect sense.

Use whichever method of allocation makes sense in your application.

Paul

OK, one final thought/question.

It seems that based on the responses so far, the usefullness of ALLOC is:
1) to free memory periodically
2) safe when out-of-memory

Suppose we have a version of malloc that can also exit gracefully when
out-of-memory, and therefore the only advantage of ALLOC is to free memory
periodically (by invoking the garbage collector).

Suppose in this Ruby/C world we divide the memory into “Ruby memory” and
"C memory". Ruby memory is defined to be memory consumed by Ruby because
of user’s Ruby script and C memory is defined to be memory that is
allocated/deallocated by the C programmer who is writing Ruby/C. To make
discussion simpler, assume that the total memory used is stable during the
life of the process.

To be interesting, we assume that the C memory is dynamic, i.e., memory is
continually being consumed (by using ALLOC() or malloc()) and being
released (by using free). Then we have two possibilities regarding the
Ruby memory: either static or dynamic. Ruby memory can become static
because in the user Ruby script, once the script is completely parsed,
there are no more new Ruby objects being created. Then,

  1. When Ruby memory is dynamic, we don’t need to use ALLOC to free memory
    periodically, because the garbage collector is already being called
    often by Ruby internals. If we use ALLOC, then the garbage collector is
    called even more often, which may result in execution performance
    degradation.

  2. When Ruby memory is static, there is no real need to invoke the garbage
    collector. If we use ALLOC, then the garbage collecting is only an
    execution waste, because no new memory will actually be released.

So, it seems that we should always use malloc (or its safe
version) instead of ALLOC. What is the flaw in the above
argument? (Sorry, plain response such as “it is stupid” is not
acceptable.)

If the above argument is accepted, then because Data_Make_Struct calls
ALLOC explicitly, it also follows that Data_Wrap_Struct should be
preferred to Data_Make_Struct.

Regards,

Bill

William Djaja Tjokroaminata wrote:

  1. When Ruby memory is dynamic, we don’t need to use ALLOC to free memory
    periodically, because the garbage collector is already being called
    often by Ruby internals. If we use ALLOC, then the garbage collector is
    called even more often, which may result in execution performance
    degradation.

Suppose your ruby code produces a large amount of garbage, but not
enough to trigger GC, and then your C code takes over for a while and
starts allocating. Unless you call back into ruby code or manually
invoke ALLOC or GC, that amount of garbage is, in effect, deducted from
the space your C code has to play with.

[…]

  1. When Ruby memory is dynamic, we don’t need to use ALLOC to free memory
    periodically, because the garbage collector is already being called
    often by Ruby internals. If we use ALLOC, then the garbage collector is
    called even more often, which may result in execution performance
    degradation.

This is inaccurate. Ruby only knows about the memory allocated via
ALLOC(), not of memory allocated via malloc(). So, Ruby’s heuristic
as to when to start the next GC might be way off, and you might
use more physical memory than necessary. In the extreme case, Ruby
would not start garbage collecting before you run out of swap space.

  1. When Ruby memory is static, there is no real need to invoke the garbage
    collector. If we use ALLOC, then the garbage collecting is only an
    execution waste, because no new memory will actually be released.

Once memory has become static, there will be no more calls to the
garbage collector; hence no CPU overhead from the garbage collector,
either.

So, it seems that we should always use malloc (or its safe
version) instead of ALLOC. What is the flaw in the above
argument? (Sorry, plain response such as “it is stupid” is not
acceptable.)

A big flaw in your argument is that you assume that you can always take
over the duties of freeing memory yourself. That works if the allocated
memory is referenced only by a single object. If the reference gets
passed around, you’ll essentially find that you’re implementing memory
management yourself again, trying to find out if the malloc()ed memory
is still accessible.

If the above argument is accepted, then because Data_Make_Struct calls
ALLOC explicitly, it also follows that Data_Wrap_Struct should be
preferred to Data_Make_Struct.

Note that Data_Wrap_Struct() can initiate garbage collection as well,
namely in rb_newobj(). It doesn’t really make a difference.

		Reimer Behrends
···

William Djaja Tjokroaminata (billtj@y.glue.umd.edu) wrote:

Now we can start talking about the real numbers. In my linux 1.6.7 Ruby,
it is defined in gc.c that

#ifndef GC_MALLOC_LIMIT
#if defined(MSDOS) || defined(human68k)
#define GC_MALLOC_LIMIT 200000
#else
#define GC_MALLOC_LIMIT 8000000
#endif
#endif

I don’t know about the units, but it is likely that they are in
bytes. Therefore, if we are not using DOS and human68k(?), the maximum
amount of garbage that we may retain is 8 megabytes. For me, memory is
relatively cheap these days, and as I will likely have only one such
process in a single computer, 8 megabytes (more likely less than this, as
I will have real Ruby objects that I need to keep) is a small price to pay
for the increase in the execution efficiency. Furthermore, we can always
redefine GC_MALLOC_LIMIT to reduce the maximum possible wasted memory.

In fact, in my application I hope that once the Ruby code is parsed, my C
code takes over until the process is finished, so that I will only have “C
speed” and not “Ruby speed” at all. Ruby is used only for system
specifications and configurations. Based on your response alone, it seems
that I am on the right track for my (specific) application.

Regards,

Bill

···

===========================================================================
Joel VanderWerf vjoel@path.berkeley.edu wrote:

William Djaja Tjokroaminata wrote:

  1. When Ruby memory is dynamic, we don’t need to use ALLOC to free memory
    periodically, because the garbage collector is already being called
    often by Ruby internals. If we use ALLOC, then the garbage collector is
    called even more often, which may result in execution performance
    degradation.

Suppose your ruby code produces a large amount of garbage, but not
enough to trigger GC, and then your C code takes over for a while and
starts allocating. Unless you call back into ruby code or manually
invoke ALLOC or GC, that amount of garbage is, in effect, deducted from
the space your C code has to play with.

Hi,

···

In message “Re: Data_Make_Struct Considered Dangerous?” on 02/08/19, William Djaja Tjokroaminata billtj@y.glue.umd.edu writes:

In fact, in my application I hope that once the Ruby code is parsed, my C
code takes over until the process is finished, so that I will only have “C
speed” and not “Ruby speed” at all. Ruby is used only for system
specifications and configurations. Based on your response alone, it seems
that I am on the right track for my (specific) application.

I think so too. If you don’t have memory interaction with Ruby
interpreter, you don’t have to use ALLOC().

						matz.

This is inaccurate. Ruby only knows about the memory allocated via
ALLOC(), not of memory allocated via malloc(). So, Ruby’s heuristic
as to when to start the next GC might be way off, and you might
use more physical memory than necessary. In the extreme case, Ruby
would not start garbage collecting before you run out of swap space.


Please see my response to Joel. Isn’t it that the maximum possible wasted
memory is determined by GC_MALLOC_LIMIT, which has default values of 8
megabytes? As an example, suppose I have a total of 128 Mbytes of
memory. My Ruby memory is 16.1 Mbytes and my C memory is 64
Mbytes. Isn’t it that the total used memory will be 24 + 64 = 88 Mbytes?

A problem occurs if, say, you have an Image class that allocates memory
for the pixel data using malloc():

for i = 1…1000 do
img = Image.load(directory + i.to_s + “.png”)
end

Assuming that each image occupies, say, 1 MB in memory, the program
allocates roughly 1 GB of memory. Yet as far as Ruby knows, only a few
kilobytes have been requested, so no garbage collection is initiated
and the program starts chewing through all the available swap space,
even though there is plenty of garbage to collect.

That may not be a problem with your specific application, but your
overall argument that ALLOC() can generally be replaced by malloc() does
not work, because memory may not be freed in a timely fashion. It sort
of works if the malloced parts are small relative to object size,
essentially increasing the amount of garbage that can exist in the heap
by a proportional factor.

In a nutshell, use of malloc() instead of ALLOC() can break the garbage
collection heuristic.

[…]


A big flaw in your argument is that you assume that you can always take
over the duties of freeing memory yourself. That works if the allocated
memory is referenced only by a single object. If the reference gets
passed around, you’ll essentially find that you’re implementing memory
management yourself again, trying to find out if the malloc()ed memory
is still accessible.


Oh yes, because I am writing in C, memory management is part of everyday
life; it is not like that in Ruby or Java. About the references to Ruby
objects, aren’t they taken care of by the mark functions?

I am not talking about references to Ruby objects. I am talking about
several Ruby objects sharing the same structure (or part of one) on the
C side. Consider objects with copy-on-write semantics, for instance:
cloning such an object will not create a duplicate of the internal
structure, but the first attempt to modify it will.


Note that Data_Wrap_Struct() can initiate garbage collection as well,
namely in rb_newobj(). It doesn’t really make a difference.


Agreed. The only difference is, we “reduce” the frequency by which the
garbage collector is invoked. With Data_Make_Struct, the allocated memory
is the sum of the requested in ALLOC plus that in rb_newobj(); with
Data_Wrap_Struct, it is only rb_newobj() alone, which I hope is not too
large.

Your original argument was that Data_Make_Struct() is inherently unsafe;
but since Data_Wrap_Struct() can trigger a garbage collection as well,
there should be no difference in kind, just in degree. Efficiency is
another matter (note also that you may be paying the price of wasted
memory for fewer collection cycles).

		Reimer Behrends
···

William Djaja Tjokroaminata (billtj@y.glue.umd.edu) wrote:

Reimer Behrends behrends@cse.msu.edu wrote:

[Loading 1 MB images via malloc.]

The 1 GB memory is a valid memory, isn’t it? So we should not have a
problem with it, unless, of course, our computer does not have that much
of memory. I just doubt your last sentence above, “even though there is
plenty of garbage to collect”. With GC_MALLOC_LIMIT of 8000000 bytes,
isn’t that the maximum garbage to collect is 8 megabytes?

No, 99.8% of the allocated memory is garbage, since at most 2 out of the
1000 images are accessible at any time. Ruby only counts memory
allocated via ALLOC(). It does not even know about malloc(), so its
internal counter will not be affected by malloc() calls and still be at
a few K at best (for the ruby objects you have).

That’s why the GC heuristic won’t be worth much if you use malloc()
predominantly.

[…]

		Reimer Behrends
···

William Djaja Tjokroaminata (billtj@y.glue.umd.edu) wrote:

Reimer Behrends behrends@cse.msu.edu wrote:

void *
ruby_xmalloc(size)
long size;
{
void *mem;

if (size < 0) {
    rb_raise(rb_eNoMemError, "negative allocation size (or too big)");
}
if (size == 0) size = 1;
malloc_memories += size;

if (malloc_memories > GC_MALLOC_LIMIT) {
    rb_gc();
}

This means the garbage collector will automatically be called after
GC_MALLOC_LIMIT (8MB on sane platforms) bytes have been ALLOCated
(malloc_memories is reset afterwards).
But if the data you ALLOCated points to malloc()'ed data there will
be indeed a lot of garbage, and Ruby doesn’t know it.

···

On Wed, Aug 21, 2002 at 03:50:40AM +0900, William Djaja Tjokroaminata wrote:

Reimer Behrends behrends@cse.msu.edu wrote:

A problem occurs if, say, you have an Image class that allocates memory
for the pixel data using malloc():

for i = 1…1000 do
img = Image.load(directory + i.to_s + “.png”)
end

Assuming that each image occupies, say, 1 MB in memory, the program
allocates roughly 1 GB of memory. Yet as far as Ruby knows, only a few
kilobytes have been requested, so no garbage collection is initiated
and the program starts chewing through all the available swap space,
even though there is plenty of garbage to collect.
=================================================================

The 1 GB memory is a valid memory, isn’t it? So we should not have a
problem with it, unless, of course, our computer does not have that much
of memory. I just doubt your last sentence above, “even though there is
plenty of garbage to collect”. With GC_MALLOC_LIMIT of 8000000 bytes,
isn’t that the maximum garbage to collect is 8 megabytes?


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

How should I know if it works? That’s what beta testers are for. I
only coded it.
– Attributed to Linus Torvalds, somewhere in a posting

Hi,

I am sorry that I am having a difficulty in following your reasoning. For
clarity, I reposted your original Ruby code:

for i = 1…1000 do
img = Image.load(directory + i.to_s + “.png”)
end

First, are we assuming that the “Image.load” has been implemented in C? I
will assume yes.

Second, why at most 2 out of the 1000 images are accessible at any
time? What is the intent the Ruby code above? Does it really have to
load all the images at the same time, or to display one at a time
(basically alloc and free each image in sequence)? I will assume the
later.

Hmmmm, based on these assumptions, I think you are correct. The problem
would be my free functions will not be called if they are tied to the
Data_Wrap_Struct (…, …, free_func, …) and if the Ruby objects
themselves are indeed small. Very interesting. Thanks for the counter
example. Fortunately, I don’t have this kind of peculiarity in my
application. I will have to note this particular example. Again, thanks.

Regards,

Bill

···

=========================================================================
Reimer Behrends behrends@cse.msu.edu wrote:

William Djaja Tjokroaminata (billtj@y.glue.umd.edu) wrote:

Reimer Behrends behrends@cse.msu.edu wrote:
[Loading 1 MB images via malloc.]
The 1 GB memory is a valid memory, isn’t it? So we should not have a
problem with it, unless, of course, our computer does not have that much
of memory. I just doubt your last sentence above, “even though there is
plenty of garbage to collect”. With GC_MALLOC_LIMIT of 8000000 bytes,
isn’t that the maximum garbage to collect is 8 megabytes?

No, 99.8% of the allocated memory is garbage, since at most 2 out of the
1000 images are accessible at any time. Ruby only counts memory
allocated via ALLOC(). It does not even know about malloc(), so its
internal counter will not be affected by malloc() calls and still be at
a few K at best (for the ruby objects you have).

That’s why the GC heuristic won’t be worth much if you use malloc()
predominantly.

[…]

  	Reimer Behrends

BTW, I’ve been reading gc.c anf the only heuristic seems to be the
GC_MALLOC_LIMIT thing. And this constant is #define’d, but the cost of
the garbage collection per recovered bit could be controlled if
GC_MALLOC_LIMIT were dynamically adjusted.

If you’re using a lot of memory (say, in chunks of 10MB), the GC will
get called lots of times, which could be a severe speed hit. It would
make sense for GC_MALLOC_LIMIT to grow as the total used memory grows
(the rationale being that if you’re using 1GB RAM you don’t really mind
using 10MB more or less, but you don’t want the performance to plummet).

···

On Wed, Aug 21, 2002 at 05:50:56AM +0900, Reimer Behrends wrote:

That’s why the GC heuristic won’t be worth much if you use malloc()
predominantly.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Linux: Where Don’t We Want To Go Today?
– Submitted by Pancrazio De Mauro, paraphrasing some well-known sales talk

Thanks for your confirmation that “Ruby gc heuristic” is in practice
really only a simple invocation of gc when GC_MALLOC_LIMIT has been
ALLOCated.

I have been thinking about this problem last night, especially the good
(counter) example given by Reimer. Because of my philosophical “Ruby-C
separation” (let Ruby and C manage their own memories) for execution
performance reason, I came up with this solution. In my malloc wrapper
function, I will also do simple counting on the memory allocated on the
C side so far:

safe_malloc (int size)
{
    total += size;
    if (total > threshold)
    {
        rb_gc ();
        threshold += GC_MALLOC_LIMIT;
    }
    if (ptr = malloc (size))
        ....
    else
        .....

safe_free (void *ptr)
{
    total -= .....
    free (....)

I think with this malloc wrapper function, the total garbage can be
limited to 2 * GC_MALLOC_LIMIT = 16 Mbytes. Also, when the C memory
fluctuates, say, between 7 and 9 Mbytes, the Ruby gc will not be called
unnecessarily. This solution will solve the problem, won’t it? Probably
there are other counter-examples that will defeat this solution?

My particular concern with rb_gc() is that it is linear (O(N)) in the
number of Ruby objects. Therefore my philosophy is not to invoke
rb_gc() unnecessarily.

The idea of dynamic GC_MALLOC_LIMIT is very interesting. I don’t know
whether there has been some research on the algorithms for mark-and-sweep
gc. For now, I think using a reasonable constant GC_MALLOC_LIMIT value is
adequate.

Regards,

Bill

···

============================================================================
Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

Reimer Behrends behrends@cse.msu.edu wrote:

A problem occurs if, say, you have an Image class that allocates memory
for the pixel data using malloc():

for i = 1…1000 do
img = Image.load(directory + i.to_s + “.png”)
end

Assuming that each image occupies, say, 1 MB in memory, the program
allocates roughly 1 GB of memory. Yet as far as Ruby knows, only a few
kilobytes have been requested, so no garbage collection is initiated
and the program starts chewing through all the available swap space,
even though there is plenty of garbage to collect.
=================================================================

This means the garbage collector will automatically be called after
GC_MALLOC_LIMIT (8MB on sane platforms) bytes have been ALLOCated
(malloc_memories is reset afterwards).
But if the data you ALLOCated points to malloc()'ed data there will
be indeed a lot of garbage, and Ruby doesn’t know it.

I have been thinking about this problem last night, especially the good
(counter) example given by Reimer. Because of my philosophical “Ruby-C
separation” (let Ruby and C manage their own memories) for execution
performance reason, I came up with this solution. In my malloc wrapper
function, I will also do simple counting on the memory allocated on the
C side so far:

safe_malloc (int size)
{
    total += size;
    if (total > threshold)
    {
        rb_gc ();
        threshold += GC_MALLOC_LIMIT;
    }
    if (ptr = malloc (size))
        ....
    else
        .....

safe_free (void *ptr)
{
    total -= .....
    free (....)

I think with this malloc wrapper function, the total garbage can be
limited to 2 * GC_MALLOC_LIMIT = 16 Mbytes.

Even without safe_malloc, you may have more than 8MB garbage at a given
time, if you have allocated n MB and they do suddenly become garbage.

Anyway, using your solution, the total garbage will be < 16MB
if your objects are created and destroyed more or less sequentially (ie
they aren’t a lot of them alive at the same time) which I think should
happen in the network simulation you’re doing.

The code above seems to indicate that the threshold will grow when
needed, but then again this is OK IMO as that’s what I think Ruby should
be doing :slight_smile: You’re in fact implementing the dynamic GC_MALLOC_LIMIT
thing…

Example:

  • do some safe_malloc() until total > (GC_MALLOC_LIMIT = 8MB);
    rb_gc is called, but imagine there’s no garbage yet
  • then threshold is increased and is now 16 MB
  • allocate with safe_malloc, say, 5MB more (13MB+ are allocated)
  • safe_free() all the data
  • allocate with safe_malloc(); rb_gc won’t be called until you’ve
    total > 16MB this time
  • those total bytes may become garbage at a later time

Also, when the C memory fluctuates, say, between 7 and 9 Mbytes, the
Ruby gc will not be called unnecessarily. This solution will solve the
problem, won’t it? Probably there are other counter-examples that will
defeat this solution?

I think it should do the trick.

The idea of dynamic GC_MALLOC_LIMIT is very interesting. I don’t know
whether there has been some research on the algorithms for mark-and-sweep
gc. For now, I think using a reasonable constant GC_MALLOC_LIMIT value is
adequate.

Suppose the amount of reachable data is R, and H is the total heap size
(ie, R + garbage). In one run of the GC, you can free up to (H - R),
that is, the garbage at that point.

The cost of garbage collection involves 2 things:

  • marking: O(R) => more or less O(number of objects)
  • sweeping: O(H)

We can thus write it as
c * R + d * H

The amount of garbage thrown away is H - R, thus, the cost per amount of
garbage reclaimed is
c * R + d * H

···

On Wed, Aug 21, 2002 at 10:52:10PM +0900, William Djaja Tjokroaminata wrote:
---------------
H - R

The cost is fairly high if H is close to R (this is the case when the
gc is triggered when there’s little garbage compared to the total data).

To lower it, given some R, we’ll want H to be greater (and so will be
H-R), that is to have more garbage collected per run.

This is why IMO the gc should be less and less frequent (where “time”
corresponds to ALLOCation) as the heap size grows.

You can also change the current mem/speed tradeoff by setting
GC_MALLOC_LIMIT to a higher value (this way the amortized cost of gc
will be lower but there will be more garbage and each run will be
slower).


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

<|ryan|> I don’t use deb
u poor man
netgod: heh
apt-get install task-p0rn

The sweep must check all the objects to find which ones haven’t been
marked. H is the set of all objects (both alive and dead == garbage),
whereas the objects in R are only those that are live.
Thus one part of the cost is proportional to the number of live objects
(marking) and the other to the total number of objects (sweeping). It
seems I didn’t state it clearly and you took R for H. The way Ruby is
implemented, the memory in use by the process (heap) is that of the total
number of objects, but not more, ie, there isn’t an arena of free memory.
(see below for some notes on Ruby’s GC implementation)

d could be made smaller than c (which involves recursion or pointer
reversal) if you didn’t care about memory fragmentation (it would only
involve updating a linked list), but as Ruby uses malloc instead of
managing its own arena, freeing the memory will probably be costlier
on the average. In fact in the book I mostly took this from, they gave
c = 10 ins. and d = 3 for “some computer”, but without taking care of memory
fragmentation. (R and H given in words)
Anyway it isn’t very fair to consider the cost of free() as you’d have
to do it if you sticked to malloc()/free().

This is another area where being able to define Ruby’s behavior would be
great for some tasks, going even further in the mem/speed tradeoff than
malloc…

Ruby’s GC

···

On Thu, Aug 22, 2002 at 05:33:34AM +0900, William Djaja Tjokroaminata wrote:

The idea of amortized cost of garbage collection is very interesting, but
probably our metric of interest for execution speed is only the total cost
itself:

 cost = c * R + d * H    = O(R)

(One question: why sweeping is O(H) and not O(R)? Is the assumption
checking the marking bit is much cheaper than freeing the memory itself?)

=========
(I’ve spent some time reading gc.c to write this)

Ruby’s heap is in fact an array (named heaps) of RVALUEs (well, actually
an array of RVALUEs arrays :slight_smile: on which all the GC is performed. There’s
a list of free positions in heaps, and its grows as needed exponentially
(increment *= 1.8 each time it’s full).
Each RVALUE takes about 20bytes.

The RVALUEs point to the “real” data of the object, which is allocated
with malloc and free()'d when done. (talking about objects made with
Data_*_Struct)

				THIS IS WHAT WOULD NORMALLY BE
				CALLED HEAP

taken care of by Ruby, allocated with malloc
heaps can be larger than the only as much as needed
number of objects and there’s to hold the total num of objects
a list of free RVALUEs. there’s no free mem available
> here, it must be allocated each
> time there’s a new object with
> malloc


RVALUE* heaps

heaps[1]		points to	
1	RVALUE  		 --> data for heaps[1][0]
2	RVALUE  		 --> data for heaps[1][1]
3	RVALUE			 --> this one is free, so
					there's no memory
					associated
	...
heaps[2]
1	RVALUE   		--> data for heaps[2][0]
2	RVALUE			--> FREE
3	RVALUE			--> FREE


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

/*

  • Buddy system. Hairy. You really aren’t expected to understand this

*/
– From /usr/src/linux/mm/page_alloc.cA

Thanks a lot, Mauricio, for the comprehensive treatment of the
subject. It is very good to know what is going on in the background and
it can be used to influence the design decision, especially when execution
performance is the objective. It just surprised me that at the most basic
level Ruby calls malloc() and free() directly instead of managing its own
memory arena.

Regards,

Bill

···

=========================================================================
Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

(deleted)

d could be made smaller than c (which involves recursion or pointer
reversal) if you didn’t care about memory fragmentation (it would only
involve updating a linked list), but as Ruby uses malloc instead of
managing its own arena, freeing the memory will probably be costlier
on the average. In fact in the book I mostly took this from, they gave
c = 10 ins. and d = 3 for “some computer”, but without taking care of memory
fragmentation. (R and H given in words)
Anyway it isn’t very fair to consider the cost of free() as you’d have
to do it if you sticked to malloc()/free().

This is another area where being able to define Ruby’s behavior would be
great for some tasks, going even further in the mem/speed tradeoff than
malloc…

(deleted)