I have been thinking about this problem last night, especially the good
(counter) example given by Reimer. Because of my philosophical “Ruby-C
separation” (let Ruby and C manage their own memories) for execution
performance reason, I came up with this solution. In my malloc wrapper
function, I will also do simple counting on the memory allocated on the
C side so far:
safe_malloc (int size)
{
total += size;
if (total > threshold)
{
rb_gc ();
threshold += GC_MALLOC_LIMIT;
}
if (ptr = malloc (size))
....
else
.....
safe_free (void *ptr)
{
total -= .....
free (....)
I think with this malloc wrapper function, the total garbage can be
limited to 2 * GC_MALLOC_LIMIT = 16 Mbytes.
Even without safe_malloc, you may have more than 8MB garbage at a given
time, if you have allocated n MB and they do suddenly become garbage.
Anyway, using your solution, the total garbage will be < 16MB
if your objects are created and destroyed more or less sequentially (ie
they aren’t a lot of them alive at the same time) which I think should
happen in the network simulation you’re doing.
The code above seems to indicate that the threshold will grow when
needed, but then again this is OK IMO as that’s what I think Ruby should
be doing
You’re in fact implementing the dynamic GC_MALLOC_LIMIT
thing…
Example:
- do some safe_malloc() until total > (GC_MALLOC_LIMIT = 8MB);
rb_gc is called, but imagine there’s no garbage yet
- then threshold is increased and is now 16 MB
- allocate with safe_malloc, say, 5MB more (13MB+ are allocated)
- safe_free() all the data
- allocate with safe_malloc(); rb_gc won’t be called until you’ve
total > 16MB this time
- those total bytes may become garbage at a later time
Also, when the C memory fluctuates, say, between 7 and 9 Mbytes, the
Ruby gc will not be called unnecessarily. This solution will solve the
problem, won’t it? Probably there are other counter-examples that will
defeat this solution?
I think it should do the trick.
The idea of dynamic GC_MALLOC_LIMIT is very interesting. I don’t know
whether there has been some research on the algorithms for mark-and-sweep
gc. For now, I think using a reasonable constant GC_MALLOC_LIMIT value is
adequate.
Suppose the amount of reachable data is R, and H is the total heap size
(ie, R + garbage). In one run of the GC, you can free up to (H - R),
that is, the garbage at that point.
The cost of garbage collection involves 2 things:
- marking: O(R) => more or less O(number of objects)
- sweeping: O(H)
We can thus write it as
c * R + d * H
The amount of garbage thrown away is H - R, thus, the cost per amount of
garbage reclaimed is
c * R + d * H
···
On Wed, Aug 21, 2002 at 10:52:10PM +0900, William Djaja Tjokroaminata wrote:
---------------
H - R
The cost is fairly high if H is close to R (this is the case when the
gc is triggered when there’s little garbage compared to the total data).
To lower it, given some R, we’ll want H to be greater (and so will be
H-R), that is to have more garbage collected per run.
This is why IMO the gc should be less and less frequent (where “time”
corresponds to ALLOCation) as the heap size grows.
You can also change the current mem/speed tradeoff by setting
GC_MALLOC_LIMIT to a higher value (this way the amortized cost of gc
will be lower but there will be more garbage and each run will be
slower).
–
_ _
__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_
_ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com
<|ryan|> I don’t use deb
u poor man
netgod: heh
apt-get install task-p0rn