More on memory and possible leaks ... (longish)

Hi ..

This is getting tough to track down!

The basics are that my script is multi-threaded (potentially 20-30 threads),
running remote data collection using telnet, for up to 300 nodes. Some state
information (basic info about the remote node) is stored using hashes within
the class though all of the "real" data is spooled to disk for later
processing.

Here are some facts, not necessarily all directly related to the problems I am
seeing:

1. I am using -lots- of memory under Solaris, without any exceptions being
thrown (approx 200Mb per-thread reported for the full application). When I
trim everything down to do just login with no state saving, I am using about
12Mb per thread. This seems to be very high, though I am not really sure
whether that amount of usage is or not. Perhaps it is in keeping with Ruby's
thread model.

2. When I use a fork() model, the amount of memory used per-process is around
20Mb. Again, this seems to be quite heavy, however, memory usage doubles to
40Mb when I capture data in the hashes! Now, I am -not- gathering a lot of
data here:

      ### process card data
      if line =~ /^\s+([0-9]+)\s+(.*)$/ then
        @cards[$1] = $2
      end

as an example. There are at a maximum 14 entries like this. 20Mb+ seems a
lot of memory to use!

3. I don't have, yet, a copy of Purify to check for leaks under Solaris, so,
I am using valgrind under FreeBSD. valgrind is crashing a lot due to the
threads and sockets, but that is another issue :frowning:

4. Before valgrind dies, it is reporting the following (many of these errors,
I am just picking out the unique ones):

11:55 (kant)$ vgm --run-libc-freeres=no ruby test.rb
==26384== Memcheck, a memory error detector for x86-linux.
==26384== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward.
==26384== Using valgrind-2.1.2.CVS, a program supervision framework for
x86-linux.
==26384== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward.
==26384== For more details, rerun with: -v
==26384==
==26384== Conditional jump or move depends on uninitialised value(s)
==26384== at 0x806EB35: is_pointer_to_heap (gc.c:605)
==26384== by 0x806EAFB: mark_locations_array (gc.c:623)
==26384== by 0x806FD2A: garbage_collect (gc.c:1352)
==26384== by 0x806E7BF: rb_newobj (gc.c:381)
==26384==
==26384== Conditional jump or move depends on uninitialised value(s)
==26384== at 0x806EB7D: is_pointer_to_heap (gc.c:610)
==26384== by 0x806EAFB: mark_locations_array (gc.c:623)
==26384== by 0x806FD2A: garbage_collect (gc.c:1352)
==26384== by 0x806E7BF: rb_newobj (gc.c:381)
==26384==
==26384== Conditional jump or move depends on uninitialised value(s)
==26384== at 0x806EDBE: rb_special_const_p (ruby.h:666)
==26384== by 0x806ED12: gc_mark (gc.c:712)
==26384== by 0x806EB11: mark_locations_array (gc.c:624)
==26384== by 0x806EBFD: rb_gc_mark_locations (gc.c:637)
==26384==
==26384== Use of uninitialised value of size 4
==26384== at 0x806ED22: gc_mark (gc.c:713)
==26384== by 0x806EB11: mark_locations_array (gc.c:624)
==26384== by 0x806EBFD: rb_gc_mark_locations (gc.c:637)
==26384== by 0x806FD3E: garbage_collect (gc.c:1354)
==26384==
==26384== Use of uninitialised value of size 4
==26384== at 0x806ED2F: gc_mark (gc.c:714)
==26384== by 0x806EB11: mark_locations_array (gc.c:624)
==26384== by 0x806EBFD: rb_gc_mark_locations (gc.c:637)
==26384== by 0x806FD3E: garbage_collect (gc.c:1354)
==26384==
==26384== Use of uninitialised value of size 4
==26384== at 0x806ED40: gc_mark (gc.c:715)
==26384== by 0x806EB11: mark_locations_array (gc.c:624)
==26384== by 0x806EBFD: rb_gc_mark_locations (gc.c:637)
==26384== by 0x806FD3E: garbage_collect (gc.c:1354)
==26384==
==26384== Conditional jump or move depends on uninitialised value(s)
==26384== at 0x806EE4F: gc_mark_children (gc.c:756)
==26384== by 0x806EDA9: gc_mark (gc.c:729)
==26384== by 0x806EB11: mark_locations_array (gc.c:624)
==26384== by 0x806EBFD: rb_gc_mark_locations (gc.c:637)
==26384==

These errors appear to be pointing at use of unprepared memory by the gc,
which may or may not be an issue. Perhaps someone with knowledge of that
part of ruby would care to comment.

5. There doesn't appear to be an easy way of tracking memory usage /
allocations within ruby. Are there any plans on adding something that might
help? Or is it all too system dependant?

So, the bottom line is that I haven't yet determined if there is any memory
leaks happening. However, I do seem to be using a lot of memory that, maybe,
shouldn't be used. My problem happens when I multiply these small number,
like 20Mb, by 300. Then I start to run into real problems.

Any thoughts, fellow rubyists?

Regards,

···

--
-mark. (probertm at acm dot org)

Software Verification will be providing a Memory inspection tool for
Ruby. It will initially be available on Windows. Initial research work
has been completed. The software tool will show allocations sites,
references, garbage collections, loiterers, cycles etc - all the stuff
you need to identify problem areas in Ruby code. Implementation will be
later this year. I can't give you a date at this time because I don't
know one.

        http://www.softwareverify.com

Stephen

···

In message <200502141233.34120.probertm@acm.org>, Mark Probert <probertm@acm.org> writes

5. There doesn't appear to be an easy way of tracking memory usage /
allocations within ruby. Are there any plans on adding something that might
help? Or is it all too system dependant?

--
Stephen Kellett
Object Media Limited http://www.objmedia.demon.co.uk
RSI Information: http://www.objmedia.demon.co.uk/rsi.html

Mark Probert <probertm@acm.org> writes:

This is getting tough to track down!

The basics are that my script is multi-threaded (potentially 20-30 threads),
running remote data collection using telnet, for up to 300 nodes. Some state
information (basic info about the remote node) is stored using hashes within
the class though all of the "real" data is spooled to disk for later
processing.

Here are some facts, not necessarily all directly related to the problems I am
seeing:

1. I am using -lots- of memory under Solaris, without any exceptions being
thrown (approx 200Mb per-thread reported for the full application).
When I trim everything down to do just login with no state saving, I
am using about 12Mb per thread. This seems to be very high, though
I am not really sure whether that amount of usage is or not.
Perhaps it is in keeping with Ruby's thread model.

I once had a server program that grew continuously to the point of
memory exhaustion until I wrote my own tempfile.rb. Somehow
tempfile's use of delegate.rb caused the script to use gobs of memory.
I never tracked that down, but at the time I remember something about
some internal Ruby data structures never being collected (symbols come
to mind).

2. When I use a fork() model, the amount of memory used per-process is around
20Mb. Again, this seems to be quite heavy, however, memory usage doubles to
40Mb when I capture data in the hashes! Now, I am -not- gathering a lot of
data here:

      ### process card data
      if line =~ /^\s+([0-9]+)\s+(.*)$/ then
        @cards[$1] = $2
      end

as an example. There are at a maximum 14 entries like this. 20Mb+ seems a
lot of memory to use!

Very odd. Can you isolate this expensive use of memory into a stand
alone script?

3. I don't have, yet, a copy of Purify to check for leaks under Solaris, so,
I am using valgrind under FreeBSD. valgrind is crashing a lot due
to the threads and sockets, but that is another issue :frowning:

[...]

4. Before valgrind dies, it is reporting the following (many of these errors,
I am just picking out the unique ones):

[...]

Purify/Valgrind is not so useful for Ruby, since Ruby's GC confuses them.

5. There doesn't appear to be an easy way of tracking memory usage /
allocations within ruby. Are there any plans on adding something that might
help? Or is it all too system dependant?

[...]

Telling more about your ruby version, and isolating the problem down
to the smallest script that exhibits the problem are both useful.

···

--
matt

Mark Probert <probertm@acm.org> writes:

1. I am using -lots- of memory under Solaris, without any exceptions being
thrown (approx 200Mb per-thread reported for the full application). When I
trim everything down to do just login with no state saving, I am using about
12Mb per thread. This seems to be very high, though I am not really sure
whether that amount of usage is or not. Perhaps it is in keeping with Ruby's
thread model.

    Do you use explicit "GC.start"s in your code? I've had similar memory
    consumption problem in the past with telnet as well and memory was freed.
    when put GC.start to data collecting loop. However I didn't use
    threads so my advice may be worhtless for you

- Ville

Hi ..

Mark Probert <probertm@acm.org> writes:

I once had a server program that grew continuously to the point of
memory exhaustion until I wrote my own tempfile.rb. Somehow
tempfile's use of delegate.rb caused the script to use gobs of memory.
I never tracked that down, but at the time I remember something about
some internal Ruby data structures never being collected (symbols come
to mind).

Ok, thanks.

> 2. When I use a fork() model, the amount of memory used per-process is
> around 20Mb. Again, this seems to be quite heavy, however, memory usage
> doubles to 40Mb when I capture data in the hashes!

Very odd. Can you isolate this expensive use of memory into a stand
alone script?

Not really. The script operates in a fairly complex environment. Its job is
to telnet to network elements, login, check to see if it is okay to run by
running a few commands and examining their output, then gather some other
basic data from the nodes. I don't seem to be able to replicate it that
easily by using standard socket or telnet. I'll try and build a simulator in
the next few days and see if that will help replication.

Purify/Valgrind is not so useful for Ruby, since Ruby's GC confuses them.

Ok. Do you know of a way that memory can be tracked? Surely there is
something that works for a GC language like Smalltalk or C# that could be
adapted for Ruby?

Telling more about your ruby version, and isolating the problem down
to the smallest script that exhibits the problem are both useful.

The ruby version is 1.8.x, and, unfortunately, I have the smallest possible
script running. The BSN class is not much use to anybody who doesn't have
access to a whole bunch of the network elements in question, though.

···

On Monday 14 February 2005 15:58, Matt Armstrong wrote:

--
-mark. (probertm at acm dot org)