[ANN] mwrap - LD_PRELOAD malloc wrapper + line stats for Ruby

mwrap is designed to answer the question:

   Which lines of Ruby are hitting malloc the most?

mwrap wraps all malloc, calloc, and realloc calls to trace the Ruby
source location of such calls and bytes allocated at each callsite.
This functionality may be expanded in the future.

It does not track allocation lifetimes, or frees, however. It works
best for allocations under GVL, but tries to track numeric caller
addresses for allocations made without GVL so you can get an idea of how
much memory usage certain extensions and native libraries use.

It requires the concurrent lock-free hash table from the
Userspace RCU project: https://liburcu.org/

It does not require recompiling or rebuilding Ruby, but only supports
Ruby trunk (2.6.0dev+) on a few platforms:

* GNU/Linux
* FreeBSD (tested 11.1)

It may work on NetBSD, OpenBSD and DragonFly BSD.

== Install

  # FreeBSD: pkg install liburcu

  # Debian-based systems: apt-get liburcu-dev

  # Install mwrap via RubyGems.org
  gem install mwrap

== Usage

mwrap works as an LD_PRELOAD and supplies a mwrap RubyGem executable to
improve ease-of-use. You can set dump_path: in the MWRAP environment
variable to append the results to a log file:

  MWRAP=dump_path:/path/to/log mwrap RUBY_COMMAND

  # And to display the locations with the most allocations:
  sort -k1,1rn </path/to/log | $PAGER

You may also `require "mwrap"' in your Ruby code and use
Mwrap.dump, Mwrap.clear, Mwrap.reset, Mwrap.each, etc.

However, mwrap MUST be loaded via LD_PRELOAD to have any
effect in tracking malloc use.

The output of the mwrap dump is a text file with 3 columns:

  total_bytes call_count location

Where location is a Ruby source location (if made under GVL)
or an address retrieved by backtrace_symbols(3). It is
recommended to use the sort(1) command on either of the
first two columns to find the hottest malloc locations.

== Known problems

* 32-bit machines are prone to overflow (WONTFIX)

== Mail archives and list:

  https://80x24.org/mwrap-public/
  nntp://80x24.org/inbox.comp.lang.ruby.mwrap

No subscription will ever be required to post, but HTML mail
will be rejected:

    mwrap-public@80x24.org

== Hacking

  git clone https://80x24.org/mwrap.git

Send all patches and pull requests (use "git request-pull" to format) to
the mailing list. We do not use centralized or proprietary messaging
systems.

== License

GPL-2.0+ <https://www.gnu.org/licenses/gpl-2.0.txt>

···

--
https://80x24.org/mwrap/README.html
https://80x24.org/mwrap/Mwrap.html

Awesome Eric!

I just ran this on a simple "Discourse boot" and got:

https://gist.githubusercontent.com/SamSaffron/220910a8fb7abd1226c3e9eb0c447ef4/raw/53af7bb83c4ebe58e51b0cee862048b2e47995ad/sorted.txt

It is fascinating to see absolute complete accounting with lines like:

301056 147 /usr/lib/x86_64-linux-gnu/libpq.so.5(+0x1324a) [0x7efd84d3824a]

This is information that is hidden from
`ObjectSpace#trace_object_allocations`. Complete accounting opens up a
whole bunch of possibilities, we could for example compare with Ruby
accounting and find memory that is not being reported to Ruby
properly.

Is there any chance we can have a mode with "free" tracking as well?
That way we can enable this on a long running process to detect leaks?

It also makes c level leaks way easier to debug, cause we can listen
on a signal and dump a log on it. Then comparing 2 logs can give us a
leak very very easily. (we could add mwrap --compare dump1 dump2 to
ease this usage)

Also per:

       344809221 1070570
/home/sam/.rbenv/versions/master/lib/ruby/gems/2.6.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:283
        39139224 26553
/home/sam/.rbenv/versions/master/lib/ruby/gems/2.6.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:277

I wonder if there is any way we can get slightly better reporting
here, I am guessing this is just around loading of code, but I wonder
if we could break that down in a cleaner way here?

Awesome Eric!

I just ran this on a simple "Discourse boot" and got:

https://gist.githubusercontent.com/SamSaffron/220910a8fb7abd1226c3e9eb0c447ef4/raw/53af7bb83c4ebe58e51b0cee862048b2e47995ad/sorted.txt

It is fascinating to see absolute complete accounting with lines like:

301056 147 /usr/lib/x86_64-linux-gnu/libpq.so.5(+0x1324a) [0x7efd84d3824a]

This is information that is hidden from
`ObjectSpace#trace_object_allocations`. Complete accounting opens up a
whole bunch of possibilities, we could for example compare with Ruby
accounting and find memory that is not being reported to Ruby
properly.

Glad you've found that useful! Took me a while to figure out
that part out and I'm hoping to make it more informational
(steal from addr2line.c in ruby)

Is there any chance we can have a mode with "free" tracking as well?
That way we can enable this on a long running process to detect leaks?

Maybe, but it might get a lot more expensive to be useful...
Merely counting frees and associating them with source lines
like mwrap currently does with *allocs would not be useful,
since GC can call free from just about anywhere.

So we'd have to try per-allocation accounting to know for sure
and it could get time consuming.

Right now, allocations from the same location just increment
counters and there's no per-allocation overhead after a callsite
is hit the first time. So it's not excessive overhead at the
moment... Noticeable overhead, but probably usable in
production.

If it ends up being as expensive as the leak checking done by
valgrind, I'm afraid it would lose usefulness in production
environments.

Neglecting "free" also simplifies the code a lot, as I cheat
a little by ignoring most allocations made by liburcu and mwrap
itself :slight_smile: Wrapping free/cfree means we'd have to add extra
accounting info for all those allocations, too; in addition
to having to wrap all the memalign functions and ensure proper
alignment.

It also makes c level leaks way easier to debug, cause we can listen
on a signal and dump a log on it. Then comparing 2 logs can give us a
leak very very easily. (we could add mwrap --compare dump1 dump2 to
ease this usage)

Yes, if we add per-allocation tracking, storing the
rb_gc_count() result for each malloc count could be useful to
track the relative age of each malloc-ed region.

For now, probably calling Mwrap.dump periodically is prudent,
(it releases GVL, so you can probably dedicate a thread to it),
and sorting on the location column + "diff -u" seems to be enough.

Also per:

       344809221 1070570
/home/sam/.rbenv/versions/master/lib/ruby/gems/2.6.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:283
        39139224 26553
/home/sam/.rbenv/versions/master/lib/ruby/gems/2.6.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:277

I wonder if there is any way we can get slightly better reporting
here, I am guessing this is just around loading of code, but I wonder
if we could break that down in a cleaner way here?

Yes, I noticed a problems with rubygems/core_ext/kernel_require.rb,
too. It seems to be a limitation of rb_source_location_cstr and
the result of VM optimizations. The usual trade-off between
diagnostic details and performance.

I tried using the "caller_locations" method via rb_funcall, but
that creates new objects and even calling rb_gc_disable() inside
the malloc() wrapper caused GC failures.

···

Sam Saffron <sam.saffron@gmail.com> wrote:

It does not require recompiling or rebuilding Ruby, but only supports
Ruby trunk (2.6.0dev+)

Btw, I fixed a rare crash at thread creation in Ruby trunk r63836:
https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=63836

r63837 also solves the same problem, but may affect other
potential callers:
https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=63837

> Is there any chance we can have a mode with "free" tracking as well?
> That way we can enable this on a long running process to detect leaks?

Maybe, but it might get a lot more expensive to be useful...
Merely counting frees and associating them with source lines
like mwrap currently does with *allocs would not be useful,
since GC can call free from just about anywhere.

Definitely more expensive, but it's still usable:

   [PATCH 0/19] the heavy version of mwrap - Eric Wong

TL;DR: live demo of the new features running inside a Rack app:

  https://80x24.org/MWRAP/each/2000

The following changes since commit 834de3bc0da4af53535d5c9d4975e546df9fb186:

  bin/mwrap: support LISTEN_FDS env from systemd (2018-07-16 19:33:12 +0000)

are available in the Git repository at:

  LD_PRELOAD malloc wrapper + line stats for Ruby heavy

for you to fetch changes up to c432e3ad30aa247dbac8575af87b0c594365d3fd:

Cloning from that git URL runs it through the same process which
is running /MWRAP/. (I will eventually replace cgit on
80x24.org with repobrowse <https://80x24.org/repobrowse/README&gt;\)

So we'd have to try per-allocation accounting to know for sure
and it could get time consuming.

So depending on the application, it's up to 50% more memory
intensive and 50% slower. Though likely cheaper than most leak
checkers (definitely cheaper than valgrind's leakchecker).

memalign functions (used by CoW-friendly Ruby GC) ends up
causing massive fragmentation as a result, because we can't reuse
the the overallocated part like a full malloc implementation
can. And I even disabled tracking allocations made by them by
default, since it can be noisy and misleading when hit in cold
paths.

Yes, if we add per-allocation tracking, storing the
rb_gc_count() result for each malloc count could be useful to
track the relative age of each malloc-ed region.

Yes, that's done, and the /at/$LOCATION endpoint lets us see
the size and rb_gc_count value at the time of allocation
for each live allocation:

https://80x24.org/MWRAP/at//home/rb/lib/ruby/gems/2.6.0/gems/rack-2.0.5/lib/rack/deflater.rb:77

(note the double slash :<)

···

Eric Wong <e@80x24.org> wrote:

Sam Saffron <sam.saffron@gmail.com> wrote: