Hopefully some of you are trying and enjoying Ruby 2.5 by now. I figured I’d write about some changes to C Ruby over the years which make it easier to reduce memory use.
String objects are often to blame for high memory usage in Ruby applications. High memory usage limits scalability and hurts performance by increasing memory traffic (GC overhead and general access times).
Frozen string literals have been proposed as the default in Ruby 3 but I remain against them for compatibility. Meanwhile, Ruby has gained some transparent optimizations along with some syntactic improvements to help programmers reduce overheads further.
The String#-@
method was introduced back in Ruby 2.3 as syntactic sugar for making frozen strings more succinctly than String#freeze:
# https://bugs.ruby-lang.org/issues/11782
-"this string is frozen" # became equivalent to:
"this string is frozen".freeze # from Ruby 2.2 and earlier
Starting with Ruby 2.5, the same String#-@
method will deduplicate non-frozen strings:
# https://bugs.ruby-lang.org/issues/13077
original = -"this string is frozen"
dynamic = -%w(this string is frozen).join(' ')
# original.object_id == dynamic.object_id
Furthermore, writing -"literal"
avoids allocation in the first place in 2.5, just like "literal".freeze
since Ruby 2.1. So, if your code only needs to support Ruby 2.3+, you can start using String#-@
and your 2.5 users can benefit from more optimizations without relying on more fragile file-wide (or process-wide) frozen string literals.
However, there’s several places where you do not need to worry about allocations because the VM does it for you!
Hash keys
When given a non-frozen String as a hash key, Ruby transparently duplicates and freezes the key to avoid data corruption in case the original string is mutated [ruby-core:35410].
In the old days, frozen constants were used in some code bases (e.g. mongrel) to reduce overhead from common hash keys. This practice lives on in some places, but is no longer necessary for
the majority of cases. In fact, unnecessarily referencing constants adds some memory overhead in the bytecode for inline caching.
Since Ruby 2.1, using a string literal for Hash#[]
and Hash#[]=
, and creating hash literals do not allocate new memory for keys.
In other words, there’s no benefit in writing any of the following:
foo = { "key".freeze => nil } # unnecessary freeze
foo["a".freeze] = true # unnecessary freeze
foo["b".freeze] # unnecessary freeze
They are equivalent to the following, in all versions of Ruby:
foo = { "key" => nil }
foo["a"] = true
foo["b"]
Note: this optimization does not apply to Hash subclasses.
Furthermore, starting with Ruby 2.5, all untainted Strings used as Hash keys are transparently duplicated to the frozen copy as long as there’s an identical reference to it in the source code.
Unfortunately, this does not help with tainted strings which come from most parsers, yet. But since hardly anybody cares about tainting in keys or at all, I’ve proposed to have it removed in 2.6: https://bugs.ruby-lang.org/issues/14225
case/when statements
Since Ruby 1.9.3, string literals in case/when clauses are transparently frozen and deduplicated since Ruby 2.1: https://bugs.ruby-lang.org/issues/5000
Semi-automatic memory management
(Perhaps a controversial topic)
String#clear
exists since Ruby 1.9.1 and immediately releases memory allocated from malloc(3)
. I use this to reduce memory pressure and improve locality when working with large buffers.
In the C source code of Ruby, you will also find many uses of rb_str_resize(str, 0)
to clear buffers.
I don’t know if this can be improved for out-of-the-box Ruby users; and I don’t know how some Rubyists feel about uglifying code to reduce resource usage.
That’s all I can think of for now, thanks for reading.
Footnotes: