Making Ruby faster

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
* http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
* http://t-a-w.blogspot.com/2007/02/making-ruby-faster.html

···

--
Tomasz Wegrzanowski [ http://t-a-w.blogspot.com/ ]

I think you need to post that to ruby-lang, actually. Sounds very
interesting though.

···

On 2/20/07, Tomasz Wegrzanowski <tomasz.wegrzanowski@gmail.com> wrote:

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
* http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
* taw's blog: Making Ruby faster

--
Tomasz Wegrzanowski [ http://t-a-w.blogspot.com/ ]

--
Giles Bowkett
http://www.gilesgoatboy.org

http://gilesgoatboy.blogspot.com

Great article. The optimization of 'switch' is cute.
And the optimization of 'fix_equal' is awesome.

···

On 2/21/07, Tomasz Wegrzanowski <tomasz.wegrzanowski@gmail.com> wrote:

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
* http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
* taw's blog: Making Ruby faster

--
Simon Strandgaard
http://opcoders.com/

Definitely a great read, I cannot judge the technical quality in depth
but for what I understand this is reasonable stuff.
I got a stupid question too, did you try the debugger after using -O3?

However you should post this to ruby-core if you hope for your patches
to be accepted.
It might not be worth to do so however in the advent of Ruby2 and the
small improvement.
But I am an ignorant so just try your luck there :wink:

Again this is good stuff even if it might not be adopted.

Cheers
Robert

···

On 2/21/07, Tomasz Wegrzanowski <tomasz.wegrzanowski@gmail.com> wrote:

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
* http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
* taw's blog: Making Ruby faster

--
Tomasz Wegrzanowski [ http://t-a-w.blogspot.com/ ]

--
We have not succeeded in answering all of our questions.
In fact, in some ways, we are more confused than ever.
But we feel we are confused on a higher level and about more important things.
-Anonymous

3% is a modest improvment. Let's see what can be
achieved by switching to a language that's designed
to be fast.

            Ruby Lua LuaJIT

···

On Feb 20, 11:32 pm, "Tomasz Wegrzanowski" <tomasz.wegrzanow...@gmail.com> wrote:

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
*http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
*taw's blog: Making Ruby faster

--
Tomasz Wegrzanowski [http://t-a-w.blogspot.com/\]

------------------------------
tarai 0% 408% 1918%
fib 0% 497% 2874%
mandelbrot 0% 920% 1347%

The source:

local function tarai( x, y, z )
  if x <= y then
    return y
  else
    return tarai( tarai(x-1, y, z),
                  tarai(y-1, z, x),
                  tarai(z-1, x, y))
  end
end

tarai(12, 6, 0)

#####

local function fib( n )
  if n < 3 then
    return 1
  else
    return fib(n-1) + fib(n-2)
  end
end

#####

local sqrt = math.sqrt
local insert = table.insert

local function c_mul( c1, c2 )
  local a,b = unpack(c1)
  local c,d = unpack(c2)
  return { a*c - b*d, a*d + c*b }
end

local function c_abs( c )
  local a, b = unpack( c )
  return sqrt( a*a + b*b )
end

local function is_mandelbrot( z )
  for i = 1, 100 do
    z = c_mul( z, z )
    if c_abs( z ) > 2 then
      return false
    end
  end
  return true
end

ary = {}

for xx = 0, 100 do
  for yy = 0, 100 do
    local x, y = xx / 50.0, yy / 50.0
    local c = {x, y}
    if is_mandelbrot( c ) then
      insert( ary, c )
    end
  end
end

Looks cool. Couple of comments...

1) if-then-else-switch... Not clear to me why it would be faster than
the switch. Also, most of that stuff is gone in ruby1.9. Seems you
were using ruby1.8, right? You should probably redo this patch
against yarv (see vm.c, for example).

2) Fixnum improvements. Kind of a good idea, but maybe a problematic
implementation. FIX2LONG() and similar are macros that allow hiding
the implementation details. This could allow in the future to have
Ruby treat Fixnums as true real objects (like Python), without
changing the source code. Currently, yes, they are pretty inefficient
as they do a shift each time, even when unneeded.
Removing the use of the macro may not be such a good idea, as
fix_equal like you have written it could break as soon as Fixnums are
no longer represented as longs, but as a class. You might work around
it by creating your own pseudo FIX2LONG_SANS_SHIFT() macro. That way,
if stuff changes, it will become obvious, and changing the macro to be
== to FIX2LONG() is all it should take.
Also, if I read your code correctly, fix_equal() as you presented it
is slightly different in that it does not check that the pointers are
the same when it is not a fixnum (this means something like a == a may
perform slower -- albeit I'll admit that is a weird case). I would
probably ask in the dev list why that check was needed in the first
place.

3) Judy is in general not faster than a hash lookup. It is just more
memory efficient.
   Ruby might benefit from a better hash, thou, like google's
dense_hash_map (albeit it is C++ code, which Matz is
   against).

4) You should probably make a separate patch for each improvement you
made, instead of all of them together. That will make it easier to
merge (and evaluate) them in case one of them is rejected but others
accepted.

Other than that, pretty cool. You should keep working on it and join
the ruby dev list to start merging your changes.
Here's also the docs for sending patches:
http://www.ruby-lang.org/en/community/ruby-core/

···

On 21 feb, 02:32, "Tomasz Wegrzanowski" <tomasz.wegrzanow...@gmail.com> wrote:

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
*http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
*taw's blog: Making Ruby faster

--
Tomasz Wegrzanowski [http://t-a-w.blogspot.com/\]

I never use C debugger for single-stepping. If I want to know what's
happening inside
a program I usually printf or strace or efence or ltrace or fenris or objdump or
LD_PRELOAD or something like that.
The only use I have for C debugger is backtrace printing with segfaults.
It works fine with -O3, you just need to pass -g option instead of
-fomit-frame-pointer
(on some platforms -O* causes -fomit-frame-pointer, but not on x86).
In any case, there's not much difference between -O and -O3 when it comes
to debugging - any optimizations make single-stepping difficult.

···

On 21/02/07, Robert Dober <robert.dober@gmail.com> wrote:

I got a stupid question too, did you try the debugger after using -O3?

--
Tomasz Wegrzanowski [ http://t-a-w.blogspot.com/ ]

Hi,

···

In message "Re: Making Ruby faster" on Thu, 22 Feb 2007 11:25:10 +0900, "William James" <w_a_x_man@yahoo.com> writes:

3% is a modest improvment. Let's see what can be
achieved by switching to a language that's designed
to be fast.

           Ruby Lua LuaJIT
------------------------------
tarai 0% 408% 1918%
fib 0% 497% 2874%
mandelbrot 0% 920% 1347%

Here's my numbers, meaningful or not:

            Ruby YARV
-----------------------
tarai 0% 547%
fib 0% 540%
mandelbrot 0% 172%
total 0% 472%
-----------------------

def tarai( x, y, z )
  if x <= y then
    return y
  else
    return tarai( tarai(x-1, y, z),
                  tarai(y-1, z, x),
                  tarai(z-1, x, y))
  end
end

p tarai(12, 6, 0)

#####

def fib( n )
  if n < 3 then
    return 1
  else
    return fib(n-1) + fib(n-2)
  end
end

p fib(12)

#####

include Math

def c_mul( c1, c2 )
  a,b = c1
  c,d = c2
  return a*c - b*d, a*d + c*b
end

def c_abs(c)
  a, b = c
  return sqrt( a*a + b*b )
end

def is_mandelbrot( z )
  for i in 1..100 do
    z = c_mul( z, z )
    if c_abs( z ) > 2 then
      return false
    end
  end
  return true
end

ary =

for xx in 0..100 do
  for yy in 0..100 do
    c = [xx / 50.0, yy / 50.0]
    if is_mandelbrot( c ) then
      ary.push(c)
    end
  end
end

-1, Troll.

David Vallner

···

On Thu, 22 Feb 2007 03:25:10 +0100, William James <w_a_x_man@yahoo.com> wrote:

3% is a modest improvment. Let's see what can be
achieved by switching to a language that's designed
to be fast.

3) Judy is in general not faster than a hash lookup. It is just more
memory efficient.

I cannot fully agree with that characterization, even if it's correct
considering
only the operation count.

It's not something you see on typical benchmarks [1], as they tend to access all
memory with the same frequency, but real programs' memory access patterns
more or less follow the power law - very few data structures are
accessed all the time,
and gradually bigger groups of data structures are accessed less frequently,
ending with most of the program memory being accessed very rarely.

For typical programs computers are pretty good at having the more commonly
accessed things in faster caches. This means even modest decrease in
data structure sizes will pull frequently accessed data structures into faster
caches. In most benchmarks all items are equally probably, so the items pulled
into freed caches tends to be uninteresting.

I'd like to see Judy vs hash tables performance comparison on real programs,
or at least on more realistic benchmarks.

One more thing - Ruby uses numeric hash tables for symbol lookup.
Symbols ids are highly non-random:

ids = ObjectSpace.enum_for(:each_object,

Module).inject(){|ms,md| ms + md.instance_methods}.uniq.map{|x|
x.to_sym.object_id}.sort; [ids.min, ids.max, ids.size]
=> [378, 269218, 1061]

That's exactly the situation Judy is optimized for.

[1] - A performance comparison of Judy to hash tables

···

On 22/02/07, gga <GGarramuno@aol.com> wrote:

--
Tomasz Wegrzanowski [ http://t-a-w.blogspot.com/ ]