Ruby is too slow

I have been writing some image processing algorithms that run on incoming
video. At 30fps, there is only 33ms to do all processing.
The following Ruby code is typical of the kind of loops being run.
A lookup table is created that maps video input into the desired output.
The image is then iterated over applying the loopup table to each element.

The following script takes 1.6 secs to execute on my PIII 866Mhz.
The equivalent C++ code takes 8 ms to execute.
A factor of 200 difference.
The equivalent Java code takes 15 ms to execute.
I actually increased the image size 10 fold on the Java run and saw 150ms.
10ms timing resolution really bites.

I like Ruby, but it needs to be significantly faster.
Anybody know the main reason why Ruby is so much slower in this example.
Is it the handling of numbers?
Is there overhead calling yield for every pixel?
Is the array lookup not O(1)?

Ruby

···

size = 6404802
image = []
image.fill 0, 0…size
lookup = []
lookup.fill 0, 0…1024
t0 = Time.now
size.times { |i|
image[i] = lookup[image[i]]
}
td = Time.at(Time.now - t0)
puts “etime: #{td.sec}.#{td.usec}”

C++

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <windows.h>
#include <winbase.h>

static LARGE_INTEGER m_countsPerSec;
static LARGE_INTEGER m_start;
static LARGE_INTEGER m_end;

void elapsedTimeInit()
{
QueryPerformanceFrequency(&m_countsPerSec);
}

void elapsedTimeMarkStart()
{
QueryPerformanceCounter(&m_start);
}

/**

  • returns the elapsed time in microseconds.
    */
    unsigned long elapsedTime()
    {
    LONGLONG delta;
    QueryPerformanceCounter(&m_end);
    delta = m_end.QuadPart - m_start.QuadPart;
    if (delta < 0)
    delta = m_countsPerSec.QuadPart + delta;
    delta = (LONGLONG)(1000000.0 * delta / m_countsPerSec.QuadPart + 0.5);
    return (unsigned long)delta;
    }

void main(int argc, char argv[])
{
const int size = 640
4802;
short int image = (short int)malloc(size
sizeof(short int));
memset(image, 0, size*sizeof(short int));
short int lookup[1024] = { 0 };

short int *imageIt = image;
short int *last = image + size;
elapsedTimeInit();
elapsedTimeMarkStart();
while (imageIt != last)
{
	*imageIt = lookup[*imageIt];
	imageIt++;
}
printf("etime: %ld\n", elapsedTime() );
free(image);

}

Java

import java.util.*;

public class Test
{
public static void main(String[] args)
{
int size = 6404802;
short[] image = new short[size];
for (int i=0; i < size; ++i)
image[i] = 0;

	short[] lookup = new short[1024];
	for (int i=0; i < 1024; ++i)
		lookup[i] = 0;
		
	Date t0 = new Date();
	for (int i=0; i < size; ++i)
	{
		image[i] = lookup[image[i]];
	}
	Date t1 = new Date();
	System.out.println("etime: " + (t1.getTime() - t0.getTime()) );
}

}

Hello MetalOne,

Thursday, November 21, 2002, 9:14:27 AM, you wrote:

I like Ruby, but it needs to be significantly faster. Anybody know
the main reason why Ruby is so much slower in this example. Is it
the handling of numbers? Is there overhead calling yield for every
pixel? Is the array lookup not O(1)?

it is because ruby interpreted language, while java and c++ are
compiled to machine code. this sort of tasks definitely not for ruby

···


Best regards,
Bulat mailto:bulatz@integ.ru

“MetalOne” jcb@iteris.com wrote in message
news:92c59a2c.0211202158.4ed9aff9@posting.google.com

The following script takes 1.6 secs to execute on my PIII 866Mhz.
The equivalent C++ code takes 8 ms to execute.
A factor of 200 difference.
The equivalent Java code takes 15 ms to execute.
I actually increased the image size 10 fold on the Java run and saw 150ms.
10ms timing resolution really bites.

Java code seems being JITted.

I like Ruby, but it needs to be significantly faster.
Anybody know the main reason why Ruby is so much slower in this example.
Is it the handling of numbers?
Is there overhead calling yield for every pixel?
Is the array lookup not O(1)?

Ruby

size = 6404802
image =
image.fill 0, 0…size
lookup =
lookup.fill 0, 0…1024
t0 = Time.now
size.times { |i|
image[i] = lookup[image[i]]
}
td = Time.at(Time.now - t0)
puts “etime: #{td.sec}.#{td.usec}”

As for me a tool should be used just for what it was designated. (is it
English?).

Time critical routines should be written with another language.
In this case I would write the piece

size.Times{|i| image[i] = lookup[image[i]]}

in assembler and achieve much lower times than Your Java or C++ even on
i80386 processor.
Moreover assembler code for this task is shorter and easier to write.

Ruby has a perfect mechanism of binding native machine code. It’s very easy
and I prefere using it for such tasks.

MetalOne wrote:

I like Ruby, but it needs to be significantly faster.
Anybody know the main reason why Ruby is so much slower in this example.
Is it the handling of numbers?
Is there overhead calling yield for every pixel?
Is the array lookup not O(1)?

Try to run your code with the ruby profiler (using “ruby -rprofile
yourcode.rb”). That should find where most time is spent, and you can
then either try to optimize that part, or implement it as a C library
for ruby.

It may well be that it is still too slow for your purposes, since C++
and java in general are faster than ruby. But at least you’ll have more
information to make the right decision.

/Anders

···

A n d e r s B e n g t s s o n | ndrsbngtssn@yahoo.se
Stockholm, Sweden |


Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!

jcb@iteris.com (MetalOne) writes:

I have been writing some image processing algorithms that run on incoming
video. At 30fps, there is only 33ms to do all processing.
The following Ruby code is typical of the kind of loops being run.
A lookup table is created that maps video input into the desired output.
The image is then iterated over applying the loopup table to each element.

The following script takes 1.6 secs to execute on my PIII 866Mhz.
The equivalent C++ code takes 8 ms to execute.
A factor of 200 difference.

Find out the difference and write it in C, preferably using
Ruby::Inline for your convenience. This is the 10/90 rules again: 10%
of the code is executed 90% of the time. Find that 10% and optimise
away.

YS.

while (imageIt != last)
{
*imageIt = lookup[*imageIt];
imageIt++;
}

Simple loop and 3 pointer dereferences.

for (int i=0; i < size; ++i)
{
image[i] = lookup[image[i]];
}

Simple loop and 3 array lookups.

size.times { |i|
image[i] = lookup[image[i]]
}

One function call per pixel for the loop and 3 function calls for the
lookups.

I believe that accounts for the difference you are seeing.
You have to remember that .times is implemented using a block and then
yield()ing it that many times.

My advice would be to break this code out into a module.

I wrote an app that did ebook processing, and was using stuff like .each
for each line of the book and then .each for each character on the line,
and the function call overhead was insane. One I broke the per
character stuff into a C module and called it on each line as a ruby
function, the speed improved by a factor of 50.

Tom.

···
  • MetalOne (jcb@iteris.com) wrote:

    .^. .-------------------------------------------------------.
    /V\ | Tom Gilbert, London, England | http://linuxbrit.co.uk |
    /( )\ | Open Source/UNIX consultant | tom@linuxbrit.co.uk |
    ^^-^^ `-------------------------------------------------------’

Do I have the Smalltalk version right?

size := 6404802.
image := ByteArray new: size withAll: 0.
lookup := ByteArray new: size withAll: 0.

Time millisecondsToRun: [
1 to: size do: [ :i |
image at: i put: (lookup at: (image at: i ) + 1)
]
]

“159”

···


…tom
remove dashes in email for replies
http://isectd.sourceforge.net

Hello Aleksei,

Thursday, November 21, 2002, 9:34:31 AM, you wrote:

Ruby has a perfect mechanism of binding native machine code. It’s
very easy and I prefere using it for such tasks.

give us url and example, pliz :slight_smile:

···


Best regards,
Bulat mailto:bulatz@integ.ru

Bulat Ziganshin wrote:

it is because ruby interpreted language, while java and c++ are
compiled to machine code. this sort of tasks definitely not for ruby

Java compiles to byte code that is then interpreted. Unless you have a
very specific Java to Native compiler (which the poster does not
mention) then he is running within the Java VM.

Compared to Ruby the Java VM has had a great deal of effort put into JIT
and other improvements and so we should hope that it is indeed faster.

Ruby will get there I’m sure but in the meantime JAVA IS NOT COMPILED TO
MACHINE CODE!

I’m sure that a programmer of your stature knew that and it just slipped
your mind for the moment.

Hello Anders,

Thursday, November 21, 2002, 3:14:39 PM, you wrote:

Anybody know the main reason why Ruby is so much slower in this

Try to run your code with the ruby profiler (using "ruby -rprofile

it is the joke of the week! :slight_smile: he runs simple loop 6404802 times

···


Best regards,
Bulat mailto:bulatz@integ.ru

size.times { |i|
image[i] = lookup[image[i]]
}

One function call per pixel for the loop and 3 function calls for the
lookups.

hmmm.
Actually, I tried replacing the iterator with a while loop.
i = 0
while i < size
image[i] = lookup[image[i]]
i += 1
end

The while loop version was slower yet.
A function call to index an array? Yikes! Well, that might explain it.

Thomas Gagné tgagne@wide-open-west.com wrote in message news:3DDCFB97.1040505@wide-open-west.com

Do I have the Smalltalk version right?

size := 6404802.
image := ByteArray new: size withAll: 0.
lookup := ByteArray new: size withAll: 0.

Time millisecondsToRun: [
1 to: size do: [ :i |
image at: i put: (lookup at: (image at: i ) + 1)
]
]

“159”

I don’t know smalltalk, but it looks close enough.
I don’t see the reason for the + 1.
The lookup table was actually only 1024 elements, but it doesn’t really matter.
What kind of processor did you run the test on.

Thomas Gagné tgagne@wide-open-west.com wrote in message news:3DDCFB97.1040505@wide-open-west.com

Do I have the Smalltalk version right?

size := 6404802.
image := ByteArray new: size withAll: 0.
lookup := ByteArray new: size withAll: 0.

Time millisecondsToRun: [
1 to: size do: [ :i |
image at: i put: (lookup at: (image at: i ) + 1)
]
]

“159”

I don’t know smalltalk, but it looks close enough.
I don’t see the reason for the + 1.

The index of the first element of an array in Smalltalk is 1.
He could have omitted it and simply initialize the lookup table w/
another value, but it’d have been unfair :slight_smile:

BTW, in Squeak, on a 1700+ Athlon XP, that code snippet runs in 260ms.

···

On Fri, Nov 22, 2002 at 06:56:05AM +0900, MetalOne wrote:

The lookup table was actually only 1024 elements, but it doesn’t really matter.
What kind of processor did you run the test on.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Turn right here. No! NO! The OTHER right!

“Bulat Ziganshin” bulatz@integ.ru wrote in message
news:112263769600.20021121093740@integ.ru

Hello Aleksei,

Thursday, November 21, 2002, 9:34:31 AM, you wrote:

Ruby has a perfect mechanism of binding native machine code. It’s
very easy and I prefere using it for such tasks.

give us url and example, pliz :slight_smile:

What URL and what example.

Extending process is described in the readme file shipped with the Ruby
itself.

It’s not a good idea to post assembler code to a Ruby conference. Moreover,
some Ruby users do not understand the assembler.

···


Best regards,
Bulat mailto:bulatz@integ.ru

Hello Peter,

Thursday, November 21, 2002, 12:31:04 PM, you wrote:

it is because ruby interpreted language, while java and c++ are
compiled to machine code. this sort of tasks definitely not for
ruby

Java compiles to byte code that is then interpreted.

… or compiled using JIT. based on timings in original post, i think,
his code is compiled

···


Best regards,
Bulat mailto:bulatz@integ.ru

tor 2002-11-21 klockan 13.36 skrev Bulat Ziganshin:

Hello Anders,

Thursday, November 21, 2002, 3:14:39 PM, you wrote:

Anybody know the main reason why Ruby is so much slower in this

Try to run your code with the ruby profiler (using "ruby -rprofile

it is the joke of the week! :slight_smile: he runs simple loop 6404802 times

So? I still figure that a quick measurement of the code in question
gives valuable information. At least more valuable than the idle
speculation on Java’s performance that most replies to his question
revolved around. :wink:

/Anders

···

A n d e r s B e n g t s s o n | ndrsbngtssn@yahoo.se
Stockholm, Sweden |


Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!

The while loop version was slower yet.
A function call to index an array? Yikes! Well, that might explain it.

:slight_smile: There you go. That’s the price you pay for convenience: no memory issues
for out-of-bounds indexing; ability to index from the end; ability to take a
slice of the array.

In short:

   a = [ "a", "b", "c", "d", "e" ]
   a[2] +  a[0] + a[1]   #=> "cab"
   a[6]                  #=> nil
   a[1, 2]               #=> ["b", "c"]
   a[1..3]               #=> ["b", "c", "d"]
   a[4..7]               #=> ["e"]
   a[6..10]              #=> nil
   a[-3, 3]              #=> ["c", "d", "e"]

Gavin

···

From: “MetalOne” jcb@iteris.com

MetalOne wrote:

Thomas Gagné tgagne@wide-open-west.com wrote in message news:3DDCFB97.1040505@wide-open-west.com

Do I have the Smalltalk version right?

size := 6404802.
image := ByteArray new: size withAll: 0.
lookup := ByteArray new: size withAll: 0.

Time millisecondsToRun: [
1 to: size do: [ :i |
image at: i put: (lookup at: (image at: i ) + 1)
]
]

“159”

I don’t know smalltalk, but it looks close enough.
I don’t see the reason for the + 1.

Smalltalk numbers its indexes from 1 instead of 0. Since
the value of “image at: i” was zero I added 1 to it.

The lookup table was actually only 1024 elements, but it doesn’t really matter.
What kind of processor did you run the test on.

My Dell 5000’s 650–it was plugged in and I had the dimmer
switch on all the way :-).

···


…tom
remove dashes in email for replies
http://isectd.sourceforge.net

hmmm.
Actually, I tried replacing the iterator with a while loop.
i = 0
while i < size
image[i] = lookup[image[i]]
i += 1
end

The while loop version was slower yet.
A function call to index an array? Yikes! Well, that might explain it.

I’ve read that using the Array’s at function is faster than operator
because it doesn’t support ranges.

image.at(i) = …

Isak

Mauricio Fernández batsman.geo@yahoo.com writes:

BTW, in Squeak, on a 1700+ Athlon XP, that code snippet runs in
260ms.

As a reference, it takes 128ms on a 1.6G P4 using GNU Smalltalk.

···


Josh Huber