For performance, write it in C

Whenever the question of performance comes up with scripting languages such as Ruby, Perl or Python there will be people whose response can be summarised as "Write it in C". I am one such person. Some people take offence at this and label us trolls or heretics of the true programming language (take your pick).

I am assuming here that when people talk about performance they really mean speed. Some will disagree but this is what I am talking about.

In this post I want to clear some things up and provide benchmarks as to why you should take "Write it in C" seriously. Personally I like to write my projects in Ruby or Perl or Python and then convert them to C to get a performance boost. How much of a boost? Well here I will give you some hard data and source code so that you can see just how much of a boost C can give you.

The mini project in question is to generate all the valid Latin squares. A Latin square is a grid of numbers (lets say a 9 x 9 grid) that has the numbers 1 to 9 appear only once in each row and column. Sudoku grids are a subset of Latin squares.

The approach taken is to create a list of all the permutations and then build up a grid row by row checking that the newly added row does not conflict with any of the previous rows. If the final row can be added without problem the solution is printed and the search for the next one starts. It is in essence depth first search. The first version of the program that I wrote in Perl took 473 minutes to generate all the valid 5 x 5 Latin squares, version four of the program took 12 minutes and 51 seconds. The C version of the program took 5.5 seconds to produce identical results. All run on the same hardware.

[Latin]$ time ./Latin1.pl 5 > x5

real 473m45.370s
user 248m59.752s
sys 2m54.598s

[Latin]$ time ./Latin4.pl 5 > x5

real 12m51.178s
user 12m14.066s
sys 0m7.101s

[Latin]$ time ./c_version.sh 5

real 0m5.519s
user 0m4.585s
sys 0m0.691s

This is what I mean when I say that coding in C will improve the performance of your program. The improvement goes beyond percentages, it is in orders of magnitude. I think that the effort is worth it. If a 5 x 5 grid with 120 permutations took 12 minutes in Perl, how long would a 6 * 6 grid with 720 permutations take? What unit of measure would you be using for a 9 x 9 grid?

Size Permutations
==== ============
   1 1
   2 2
   3 6
   4 24
   5 120
   6 720
   7 5040
   8 40320
   9 362880

Now lets look at first version of the code:

     1 #!/usr/bin/perl -w

     2 use strict;
     3 use warnings;

     4 use Algorithm::Permute;

     5 my $width_of_board = shift;

     6 my @permutations;

     7 my $p = new Algorithm::Permute( [ 1 .. $width_of_board ] );

     8 while ( my @res = $p->next ) {
     9 push @permutations, [@res];
    10 }
    11 my $number_of_permutations = scalar(@permutations);

    12 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
    13 add_a_line($x);
    14 }

Lines 1 to 11 just build up a list of all the permutations using the handy Algorithm::Permute module from CPAN. Lines 12 to 14 starts on the first row of the solution by trying out all possible permutations for the first row.

    15 sub add_a_line {
    16 my @lines = @_;

    17 my $size = scalar(@lines);

    18 my $ok = 1;
    19 for ( my $x = 0 ; $x < $size ; $x++ ) {
    20 for ( my $y = 0 ; $y < $size ; $y++ ) {
    21 if ( $x != $y ) {
    22 $ok = 0 unless compare( $lines[$x], $lines[$y] );
    23 }
    24 }
    25 }

    26 if ($ok) {
    27 if ( $size == $width_of_board ) {
    28 print join(':', map { p($_) } @lines) . "\n";
    29 }
    30 else {
    31 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
    32 add_a_line( @lines, $x );
    33 }
    34 }
    35 }
    36 }

The add_a_line() function first checks that none of the lines so far conflict (have the same digit in the same column), if it passes and the number of lines equals the size of the board then the result is printed and another solution is looked for. Failing that another line is added and add_a_line() is called.

Here is the function that tells if two lines conflict.

    37 sub compare {
    38 my ( $a, $b ) = @_;

    39 my $ok = 1;

    40 my @aa = @{ $permutations[$a] };
    41 my @bb = @{ $permutations[$b] };

    42 for ( my $x = 0 ; $x < $width_of_board ; $x++ ) {
    43 $ok = 0 if $aa[$x] == $bb[$x];
    44 }

    45 return $ok == 1;
    46 }

The p() function is a little utility to convert a list into a string for display.

    47 sub p {
    48 my ($x) = @_;

    49 my @a = @{ $permutations[$x] };
    50 my $y = join( '', @a );

    51 return $y;
    52 }

Well I have just exposed some pretty crap code to eternal ridicule on the internet, but there you have it. The code is crap, even non Perl programmers will be able to point out the deficenties with this code. It works, even though a 5 x 5 grid took 473 minutes to run. Lets try and salvage some pride and show version four and see how we managed to speed things up.

     1 #!/usr/bin/perl -w

     2 use strict;
     3 use warnings;

     4 use Algorithm::Permute;

     5 my $width_of_board = shift;

     6 my @permutations;
     7 my @output;
     8 my %compared;

     9 my $p = new Algorithm::Permute( [ 1 .. $width_of_board ] );

    10 while ( my @res = $p->next ) {
    11 push @permutations, [@res];
    12 push @output, join( '', @res );
    13 }
    14 my $number_of_permutations = scalar(@permutations);

Lines 1 to 14 are doing pretty much what version one was doing except that a new list, @output, is being built up to precalculate the output strings and remove the need for the p() function. A minor speed up but useful.

    15 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
    16 for ( my $y = 0 ; $y < $number_of_permutations ; $y++ ) {
    17 my $ok = 1;

    18 my @aa = @{ $permutations[$x] };
    19 my @bb = @{ $permutations[$y] };

    20 for ( my $z = 0 ; $z < $width_of_board ; $z++ ) {
    21 if ( $aa[$z] == $bb[$z] ) {
    22 $ok = 0;
    23 last;
    24 }
    25 }

    26 if ( $ok == 1 ) {
    27 $compared{"$x:$y"} = 1;
    28 }
    29 }
    30 }

Lines 15 to 30 introduces new code to precalculate the comparisons and feed the results into a hash. Lines 31 to 33 start the work in the same way as version one.

    31 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
    32 add_a_line($x);
    33 }

And now to the improved add_a_line() function. The code has been improved to only check that the newly added line does not conflict with any of the existsing lines rather than repeatedly comparing the existing (valid) lines.

    34 sub add_a_line {
    35 my @lines = @_;

    36 my $size = scalar(@lines);

    37 my $ok = 1;

    38 if ( $size > 1 ) {
    39 for ( my $x = 0 ; $x < ( $size - 1 ) ; $x++ ) {
    40 unless ( defined $compared{ $lines[$x] .':'. $lines[-1] } ) {
    41 $ok = 0;
    42 last;
    43 }
    44 }
    45 }

    46 if ($ok) {
    47 if ( $size == $width_of_board ) {
    48 print join( ':', map { $output[$_] } @lines ) . "\n";
    49 }
    50 else {
    51 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
    52 add_a_line( @lines, $x );
    53 }
    54 }
    55 }
    56 }

These changes took us down from 473 minutes to just 12. The elimination of unnessessary comparisons in add_a_line() helped as did the precalculation of those comparisons. There are lessons to be learnt here, write decent code and cache repetetive comparisons. There are no great tricks, just that bad code can cost you dearly and simple things can bring big improvements. So with such a massive improvement how could we make our code any faster?

Write it in C.

Having learnt the lessons developing the code in Perl I am not going to start the whole thing over in C. Using version four I used the precalculation phase of the Perl scripts to write out a C header file with data structures that would be useful for the C program.

     1 #define WIDTH_OF_BOARD 5
     2 #define NUMBER_OF_PERMUTATIONS 120
     3 char *output_strings[] = {
     4 "54321",

   123 "12345",
   124 };
   125 bool compared[NUMBER_OF_PERMUTATIONS][NUMBER_OF_PERMUTATIONS] = {
   126 {false, false, ...

   245 {false, false, ...
   246 };
   247 int work[WIDTH_OF_BOARD];

This then leaves the C code itself. Lines 1 to 7 includes a load of useful stuff, infact it is also probably including some quite unneccessary stuff too, I just cut and paste it from another project.

     1 #include <stdio.h>
     2 #include <stdlib.h>
     3 #include <stdbool.h>
     4 #include <err.h>
     5 #include <string.h>
     6 #include <unistd.h>
     7 #include <sys/types.h>

Line 8 is the header file that Perl precalculated.

     8 #include "Latin.h"

Now the meat. The code is pretty much the same as version four just adapted to C. No special C tricks, no weird pointer stuff, an almost line for line translation of the Perl code.

     9 void
    10 add_a_row(int row)
    11 {
    12 bool is_ok;
    13 int x,y;

    14 if (row == WIDTH_OF_BOARD) {
    15 for (x = 0; x < WIDTH_OF_BOARD; x++) {
    16 if (x == 0) {
    17 printf("%s", output_strings[work[x]]);
    18 } else {
    19 printf(":%s", output_strings[work[x]]);
    20 }
    21 }
    22 puts("");
    23 } else {
    24 for (x = 0; x < NUMBER_OF_PERMUTATIONS; x++) {
    25 work[row] = x;

    26 is_ok = true;
    27 if (row != 0) {
    28 for( y = 0; y < row; y++ ) {
    29 if(compared[work[row]][work[y]] == false) {
    30 is_ok = false;
    31 break;
    32 }
    33 }
    34 }
    35 if (is_ok == true) {
    36 add_a_row(row + 1);
    37 }
    38 }
    39 }
    40 }

    41 int
    42 main(int argc, char *argv[])
    43 {
    44 add_a_row(0);
    45 }

And the C version ran in 5.5 seconds. In fact the 5.5 seconds includes the Perl program that does all the precalculation to write the Latin.h header file, the compiling of the C source and finally the running of the program itself. So we have not cheated by doing some work outside the timings.

Just think of it, 12 minutes down to 5.5 seconds without having to write any incomprehensible C code. Because we all know that C code is completely incomprehensible with it doing all that weird pointer stuff all the time.

Now the Perl code could be improved, there are tricks that could be pulled out of the bag to trim something off the 12 minutes. Perhaps another language would be faster? But how far could you close the gap from 12 minutes to 5.5 seconds?

Just to up the ante I added -fast -mcpu=7450 to the compiler (gcc optimized for speed on an G4 Macintosh) and ran it again.

[Latin]$ time ./c_version.sh 5 > x5

real 0m3.986s
user 0m2.810s
sys 0m0.540s

Another 30% performance improvement without changing any code.

Lets review the languages we have used and their advantages. C is very fast without any stupid tricks. C will give you better control over the amount of memory you use (the Perl code eats up massive amounts of memory in comparison, the 9 x 9 grid threw an out of memory error on my 1Gb machine).

It is much easier to develop in Perl. Any error message you get is likely to at least give you a clue as to what the problem might be. C programmers have to put up with the likes of 'Bus error' or 'Segmentation fault', which is why C programmers are grouches. Perl also allows you to significantly improve your code without major rewrites. There is a module called Memoize that can wrap a function and cache the calls all by adding two extra lines to your code, the same is true for most scripting languages.

So what am I recommending here, write all your programs in C? No. Write all your programs in Perl? No. Write them in your favourite scripting language to refine the code and then translate it into C if the performance falls short of your requirements. Even if you intend to write it in C all along hacking the code in Perl first allows you to play with the algorithm without having to worry about memory allocation and other such C style house keeping. Good code is good code in any language.

If you really really want that performance boost then take the following advice very seriously - "Write it in C".

Do you have any preferred tutorials on wrapping Ruby around C libraries?

···

--
Posted via http://www.ruby-forum.com/.

Peter Hickman schrieb:

(Example of Perl and C Code)

Peter, is there any chance you could test your program with Ruby Inline?

   http://rubyforge.org/projects/rubyinline

I'm on Windows, so I can't use Ruby Inline (+1 for MinGW btw :slight_smile:

Regards,
Pit

Peter Hickman gave a very good article about prototyping in a scripting
language, and then re-coding in c:

*snip*

If you really really want that performance boost then take the following
advice very seriously - "Write it in C".

I totally agree that with the current state of the art, this is the
right approach.

Maybe it doesn't need saying, but I'm going to... in the vasy majority
of applications, almost all of their run time is a tiny wee small
fraction of the code. This is the part that I would write in c (or c++).
The vast majority of the code (and it's not there just for fun, it's
still completely important to the application) will use a vanishingly
small fraction of processor time. This is the bit that I would probably
leave in Ruby.

People talk about the 80:20 principle, but in my experience it's much
more like 99:1 for applications. 99% of the code uses 1% of the run
time. 1% of the code consumes 99% of the run time. That could be the
signal processing and graphics heavy applications that I have
experienced though.

Thanks for the comparison, it was great. And thanks for the very nice
pre-generation of look up tables in perl idea. Nice.

Cheers,
  Benjohn

This is a great post, and should at least be posted to a blog somewhere so
the masses who don't know about USENET can still find it on Google!

Jay

···

On Wed, 26 Jul 2006 17:47:13 +0900, Peter Hickman wrote:

In this post I want to clear some things up and provide benchmarks as to
why you should take "Write it in C" seriously.

One important thing about Ruby performance is that Ruby gets very
inefficient very quickly, as the size of the working set grows. This
manifests as an observed performance problem, but the tip-off is that
performance of a particular Ruby program is quite acceptable for small
data-set sizes (such as in your test fixtures), but falls off a cliff with
production data-sets. Most of the times that I have found myself rewriting
slow Ruby code in C have been to get a program that runs in far less memory,
by manually laying out the data structures. This improves performance not
only for the obvious reasons (less thrashing and less contention with other
processes on a busy server), but also because you can "help" the processor's
cache manager to get more hits by creating localities of reference and by
stuffing more of your working set into a smaller number of cache lines. This
matters all the more on multiprocessors, which are severely penalized by
cache misses, to the point that some naively-written programs can run
noticeably faster on a uniprocessor.

Hi Peter!

Whenever the question of performance comes up with scripting
languages
such as Ruby, Perl or Python there will be people whose
response can be
summarised as "Write it in C". I am one such person. Some people take
offence at this and label us trolls or heretics of the true
programming
language (take your pick).

The last (and only) time I called someone a troll for saying
'Write it C' it was in response to a rails related question.
Further the OP asked for configuration items and such, but maybe
that's a whole other storry. (and of course you can write
C Extensions for rails... yeah, yadda, yadda :slight_smile: )

...snip 52 lines Perl, some hundred lines C ...

[Latin]$ time ./Latin1.pl 5 > x5

real 473m45.370s
user 248m59.752s
sys 2m54.598s

[Latin]$ time ./Latin4.pl 5 > x5

real 12m51.178s
user 12m14.066s
sys 0m7.101s

[Latin]$ time ./c_version.sh 5

real 0m5.519s
user 0m4.585s
sys 0m0.691s

Just to show the beauty of ruby:

···

-----------------------------------------------------------
require 'rubygems'
require 'permutation'
require 'set'

$size = (ARGV.shift || 5).to_i

$perms = Permutation.new($size).map{|p| p.value}
$out = $perms.map{|p| p.map{|v| v+1}.join}
$filter = $perms.map do |p|
  s = SortedSet.new
  $perms.each_with_index do |o, i|
    o.each_with_index {|v, j| s.add(i) if p[j] == v}
  end && s.to_a
end

$latins =
def search lines, possibs
  return $latins << lines if lines.size == $size
  possibs.each do |p|
    search lines + [p], (possibs -
$filter[p]).subtract(lines.last.to_i..p)
  end
end

search , SortedSet[*(0...$perms.size)]

$latins.each do |latin|
  $perms.each do |perm|
    perm.each{|p| puts $out[latin[p]]}
    puts
  end
end
-----------------------------------------------------------
(does someone has a nicer/even faster version?)

would you please run that on your machine?
perhaps you have to do a "gem install permutation"
(no I don't think it's faster than your C code, but
it should beat the perl version)

If you really really want that performance boost then take
the following
advice very seriously - "Write it in C".

Agreed, 100%, for those who want speed, speed and nothing
else there is hardly a better way.

thanks

Simon

Whenever the question of performance comes up with scripting languages
such as Ruby, Perl or Python there will be people whose response can be
summarised as "Write it in C". I am one such person. Some people take
offence at this and label us trolls or heretics of the true programming
language (take your pick).

<snip>

Hi,

When reading your C code, I saw that there is a lot of code that is
generated. I'd be interested to see how well the C program does if
it can work for any size of the squares. In this case I think the problem
is well suited for logic languages. I wrote a version in the functional
logic language Curry, which does reasonably well. It will probably not be
faster than the C version, but a lot faster than a program written in
Ruby/Perl/Python.

If you really really want that performance boost then take the following
advice very seriously - "Write it in C".

It can be a good idea to rewrite parts in C, but I would first check if
the algorithms are good, so that it may not even be needed to write any
C code. And perhaps there are libraries or tools that do the trick
efficiently. I would keep writing C code as the last option.

Regards,
Kristof

-------------------- start of latin.curry ----------------------------
-- upto is a nondeterministic function that evaluates to
-- a number from 1 upto n
upto 1 = 1
upto n | n > 1 = n ? upto (n-1)

-- check if the lists r s have no element with the same value at the
-- same position
elems_diff r s = and $ zipWith (/=) r s

-- extend takes a list of columns, and extends each column with a
-- number for the next row. It checks the number agains the column and
-- against the previous numbers in the row.

extend :: [[Int]] -> Int -> [[Int]]
extend cols n = addnum cols where
    addnum _ =
    addnum (col:cs) prev
        > x =:= upto n &
          (x `elem` prev) =:= False &
          (x `elem` col) =:= False = (x:col) : addnum cs (x:prev)
        where x free

latin_square n = latin_square_ n
    where latin_square_ 0 = replicate n -- initalize columns to nil
          latin_square_ m | m > 0 = extend (latin_square_ (m-1)) n

square2str s = unlines $ map format_col s
    where format_col col = unwords $ map show col

main = mapIO_ (putStrLn . square2str) (findall (\s -> s =:= latin_square 5))
------------------------- end latin.curry -----------------------------

···

On Wed, 26 Jul 2006 17:47:13 +0900, Peter Hickman wrote:

Peter Hickman wrote:

If you really really want that performance boost then take the following
advice very seriously - "Write it in C".

Assuming you have no choice but C/C++. That's why I like using the
jvm or clr with languages like jruby, groovy, or boo. You don't have
to use C, you can use java or C# or boo itself (since it is statically
typed with type inference): http://boo.codehaus.org/
or C/C++ as well, although it is 100x easier to interface with a C lib
from the clr than it is from jvm with jni.

just for fun, here's a ruby version (note that the array would actually need
to be reformed into rows, but someone else can play with that)

   harp:~ > cat a.rb
   require 'gsl'

   n = Integer(ARGV.shift || 2)

   width, height = n, n

   perm = GSL::Permutation.alloc width * height

   p perm.to_a until perm.next == GSL::FAILURE

it's not terribly fast to run - but it was to write!

-a

···

On Wed, 26 Jul 2006, Peter Hickman wrote:

Whenever the question of performance comes up with scripting languages such
as Ruby, Perl or Python there will be people whose response can be
summarised as "Write it in C". I am one such person. Some people take
offence at this and label us trolls or heretics of the true programming
language (take your pick).

I am assuming here that when people talk about performance they really mean
speed. Some will disagree but this is what I am talking about.

In this post I want to clear some things up and provide benchmarks as to why
you should take "Write it in C" seriously. Personally I like to write my
projects in Ruby or Perl or Python and then convert them to C to get a
performance boost. How much of a boost? Well here I will give you some hard
data and source code so that you can see just how much of a boost C can give
you.

The mini project in question is to generate all the valid Latin squares. A
Latin square is a grid of numbers (lets say a 9 x 9 grid) that has the
numbers 1 to 9 appear only once in each row and column. Sudoku grids are a
subset of Latin squares.

The approach taken is to create a list of all the permutations and then
build up a grid row by row checking that the newly added row does not
conflict with any of the previous rows. If the final row can be added
without problem the solution is printed and the search for the next one
starts. It is in essence depth first search. The first version of the
program that I wrote in Perl took 473 minutes to generate all the valid 5 x
5 Latin squares, version four of the program took 12 minutes and 51 seconds.
The C version of the program took 5.5 seconds to produce identical results.
All run on the same hardware.

--
suffering increases your inner strength. also, the wishing for suffering
makes the suffering disappear.
- h.h. the 14th dali lama

Sorry for coming late to the party.

Whenever the question of performance comes up with scripting languages such as Ruby, Perl or Python there will be people whose response can be summarised as "Write it in C".

The conclusion is wrong in the general case. Suppose that, instead of computing permutations, your task had been to read ten million lines of textual log files and track statistics about certain kinds of events coded in there. I bet a version coded in perl, making straightforward uses of regexes and hashes, would have performance that would be very hard to match in C or any other language. Ruby would be a little slower I bet just because Perl's regex engine is so highly-tuned, although it's been claimed Oniguruma is faster.

So, first gripe: C is faster than Ruby *in certain problem domains*. In others, it's not.

Second gripe. The notion of doing a wholesale rewrite in C is almost certainly wrong. In fact, the notion of doing any kind of serious hacking, without doing some measuring first, is almost always wrong. The *right* way to build software that performs well is to write a natural, idiomatic implementation, trying to avoid stupid design errors but not worrying too much about performance. If it's fast enough, you're done. If it's not fast enough, don't write another line of code till you've used used a profiler and understand what the problem is. If in fact this is the kind of a problem where C is going to do better, chances are you only have to replace 10% of your code to get 90% of the available speedup.

And don't remember to budget downstream maintenance time for the memory-allocation errors and libc dependencies and so on that cause C programs to be subject to periodic downstream crashes.

  -Tim

···

On Jul 26, 2006, at 1:47 AM, Peter Hickman wrote:

Peter Hickman schrieb:

Whenever the question of performance comes up with scripting languages such as Ruby, Perl or Python there will be people whose response can be summarised as "Write it in C". I am one such person. Some people take offence at this and label us trolls or heretics of the true programming language (take your pick).

I am assuming here that when people talk about performance they really mean speed. Some will disagree but this is what I am talking about.

write in VHDL (or Verilog or SystemC), synthesize you piece of
hardware, plugin into PCI, have fun with most performant (in terms of speed)and efficient (in terms of energy consumption) solution
:slight_smile:

[...]

So what am I recommending here, write all your programs in C? No. Write all your programs in Perl? No. Write them in your favourite scripting language to refine the code and then translate it into C if the performance falls short of your requirements. Even if you intend to write it in C all along hacking the code in Perl first allows you to play with the algorithm without having to worry about memory allocation and other such C style house keeping. Good code is good code in any language.

I like C, but in this context C can be replaced with any compiled language.

If you really really want that performance boost then take the following advice very seriously - "Write it in C".

every platform. As I said above any of compiled languages would speed up
the code.

my 2 cents
Regards, Daniel

···

from my point of view C has the only advantage of running on nearly

Ok, I need to preface this by saying that I'm in no way either a C or
ruby guru, so I may have missed something here, but this is what I'm
seeing. The bottle-neck here is in the printing to screen.

I don't see how the OP got the code to run in 5 seconds while using any
stdout writing function (e.g., printf). The program should print like 4
megs of text to the screen (though I couldn't get the C that was posted
to compile--like I said, I'm no C superstar) -- there's no way that is
happening in 5 seconds.

Taking a very simple test case, which just prints a 10 byte char array
15000 times using puts:

···

--------

#include <stdio.h>
#define SIZE 15000
int main() {
  char str[11] = "ABCDEFGHIJ";
  int i;
  for (i = 0; i < SIZE; ++i) {
    puts(str);
  }
  printf("Printed %i bytes of data\n\0", SIZE*strlen(str));
  return 0;
}

--------

Compiled with:

gcc -o test test.c

This yields:

time ./test

ABCDEFGHIJ
ABCDEFGHIJ
...
ABCDEFGHIJ
Printed 150000 bytes of data

real 0m25.621s
user 0m0.010s
sys 0m0.029s

Now, let's see how Ruby's stdout writing stacks up:

--------

#!/usr/bin/ruby -w
SIZE = 15000
str = "ABCDEFGHIJ"
1.upto(SIZE) {
  STDOUT.syswrite("#{str}\n")
}
STDOUT.syswrite("Printed #{SIZE*str.length} bytes of data\n")

--------

This yields:

ABCDEFGHIJ
ABCDEFGHIJ
...
ABCDEFGHIJ
Printed 150000 bytes of data

real 0m26.796s
user 0m0.202s
sys 0m0.049s

Pretty comparable there.

Of course, the bottle-neck was *supposedly* the maths and array access
and such, which is where C would excel (I'm not denying that ruby is
sometimes pretty slow, or that C is *much* faster all around, just bare
with me here).

So with one ruby implementation of the program mentioned on here:

--------

#!/usr/bin/ruby -w

Wd = (ARGV.shift || 5).to_i
$board = []

# Generate all possible valid rows.
Rows = (0...Wd ** Wd)\
       .map { |n| n.to_s(Wd)\
       .rjust(Wd,'0') }\
       .reject{ |s| s =~ /(.).*\1/ }\
       .map { |s| s.split(//)\
                  .map { |n| n.to_i + 1 }
       }

def check (ary, n)
  ary[0, n + 1].transpose.all? { |x| x.size == x.uniq.size }
end

def add_a_row (row_num)
  if (Wd == row_num)
    STDOUT.syswrite($board.map { |row| row.join }.join(':'))
  else
    Rows.size.times { |i|
      $board[row_num] = Rows[i]
      if (check($board, row_num))
        add_a_row(row_num + 1)
      end
    }
  end
end

add_a_row(0)

---------

This took like 48 minutes! Ouch! But if that syswrite (or puts in the
original version) is replaced with a file write:

--------

def add_a_row (row_num)
  if Wd == row_num
    $outfile << $board.map { |row| row.join }.join(':')
  else
    Rows.size.times { |i|
      $board[row_num] = Rows[i]
      if (check($board, row_num))
        add_a_row(row_num + 1)
      end
    }
  end
end

$outfile = File.open('latins.dump', 'wb')
add_a_row(0)
$outfile.close

--------

Now we're down to 16 minutes. Much better! But still...

Ok, so what about the implementation using the permutation and set
classes?

--------

#!/usr/bin/ruby -w

require('permutation')
require('set')

$size = (ARGV.shift || 5).to_i

$perms = Permutation.new($size).map { |p| p.value }
$out = $perms.map { |p| p.map { |v| v+1 }.join }
$filter = $perms.map { |p|
  s = SortedSet.new
  $perms.each_with_index { |o, i|
    o.each_with_index { |v, j| s.add(i) if p[j] == v }
  } && s.to_a
}

$latins = []
def search lines, possibs
  return $latins << lines if lines.size == $size
  possibs.each { |p|
    search lines + [p], (possibs -
                         $filter[p]).subtract(lines.last.to_i..p)
  }
end

search [], SortedSet[*(0...$perms.size)]

$latins.each { |latin|
  $perms.each { |perm|
    perm.each { |p| STDOUT.syswrite($out[latin[p]] + "\n") }
    STDOUT.syswrite("\n")
  }
}

--------

26 minutes...that's still gonna leave a mark. But the file IO version
of the same program?

--------

outfile = File.open('latins.dump', 'wb')
$latins.each { |latin|
  $perms.each { |perm|
    perm.each { |p| outfile << $out[latin[p]] << "\n" }
    outfile << "\n"
  }
}
outfile.close

--------

time ruby test.rb

real 0m17.227s
user 0m13.969s
sys 0m1.399s

17 seconds! WOOHOO! Yes, indeedy. That's more like it. Try it if you
don't believe me.

So the moral of the story is twofold:

1.) Don't assume the bottle-neck is in the interpreter and just run off
and start writing everything in C or Java or simplified Klingon or
whatever. Testing and coverage and profiling is the key to discovering
the true cause of your woes. Here the bottle-neck was in the writing 4+
megs to stdout -- and going to C won't help for that, contra the claims
of the OP.

2.) Don't write a crapload of text to stdout. It won't be as fast as
you'd like it to be no matter what language you use.

========

NB: All testing was done on:

Linux 2.6.17-gentoo-r5 #2 PREEMPT Thu Aug 10 14:21:37 CDT 2006 i686 AMD
Athlon(tm) XP 1600+ AuthenticAMD GNU/Linux

gcc version 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)

GNU C Library development release version 2.4

ruby 1.8.5 (2006-08-18) [i686-linux]

========

Regards,
Jordan

Well put. I totally agree with you.

I'd like to mention that you can use inline for this purpose.
Have a look at

In the article, ruby and inlined c version of prime nubmer checking is
compared. As expected, inlined C runs very fast compared to ruby only
version.

I also have an experience of porting to C++. My application
which heavily relied on matrix and log functions took more than
2 days to finish. After I've ported it to C++, it took about 6hrs.
Yes, it was amazing performance boost.

Sincerely,
Minkoo Seo

Peter Hickman-2 wrote:

···

If you really really want that performance boost then take the following
advice very seriously - "Write it in C".

--
View this message in context: http://www.nabble.com/For-performance%2C-write-it-in-C-tf2002706.html#a5963409
Sent from the ruby-talk forum at Nabble.com.

It might be interesting to see how Java fares too - another route
again.

Pete

···

benj...@fysh.org wrote:

Peter Hickman gave a very good article about prototyping in a scripting
language, and then re-coding in c:

*snip*

> If you really really want that performance boost then take the following
> advice very seriously - "Write it in C".

I totally agree that with the current state of the art, this is the
right approach.

Maybe it doesn't need saying, but I'm going to... in the vasy majority
of applications, almost all of their run time is a tiny wee small
fraction of the code. This is the part that I would write in c (or c++).
The vast majority of the code (and it's not there just for fun, it's
still completely important to the application) will use a vanishingly
small fraction of processor time. This is the bit that I would probably
leave in Ruby.

People talk about the 80:20 principle, but in my experience it's much
more like 99:1 for applications. 99% of the code uses 1% of the run
time. 1% of the code consumes 99% of the run time. That could be the
signal processing and graphics heavy applications that I have
experienced though.

Thanks for the comparison, it was great. And thanks for the very nice
pre-generation of look up tables in perl idea. Nice.

Cheers,
  Benjohn

Sorry, I just couldn't resist - but maybe you should code Java instead -
http://kano.net/javabench/ :wink:

···

On 7/26/06, benjohn@fysh.org <benjohn@fysh.org> wrote:

Peter Hickman gave a very good article about prototyping in a scripting
language, and then re-coding in c:

*snip*

> If you really really want that performance boost then take the following
> advice very seriously - "Write it in C".

I totally agree that with the current state of the art, this is the
right approach.

Jay Levitt wrote:

···

On Wed, 26 Jul 2006 17:47:13 +0900, Peter Hickman wrote:

In this post I want to clear some things up and provide benchmarks as to why you should take "Write it in C" seriously.
    
This is a great post, and should at least be posted to a blog somewhere so
the masses who don't know about USENET can still find it on Google!

Jay

I may well put it in my web site, along with all the source code. Google and Yahoo hit it enough.

> In this post I want to clear some things up and provide benchmarks as to
> why you should take "Write it in C" seriously.

This is a great post, and should at least be posted to a blog somewhere so
the masses who don't know about USENET can still find it on Google!

well, I didn't post the original post (though I did link to it). I
did post my take
on it. At it's core: write it in Ruby and if it's too slow, profile
it and rewrite the
slow parts (in C if need be). Rewriting the whole app in C when Ruby makes
cohabitating with C (or C++ or Objective C) so easy just seems pointless.

My post is at On Ruby: RubyInline, Making Making Things Faster Easier

···

On 7/26/06, Jay Levitt <jay+news@jay.fm> wrote:

On Wed, 26 Jul 2006 17:47:13 +0900, Peter Hickman wrote:

Jay

--
thanks,
-pate
-------------------------

This is the "value proposition" of the "Hot Spot" technology in the
Java Virtual Machine. On the fly, it looks for byte code sections that
get executed repeatedly and it then compiles them to object code,
thereby doing runtime optimization. This allows many Java server
processes to run with near-native speeds. When Ruby runs on a virtual
machine, planned for version 2, then Ruby can do that too. The JRuby
project will effectively accomplish the same goal.

···

On 7/26/06, benjohn@fysh.org <benjohn@fysh.org> wrote:

...

People talk about the 80:20 principle, but in my experience it's much
more like 99:1 for applications. 99% of the code uses 1% of the run
time. 1% of the code consumes 99% of the run time. That could be the
signal processing and graphics heavy applications that I have
experienced though.
...

--
Dean Wampler
http://www.objectmentor.com
http://www.aspectprogramming.com
http://www.contract4j.org

I will run your Ruby version and the Java version that I write and post the results here. Give us a week or so as I have other things to be doing.