For performance, write it in C - Part 2, comparing C, Ruby and Java

This is the follow up to my "Write it in C post" and is intended to report the timings for the Java implementation that I said I would write for Charles O Nutter and the Ruby version by Simon Kroeger. First let us deal with the Ruby version.

The program differs from the Perl and C versions in that the various values it requires are not precomputed. Simon's program is completely self contained.

[Latin]$ time ruby latin.rb 5 > r5

real 0m35.793s
user 0m32.081s
sys 0m0.843s

This quite clearly pisses all over the Perl version, and yes the results were correct. Both faster than the Perl version and considerably less code, a testament to the power and expressiveness of Ruby.

Now the Java version. I will be honest here, I might be paid to program in Java but it hasn't been my language of choice since around 1992. I find it gets in my way and today it found yet another way to do it.

A straight translation like the C version worked fine for a 4 x 4 grid but when I got to the 5 x 5 grid I got the following error 'code too large'. Yes Java has hard coded limits as to the allowed size of various data structures within class files and the Compared array of 120 x 120 boolean values could not be initialised with the following code:

private static boolean[][] Compared = {
    {false, false, ...
    ...
    {true, true, ...
};

I had to have a whole load of 'Compared[0][44] = true;' and the like to get the data in. This got the 5 x 5 grid to run but the 6 x 6 grid blew up even that. Java has a 64Kb limit for various structures in the class file (see http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html). The last time that I had to work round such mind numbingly arbitrary limits was when I was programming Quick Basic. Now the timings.

[Latin]$ time ./j_version.sh 5 > j5

real 0m29.553s
user 0m13.813s
sys 0m10.745s

Sorry Java fans but "as fast as C" or "faster than C" it is not. It's only a bit faster than Ruby despite having much more resources being dedicated to speeding it up.

The really odd thing here is that Java should actually be much faster than this. I did manage to get the 4 x 4 grid to be written with the same initialisation method as the C version and the timings (admittedly on a much smaller problem) were much closer to the C version for the same 4 x 4 grid. The solution just didn't scale because of the 64Kb limit in the class files, which is probably not going to be change any time in the near future.

In the interest of fairness I also looked at the timings of just the execution of the C and Java version so that the performance of the compilers were not impacting the times. So here is the C and Java versions without the precomuting phase and without the compiling.

[Latin]$ time ./latin > /dev/null 2>&1

real 0m1.961s
user 0m1.680s
sys 0m0.051s

[Latin]$ time java Latin > /dev/null 2>&1

real 0m15.483s
user 0m9.641s
sys 0m4.280s

There you have it, C is still faster by an order of magnitude. Performance is yours for the asking, but it comes at a price - you have to write it in C. Ease of development also comes at a price, you don't get the same performance as C. Of course if you have a fear of C this does show that you can go some of the way by converting to Java, if that is fast enough for you then well and good but know this, C is faster.

Peter Hickman wrote:

*snip*

Now the Java version. I will be honest here, I might be paid to program
in Java but it hasn't been my language of choice since around 1992. I
find it gets in my way and today it found yet another way to do it.

*snip*

In the interest of fairness I also looked at the timings of just the
execution of the C and Java version so that the performance of the
compilers were not impacting the times. So here is the C and Java
versions without the precomuting phase and without the compiling.

[Latin]$ time ./latin > /dev/null 2>&1

real 0m1.961s
user 0m1.680s
sys 0m0.051s

[Latin]$ time java Latin > /dev/null 2>&1

real 0m15.483s
user 0m9.641s
sys 0m4.280s

There you have it, C is still faster by an order of magnitude.
Performance is yours for the asking, but it comes at a price - you have
to write it in C. Ease of development also comes at a price, you don't
get the same performance as C. Of course if you have a fear of C this
does show that you can go some of the way by converting to Java, if that
is fast enough for you then well and good but know this, C is faster.

When people want 'speed' they care about how fast the code runs, not
JVM startup time..

Your benchmarks are utterly irrelevant, sorry.

Isak

Peter Hickman wrote:

Now the Java version. I will be honest here, I might be paid to program
in Java but it hasn't been my language of choice since around 1992. I
find it gets in my way and today it found yet another way to do it.

Um, just a nitpick, but Java didn't exist in 1992. Unless you count Oak.

Was the code for the Java and Ruby versions posted somewhere in
the other thread? (I must admit, I tuned a lot of that thread out once it
became a shouting match about the relative speeds of C and Java.)

···

On 7/28/06, Peter Hickman <peter@semantico.com> wrote:

This is the follow up to my "Write it in C post" and is intended to
report the timings for the Java implementation that I said I would write
for Charles O Nutter and the Ruby version by Simon Kroeger. First let us
deal with the Ruby version.

--
thanks,
-pate
-------------------------

Peter Hickman wrote:

There you have it, C is still faster by an order of magnitude. Performance is yours for the asking, but it comes at a price - you have to write it in C. Ease of development also comes at a price, you don't get the same performance as C. Of course if you have a fear of C this does show that you can go some of the way by converting to Java, if that is fast enough for you then well and good but know this, C is faster.

In my younger days, I did a lot of development in assembler languages, and for many years my main high-level language was FORTRAN. Towards the end of my FORTRAN days (about 1990) I was still dropping into assembler for speed, even though the (FORTRAN) compilers were quite good by that time. C compilers really sucked, especially for numerical applications.

Now here's where I'm going to put on my asbestos suit. I think the difficulty of C development is *vastly* exaggerated by the fans of "dynamic/scripting/interpreted" languages! In addition, I think the difficulty of *assembler* development is vastly exaggerated, except in bizarre architectures. (Of course, x86 does border on bizarre, until you get to 64-bit addressing). :slight_smile:

So what is the source of "fear of C?"

Man, I needed a good laugh today. Where to begin...

This is the follow up to my "Write it in C post" and is intended to
report the timings for the Java implementation that I said I would write
for Charles O Nutter and the Ruby version by Simon Kroeger. First let us
deal with the Ruby version.

You start off right, but it's quickly apparent you're setting out to prove
Java claims wrong. You're starting off with a specific intent.

The program differs from the Perl and C versions in that the various

values it requires are not precomputed. Simon's program is completely
self contained.

[Latin]$ time ruby latin.rb 5 > r5

real 0m35.793s
user 0m32.081s
sys 0m0.843s

Not bad, really, but not even as good as the bogus Java numbers below.

This quite clearly pisses all over the Perl version, and yes the results

were correct. Both faster than the Perl version and considerably less
code, a testament to the power and expressiveness of Ruby.

So then Java is obviously more powerful since the bogus numbers are
faster...you can't draw one conclusion from Ruby numbers and another
conclusion from Java numbers. You're serving the food before setting the
table.

Now the Java version. I will be honest here, I might be paid to program

in Java but it hasn't been my language of choice since around 1992. I
find it gets in my way and today it found yet another way to do it.

A straight translation like the C version worked fine for a 4 x 4 grid
but when I got to the 5 x 5 grid I got the following error 'code too
large'. Yes Java has hard coded limits as to the allowed size of various
data structures within class files and the Compared array of 120 x 120
boolean values could not be initialised with the following code:

As another posted, Java hasn't been around since 1992, so I think perhaps
you're mistaken.

private static boolean Compared = {

    {false, false, ...
    ...
    {true, true, ...
};

I had to have a whole load of 'Compared[0][44] = true;' and the like to
get the data in. This got the 5 x 5 grid to run but the 6 x 6 grid blew
up even that. Java has a 64Kb limit for various structures in the class
file (see
Oracle Java Technologies | Oracle
).
The last time that I had to work round such mind numbingly arbitrary
limits was when I was programming Quick Basic. Now the timings.

First off, I call Troll.

Second, you're not a very good Java programmer if you didn't know about this
limit. Perhaps they didn't teach you this in Java class in 1992? (a response
troll, admittedly)

The limit is not arbitrary; it's to allow the JVM to maintain certain
constraints over the memory used by incoming class definitions, since
they're typically not garbage collected. It would not be advisable to allow
loading an extremely large class definition into permanent memory space,
eating up the entirety of the heap. Put your gigantic data in a separate
file and load it at runtime.

[Latin]$ time ./j_version.sh 5 > j5

real 0m29.553s
user 0m13.813s
sys 0m10.745s

Sorry Java fans but "as fast as C" or "faster than C" it is not. It's
only a bit faster than Ruby despite having much more resources being
dedicated to speeding it up.

Startup time is and always has been a concern with Java apps, which is why
their area of choice is primarily long-running server-side applications or
somewhat less-long-running desktop applications. For example, would you
benchmark the speed of Excel's calculation algorithms from the time you
start it up until you'd entered the numbers in and told it to calculate? To
do so would be absurd. If you want to put languages on a level playing field
you must remove limitations that each incurs for different reasons than the
others. I'd also remove any Ruby load/parse time before running any
benchmark, since that's skewing numbers too. Are we benchmarking the
performance of the language implementation or benchmarking how fast we can
load Ruby's couple hundred k of executable data versus Java's many megabytes
of base platform code? Compare apples to apples, man, and just benchmark the
algorithm.

The really odd thing here is that Java should actually be much faster

than this. I did manage to get the 4 x 4 grid to be written with the
same initialisation method as the C version and the timings (admittedly
on a much smaller problem) were much closer to the C version for the
same 4 x 4 grid. The solution just didn't scale because of the 64Kb
limit in the class files, which is probably not going to be change any
time in the near future.

No, it's not. You shouldn't stuff data into your class files. Class files
are for code.

In the interest of fairness I also looked at the timings of just the

execution of the C and Java version so that the performance of the
compilers were not impacting the times. So here is the C and Java
versions without the precomuting phase and without the compiling.

[Latin]$ time ./latin > /dev/null 2>&1

real 0m1.961s
user 0m1.680s
sys 0m0.051s

I'm actually surprised C wasn't even faster here.

[Latin]$ time java Latin > /dev/null 2>&1

real 0m15.483s
user 0m9.641s
sys 0m4.280s

Some versions of Java have taken as much as 15 seconds to start up on
certain platforms, and the startup time on Linux is frequently slower than
on other platforms. Java 5 on Windows takes perhaps a second to start up
now, primarily because they do use a shared-memory cache of much of the
static data loaded at startup. Of course, it's not a startup cost of zero,
but people simply don't use Java for command-line tools.

There you have it, C is still faster by an order of magnitude.

Performance is yours for the asking, but it comes at a price - you have
to write it in C. Ease of development also comes at a price, you don't
get the same performance as C. Of course if you have a fear of C this
does show that you can go some of the way by converting to Java, if that
is fast enough for you then well and good but know this, C is faster.

I have no fear of C. I have fear of making C work everywhere, which I do not
have to worry about with either Ruby or Java. I also have a fear of C
fanboys giving up on improving Ruby and always advising that people drop to
C for their problems.

···

On 7/28/06, Peter Hickman <peter@semantico.com> wrote:

--
Contribute to RubySpec! @ Welcome to headius.com
Charles Oliver Nutter @ headius.blogspot.com
Ruby User @ ruby.mn
JRuby Developer @ www.jruby.org
Application Architect @ www.ventera.com

Peter Hickman wrote:

This is the follow up to my "Write it in C post" and is intended to
report the timings for the Java implementation that I said I would write
for Charles O Nutter and the Ruby version by Simon Kroeger. First let us
deal with the Ruby version.

The program differs from the Perl and C versions in that the various
values it requires are not precomputed. Simon's program is completely
self contained.

[Latin]$ time ruby latin.rb 5 > r5

real 0m35.793s
user 0m32.081s
sys 0m0.843s

In another thread Simon Kroger wrote:

real 0m4.703s
user 0m0.015s
sys 0m0.000s

(this is a 2.13GHz PentiumM, 1GB RAM, forget the user and sys timings, but
'real' is for real, this is WinXP)

Why is it so much slower on Peter Hickman's machine?

Isak Hansen wrote:

When people want 'speed' they care about how fast the code runs, not
JVM startup time..

Your benchmarks are utterly irrelevant, sorry.

Isak

Interesting, so just how to you run a Java program without the JVM start-up time?

And if you can't run a Java program without the JVM start-up then your point is what exactly?

From: Isak Hansen [mailto:isak.hansen@gmail.com]
Sent: Friday, July 28, 2006 1:15 PM

When people want 'speed' they care about how fast the code runs, not
JVM startup time..

Of course each Benchmark is to be taken with a lot of caution, but I
doubt the JVM takes more than 13 seconds to start, even on a slow
machine.

cheers

Simon

Hmm. I left Uni in 1992 and it was around then that my, obviously flaky, memory says I was reading the O'Reilly Java in a Nutshell.

Digs out the brown book. Oh yes it is dated 1996, what the hell was I doing for four years?

Good catch.

pat eyler wrote:

Was the code for the Java and Ruby versions posted somewhere in
the other thread? (I must admit, I tuned a lot of that thread out once it
became a shouting match about the relative speeds of C and Java.)

The Ruby version was posted in the previous thread by Simon. I didn't post the Java version because, code wise, it is pretty much a line for line translation of the C version. But so you can see what the code was like here is the 3 x 3 version. The 5 x 5 version is just too damn big for a post, being as it is 5449 lines long!

public class Latin {
  private static int WidthOfBoard = 3;

  private static int NumberOfPermutations = 6;

  private static String OutputStrings = {
    "321",
    "231",
    "213",
    "312",
    "132",
    "123"
  };

  private static boolean Compared = new boolean[6][6];

  private static int work = { 0, 0, 0 };

  private static void addARow(int row) {
    if (row == WidthOfBoard) {
      for (int x = 0; x < WidthOfBoard; x++) {
        if (x == 0) {
          System.out.print(OutputStrings[work]);
        } else {
          System.out.print(":" + OutputStrings[work]);
        }
      }
      System.out.println();
    } else {
      for (int x = 0; x < NumberOfPermutations; x++) {
        work[row] = x;

        boolean is_ok = true;
        if (row != 0) {
          for (int y = 0; y < row; y++) {
            if (Compared[work[row]][work[y]] != true) {
              is_ok = false;
              break;
            }
          }
        }
        if (is_ok == true) {
          addARow(row + 1);
        }
      }
    }
  }

  public static void main(String args) {
    // This nonsense is to get around the fact that Java will not allow
    // me to initialise an array in the declaration.

    Compared[0][2] = true;
    Compared[0][4] = true;
    Compared[1][3] = true;
    Compared[1][5] = true;
    Compared[2][0] = true;
    Compared[2][4] = true;
    Compared[3][1] = true;
    Compared[3][5] = true;
    Compared[4][0] = true;
    Compared[4][2] = true;
    Compared[5][1] = true;
    Compared[5][3] = true;

    addARow(0);
  }
}

Well, there's a number of "shoot yourself in the foot" and "C
pitfalls" type books out there which can give you a few ideas. My
guess is that, often, new C programmers get tripped up regularly on
things like:

* arrays vs. pointers, extern vs. static, and other possibly tricky
spots in the language,
* build issues, like dealing with cryptic makefiles and gcc args (ex.
passing in -lfoo args in the right order),
* discipline with conventions on memory management

But I agree with you that it's not so bad if you use it for what it's
good at. Maybe what's happened is, folks have a bad taste in their
mouth from trying to use C to write end-user apps, when it's really
best at lower-level libs, drivers, and number crunching.

---John

···

On 7/28/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:

Peter Hickman wrote:
[snip]

Now here's where I'm going to put on my asbestos suit. I think the
difficulty of C development is *vastly* exaggerated by the fans of
"dynamic/scripting/interpreted" languages! [snip]

So what is the source of "fear of C?"

The limit is not arbitrary; it's to allow the JVM to maintain certain
constraints over the memory used by incoming class definitions, since
they're typically not garbage collected. It would not be advisable to allow
loading an extremely large class definition into permanent memory space,
eating up the entirety of the heap. Put your gigantic data in a separate
file and load it at runtime.

I think you might be misunderstanding the usage of "arbitrary" here. An
arbitrary limit, in uses such as this, is one where someone picks a
"magic number" as a limit.

I have no fear of C. I have fear of making C work everywhere, which I do not
have to worry about with either Ruby or Java. I also have a fear of C
fanboys giving up on improving Ruby and always advising that people drop to
C for their problems.

If you really believe there's no worry about portability of Java code,
you haven't been dealing with multiplatform, multi-VM Java deployments
enough.

···

On Sat, Jul 29, 2006 at 12:21:48AM +0900, Charles O Nutter wrote:

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
print substr("Just another Perl hacker", 0, -2);

Charles O Nutter wrote:

Not bad, really, but not even as good as the bogus Java numbers below.

The Java numbers are what was genuinely produced by the Java system on my computer. What grounds do you have to call them bogus? State clearly how to time a Java program in such a way that you will not call it bogus that can also be applied to the other languages here. Remember the Perl interpreter must be loaded and the source compiled each time the Perl version is run, somehow I don't recall you saying that the Perl numbers were bogus.

Second, you're not a very good Java programmer if you didn't know about this
limit. Perhaps they didn't teach you this in Java class in 1992? (a response
troll, admittedly)

I have never encountered this limit before, but then again I have never had the urge to do this sort of program in Java, I used it to do ALife and other simulations. How about you submit your version of the problem in Java, I can then run it on my system in the manner you will describe so that you wont call it 'bogus' and we use those timings. If you are going to question someone's programming prowess then we can only ask that the master himself teaches us how to to it properly.

Some versions of Java have taken as much as 15 seconds to start up on
certain platforms, and the startup time on Linux is frequently slower than
on other platforms. Java 5 on Windows takes perhaps a second to start up
now, primarily because they do use a shared-memory cache of much of the
static data loaded at startup. Of course, it's not a startup cost of zero,
but people simply don't use Java for command-line tools.

Where are you coming from? We use plenty of command line tools that are written in Java here. Tools to parse, validate and transform XML. Schema and RelaxNG checkers. There are plenty of command line tools written in Java that are used daily by people all over the world, where you pulled the 'people simply don't use Java for command-line tools' from?

I look forward to seeing your code.

M. Edward (Ed) Borasky wrote:

So what is the source of "fear of C?"

I'm pretty sure it's manual memory management. Pointers are fun and
dangerous. C is a great language, it gives you enough power to shoot
yourself in the foot.

In addition, I hear it eats babies.
  --mark

···

--
sic transit gloria et adulescentia
blog | http://blog.hasno.info/blog
wiki | http://wiki.hasno.info

M. Edward (Ed) Borasky wrote:

In my younger days, I did a lot of development in assembler languages,
and for many years my main high-level language was FORTRAN. Towards the
end of my FORTRAN days (about 1990) I was still dropping into assembler
for speed, even though the (FORTRAN) compilers were quite good by that
time. C compilers really sucked, especially for numerical applications.

So you know only clunky, crude, archaic languages, the newest
of which dates back to 1973 or earlier. Someday you ought
to move out of the dark ages and program in Ruby.

Now here's where I'm going to put on my asbestos suit. I think the
difficulty of C development is *vastly* exaggerated by the fans of
"dynamic/scripting/interpreted" languages! In addition, I think the
difficulty of *assembler* development is vastly exaggerated

(Note that Eddie is no fan of Ruby.)

One or more of these is true:

1. You don't program in Ruby.
2. You're not thinking.
3. You're not being honest.
4. You're simple-minded.

Compared to Ruby, programming in assembly language is very
tedious, very error-prone, very time-consuming, and laden
with a multitude of miniscule details. To a lesser degree,
the same is true of C. And assembly code, of course, is not
portable to another processor.

Interesting, so just how to you run a Java program without the JVM start-up time?

You can time it inside java by fetching the system clock before and after.

And if you can't run a Java program without the JVM start-up then your point is
what exactly?

It makes sense to do that for long running applications but in this
case it doesn't. If what you really wanted to do was calculate this
latin squares thing then the startup time matters. Are there any JVMs
around that keep a shared daemon running for all processes to share so
as to avoid some of the startup time?

Pedro.

···

On 7/28/06, Peter Hickman <peter@semantico.com> wrote:

Peter Hickman wrote:

Isak Hansen wrote:

When people want 'speed' they care about how fast the code runs, not
JVM startup time..

Your benchmarks are utterly irrelevant, sorry.

Isak

Interesting, so just how to you run a Java program without the JVM start-up time?

And if you can't run a Java program without the JVM start-up then your point is what exactly?

It's like measuring database performance and including the startup time of the database server. Sure, mysql startup is faster then oracles, but does it make sense? I don't know...

Regards,
Roland

Kroeger, Simon (ext) wrote:

> From: Isak Hansen [mailto:isak.hansen@gmail.com]
> Sent: Friday, July 28, 2006 1:15 PM

> When people want 'speed' they care about how fast the code runs, not
> JVM startup time..

Of course each Benchmark is to be taken with a lot of caution, but I
doubt the JVM takes more than 13 seconds to start, even on a slow
machine.

cheers

Simon

Startup time varies with the classes that are being loaded, this hello
world comparison is a wild assed guess
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=hello&lang=all

Kroeger, Simon (ext) wrote:

From: Isak Hansen [mailto:isak.hansen@gmail.com] Sent: Friday, July 28, 2006 1:15 PM

When people want 'speed' they care about how fast the code runs, not
JVM startup time..

Of course each Benchmark is to be taken with a lot of caution, but I doubt the JVM takes more than 13 seconds to start, even on a slow
machine.

This may seem like a reasonable assumption, but it really isn't that simple. Read up on how the JVM/Hotspot works, it's interesting stuff really.

Isak

···

cheers

Simon