Confirm my Performance Test Against Java?

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

I've tried it on both Linux and Mac OSX and get similar performance
numbers on each - differences being hardware, but the ratio between the
results about the same.

Please take a look at my blog post on my test results and view the
source code and let me know if I'm doing something completely wrong with
the Ruby code or execution - or if these are accurate numbers.

http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/

NOTE: This is not an attempt to start a flame war. This is a legitimate
effort to take a good look at Ruby and let the numbers speak for
themselves in making decisions for what types of applications I can
choose to use Ruby for without sacrificing the performance of a mature
platform such as Java.

Thank you.

Ben

···

--
Posted via http://www.ruby-forum.com/.

Well.... without having put a ton of thought into this... yes, Ruby
(*especially* 1.8 MRI) is slow. No one's going to argue that the Ruby
interpreter is one of the quicker kids around. If performance is the
#1 priority of whatever you'll be developing, Ruby doesn't fit your
needs, and no one will tell you it does. That's what Java (for the
most part) and C are still hanging around for.

What sort of software is in needed of being developed here?

Ask yourself: is it critical that my code always performs as fast as
possible? Or is the greater concern speed of development and project
maintainability?

Also as to the benchmark... can you post your /tmp/file_test.txt?
Posting some benchmarky code isn't very useful if no one can replicate
your results. Reading the whole file into memory may be faster than
reading it line-by-line (but obviously the wrong thing to do if the
file's enormous, which.... 8 secs to read??? i'd better be moved to
tears by the size it.) And not entirely sure what it is you're trying
to benchmark here? Vagggguuee benchmarks are fairly useless, as the
code your timing is never going to be anywhere close to the actual
code you'll write. Are you trying to just compare file reading times?
Benchmark that, and only that. Is there something specific string
manipulation-wise you want to measure? Then... measure that. Until
your code starts getting at least halfway specific, just doing a line-
by-line Java-Ruby conversion doesn't tell anything, as the code that
happens is neither the most "elegant" *nor* fastest Ruby can do.

···

On Aug 19, 9:31 am, Ben Christensen <benjchristen...@gmail.com> wrote:

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

I've tried it on both Linux and Mac OSX and get similar performance
numbers on each - differences being hardware, but the ratio between the
results about the same.

Please take a look at my blog post on my test results and view the
source code and let me know if I'm doing something completely wrong with
the Ruby code or execution - or if these are accurate numbers.

http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-per\.\.\.

NOTE: This is not an attempt to start a flame war. This is a legitimate
effort to take a good look at Ruby and let the numbers speak for
themselves in making decisions for what types of applications I can
choose to use Ruby for without sacrificing the performance of a mature
platform such as Java.

Thank you.

Ben
--
Posted viahttp://www.ruby-forum.com/.

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

Is this test case in any way representative of the tasks you will
actually be performing?

Test file 1:

uname -a

Linux linux116.ctc.com 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 12:03:43
EST 2008 i686 i686 i386 GNU/Linux

java -version

java version "1.6.0_0"
IcedTea6 1.3.1 (6b12-Fedora-EPEL-5) Runtime Environment (build 1.6.0_0-b12)
OpenJDK Server VM (build 1.6.0_0-b12, mixed mode)

java FileReadParse

Starting to read file...
The number of tokens is: 1954
It took 16 ms

ruby -v file_read_parse.rb

ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-linux]
Starting to read file ...
The number of tokens is: 1954
It took 4.951 ms

Test file 2:

java FileReadParse

Starting to read file...
The number of tokens is: 479623
It took 337 ms

ruby file_read_parse.rb

Starting to read file ...
The number of tokens is: 479623
It took 2526.455 ms

ruby file_read_parse-2.rb

Starting to read file ...
It took 588.065 ms
The number of tokens is: 479623

cat file_read_parse-2.rb

puts "Starting to read file ..."
start = Time.now

tokens = File.new("/tmp/file_test.txt").read.scan(/[^\s]+/)
count = tokens.size

stop = Time.now
puts "It took #{(stop - start) * 1000} ms"
puts "The number of tokens is: #{count}"

···

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen <benjchristensen@gmail.com> wrote:

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

I've tried it on both Linux and Mac OSX and get similar performance
numbers on each - differences being hardware, but the ratio between the
results about the same.

Please take a look at my blog post on my test results and view the
source code and let me know if I'm doing something completely wrong with
the Ruby code or execution - or if these are accurate numbers.

http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/

NOTE: This is not an attempt to start a flame war. This is a legitimate
effort to take a good look at Ruby and let the numbers speak for
themselves in making decisions for what types of applications I can
choose to use Ruby for without sacrificing the performance of a mature
platform such as Java.

Hi Ben,

The point everyone keeps bringing up--whether this benchmark is indicative
of what you will actually be doing with Ruby, and whether it is "fast
enough"--is worth considering for any project, but the fact remains that for
many things, Java is going to execute faster than Ruby. You can certainly
optimize Ruby code (and yes, writing Ruby extensions in C is actually pretty
easy), but that's not why many of us love Ruby. We love it because it allows
you to turn FileReadParse.java into this: http://gist.github.com/170466\.
Now, in the spirit of good fun:

$ ruby file_read_parse_2.rb file_read_parse_2.rb
Starting to read file ...
The number of tokens is: 39.
It took 0.189 ms

$ ruby file_read_parse_2.rb FileReadParse.java
Starting to read file ...
The number of tokens is: 159.
It took 0.215 ms

See? :slight_smile:

Good luck with Ruby, and don't be afraid to ask more questions!
Mike

···

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen <benjchristensen@gmail.com>wrote:

Thank you.

Ben
--
Posted via http://www.ruby-forum.com/\.

1.9* is significantly better. I did not try JRuby yet.

robert@fussel /cygdrive/c/Temp/frp
$ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/javac FileReadParse.java

robert@fussel /cygdrive/c/Temp/frp
$ java -cp . FileReadParse
Starting to read file...
The number of tokens is: 1122
It took 16 ms

robert@fussel /cygdrive/c/Temp/frp
$ allruby file_read_parse.rb
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
Starting to read file ...
The number of tokens is: 1122
It took 3.0 ms
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
Starting to read file ...
The number of tokens is: 1122
It took 2.0 ms

robert@fussel /cygdrive/c/Temp/frp
$ wc file_test.txt
  190 1114 7579 file_test.txt

robert@fussel /cygdrive/c/Temp/frp
$

···

On 19.08.2009 15:31, Ben Christensen wrote:

http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/

====================================================================

robert@fussel /cygdrive/c/Temp/frp
$ !w
wc file_test.txt x
   95000 557000 3789500 file_test.txt
   68970 404382 2751177 x
  163970 961382 6540677 insgesamt

robert@fussel /cygdrive/c/Temp/frp
$ java -cp . FileReadParse
Starting to read file...
The number of tokens is: 561000
It took 359 ms

robert@fussel /cygdrive/c/Temp/frp
$ !a
allruby file_read_parse.rb
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
Starting to read file ...
The number of tokens is: 561000
It took 1395.0 ms
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
Starting to read file ...
The number of tokens is: 561000
It took 872.0 ms

robert@fussel /cygdrive/c/Temp/frp

robert@fussel /cygdrive/c/Temp/frp
$ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/java -server -cp . FileReadParse
Starting to read file...
The number of tokens is: 561000
It took 515 ms

robert@fussel /cygdrive/c/Temp/frp
$

Cheers

  robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

1.8.6 is pretty slow, compared to other impls. Ruby 1.9 and JRuby will
perform better, as shown by a few folks. JRuby on a Java 6 JVM with
--fast and --server should perform very well.

I'm also pretty confident that I can get JRuby within a few times Java
performance for non-numeric CPU-intensive tasks. Just not sure when it
will be a priority to make it happen.

- Charlie

···

On Wed, Aug 19, 2009 at 8:31 AM, Ben Christensen<benjchristensen@gmail.com> wrote:

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

@Matthew K. Williams

-- 1. How *often* are you going to be processing these files? If they
are
-- batch style jobs, then does absolute speed matter over
maintainability?

The particular application I'm looking at in the future has a virtually
continuous feed of incoming data from multiple concurrent sources.

Thus I'm looking at what language the processing code would be in. My
default go to is Java - but I want to consider Ruby and not blindly just
use what I'm accustomed to before establishing what will likely be in
existence for the next 3-5 years.

In an existing system doing similar data processing, it is indeed a
batch process - but one that preferably didn't exist - thus the concept
of potentially doubling the time isn't appealing - as it's already a
thorn in the side of operations at which hardware is thrown to
alleviate.

In another system we horizontally cluster and shard data processing as
much as possible to parallelize the effort - and do as much as we can to
optimize performance. For example, daily jobs are required, but the
volume of data progressed to where the old system was taking days to
process a single job - hence the new system which now handles a job in
4-6 hours - and we're looking at other ways of reducing that further but
so far their cost exceeds business value for now.

-- 2. Are there any reasons to not keep the data in a database and then
-- perform queries, etc.?

SQL is far slower at handling this type of processing in most cases with
large volumes of data where the incremental inefficiencies of things
like REGEX and SQL really add up over 10s of millions of executions.

I have recently dealt with a large database (100+ GB) where to achieve
the necessary performance thresholds we finally had to revert to the use
of C to write UDFs in MySQL that could process the data efficiently
without needing to pull the data out of the database, process in Java
then re-insert, and therefore create huge IO burdens. It was an order of
magnitude or two faster using this approach rather than straight SQL
and/or pulling the data out to process externally.

This is a rare thing - this project was the first time I've ever had to
do that due to very unique needs of the project.

Generally however I have Java in asynchronous processes doing the data
processing and manipulation.

The analysis of Ruby performance doing these types of jobs was intended
to find what cost the adoption of Ruby would incur.

It appears that Ruby is not well suited to data processing type
applications from what I've seen and heard so far.

In another simple test I did where I was iterating over a large amount
of data, I was shocked at how poorly the Ruby implementation did. It
seems the looping itself was a very inefficient action in the Ruby
interpreter.

Hopefully this helps provide some context to my questions about Ruby in
regards to batch process of data.

···

--
Posted via http://www.ruby-forum.com/.

brabuhr@gmail.com wrote:

···

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen > <benjchristensen@gmail.com> wrote:

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

Is this test case in any way representative of the tasks you will
actually be performing?

If it is, then you should just do
$ time wc approach.txt
   6836 78325 484114 approach.txt

real 0m0.041s
user 0m0.046s
sys 0m0.015s

Argh! That gist should be http://gist.github.com/170476\. Sigh...

···

On Wed, Aug 19, 2009 at 1:17 PM, Mike Sassak <msassak@gmail.com> wrote:

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen < > benjchristensen@gmail.com> wrote:

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

I've tried it on both Linux and Mac OSX and get similar performance
numbers on each - differences being hardware, but the ratio between the
results about the same.

Please take a look at my blog post on my test results and view the
source code and let me know if I'm doing something completely wrong with
the Ruby code or execution - or if these are accurate numbers.

http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/

NOTE: This is not an attempt to start a flame war. This is a legitimate
effort to take a good look at Ruby and let the numbers speak for
themselves in making decisions for what types of applications I can
choose to use Ruby for without sacrificing the performance of a mature
platform such as Java.

Hi Ben,

The point everyone keeps bringing up--whether this benchmark is indicative
of what you will actually be doing with Ruby, and whether it is "fast
enough"--is worth considering for any project, but the fact remains that for
many things, Java is going to execute faster than Ruby. You can certainly
optimize Ruby code (and yes, writing Ruby extensions in C is actually pretty
easy), but that's not why many of us love Ruby. We love it because it allows
you to turn FileReadParse.java into this: http://gist.github.com/170466\.
Now, in the spirit of good fun:

$ ruby file_read_parse_2.rb file_read_parse_2.rb
Starting to read file ...
The number of tokens is: 39.
It took 0.189 ms

$ ruby file_read_parse_2.rb FileReadParse.java
Starting to read file ...
The number of tokens is: 159.
It took 0.215 ms

See? :slight_smile:

Good luck with Ruby, and don't be afraid to ask more questions!
Mike

Thank you.

Ben
--
Posted via http://www.ruby-forum.com/\.

Robert Klemme wrote:

http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/

1.9* is significantly better. I did not try JRuby yet.

robert@fussel /cygdrive/c/Temp/frp
$ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/javac FileReadParse.java

robert@fussel /cygdrive/c/Temp/frp
$ java -cp . FileReadParse
Starting to read file...
The number of tokens is: 1122
It took 16 ms

robert@fussel /cygdrive/c/Temp/frp
$ allruby file_read_parse.rb
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
Starting to read file ...
The number of tokens is: 1122
It took 3.0 ms
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
Starting to read file ...
The number of tokens is: 1122
It took 2.0 ms

robert@fussel /cygdrive/c/Temp/frp
$ wc file_test.txt
190 1114 7579 file_test.txt

robert@fussel /cygdrive/c/Temp/frp
$

====================================================================

robert@fussel /cygdrive/c/Temp/frp
$ !w
wc file_test.txt x
  95000 557000 3789500 file_test.txt
  68970 404382 2751177 x
163970 961382 6540677 insgesamt

robert@fussel /cygdrive/c/Temp/frp
$ java -cp . FileReadParse
Starting to read file...
The number of tokens is: 561000
It took 359 ms

robert@fussel /cygdrive/c/Temp/frp
$ !a
allruby file_read_parse.rb
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
Starting to read file ...
The number of tokens is: 561000
It took 1395.0 ms
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
Starting to read file ...
The number of tokens is: 561000
It took 872.0 ms

robert@fussel /cygdrive/c/Temp/frp

robert@fussel /cygdrive/c/Temp/frp
$ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/java -server -cp . FileReadParse
Starting to read file...
The number of tokens is: 561000
It took 515 ms

robert@fussel /cygdrive/c/Temp/frp
$

Cheers

    robert

$ java FileReadParse
Starting to read file...
The number of tokens is: 284717
It took 333 ms
rthompso@raker>~

$ ruby wcinline.rb uscities.txt
Starting to read file ...
   284717
It took 211.72 ms
rthompso@raker>~

$ time wc uscities.txt
  141989 284717 7449038 uscities.txt

real 0m0.333s
user 0m0.307s
sys 0m0.006s

$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
Java HotSpot(TM) Server VM (build 14.1-b02, mixed mode)

$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-linux]

Not sure how Gentoo handles the java, but all other exes on the box are compiled
CFLAGS="-march=prescott -O2 -g -pipe" with splitdebug enabled
dual core
Linux raker 2.6.30-gentoo-r4 #2 SMP PREEMPT Wed Aug 5 11:51:00 EDT 2009 i686 Intel(R) Core(TM)2 CPU 6320 @ 1.86GHz GenuineIntel GNU/Linux

wcinline.rb quickly hacked from http://en.literateprograms.org/Special:Downloadcode/Word_count_(C)
and ffi-inliner/examples/ex_1.rb at 90481c869d9fc0778e12218d3ffa83e3f823acaf · remogatto/ffi-inliner · GitHub

$ cat wcinline.rb
require 'ffi-inliner'

module MyLib
     extend Inliner
     inline '#include <stdio.h>
     #include<ctype.h>

     int n;

     void wc(const char *fname)
     {
         int ch;
         int chars=0;
         int words=0;
         int lines=0;
         int sp=1;
         FILE *fp;

         if(fname[0]!=055) fp=fopen(fname, "r");
         else fp=stdin;
             if(!fp) return -1;

                 while((ch=getc(fp))!=EOF) {
                     if(isspace(ch)) sp=1;
                     else if(sp) {
                         ++words;
                         sp=0;
                     }
                 }

                 if(fname[0]!=055) fclose(fp);

        printf("% 8d\n", words);
     }'
end

class Foo
     include MyLib
end

# get the start time
start = Time.now

puts "Starting to read file ..."

Foo.new.wc(ARGV[0])

puts "It took " + ((Time.now-start)*1000).to_s + " ms"

···

On 19.08.2009 15:31, Ben Christensen wrote:

And, of course JRuby adds other possibilities:

$ java FileReadParse
Starting to read file...
The number of tokens is: 234937
It took 2098 ms

$ java FileReadParse
Starting to read file...
The number of tokens is: 234937
It took 788 ms

$ ruby -v file_read_parse.rb
ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0]
Starting to read file ...
The number of tokens is: 234937
It took 2666.646 ms

$ jruby -v file_read_parse.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file ...
The number of tokens is: 234937
It took 3120.0 ms

$ jruby --fast --server -v file_read_parse.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file ...
The number of tokens is: 234937
It took 2809.0 ms

$ jruby -v file_read_parse-2.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file...
The number of tokens is: 234937
It took 593 ms

$ java FileReadParse
Starting to read file...
The number of tokens is: 234937
It took 588 ms

$ jruby -v file_read_parse-2.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file...
The number of tokens is: 234937
It took 595 ms

$ cat file_read_parse-2.rb
require 'java'
java_import 'FileReadParse'

FileReadParse.new.do_stuff

:slight_smile:

···

On Wed, Aug 19, 2009 at 5:05 PM, Charles Oliver Nutter<headius@headius.com> wrote:

On Wed, Aug 19, 2009 at 8:31 AM, Ben > Christensen<benjchristensen@gmail.com> wrote:

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

1.8.6 is pretty slow, compared to other impls. Ruby 1.9 and JRuby will
perform better, as shown by a few folks. JRuby on a Java 6 JVM with
--fast and --server should perform very well.

Edging away from the data driven performance analysis, I'd like to
explore some of my more subjective and opinionated thoughts on the
matter since several of your answers have touched on them. I'd very much
appreciate your rebuttals to my following comments (if you feel like
spending the time reading and responding).

I am obviously biased by my long use of the Java platform, and very
likely by my many years of focusing on processing large amounts of data,
writing search engines and other such applications very sensitive to
performance - and thus I have profiled and optimized virtually every
aspect of the stack and code - and to great gains for the user
experience. On the other hand, I've also connected to mainframes where
no amount of code optimization on my end could make any difference in
how things performed as I depended on the result from an external
resource, and I hid that from the user as much as possible with
asynchronous user interface magic as opposed to worrying about code
optimization, language choice or even hardware.

## Fast Enough ##

I often hear that Ruby is "fast enough" or that the performance
difference is not important since IO is generally the real source of
performance issues.

I understand that in certain cases - though when I see potential
performance degradation as multiples (2x, 3x, etc) as opposed to small
percentages, it makes me question the decision to use Ruby much more
than if we were talking percentages of 10-20% - such as 115ms vs 100ms
on a page response.

For example, a java environment my team has built provides SOAP/REST
webservices for product catalog search - and responsiveness is a very
significant measurement criteria of the success of the product and
system. Therefore, our average server side response time is something we
watch very closely.

It's difficult for me to accept the use of a language or platform which
means I'm taking a significant hit in performance - similar to the first
5-8 years of Java.

Even Java is still improving ... Java 5 to Java 6 was a very noticeable
improvement in performance at the JVM level (I've had 30-80% performance
increases from 5 to 6). Same hardware, same code - noticeable
improvement in performance and thus responsiveness on webservices,
webapps and shorter data processing times.

In the late 90s it was worth it to me to use Java and take the
performance hit - as the benefits were so strong over C for so many
things.

However, I'm still struggling to see the strong reasons to adopt Ruby
when I'm penalized performance-wise and the answer becomes "it's good
enough" or "network IO is usually the slowest aspect, so it's negligible
what Ruby adds" or "CPU is cheap".

Yes, "CPU is cheap", but that applies to Java as well. 6 months ago we
upgraded our hardware (systems were only 18 months old when we upgraded)
and shaved off another 30% just by taking advantage of new CPU
architectures and bus speeds that had changed in the previous year.

Perhaps in a straight-forward CRUD app where all that's being done is
retrieving/storing data in a database where IO truly is the bottleneck
that no amount of optimization can improve - then it doesn't matter and
"good enough" rings true where IO takes 100ms and the Ruby/Java portion
is only 10-20ms on top of the IO.

## C Extensions to Ruby ##

This feels akin to saying in the late 90s that to make Java perform
well, use JNI to use C. To me it defeats the whole purpose of the "Ruby
is simple and pleasant" paradigm. If I have to optimize it with C, then
it's no longer simple or a joy to use.

## Maintainability, Speed of Development and 'Enjoyment' ##

I hear "speed of development", "maintainability", and "enjoyment of
coding" as the reasons to move to Ruby - and to accept the negatives in
performance, tools, libraries etc.

I'm still not sold on these reasons - as I truly enjoy working in the
Java ecosystem of tools, projects, libraries etc - despite what may or
may not be "crufty" or verbose in some aspects of the language itself.

Nor am I convinced yet that managing a codebase over 5-10 years and 40+
developers is any easier with Ruby - Java's static typing and now its
generics (the polar oppostite approach of 'duck typing' in Ruby) are
actually very nice for readability, navigation of code, refactoring and
other such things on such large codebases when so much of it is from
other coders, teams, or just plain old and forgotten about.

Putting aside these more subjective decision points that I do not yet
have the experience to weigh in on with Ruby - if the performance impact
affects the end user, then that is in my opinion an objective point of
concern. Google and Amazon have both publicized how slow downs of
100-300ms on a user interface affects user experience and how much those
users utilize their sites. I certainly recognize that fact while
operating a hosted search engine. Speed of search directly impacts how
much someone will choose to use it. Slower performance equals increased
friction to use.

Also, no amount of "throwing hardware at it" will make Ruby faster than
throwing the same hardware at Java - which is ultimately I think the
biggest issue I have with the concept. If I throw the fastest hardware I
can at something, I want my user to get the biggest bang for the buck -
not just make up for me using a slower language.

As for "brevity" equalling "maintainability" and "less bugs" - I tend to
disagree when the pursuit results in code such as this example given:

    puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
    f.inject(0){|a,l| a+l.split.length } } ,
      "It took #{(Time.now - start) * 1000} ms"

I find it intellectually stimulating and admirable at the power of what
is accomplished in such a short statement.

Understanding it however takes time and thought - and a certain level of
skill.

Perhaps your experiences are different - but most development teams have
a lot of more junior and intermediate developers than senior - and the
more verbose, easy to read, simple to understand code is far more
preferable - even for myself when I must review, debug and profile the
code of dozens of others - as opposed to something that looks like an
academic puzzle to unravel.

## Concluding Thoughts ##

I recognize that many of my concerns are similar to the C versus Java
argument of 10 years past.

Moving to Java from C had a variety of very significant benefits
however, amongst them being: garbage collection and vastly simplified
memory management, no need to worry about pointers, "write once, run
[mostly] everywhere" (as long as you were talking server-side and not
desktop where Java is miserable) and APIs designed better to address the
networked world of the then new (to the common public anyways) internet.

In moving from Java to Ruby I don't see the benefits as strongly
motivating - and therefore am much less willing to accept performance
penalties.

In short, after waiting through 10+ years of maturation to get the Java
performance now enjoyed, it feels somewhat odd to step back
significantly in performance, tool maturity and available libraries.

What types of applications and codebases do you feel truly are served
best by Ruby - and therefore not in need of the highest performance
given to the end-user?

···

--
Posted via http://www.ruby-forum.com/.

OK, speed and excellent concurrency handling (non-JRuby Ruby strikes
out across the board on the second aspect unfortunately) really is
your priority then. May I then ask what made you consider Ruby in the
first place? Something along the lines of Scala (runs on the JVM,
*relatively* mature, *fantastic* concurrency support, works well with
functional and imperative/object oriented styles) or Erlang (SMP
almost for free, but *almost* strictly immutable and functional coding
(which is much more of a learning curve than a negative)) would make a
ton more sense here.

···

On Aug 20, 1:16 am, Ben Christensen <benjchristen...@gmail.com> wrote:

@Matthew K. Williams

-- 1. How *often* are you going to be processing these files? If they
are
-- batch style jobs, then does absolute speed matter over
maintainability?

The particular application I'm looking at in the future has a virtually
continuous feed of incoming data from multiple concurrent sources.

Thus I'm looking at what language the processing code would be in. My
default go to is Java - but I want to consider Ruby and not blindly just
use what I'm accustomed to before establishing what will likely be in
existence for the next 3-5 years.

One of the things this 'benchmark' skips over is the time it takes to
initialize the two environments.

There's no measurement of the time between hitting enter on the
command line and the point where the

start = Time.now

line (or it's equivalent in the Java program) gets executed.

It might be interesting to execute those benchmarks using something
like the linux time command to measure the differences in "womb to
tomb" execution times.

I suspect, but I may be totally wrong, that Java takes a while to
'warm up' a more complex runtime environment, and that Ruby gets going
faster. The JVM has evolved in an environment where it has tended to
be used for long-running processes.

Depending on the task this may be important.

This can also affect deployment architecture. JRuby tends to
encourage multi-threading in a single process to amortize the start-up
time. Most of us who are using MRI for, say rails, are pretty happy
with having multiple server processes which can be brought up as
needed (and terminated when not) under something like Passenger
(a.k.a. mod ruby) particularly using the "Enterprise" version of Ruby
which allows for sharing vm code between the processes.

···

On Wed, Aug 19, 2009 at 10:41 AM, <brabuhr@gmail.com> wrote:

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen > <benjchristensen@gmail.com> wrote:

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

Is this test case in any way representative of the tasks you will
actually be performing?

Test file 1:

java FileReadParse

Starting to read file...
The number of tokens is: 1954
It took 16 ms

ruby -v file_read_parse.rb

ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-linux]
Starting to read file ...
The number of tokens is: 1954
It took 4.951 ms

Test file 2:

java FileReadParse

Starting to read file...
The number of tokens is: 479623
It took 337 ms

ruby file_read_parse.rb

Starting to read file ...
The number of tokens is: 479623
It took 2526.455 ms

ruby file_read_parse-2.rb

Starting to read file ...
It took 588.065 ms
The number of tokens is: 479623

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

I'm evaluating Ruby for use in a variety of systems that are planned by
default to be Java.

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

Is this test case in any way representative of the tasks you will
actually be performing?

If it is, then you should just do
$ time wc approach.txt
6836 78325 484114 approach.txt

:slight_smile:

I got a little crazy; first the numbers (slower hardware this time):

uname -a

Linux eXist 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 00:28:35 UTC
2009 i686 GNU/Linux

java -version

java version "1.6.0_0"
OpenJDK Runtime Environment (IcedTea6 1.4.1) (6b14-1.4.1-0ubuntu11)
OpenJDK Client VM (build 14.0-b08, mixed mode, sharing)

java FileReadParse

Starting to read file...
The number of tokens is: 479623
It took 596 ms

/opt/matzruby/trunk/bin/ruby -v -rubygems file_read_parse.rb

ruby 1.9.2dev (2009-08-14 trunk 24539) [i686-linux]
Starting to read file ...
The number of tokens is: 479623
It took 1751.92544 ms

/opt/matzruby/trunk/bin/ruby -v -rubygems file_read_parse-3.rb

ruby 1.9.2dev (2009-08-14 trunk 24539) [i686-linux]
ffi_c.so: warning: method redefined; discarding old inspect
struct.rb:26: warning: method redefined; discarding old offset
variadic.rb:15: warning: method redefined; discarding old call
library.rb:78: warning: method redefined; discarding old fopen
library.rb:78: warning: method redefined; discarding old fgetc
Starting to read file ...
It took 4565.077896 ms
The number of tokens is: 479623

jruby -v -rubygems file_read_parse.rb

jruby 1.3.0 (ruby 1.8.6p287) (2009-06-03 5dc2e22) (OpenJDK Client VM
1.6.0_0) [i386-java]
Starting to read file ...
The number of tokens is: 479623
It took 2316.0 ms

jruby -v -rubygems file_read_parse-3.rb

jruby 1.3.0 (ruby 1.8.6p287) (2009-06-03 5dc2e22) (OpenJDK Client VM
1.6.0_0) [i386-java]
Starting to read file ...
It took 3117.0 ms
The number of tokens is: 479623

And the code:

cat file_read_parse-3.rb

require 'ffi'

module LibC
  extend FFI::Library

  # FILE *fopen(const char *path, const char *mode);
  attach_function :fopen, [ :string, :string ], :pointer

  # int fgetc(FILE *stream);
  attach_function :fgetc, [ :pointer ], :int
end

puts "Starting to read file ..."
start = Time.now

file = LibC.fopen("/tmp/file_test.txt", "r")
count = 0; in_word = false
while (c = LibC.fgetc(file)) != -1
  if 32 < c and c < 127
    unless in_word
      count += 1
      in_word = true
    end
  else
    in_word = false
  end
end

stop = Time.now
puts "It took #{(stop - start) * 1000} ms"
puts "The number of tokens is: #{count}"

···

On Wed, Aug 19, 2009 at 11:07 AM, Reid Thompson<reid.thompson@ateb.com> wrote:

brabuhr@gmail.com wrote:

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen >> <benjchristensen@gmail.com> wrote:

Ha! I wrote it with inject initially, but then thought, "Nah... I don't want
to blow *too* many minds." :slight_smile:

···

On Wed, Aug 19, 2009 at 1:26 PM, Joel VanderWerf <vjoel@path.berkeley.edu>wrote:

Mike Sassak wrote:

Argh! That gist should be http://gist.github.com/170476\. Sigh...

And you can even, with another ounce of ruby-love, rewrite that as:

num = 0
ARGF.each do |l|
num += l.split.length
end

Then it also works with stdin or multiple filenames on the cmdline.

I'll leave it to others to #inject... :wink:

@Mike

Thank you for providing the Gist link to a file.
(http://gist.github.com/170476)

However, the changes don't improve the performance when I take into
account what was removed and I had in there on purpose. Take note of
item #2 below.

1) Object structure

The modified code removed all of the class/object structure, which I
purposefully had in there to simulate this being an object within a
larger project.

That being said, converting the lines of code we're discussing for
performance into a script means nothing to this discussion - but I
purposefully am writing the code in an OO style with classes as opposed
to scripts.

I was also purposefully making the Java and Ruby versions as similar to
each other so as to allow a performance comparison to be done with as
little difference as possible in approaching the code.

2) Counting versus Using the Tokens

In the modified code, it is now just counting the tokens:

    num += l.split.length

Obviously that is faster than what I had in the original code. Again
however, I'm doing this on purpose.

Counting the number of tokens in an of itself is not all that I was
doing in the original code or in the Java version. To simulate more
closely what actually occurs in a functional system I am:

- assigning the array of tokens to a variable
- iterating the tokens to do something with each of them

In this case I'm just assigning each token to another variable and then
performing the count.

In a real world use I'd perform some function on the text, put it
somewhere, whatever.

This change accounts for the difference in time from "7965.289 ms" to
"4821.399 ms" when I run the original code and the modified code.

So yes, the modified code is "faster", but it's not doing the same thing
as the original and therefore not a valid comparison.

What I gather therefore from looking at your changes, is that there
really isn't anything different for me to do in the code - that I am in
fact using the proper API calls and techniques and there is nothing
special.

For example, in Java there are 2 ways of doing this:

a) String.split - which uses REGEX and is much slower as it's intended
for pattern matching, not simple tokenization
b) StringTokenizer - intended for tokenization on a delimiter instead of
REGEX and much faster

Therefore, I'm using option (b) in Java. I was curious if I was
mistakenly using a slower technique of Ruby when in fact there was a
faster alternative.

···

--
Posted via http://www.ruby-forum.com/.

   puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
   f.inject(0){|a,l| a+l.split.length } } ,
      "It took #{(Time.now - start) * 1000} ms"
I find it intellectually stimulating and admirable at the power of what
is accomplished in such a short statement.

Understanding it however takes time and thought - and a certain level of
skill.

Perhaps your experiences are different - but most development teams have
a lot of more junior and intermediate developers than senior - and the
more verbose, easy to read, simple to understand code is far more
preferable - even for myself when I must review, debug and profile the
code of dozens of others - as opposed to something that looks like an
academic puzzle to unravel.

Hello, Ben,
I apologize if my solution turns you off to Ruby (because of the joy, and
excitement you'll miss :wink: Ruby is supportive of many paradigms, and so I
adapt my code to my preference. I also write my Java like this, as much as
the language allows. Certainly you can write code which is much more
straightforward than mine. Due to the way I read my code later, I like to
get as much accomplished in as little room as possible (I'm perfectly happy
to let it run off the end of the screen), and then supply a comment telling
me what it does, and if it is esoteric, explaining how it does this. This
allows me very quickly quantify chunks of code, and narrow my attention to
only the relevant portions. If those portions are complex, my comment helps
me quickly figure it out. If your junior developers read code differently,
then perhaps a less terse style would be more fitting. Ruby also adapts
itself very well to legibility, if you choose to write it that way (in
Rails, people often remark that the code documents itself).

Choosing the right tool for the job is a relevant cliche here, and it sounds
like your job, being so performance oriented, requires a tool well suited to
meet these performance needs. If that need is so pressing that you've had to
replace Java with C, then Ruby is probably not the tool you need. You can do
things in Ruby that will make your head spin (these simple examples do not
even hint at the power Ruby grants you), but that doesn't make it the right
tool for this job. If Java is better suited to your needs for this project,
then that would certainly be the responsible choice.

However, I'd encourage you to keep evaluating the language, even if it is
not the best choice for this particular project, because it can take a
little time to figure out 'the Ruby way'. And as Pharrington pointed out
"Different languages express solutions to different problems in
different ways". I think that once you play with it to the point of comfort,
then you should see how Ruby addresses various patterns, (consider
http://www.amazon.com/Design-Patterns-Ruby-Addison-Wesley-Professional/dp/0321490452\),
you may begin to feel that same love some of us here do. And then, I suspect
that instead of seeing how well Ruby can pretend to be Java, you'll find
yourself wondering if Java can't be a bit more like Ruby. Perhaps at that
time, a solution like JRuby would look very desirable. Also, great strides
are being made in regards to speed, which would significantly alleviate the
most relevant objection.

Anyway, regardless of your choice, thank you for taking the time to educate
yourself about Ruby.

First, I should say that I'm going to present arguments for Ruby here, whether
or not I think it's the best choice. Without knowing what you need, I really
can't say.

Yes, "CPU is cheap", but that applies to Java as well.

But if you are in a position to be able to throw more hardware at the problem,
it does really become a question of CPU time vs programmer time. That is, if
Ruby really does cost 3x the CPU of Java, you can calculate in real dollars
how much it will cost to use.

Speaking for myself, I certainly feel Ruby makes me more than three times as
productive as Java, and programmer time is much more expensive than CPU time.

Obviously, there are cases where spending programmer time makes sense. For
example, performance-critical desktop apps (CAD, games, etc) cannot use the
"throw more CPU at it" argument, because you're now forcing your clients to
upgrade their hardware to use your product. A large enough organization might
want to optimize as much as they can -- if rewriting it in C saves a thousand
machines and takes a developer a year, it's probably worth it, unless you can
get a thousand machines cheaper than a developer.

On the other hand, there's a case to be made that you should "write one to
throw away" -- if you can do it in Ruby in a few days, and rewrite it in Java
in a few weeks, the rewrite can take lessons learned from that ruby sketch.

It would also give you time to evaluate the "fast enough" argument. If it
turns out that you have plenty of extra capacity, and the Ruby version runs
fast enough, it may not be worth rewriting. If it turns out that Ruby is too
slow (even after trying Ruby 1.9 and JRuby), you've only lost a few days.

Perhaps in a straight-forward CRUD app where all that's being done is
retrieving/storing data in a database where IO truly is the bottleneck
that no amount of optimization can improve - then it doesn't matter and
"good enough" rings true where IO takes 100ms and the Ruby/Java portion
is only 10-20ms on top of the IO.

That depends...

Response time is only part of the story. What you really want to benchmark is
requests per second, and that's not always as simple as multiplying response
time.

## C Extensions to Ruby ##

This feels akin to saying in the late 90s that to make Java perform
well, use JNI to use C. To me it defeats the whole purpose of the "Ruby
is simple and pleasant" paradigm. If I have to optimize it with C, then
it's no longer simple or a joy to use.

It's a bit like C -- it's going to be fast enough most of the time, but
there's always the possibility you'll find some small part that can be
rewritten in assembly to squeeze some extra performance out of it.

Most of what I said could be summarized as: The speed of a nonworking program
is irrelevant, and premature optimization is the root of all evil. (I don't
remember who I'm quoting, but it was someone important.)

My preference would be, if I can write 97% of the program in Ruby, and 3% in
C, is that really going to be less pleasant than writing 100% of the program
in Java?

Nor am I convinced yet that managing a codebase over 5-10 years and 40+
developers is any easier with Ruby - Java's static typing and now its
generics (the polar oppostite approach of 'duck typing' in Ruby) are
actually very nice for readability, navigation of code, refactoring and
other such things on such large codebases when so much of it is from
other coders, teams, or just plain old and forgotten about.

I suspect most of that is due to the tools, more than the type system itself.

The most compelling argument I've heard is: Type checks are just a special
case of unit tests. You need unit tests anyway, and good unit tests will
already more than cover what type checks were meant to save you from.

I suppose I'm curious -- when was the last time the type system saved you?
That is, when was the last time you tried to pass an object of the wrong type
to a method, and gotten a type error of some sort, and realized you needed to
do something other than a simple typecast on that object?

Also, no amount of "throwing hardware at it" will make Ruby faster than
throwing the same hardware at Java - which is ultimately I think the
biggest issue I have with the concept.

Indeed -- but again, you're paying for it with increased developer time.

And throwing the same hardware at Java that you would need for Ruby just gives
you a bunch of unused capacity -- you'd be buying less hardware. So it is
pretty much a straight trade.

If I throw the fastest hardware I
can at something, I want my user to get the biggest bang for the buck -

Which is pretty much going to give you benchmark candy. If your site is
slowing down, that's a bug. Once the speed of the site is acceptable, and
you're set up to handle spikes appropriately, more performance doesn't really
buy you anything other than "because you can".

As for "brevity" equalling "maintainability" and "less bugs" - I tend to
disagree when the pursuit results in code such as this example given:

    puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
    f.inject(0){|a,l| a+l.split.length } } ,
      "It took #{(Time.now - start) * 1000} ms"

Probably true for that -- though, to be fair, I wouldn't have put it that way.

However, there was a study done, at one point, which showed that the ratio of
bugs to lines of code was constant across languages. So while I wouldn't say
it makes sense to make things unreadably brief, Ruby is usually both more
readable and shorter. 100 lines of code is generally easier to read and debug
than a thousand.

Perhaps your experiences are different - but most development teams have
a lot of more junior and intermediate developers than senior - and the
more verbose, easy to read, simple to understand code is far more
preferable - even for myself when I must review, debug and profile the
code of dozens of others - as opposed to something that looks like an
academic puzzle to unravel.

I think that particular code sample was misleading -- certainly, you can play
Perl Golf in any language. But you have coding conventions in Java, and you
would in Ruby.

What types of applications and codebases do you feel truly are served
best by Ruby - and therefore not in need of the highest performance
given to the end-user?

I feel the kind that is served best is any sort of web app, or small scripts
for system administration -- particularly anything that needs to be flexible
and constantly maintained and improved, for which the developer controls the
hardware.

Were there better deployment tools (and Shoes seems to be an effort in that
direction), I'd also say any sort of desktop app that doesn't need the highest
performance possible. Frankly, that's most of them -- an instant messaging
client, for instance, doesn't need to be blazingly fast, just fast enough.

But, it's not always about whether the end-user needs the highest performance.
It's about whether it's possible to throw CPUs at the problem, or whether the
CPU is the bottleneck at all.

···

On Thursday 20 August 2009 12:25:58 am Ben Christensen wrote:

Ben Christensen wrote:

## Fast Enough ##

I often hear that Ruby is "fast enough" or that the performance
difference is not important since IO is generally the real source of
performance issues.

I understand that in certain cases - though when I see potential
performance degradation as multiples (2x, 3x, etc) as opposed to small
percentages, it makes me question the decision to use Ruby much more
than if we were talking percentages of 10-20% - such as 115ms vs 100ms
on a page response.

For example, a java environment my team has built provides SOAP/REST
webservices for product catalog search - and responsiveness is a very
significant measurement criteria of the success of the product and
system. Therefore, our average server side response time is something we
watch very closely.

And in that case, your response time may be dominated by the time to
process the SOAP request and format the SOAP response, so evaluating the
performance of those libraries is important.

I still don't buy the "fastest to execute must be best" reasoning. In
reality, there will be a threshold of acceptability - e.g. 90% of
requests must be returned within 150ms - in which case you're free to
choose a platform which meets that goal, and/or apply money to the
hardware platform as required.

However, if your competitor using Ruby gets a product to market in one
third of the time, then it doesn't matter if your Java solution performs
50% better - you won't have any customers.

## Maintainability, Speed of Development and 'Enjoyment' ##

I hear "speed of development", "maintainability", and "enjoyment of
coding" as the reasons to move to Ruby - and to accept the negatives in
performance, tools, libraries etc.

I'm still not sold on these reasons - as I truly enjoy working in the
Java ecosystem of tools, projects, libraries etc

Then you have no need to ask anything more here. You're sold on Java,
you're productive with Java, you enjoy Java, so use Java.

But computing is not static. There are probably still people who use
Algol and Fortran daily, and they are Turing-complete of course, but
newer languages make programming easier. You found the same when you
moved from C to Java, so you can see the benefits of keeping abreast of
new developments. It's always good to stretch yourself out of your
comfort zone to experience how people are using different languages
effectively. Perhaps you should try something more radically different
for a C/Java programmer, like Erlang.

Nor am I convinced yet that managing a codebase over 5-10 years and 40+
developers is any easier with Ruby - Java's static typing and now its
generics (the polar oppostite approach of 'duck typing' in Ruby) are
actually very nice for readability, navigation of code, refactoring and
other such things on such large codebases

Again, if you find this a benefit, then go with Java. Most people here
find the opposite, but then, you're on a ruby users' mailing list so
what do you expect :slight_smile:

That is: most of us hugely value Ruby's speed of development (like
writing the same thing in 1/10th of the number of lines of code), and if
there's a reduced run-time penalty, that's a minor issue.

You won't really get a taste for what we mean until you start writing
Ruby (real Ruby, not Java ported line-by-line to Ruby). Perhaps you
could start with using jruby to wire up your Java objects. You'll get
the raw performance of the underlying Java objects, but using jruby as
the integration glue.

As for "brevity" equalling "maintainability" and "less bugs" - I tend to
disagree when the pursuit results in code such as this example given:

    puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
    f.inject(0){|a,l| a+l.split.length } } ,
      "It took #{(Time.now - start) * 1000} ms"

I find it intellectually stimulating and admirable at the power of what
is accomplished in such a short statement.

Understanding it however takes time and thought - and a certain level of
skill.

I agree with you, this is an unnecessary use of inject, and people who
push this sort of code at newcomers are not doing any favours. I would
write this simply as:

  tokens = 0
  File.foreach(ARGV[0]) do |line|
    tokens += line.split.length
  end
  puts "The number of tokens is: #{tokens}"

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.