Confirm my Performance Test Against Java?

Thank you very much for the excellent answers, and your well reasoned
responses to what could have easily been dismissed as someone "not
getting it" or attempting to start a flame war.

I have quoted various aspects of the responses and added my response or
further comments.

-- May I then ask what made you consider Ruby in the first place?

The reason I'm considering it is because I don't want to blindly choose
Java just because it's the default.

As for why Ruby and not Erlang, Scala, Groovy etc -- the honest answer
is because Ruby is getting so much attention these days, to the point of
religious fervor amongst many I speak to that I need to take an
objective look at it and what it does well.

I greatly dislike religious wars amongst technology - for example Linux,
Mac and Windows - and thus want to understand the objective
benefits/drawbacks as opposed to personal taste.

-- "seeing how well Ruby can pretend to be Java"

My intention is not to see how Ruby can pretend to be Java.

I'm using Java as the point of comparison for a few reasons:

- it's what I have the most expertise in
- it's generally the "default" choice in the types of projects and
development teams I lead
- the language has a very wide range of understanding in the development
world and is therefore a good point of reference to discuss from
- I need a valid point of reference for objective performance
comparisons

That being said, I am trying to figure out what the "Ruby way" is -
which so far is far from clear to me.

I appreciate the reference to the Design Patterns in Ruby book. That is
very much the type of recommendation that will probably help me out, so
thank you.

-- You won't really get a taste for what we mean until you start writing
-- Ruby (real Ruby, not Java ported line-by-line to Ruby).

What example opensource projects can you refer me to which espouse the
"real Ruby" style of doing things?

I'd prefer non-Rails projects, as I understand the completely different
approach of webapp dev with Rails.

I'm looking specifically at Ruby.

I keep getting told that I must understand the "Ruby way" - so I'd
appreciate instruction on how to accomplish the "Ruby way" considering I
am apparently boxed in as a "Java/C style programmer" ... despite
disliking C :slight_smile:

-- But if you are in a position to be able to throw more hardware at the
problem,
-- it does really become a question of CPU time vs programmer time. That
-- is, if Ruby really does cost 3x the CPU of Java, you can calculate in
real
-- dollars how much it will cost to use.

If scalability was the only issue, this would be a valid response.

For example, if both Java and Ruby both performed single threaded
transactions at 150ms each, and both scaled to 10 concurrent threads
equally well, but Java continues to scale to 30 concurrent threads and
Ruby does not, then that's a scenario where I can add 3 machines to
scale Ruby horizontally and truly argue that the cost of the hardware is
more than made up for by lower developer costs.

But, "per request" performance does not get improved by this type of
solution.

Adding faster hardware does not make Ruby catch up to Java - since Java
also improves with faster hardware.

This is why the "add hardware" answer doesn't win me over on the
performance issue, because performance and scalability are two
completely different problems. I haven't even begun to test scalability
with Ruby yet.

-- Response time is only part of the story. What you really want to
benchmark is
-- requests per second, and that's not always as simple as multiplying
response time.

That's correct ... but supports my point. Requests per second is the
throughput, or scalability - not performance.

That is something I can throw hardware at - performance is not.

-- Which is pretty much going to give you benchmark candy. If your site
is
-- slowing down, that's a bug. Once the speed of the site is acceptable,
-- and you're set up to handle spikes appropriately, more performance
doesn't
-- really buy you anything other than "because you can".

I disagree. If I can cut 30% of the transaction time off of a search
engine request - that is valuable.

It provides a better use experience to the user and (according to Google
and Amazon) increases their usage of the system.

Performance of response (not talking about scalability here but actual
performance) is more than just "bragging rights" or "benchmark candy".
The speed at which an application responds to an end users request
impacts the overall usability of an application.

It is for this same reason that things such as network compression,
network optimization (CDNs, Akamai route acceleration etc) and client
side caching also all play a role.

In the presentation layer however, I tend to think the performance
degradation of using Ruby is far less of an issue than backend services,
since IO does play such a huge role - which is more or less what
Thoughtworks has come to conclude from their use of it based on their
reported experiences.

-- premature optimization is the root of all evil

I 100% agree. Martin Fowler comes to mind or someone similar.

-- My preference would be, if I can write 97% of the program in Ruby,
and
-- 3% in C, is that really going to be less pleasant than writing 100%
of the
-- program in Java?

An interesting observation and one I must consider.

-- when was the last time the type system saved you?

This is a valid and interesting question.

I would suggest that it's not that it is "saving" anything - cause there
is nothing to save once the application is running, because the code
can't be compiled if things aren't type-safe.

It's the toolset as you stated that you suspect.

The readability of code to know exactly what types a given argument,
variable or array contain.

The IDE telling me as I type when errors are occuring, what objects
relate to what, navigating through code, etc.

I've attempted RubyMine, Aptana and Netbeans. They are attempting this
dynamic interpretation but are far from accomplishing it.

For example, code completion in these tools to suggest the available API
methods is almost useless, as they offer virtually every method
available under the sun, as they are not interpreting what actual type
the variable is. Therefore they'll show me 15 different versions of a
method with the same name, all for different object types from the Ruby
API.

Similarly, looking at an array or collection in Ruby does not tell me
what it is, especially if things are being passed around through
methods, across class boundaries etc. Instead of the method signature
telling me "Collection<ZebraAnimal>" I just see a variable.

Thus, I must now depend on a team of developers properly documenting
everything, using very descriptive naming conventions (and properly
refactoring all of that when changes occur), and wrapping everything in
unit tests.

Now, all of those are "ideal" cases - ones I believe in and stress
continually. I have hundreds and hundreds of unit tests and automated
build servers etc - but in the "real world", getting teams to comment
every method, properly name (and refactor) variable names and cover
everything in unit tests just doesn't happen - unless it's a small team
of very competent people who all believe in the same paradigm and treat
their code as art. I wish that's how all dev teams were, but it's not a
reality.

Perhaps if it's a personal project where I know the code and can ensure
all is covered it's a different story.

-- 100 lines of code is generally easier to read and
-- debug than a thousand.

I'll give you that - but I have yet to see anything that proves to me
that a competent developer using both Ruby and Java (or C# for that
matter) would have 10x as much written code than they would in Ruby.

The "cruft" so often referred to are things that I don't even consider
or think of. Boilerplate code ... clutter and sometimes annoying ...
fades into the background and tools remove the pain of it. And with the
advent of annotations, many of these arguments disappear when Java code
is written correctly with modern patterns and frameworks.

-- I think that particular code sample was misleading -- certainly, you
can
-- play Perl Golf in any language. But you have coding conventions in
Java, and
-- you would in Ruby.

You surely can, and I'm trying to understand what the coding conventions
are in Ruby. The book link offered is something I'm going to go look at.
Amazon referred another book called "The Ruby Way" which may also
provide me good insights. Any experience with that one?

-- ... best served by Ruby ...
-- small scripts for system administration

I completely agree here.

-- any sort of web app ... that needs to be flexible and constantly
maintained and improved,
-- for which the developer controls the hardware.

I'm leaning more and more towards this. In fact, I'm trying to figure
out how to rip Java out of my webapps completely and leave that to the
backend webservices and let the presentation layer be as free from
"code" as possible. Java developers typically aren't exactly the best at
client facing solutions (don't attack me on this if you disagree ...
this is ofcourse not a definitive rule, it's just that I find it more
challenging to hire good web developers who are 'Java' skilled as
opposed to PHP, Ruby, Javascript, CSS, etc).

For example, if I can accomplish a dynamic front-end purely driven by
client side Javascript using AJAX techniques with a REST style
webservices backend, I will try to pursue that.

The middle ground seems to be pursuing Ruby or something else that is
still server-side, but better suited to the always changing pace of
webapp dev and more creative, script driven coding style better suited
to web developers and designers.

-- desktop app that doesn't need the highest performance possible

What you say makes sense here, but I am so far removed from desktop apps
that I'm useless in weighing in on this.

-- It's about whether it's possible to throw CPUs at the problem, or
-- whether the CPU is the bottleneck at all.

Understood and I agree.

···

--
Posted via http://www.ruby-forum.com/.

I suspect, but I may be totally wrong, that Java takes a while to
'warm up' a more complex runtime environment, and that Ruby gets going
faster. The JVM has evolved in an environment where it has tended to
be used for long-running processes.

Put in a few require statements, load some gems in the ruby source and
then repeat the comparison. The last time I checked, e.g., groovy's
startup time was comparable to ruby's.

Ben,

Thanks for provoking a productive discussion.
I think we all agree that Ruby is slow, very slow, and my impression is that you underestimate the slowdown.

But ...

The "is Ruby fast enough?" discussion suffers from the same flaws as many that precede it ( Fortran vs Assembler, C vs Fortran, C++ vs C, C++ vs Fortran, Java vs C++ ...)

The discussion rests on some faulty assumptions:

That language runtime performance will dictate system performance. In fact, rarely is that true and in performance engineering the truth is much more farcical than anyone might think...

The first time I was paid to write Fortran code I'd been warned that Fortran would be unacceptably slow compared to Assembler. It didn't matter. I simply was not capable of writing sophisticated time series analysis code in Assembler. In fact a large part of that project was built and deployed with GW-Basic on a 6MHz 8086 CPU with 512K of RAM. As a newbie programmer I didn't realize that an interpreted language could not perform, and the app successfully predicted windshifts in real time in about 5% of the time that been budgeted. Perhaps with more work experience I would have known better :wink:

Since then I've done a bunch of performance critical coding and, over the past few years, a bunch of tuning work.

Ruby's 3x performance penalty is enormous.

But it's dwarfed by the performance degradation caused by typical coding and typical physical architectures.

Two real, typical datapoints, from a list of hundreds ...

In Dec 2008 I tuned a production Rails app that had 100,000 users, improving the client side build time, for the test page,
from 2.2 sec to 181 ms, (a factor of 12). In an appendix to that project I identified more than a dozen unimplemented tunings that could further lower that build time to about 7 msec.

In 2003 I worked on a similar Java project, and spent much longer tuning a similar dynamic page (running on much slower hardware). The team implemented more than 500 performance fixes over six months, improving page build times from approx 2.5 sec to 14 ms (a factor of 180x).

Neither app was built by weak programmers - in fact they were two extremely smart development teams.

So when you describe a web service that responds (server side) in 140 ms, and ask why you should consider a toolset that might triple that response time, I ask

"has someone else deployed a similar web service that responds in 2 seconds?"
"has someone else deployed a similar web service that responds in 2 ms?"

"what would it take for it to respond in 20 ms?
"what would it take for it to respond in 5 ms?"
"what would it take for it to respond in 1 ms?"

  I hate slow code and slow websites and I resent the time I waste waiting for both.
  But our industry norm is for system response times to be 100x or more slower than they need to be.

You might think "BS", or "OK, but he's talking about the doofus programmers, he's not talking about us."

I'm talking about you, me, all of us.

I don't know anything about the web service that you describe but I will happily wager $50 that we can take any Java web service that is currently running in production, and replace it with a Ruby equivalent that is twice as fast.

Note that I'm not saying "I can". I'm saying that you, me or any smart programmer here can do this.

Here's the thing - I've worked on at least a dozen platform rewrite projects, going back more than 20 years (
"we need a C version of this hand optimized assembler file IO layer"
"Hey build a Java version of this C++/X app",
"we need a web version of this desktop app",
"we need a script version of this compiled app").

Typically there's an accompanying message that management understand that it might be twice as slow.

On every single occasion, the surprising outcome is that the new version, built with a higher level "slower" toolset, outperformed the "stable, optimized, tuned" version, typically by a factor of 3 or more. So I'd be a fool to continue being surprised by this. I'm not saying that I'm a better programmer than anyone. I am saying that the amount of wasted resources in most deployed systems is much, much higher than people realize, for a whole set of reasons.

Thanks for initiating such an interesting conversation, and for persisting with it.

Peter Booth.

Worth noting again that for any long-running code or benchmarks,
you'll want to pass --server to use the optimizing JVM in JRuby. Much
faster.

···

On Wed, Aug 19, 2009 at 9:36 PM, <brabuhr@gmail.com> wrote:

On Wed, Aug 19, 2009 at 5:05 PM, Charles Oliver > Nutter<headius@headius.com> wrote:

On Wed, Aug 19, 2009 at 8:31 AM, Ben >> Christensen<benjchristensen@gmail.com> wrote:

I've started down a path of doing various performance tests to see what
kind of impact will occur by using Ruby and in my first test the numbers
are very poor - so poor that I have to question if I'm doing something
wrong.

1.8.6 is pretty slow, compared to other impls. Ruby 1.9 and JRuby will
perform better, as shown by a few folks. JRuby on a Java 6 JVM with
--fast and --server should perform very well.

And, of course JRuby adds other possibilities:

$ java FileReadParse
Starting to read file...
The number of tokens is: 234937
It took 2098 ms

$ java FileReadParse
Starting to read file...
The number of tokens is: 234937
It took 788 ms

$ ruby -v file_read_parse.rb
ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0]
Starting to read file ...
The number of tokens is: 234937
It took 2666.646 ms

$ jruby -v file_read_parse.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file ...
The number of tokens is: 234937
It took 3120.0 ms

$ jruby --fast --server -v file_read_parse.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file ...
The number of tokens is: 234937
It took 2809.0 ms

$ jruby -v file_read_parse-2.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file...
The number of tokens is: 234937
It took 593 ms

$ java FileReadParse
Starting to read file...
The number of tokens is: 234937
It took 588 ms

$ jruby -v file_read_parse-2.rb
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
Client VM 1.5.0_16) [ppc-java]
Starting to read file...
The number of tokens is: 234937
It took 595 ms

$ cat file_read_parse-2.rb
require 'java'
java_import 'FileReadParse'

FileReadParse.new.do_stuff

:slight_smile:

Thanks everyone for your responses.

Yes, this test is representative of some of the types of applications
and necessary data processing I have current applications doing and am
needing in some future ones.

The file I'm using is 49MB in size unzipped - too large for me to upload
right now as I'm on a mobile cell network.

To provide context on the file, it contains data such as this:

Western Digital Caviar Special Edition Hard Drive - 80GB - 7200rpm -
Ultra ATA - IDE/EIDE - Internal
Kingston 256MB SDRAM Memory Module - 256MB (1 x 256MB) - 133MHz PC133 -
SDRAM - 144-pin
512Mo (1 x 512Mo) - 133MHz PC133 - SDRAM - 168 broches

It's stats are:

wc /tmp/file_test.txt
1778983 7764115 51084191 /tmp/file_test.txt

This is not a test of "file reading". The test is related to the
performance of iterating over large lists of data and performing
processing on them - such as indexing for searching, cleansing,
normalizing etc.

This is a very small representation of the level of complexity and size
of data I would in reality be dealing with.

It seems however that the answer is that this is not what Ruby is well
suited for. Am I correct in that determination?

I will however be continuing my ongoing tests with SOAP/REST webservices
and more CRUD focused webapps, where I expect to see Ruby shine.

···

--
Posted via http://www.ruby-forum.com/.

pharrington, in your response you stated:

"as the code that happens is neither the most "elegant" *nor* fastest
Ruby can do."

Can you please provide me a re-write of the Ruby code I used that is
elegant and fast so I can learn from you?

Hi, I was gone for the day, but numerous people in the thread already
did both, so :slight_smile:

@Mike

Thank you for providing the Gist link to a file.
(http://gist.github.com/170476\)

However, the changes don't improve the performance when I take into
account what was removed and I had in there on purpose. Take note of
item #2 below.

1) Object structure

The modified code removed all of the class/object structure, which I
purposefully had in there to simulate this being an object within a
larger project.

Sticking methods in a class really doesn't simulate an object in a
larger project at all; its just methods in a class. The general
concept of ***larger project*** isn't really something you can factor
out; its just how the code ends up needing to be structured for the
task at hand.

That being said, converting the lines of code we're discussing for
performance into a script means nothing to this discussion - but I
purposefully am writing the code in an OO style with classes as opposed
to scripts.

Again, coders code to solve the task at hand. When you say "Yes, this
test is representative of some of the types of applications
and necessary data processing I have current applications doing and am
needing in some future ones" we look at what the code *does*, not
guess at a vague idea of a large project which defining a class is
apparently supposed to imply. The code you posted counts words in a
file. Nothing more; nothing less.

I was also purposefully making the Java and Ruby versions as similar to
each other so as to allow a performance comparison to be done with as
little difference as possible in approaching the code.

If you want to write Java code, then why use something that isn't
Java? This is like taking a C program, trying to emulate as closely as
possible, line-for-line the C code in Erlang (using mutable data and
everything) and then dismissing Erlang because it's worse at being C
than C. Different languages express solutions to different problems in
different ways; that's the whole point.

I guess you just wanted to know whether or not the Ruby interpreter is
generally slower than compiled Java bytecode? Of course it is (I
assumed this was common knowledge (to the point of being a cliche
even) but :\). If anyone told you otherwise, I'm sorry you were
blatantly lied to. BUT Ruby lets us *produce* faster and more
accurately, giving us plenty of spare time to optimize the code (even
porting specific parts to C if needed) after we've easily made it
correct.

2) Counting versus Using the Tokens

In the modified code, it is now just counting the tokens:

num \+= l\.split\.length

Obviously that is faster than what I had in the original code. Again
however, I'm doing this on purpose.

Counting the number of tokens in an of itself is not all that I was
doing in the original code or in the Java version. To simulate more
closely what actually occurs in a functional system I am:

- assigning the array of tokens to a variable
- iterating the tokens to do something with each of them

In this case I'm just assigning each token to another variable and then
performing the count.

In a real world use I'd perform some function on the text, put it
somewhere, whatever.

In the real world, the "do something with each of them" is the real
juicy part that we want to compare. What is the something? Does the
real world program just end up counting tokens? Then we realize this,
count tokens, and be on our merry way. Is the real world program
taking each word in a text file, comparing relationships against a
lexical database, then based off whatever relationships in context and
calculations constructing a sort of hash to classify a given text
document? String token = tk.nextToken(); numTokens++ does not begin to
describe or "simulate" this, so what is the point of the benchmark?

This change accounts for the difference in time from "7965.289 ms" to
"4821.399 ms" when I run the original code and the modified code.

So yes, the modified code is "faster", but it's not doing the same thing
as the original and therefore not a valid comparison.

The input is the same. The output is the same. The person running your
code does not care if its object-oriented, procedural, a script, is
functional, etc; he only cares if he gets the expected output in a
reasonable amount of time when he gives his input. Thus the coder only
cares if she can code fast enough to give the client the features he
wants, and if she can does this in a way thats easy to keep up with
his increasing feature demands while keeping the code stable and
fast.

But I dunno, maybe I'm still completely missing the point?

Or, perhaps in your case: 9x% in Ruby, y% in Java.

Example 1:

require 'java'
java_import 'FileReadParse'

FileReadParse.new.do_stuff

Example 2:

require 'java'

java_import 'java.util.StringTokenizer'

File.open("/tmp/file_test.txt") do |file|
  file.each_line do |line|
    tokens = StringTokenizer.new(line)
    tokens.each do |token|
      #do_stuff_with(token)
    end
  end
end

(Though in the token counting case, example 2 is slower than pure
ruby: "tokens = StringTokenizer.new(line)" takes more time than
"tokens = line.split".)

Example 3:

require 'java'

java_import 'TokenProcessor'

token_processor = TokenProcessor.new

File.open("/tmp/file_test.txt") do |file|
  file.each_line do |line|
    line.split.each do |token|
      token_processor.process(token)
    end
  end
end

There could also be room in your toolbox for Ruby to help in the
testing of your Java code:
http://jtestr.codehaus.org/
http://wiki.github.com/aslakhellesoy/cucumber/jruby-and-java

···

On Thu, Aug 20, 2009 at 11:22 AM, Ben Christensen<benjchristensen@gmail.com> wrote:

-- My preference would be, if I can write 97% of the program in Ruby,
and
-- 3% in C, is that really going to be less pleasant than writing 100%
of the
-- program in Java?

An interesting observation and one I must consider.

Hi Ben,

Three books come to mind when discussing "real Ruby" style:

"The Ruby Way", by Hal Fulton: The Ruby Way (The oldest of
the three, and I don't *think* it covers 1.9)
"The Well-Grounded Rubyist", by David A. Black: The Well-Grounded Rubyist

"Ruby Best Practices", by Gregory Brown: Ruby Best Practices

For projects, I'd recommend Rake or FasterCSV. There are no doubt many, many
more, but those two are written by very well respected Rubyists, and are
also in widespread use.

HTH,
Mike

···

On Thu, Aug 20, 2009 at 11:22 AM, Ben Christensen <benjchristensen@gmail.com > wrote:

-- You won't really get a taste for what we mean until you start writing
-- Ruby (real Ruby, not Java ported line-by-line to Ruby).

What example opensource projects can you refer me to which espouse the
"real Ruby" style of doing things?

I'd prefer non-Rails projects, as I understand the completely different
approach of webapp dev with Rails.

I'm looking specifically at Ruby.

I keep getting told that I must understand the "Ruby way" - so I'd
appreciate instruction on how to accomplish the "Ruby way" considering I
am apparently boxed in as a "Java/C style programmer" ... despite
disliking C :slight_smile:

If speed and static typing are must-haves, and if you have already got
a lot invested in the jvm, scala is very well worth a look. It also
has some features that let you write concise, maintainable code in
much the same way that you could with ruby.
http://www.cordinc.com/blog/2009/04/combinatorial-iterators-in-jav.html
is an interesting look at the same task implemented in all three
languages.

martin

···

On Thu, Aug 20, 2009 at 8:52 PM, Ben Christensen<benjchristensen@gmail.com> wrote:

-- May I then ask what made you consider Ruby in the first place?

The reason I'm considering it is because I don't want to blindly choose
Java just because it's the default.

As for why Ruby and not Erlang, Scala, Groovy etc -- the honest answer
is because Ruby is getting so much attention these days, to the point of
religious fervor amongst many I speak to that I need to take an
objective look at it and what it does well.

Peter,

Taking your experiences one step further, wouldn't it stand to reason
that if a system is being "rebuilt" with all of the lessons learned, but
with the mature "faster performance" language, that it could achieve
higher performance than being rebuilt in a new, less mature, "slower
performance" language?

Your well-founded arguments suggest that many (if not the majority of)
performance issues are in poor design and implementation - not the
language itself. I agree that this is often the case - I find and fix
many of them in the systems I profile. A recent example was an issue
causing 2 orders of magnitude in performance degradation because of
absolutely horrible design - nothing to do with any type of language,
platform or infrastructure.

But if a system is being built by a team capable of achieving the
performance gains you claim with a "slower" toolset, if given a "faster"
toolset, would that same team not accomplish an even better performing
end result?

Of course, I'm not suggesting a difference as extreme as Assember and C
- which is such a different paradigm that this comparison is very
difficult to do.

Current languages though are so often variations on a theme - rather
than revolutionary changes in approach. For example, working with Ruby
doesn't leave me feeling like I've just experienced some nirvana -- it
feels like just a different approach to things that may or may not
benefit certain tasks -- but principally is not so different from Java
(or Groovy, Scala, C#, Python etc) as to make me feel something earth
shattering has occurred.

Thus, if a team equally skilled in both Ruby (and its "way" of doing
things) and Java could approach a project and avoid the design pitfalls
that cause most of the performance issues you have stated - then
wouldn't the team accomplish higher performance with Java?

Ben

···

--
Posted via http://www.ruby-forum.com/.

That being said, I am trying to figure out what the "Ruby way" is -
which so far is far from clear to me.

[...]

What example opensource projects can you refer me to which espouse the
"real Ruby" style of doing things?

I can't think of any particularly good examples, mainly because...

unless it's a small team
of very competent people who all believe in the same paradigm and treat
their code as art.

I was part of just such a team. We built a set of semi-formal rules, and an
always-outdated document about coding style. Mostly, though, our coding style
evolved together because we were always in each other's code and over each
other's shoulder.

So, unfortunately, I've developed a very visceral and intuitive sense of what
"real Ruby" should be, what's idiomatic, but I find it difficult to express.

I can point to a few things you've probably heard:

- Duck typing. The type and class hierarchy is completely irrelevant. All you
care about is whether the object in question responds to a particular method.
(This means you should more often use #responds_to? rather than #kind_of? if
you're testing your arguments at all.)

- Encapsulation. Not as in enforcing what's private, because you can't
(there's always #send and #instance_variable_get), but as in, push the logic
back into the appropriate object, rather than into something operating on that
logic.

- DSLs. Or, less buzzword-y, define what you'd like to be able to do, and then
figure out how to do it. Go by what's most expressive, and most sounds like
English -- treat code as communication. "Code like a girl."

- Don't Repeat Yourself.

I can give you some extreme examples: Rake (or even Capistrano), Hpricot (or
better, Nokogiri), Sinatra, Markaby, and Rspec (or test-spec, etc).

I'm not suggesting you read the source of all of them. Rather, see how they
might be used. Sinatra is a particularly powerful example, especially combined
with Markaby -- though I prefer Haml for real projects. Rails is a fine
framework, but it's beautiful to see a framework dissolve into nothing more
than:

get '/' do
  'Hello, world!'
end

For example, if both Java and Ruby both performed single threaded
transactions at 150ms each, and both scaled to 10 concurrent threads
equally well, but Java continues to scale to 30 concurrent threads and
Ruby does not, then that's a scenario where I can add 3 machines to
scale Ruby horizontally and truly argue that the cost of the hardware is
more than made up for by lower developer costs.

But, "per request" performance does not get improved by this type of
solution.

A good point. Still worth investigating whether Ruby can be "fast enough" for
this. Just for fun, here's a quick presentation:

This is also relevant, as there are plans to merge Merb and Rails at some
point, while retaining the advantages of Merb -- particularly performance.

Adding faster hardware does not make Ruby catch up to Java - since Java
also improves with faster hardware.

Yes, you've said this before -- but it doesn't have to. Take your example
above -- if you can get Ruby under 150 ms, that's good enough. Adding faster
hardware gets Ruby under 150 ms. If it gets Java down to 30 ms, what's the
point?

It provides a better use experience to the user and (according to Google
and Amazon) increases their usage of the system.

I'm curious what the threshold was for this to make a difference.

Certainly, at a certain point, it doesn't. The difference between 16 ms and 0.6
ms would actually be invisible to the human eye. But while 100 ms vs 50 ms may
make a difference, I'm skeptical. Users are annoyed at having to wait a tenth
of a second for a response?

The speed at which an application responds to an end users request
impacts the overall usability of an application.

It is for this same reason that things such as network compression,
network optimization (CDNs, Akamai route acceleration etc) and client
side caching also all play a role.

These all make sense -- Akamai in particular -- in the context of having a 100
ms response instead of, say, 500 ms or a full second, or in the context of
scalability.

-- when was the last time the type system saved you?

It's the toolset as you stated that you suspect.

The readability of code to know exactly what types a given argument,
variable or array contain.

To me, this falls back into Duck Typing. What type does this argument contain?
Why is this a meaningful question? If I want it to contain a string, for
instance, all I really need to know is whether it responds to #to_s.

More likely, it's a more complex object, but it's still the behavior that I
care about, not the type of it. And this intuitively makes sense -- in the
real world, also. When making a hiring decision, do you care about the "type"
of the person -- their degree, their sex, their skin color? Or do you care
what they can do, and how they'll interact with the rest of the team?

Yes, the degree may be an indication of that, but it's not really what you
care about. And certainly, the other things I mentioned shouldn't enter into
the equation at all.

For example, code completion in these tools to suggest the available API
methods is almost useless, as they offer virtually every method
available under the sun, as they are not interpreting what actual type
the variable is.

Because it probably doesn't have one yet.

While it's a bit different, try running an IRB shell with your favorite
framework loaded and some sort of tab completion. It won't be perfect, but
it'll probably work.

In the mean time, I'm going to say that it isn't an issue for me, simply
because if the framework I'm using is so complex that I need code completion
for daily work, I'm probably using the wrong framework. I can think of some
times it would've been convenient, but not nearly worth having to use one of
these other languages.

Therefore they'll show me 15 different versions of a
method with the same name, all for different object types from the Ruby
API.

Any one of them would probably have been a starting place.

Thus, I must now depend on a team of developers properly documenting
everything, using very descriptive naming conventions (and properly
refactoring all of that when changes occur), and wrapping everything in
unit tests.

These are things you should rely on anyway.

No, not Hungarian notation, but calling the variable something more
descriptive than 'a' and 'b'.

Now, all of those are "ideal" cases - ones I believe in and stress
continually. I have hundreds and hundreds of unit tests and automated
build servers etc - but in the "real world", getting teams to comment
every method, properly name (and refactor) variable names and cover
everything in unit tests just doesn't happen

I don't comment every method. I should comment more than I do, but for
example:

def writable_by? user
  # ...
end

Tell me you don't at least have a guess what that does.

-- 100 lines of code is generally easier to read and
-- debug than a thousand.

I'll give you that - but I have yet to see anything that proves to me
that a competent developer using both Ruby and Java (or C# for that
matter) would have 10x as much written code than they would in Ruby.

It's probably an exaggeration, but not much, though I admittedly have limited
experience in Java. But as an example, how much time do you spend writing
interfaces? Maybe it was the nature of the assignment, but I would guess
easily 20-30% of my time doing Java in school was doing things like writing
interface definitions.

That whole file becomes irrelevant in Ruby.

And I would say the same for Ruby or Python, and to a lesser extent, Perl and
Lisp -- it does end up being _significantly_ less code. I'm learning Lisp now,
and this book:

http://gigamonkeys.com/book

opens with just such an anecdote:

"The original team, writing in FORTRAN, had burned through half the money and
almost all the time allotted to the project with nothing to show for their
efforts... A year later, and using only what was left of the original budget,
his team delivered a working application with features that the original team
had given up any hope of delivering. My dad credits his team's success to
their decision to use Lisp.

"Now, that's just one anecdote. And maybe my dad is wrong about why they
succeeded. Or maybe Lisp was better only in comparison to other languages of
the day..."

I could say the same -- certainly Java is going to be better than FORTRAN. But
you'll still occasionally find the story of the team which beat everyone to
market, or swooped in and rewrote a failing project, or won.

The "cruft" so often referred to are things that I don't even consider
or think of. Boilerplate code ... clutter and sometimes annoying ...
fades into the background and tools remove the pain of it.

I don't think tools would remove the pain of looking at it, at least -- and
yes, it is annoying. Even if the language is going to be statically typed,
consider the runtime exception. If the Java compiler knows enough to know that
I forgot to declare what type of exceptions a method might throw, why do I
have to specify them at all? If it's for the sake of other developers, why
can't the tool tell them?

After all, there are going to be plenty of methods which really wouldn't care
about exceptions -- just let them pass through, let some other layer handle
them.

I also find it telling that with Ruby, I can get by with just a good text
editor -- TextMate for OS X was excellent, though I now use Kate on Linux --
whereas with Java, I would pretty much need a tool just to remove the pain of
the language.

Amazon referred another book called "The Ruby Way" which may also
provide me good insights. Any experience with that one?

None. I did read a book called "The Rails Way" which was excellent, and seems
to be from the same series, but by a different author.

In fact, I'm trying to figure
out how to rip Java out of my webapps completely and leave that to the
backend webservices and let the presentation layer be as free from
"code" as possible.

Look at Haml and Sass. You'll either love it or hate it.

For example, if I can accomplish a dynamic front-end purely driven by
client side Javascript using AJAX techniques with a REST style
webservices backend, I will try to pursue that.

I like jQuery for this. Rails and Merb seem to be moving back towards
integrating this kind of thing -- "link_to_remote" is an old-school example,
and I suspect we'll see more of this sort of thing in the future.

I've also been a big fan of replacing the X in AJAX with either JSON or HTML,
as the situation demands. While it's a bit sloppy, HTML makes sense in that I
can then have all the HTML-generation stuff in the server-side views, where
they belong, and the Javascript on the client is that much simpler. But if I
was writing a richer client, JSON would be ideal, at least until someone shows
me a decent Javascript Yaml library.

The middle ground seems to be pursuing Ruby or something else that is
still server-side, but better suited to the always changing pace of
webapp dev and more creative, script driven coding style better suited
to web developers and designers.

I think this would work well with the above. In particular, Rails has been
very REST-oriented for a very long time.

···

On Thursday 20 August 2009 10:22:39 am Ben Christensen wrote:

pharrington, in your response you stated:

"as the code that happens is neither the most "elegant" *nor* fastest
Ruby can do."

Can you please provide me a re-write of the Ruby code I used that is
elegant and fast so I can learn from you?

I consider myself quite advanced in Java (14 years of experience there)
but obviously do not have experience in Ruby for performance tuning and
optimization.

I would appreciate your demonstration of how to perform the task I have
attempted in Ruby using an appropriate "Ruby" approach that achieves the
highest performance possible and the "elegance" spoke of.

Thank you.

Ben

···

--
Posted via http://www.ruby-forum.com/.

Ben -- I've been working with Java since '96 (and taught Java for sun for a while, so I think I can understand where you may be coming from). At this point, I prefer to write Ruby -- it's much more readable and lots less *crufty* than Java, but Java still pays the bills.

I do have the following questions and/or things to consider --

1. How *often* are you going to be processing these files? If they are batch style jobs, then does absolute speed matter over maintainability?

2. Are there any reasons to not keep the data in a database and then perform queries, etc.?

If you're wanting to do things such as indexing and so forth, Ruby's string handling far outshines, imho, Java's. Ruby's "collections" and enumerables are far more robust as well. As a result, I can spend 5 minutes writing something that would take me 30 or even 60 minutes in Java. Yes, ruby may not be faster in execution time -- of course, as the results show, it depends on how you write it (in one instance it was faster than java), but even if a run takes, say, 1 second longer, it'd have to run 1500 times before the total of java's development and runtime caught up with ruby's. And that's not including maintenance time. Then factor in that developer time is usually far more expensive than cpu time, and Ruby tends to come out in the lead.

What would be a far more fair assessment would be to factor in the amount of time it takes to write a test, as well as the number of lines of code, since size of code tends to increase complexity and also maintenance costs. Then run the two and see which is better.

If you're processing these files in realtime to extract data, etc., then perhaps you'd be better loading them into a database. However, if they're batched, as I expect, by simply comparing "speed of execution" you're looking at only one facet of the problem.

Matt

···

On Thu, 20 Aug 2009, Ben Christensen wrote:

This is not a test of "file reading". The test is related to the
performance of iterating over large lists of data and performing
processing on them - such as indexing for searching, cleansing,
normalizing etc.

This is a very small representation of the level of complexity and size
of data I would in reality be dealing with.

It seems however that the answer is that this is not what Ruby is well
suited for. Am I correct in that determination?

Thank you Martin and Mike for the book references, I will go pursue
further education on the subject from those.

This thread has been very instructional to me and I appreciate your
willingness to discuss this subject.

Have a nice weekend everyone.

Ben

···

--
Posted via http://www.ruby-forum.com/.

Ben,

I think the problem is that we are technologists, so we see our work
through a technical lens. But developing systems is a human activity.

Peter,

Taking your experiences one step further, wouldn't it stand to reason
that if a system is being "rebuilt" with all of the lessons learned, but
with the mature "faster performance" language, that it could achieve
higher performance than being rebuilt in a new, less mature, "slower
performance" language?

It wouldn't be true, but it might stand to reason. It would be reasonable
for me to expect that Microsoft Word, on my Dual Core MacBook Pro
should be 100x as responsive as the first dedicated word processor
that I used 25 years ago

I think that, in general, systems are as slow as is physically possible
  whilst still being "good enough." I suspect that the 2 or 3 orders of magnitude
degradations are common today because of current hardware, and that 20 years
ago the same systems might be 1 or 2 orders of magnitude slower.

Your well-founded arguments suggest that many (if not the majority of)
performance issues are in poor design and implementation - not the
language itself. I agree that this is often the case - I find and fix
many of them in the systems I profile. A recent example was an issue
causing 2 orders of magnitude in performance degradation because of
absolutely horrible design - nothing to do with any type of language,
platform or infrastructure.

My work has shifted in recent years from development to more short-term,
fire-fighting, performance work. I've found that to making a dramatic
  performance improvement in minimal time requires being willing (and able)
to work at all layers in the technology stack, from web page construction,
application code, server configuration, physical architecture(what runs where),
DB schema and query tuning, OS kernel tuning, TCP stack tuning, RDBMS,
physical network, hardware, hosting, virtualization, etc.

Unless I'm feeling jaded, I wouldn't say poor design and implementation - rather
incomplete or nonexistent design and physical architecture, and flawed implementation.
A positive spin on this is to say that 95% of startups fail, 70% of IT allegedly fail,
thus it's a waste of effort to unnecessarily invest this time until there is evidence that
the system will live.

But if a system is being built by a team capable of achieving the
performance gains you claim with a "slower" toolset, if given a "faster"
toolset, would that same team not accomplish an even better performing
end result?

Not at all. The same team that builds a web service that responds in two seconds
is capable of building the same web service with a response time of 200 ms.
But what incentive do they have, if they don't have a 200ms SLA?

The "bloat factor" is not a measure of developer strength. It seems to be more a function
of the expected performance, and how visible performance is.

Here's an example:

I worked on a project that had more than 100 developers building "portlets",
tiny portal widgets that built a single piece of content. My team was responsible for
performance. After a few months of furiously tuning everything we could we made
a change that had a profound impact:

We reconfigured the integration instance of our system so that any user would view
the build time of each portlet, as a small label on the frame of the portlet. At first
build times ranged from upto a few seconds. Within a few days those developers
whose code was especially slow were proactively asking experienced architects to help
them fix performance issues. Within two months there were no portlets building
in more than 100 ms. The force of social disapproval had a much bigger impact than
buying ten licenses of a Java profiler.

... working with Ruby ..... is not so different from Java ... make me feel
something earth shattering has occurred.

I was a bad/intermediate C++ programmer when I learnt Java. My managers, who were
all much stronger at C++ would say similar things. Java did turn out to be ground-breaking
despite looking visually so much like C++

The significance of a technology change depends upon how its used, not just syntax or the obvious feature-set.
I think it took more than a decade to clearly see how Java had changed things for developers.

In 2015 I will be happy to discuss whether or not Ruby makes the earth move for me.

Thus, if a team equally skilled in both Ruby (and its "way" of doing
things) and Java could approach a project and avoid the design pitfalls
that cause most of the performance issues you have stated - then
wouldn't the team accomplish higher performance with Java?

No. Like the guy in the bar said, "If my auntie Velma had a pair, she'd be my uncle ..."

I interviewed with two competing, secretive, equity option dealers in 2004. Each had a team of 3 or 4 developers
who had written their own automated trading system. These dealers competed head to head and used
identical technology (Java, Linux on Intel), and both were successful. The first place had identified that
equity options was an important business to them, knew that the US markets would be most profitable,
and had provided the group with almost a blank check to get the job done. They were proud of their
300 server cluster and the profits they made.

The other team was less optimistic and had a much smaller bankroll. They didn't realize that the US was
the place to be so they tried to deal in all global markets. They built a similar system to the first group
but they had a system workload that was about 10x that of the first company. They were just as proud of
their home-grown system, which made them lots of money, even though it was hosted on a mere 4 servers.

Two teams, similar background, built near-identical systems, and one had a capacity that was
1000x the capacity of the other.

This is not unusual.

Peter.

···

On Aug 21, 2009, at 6:09 PM, Ben Christensen wrote:

Ben
--
Posted via http://www.ruby-forum.com/\.

I've spent some time trying to understand this. My impressions ..

Human Factors Research says:
A response is perceived as instantaneous if it occurs within 0.1 or 0.2 sec
A response is perceived as immediate, and won't interrupt concentration, if it occurs within about 0.5 to 1.0 sec
If we speed up a website, then a change thats less than 7% to 18% wont be discerned by a user.

The State of the art
Google measured that a 400ms slowdown measurably reduced the number of searches per user by 0.76%
Shopzilla invested in performance tuning their website and saw revenue increases of 7 to 12%
Bing saw measurable changes in user behavior with slowdowns of 200ms but not with slowdowsn of 50ms

State of the Art Data Point: A google search for "state of art performance testing" from a slow broadband (ADSL 1.5MB/s) IE7 client in Dulles VA has an average response time of 861 ms (entire page)

The State of the practice
The web is slow, unnecessarily slow, sometimes painfully slow.
Most website owners (and most developers)
  dont know that their websites are slow,
and they dont know how to fix them

Above Average Data Point: If you search either meetup.com or LinkedIn for "performance testing"
from same client as above both have an average response time of 2.0 sec (entire page).

Sources
Designing and Engineering Time by Steven Seow
papers from Xerox Parc
The O'Reilly Velocity Conference proceedings for 2007 to 2009
My clients
Remote test tools like Keynote, Neustar, Gomez etc

···

On Aug 23, 2009, at 6:08 PM, David Masover wrote:

On Thursday 20 August 2009 10:22:39 am Ben Christensen wrote:

It provides a better use experience to the user and (according to Google
and Amazon) increases their usage of the system.

I'm curious what the threshold was for this to make a difference.

Certainly, at a certain point, it doesn't. The difference between 16 ms and 0.6
ms would actually be invisible to the human eye. But while 100 ms vs 50 ms may
make a difference, I'm skeptical. Users are annoyed at having to wait a tenth
of a second for a response?

A version of Mike Sassak's gist

start = Time.now
printf "Starting to read file ...\nThe number of tokens is: %d.\nIt took
%.2f ms\n" , File.open(ARGV[0]){|f| f.inject(0){|a,l| a+l.split.length } } ,
(Time.now - start) * 1000

I won't call it elegant, that seems subjective to me, but I do appreciate
brevity.

···

On Wed, Aug 19, 2009 at 12:52 PM, Ben Christensen <benjchristensen@gmail.com > wrote:

pharrington, in your response you stated:

"as the code that happens is neither the most "elegant" *nor* fastest
Ruby can do."

Can you please provide me a re-write of the Ruby code I used that is
elegant and fast so I can learn from you?

I consider myself quite advanced in Java (14 years of experience there)
but obviously do not have experience in Ruby for performance tuning and
optimization.

I would appreciate your demonstration of how to perform the task I have
attempted in Ruby using an appropriate "Ruby" approach that achieves the
highest performance possible and the "elegance" spoke of.

Thank you.

Ben

--
Posted via http://www.ruby-forum.com/\.

My previous version would probably be better like this:

start = Time.now
puts "Starting to read file ..."
puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
f.inject(0){|a,l| a+l.split.length } } ,
  "It took #{(Time.now - start) * 1000} ms"

That way if the file is enormous, it prints the "starting to read file ..."
immediately.

···

On Wed, Aug 19, 2009 at 1:17 PM, Josh Cheek <josh.cheek@gmail.com> wrote:

A version of Mike Sassak's gist

start = Time.now
printf "Starting to read file ...\nThe number of tokens is: %d.\nIt took
%.2f ms\n" , File.open(ARGV[0]){|f| f.inject(0){|a,l| a+l.split.length } } ,
(Time.now - start) * 1000

I won't call it elegant, that seems subjective to me, but I do appreciate
brevity.

On Wed, Aug 19, 2009 at 12:52 PM, Ben Christensen < > benjchristensen@gmail.com> wrote:

pharrington, in your response you stated:

"as the code that happens is neither the most "elegant" *nor* fastest
Ruby can do."

Can you please provide me a re-write of the Ruby code I used that is
elegant and fast so I can learn from you?

I consider myself quite advanced in Java (14 years of experience there)
but obviously do not have experience in Ruby for performance tuning and
optimization.

I would appreciate your demonstration of how to perform the task I have
attempted in Ruby using an appropriate "Ruby" approach that achieves the
highest performance possible and the "elegance" spoke of.

Thank you.

Ben

--
Posted via http://www.ruby-forum.com/\.

Just for fun, here's a verbose, somewhat less magical version:

start = Time.now
filename = ARGV.first

puts 'Starting to read file...'

count = 0
File.open filename do |file|
  file.each_line do |line|
    count += line.split.length
  end
end

puts "The number of tokens is: #{count}."
duration = Time.now - start
puts "It took #{duration*1000} ms"

That is intended to be somewhat self-documenting, so a bit more verbose than I
might normally do. It does more or less the same thing, in more or less the
same way. It also seems to be following roughly the pattern you did in Java,
and I find it _much_ more readable.

Of course, it's a short (benchmark) example, so it's difficult to show Ruby
really shining here, unless you want to play golf. But even the readable
version is also far less verbose than the equivalent Java.

···

On Wednesday 19 August 2009 01:45:22 pm Josh Cheek wrote:

My previous version would probably be better like this:

start = Time.now
puts "Starting to read file ..."
puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
f.inject(0){|a,l| a+l.split.length } } ,
  "It took #{(Time.now - start) * 1000} ms"