First, I should point out that I'm new to Ruby, although it seems pretty
similar in some regards to JavaScript and Perl.
Anyway, I'm not sure if it's normal, or if it's specifics of Ruby on Mac
OS X, or if I haven't compiled it properly (although I used rvm to
install on my OS X Lion, to have the latest version - Lion comes with
1.8.7 by default, I believe, so I installed 1.9.3), but most things I
try to replicate with it that I used Perl to do before run about twice
slower. So I ran some basic benchmarks. Here's one example:
Ruby:
for a in 0..1E8
a*2
end
Perl:
for $a ( 0..1E8 ) {
a*2
}
Ruby takes 22 seconds, Perl - 9 seconds to execute this. This is very
similar to all other scenarios I tried (one of which is splitting
millions of comma separated rows into arrays).
I would really appreciate any useful suggestions: I would LOVE to be
able to use Ruby for most of the stuff I do (it's not that I don't like
Perl, but I love Ruby's syntax )
Here's another example with significantly bigger performance difference:
Ruby:
s = "This is a test string"
re = Regexp.new( / test / )
for a in 0..1E7
re.match( s )
end
Perl:
my $s = "This is a test string";
for my $a ( 0..1E7 ) {
$s =~ / test /;
}
Perl takes about 1.5 seconds to execute this, while Ruby takes a
whopping 16!!! :((( I have a very strong feeling that I didn't compile
Ruby properly - there can't be such a huge difference in regexp matching
Choosing the right language is a lot less important than choosing the right algorithm:
5461 % time ruby -e 'n = 10**8; p (n + 3*n**2 + 2*n**3)/6'
333333338333333350000000
real 0m0.009s
user 0m0.004s
sys 0m0.004s
In most cases (depending on the domain, of course (*)), ruby is "fast enough". Often, with my slower ruby, I'll finish coding long before you would in your faster language. This coding-time difference is usually sufficient to deal with run-time differences.
*) your domain is fast enough unless you work for the IRS, NASA, wallstreet **, or pixar ***.
**) sufficient examples exist to show that those domains are also fast enough.
***) prolly not here tho.
···
On Feb 2, 2012, at 14:20 , Dmitry Nikiforov wrote:
Ruby:
for a in 0..1E8
a*2
end
Perl:
for $a ( 0..1E8 ) {
a*2
}
Ruby takes 22 seconds, Perl - 9 seconds to execute this. This is very
similar to all other scenarios I tried (one of which is splitting
millions of comma separated rows into arrays).
I would really appreciate any useful suggestions: I would LOVE to be
able to use Ruby for most of the stuff I do (it's not that I don't like
Perl, but I love Ruby's syntax )
$ cat mult.rb #for a in 0..100000000
# a*2 #end
require 'rubygems'
require 'inline'
class Multiply
inline do |builder|
builder.c "
long mult(int max) {
long ctr = 0;
unsigned long long result;
while (ctr < max){ result = (ctr++ * 2);}
return result;
}"
end
···
On 2/2/2012 5:20 PM, Dmitry Nikiforov wrote:
Hello all!
First, I should point out that I'm new to Ruby, although it seems pretty
similar in some regards to JavaScript and Perl.
Anyway, I'm not sure if it's normal, or if it's specifics of Ruby on Mac
OS X, or if I haven't compiled it properly (although I used rvm to
install on my OS X Lion, to have the latest version - Lion comes with
1.8.7 by default, I believe, so I installed 1.9.3), but most things I
try to replicate with it that I used Perl to do before run about twice
slower. So I ran some basic benchmarks. Here's one example:
Ruby:
for a in 0..1E8
a*2
end
Perl:
for $a ( 0..1E8 ) {
a*2
}
Ruby takes 22 seconds, Perl - 9 seconds to execute this. This is very
similar to all other scenarios I tried (one of which is splitting
millions of comma separated rows into arrays).
I would really appreciate any useful suggestions: I would LOVE to be
able to use Ruby for most of the stuff I do (it's not that I don't like
Perl, but I love Ruby's syntax )
Here's a question: when I say "for a in ( 0..1E8 )" - does Ruby create
an array and populate is with values from 0 through 1E8, or does it
merely create a counter similar to "for( a = 0; a<=1e8, a++ )" ?
Tried rubinius and jruby. Rubinius so far is the fastest one, but still
slower than Perl. The empty "for" loop runs about as fast as "while" or
.each (10 seconds - rubinius, 3.5 seconds - Perl), although .times
takes only 6 seconds.
Regexp match using / test /.match works about the same as s =~ / test /,
and is about 5 seconds, vs. Perl's 1.4s (1e7 repetitions), although,
seeing as there's such a huge difference in just empty loop alone it's
hard to tell if it's because regexps themselves are slower in Ruby, or
if it's because of the regexp engine...
Curious: rubinius reports itself as 2.0.0dev ( 1.8.7 ), which is strange
- Ruby 1.8.7 does not support \p{} regexps (like \p{Alnum} for example),
and rubinius does..
It's all the parens, whitespace, and use of tabs that slows ruby down:
# takes 26.6 seconds on my laptop:
s = "This is a test string"
re = Regexp.new( / test / )
for a in 0..1E7
re.match( s )
end
# takes 8.67 seconds on my laptop:
s = "This is a test string"
for a in 0..1E7
s =~ / test /
end
···
On Feb 2, 2012, at 14:55 , Dmitry Nikiforov wrote:
Here's another example with significantly bigger performance difference:
Ruby:
s = "This is a test string"
re = Regexp.new( / test / )
for a in 0..1E7
re.match( s )
end
Perl:
my $s = "This is a test string";
for my $a ( 0..1E7 ) {
$s =~ / test /;
}
Perl takes about 1.5 seconds to execute this, while Ruby takes a
whopping 16!!! :((( I have a very strong feeling that I didn't compile
Ruby properly - there can't be such a huge difference in regexp matching
Choosing the right language is a lot less important than choosing the
right algorithm:
5461 % time ruby -e 'n = 10**8; p (n + 3*n**2 + 2*n**3)/6'
333333338333333350000000
real 0m0.009s
user 0m0.004s
sys 0m0.004s
My code was merely an example of very simple loop. Its purpose was not
to calculate something, but run through the loop, and execute
multiplication on every iteration.
My main area of development is processing of rather large amounts of
data (billions of entries, primarily processed by regular expressions,
with some statistical analysis on top, and potentially - addition of NLP
later). You _have_ to iterate through every entry of the incoming data
(which might already be in the database, plain text file, or might be
just a "fire hose" of data pouring into the system in real time).
While I'd LOVE to have a nice and clean syntax, performance is still
number 1 on my list of priorities, therefore I asked if maybe there are
ways to improve Ruby performance.
It might make a counter-based loop internally, but at a higher level
it translates it into an iterator using Enumerable#each. The iterator
knows how to return the next number in the range each time. It doesn't
create an array with 1E8+1 elements.
···
On Fri, Feb 3, 2012 at 11:17 AM, Dmitry Nikiforov <dniq@dniq-online.com> wrote:
Here's a question: when I say "for a in ( 0..1E8 )" - does Ruby create
an array and populate is with values from 0 through 1E8, or does it
merely create a counter similar to "for( a = 0; a<=1e8, a++ )" ?
The thing you need to know about Rubinius and JRuby is that they both JIT (just-in-time) compile the code, but they need to collect statistics on the runtime profile first. That usually takes a few seconds. So any test that runs for under 10s or so doesn't give the runtime much opportunity to optimize the code.
If you really are going to be working on big datasets, then try to benchmark something that takes at least a minute or so to run. You *cannot* reliably extrapolate performance from a test that runs 4s versus 2s.
cr
···
On Feb 3, 2012, at 12:56 PM, Dmitry Nikiforov wrote:
Tried rubinius and jruby. Rubinius so far is the fastest one, but still
slower than Perl. The empty "for" loop runs about as fast as "while" or
.each (10 seconds - rubinius, 3.5 seconds - Perl), although .times
takes only 6 seconds.
Regexp match using / test /.match works about the same as s =~ / test /,
and is about 5 seconds, vs. Perl's 1.4s (1e7 repetitions), although,
seeing as there's such a huge difference in just empty loop alone it's
hard to tell if it's because regexps themselves are slower in Ruby, or
if it's because of the regexp engine...
> Here's another example with significantly bigger performance difference:
>
> Ruby:
>
> s = "This is a test string"
>
> re = Regexp.new( / test / )
>
> for a in 0..1E7
> re.match( s )
> end
>
> Perl:
>
> my $s = "This is a test string";
>
> for my $a ( 0..1E7 ) {
> $s =~ / test /;
> }
>
> Perl takes about 1.5 seconds to execute this, while Ruby takes a
> whopping 16!!! :((( I have a very strong feeling that I didn't compile
> Ruby properly - there can't be such a huge difference in regexp matching
>
It's all the parens, whitespace, and use of tabs that slows ruby down:
Euhmmm, I doubt that ...
# takes 26.6 seconds on my laptop:
s = "This is a test string"
re = Regexp.new( / test / )
for a in 0..1E7
re.match( s )
end
# takes 8.67 seconds on my laptop:
s = "This is a test string"
for a in 0..1E7
s =~ / test /
end
The same "formatted" code with just replacing re.match( s) by
s =~ /test/ also causes the same change from 22 to 7 seconds
on my system (with the same formatting, spaces, etc.).
I rather expect it's because
`match` and `=~` do quite different things ...
`match` returns a complete MatchData object
`=~` returns the index (position) of the first match
017:0> re.match( s )
=> #<MatchData " test ">
018:0> s =~ /test/
=> 10
<speculation>
Maybe (speculation) the MatchData object takes more
dynamic Object allocation and thus more calls to the GC ?
</speculation>
HTH,
Peter
···
On Fri, Feb 3, 2012 at 12:20 AM, Ryan Davis <ryand-ruby@zenspider.com>wrote:
On Feb 2, 2012, at 14:55 , Dmitry Nikiforov wrote:
Ryan is being a little facetious about the parenthesis and whitespace in
case that isn't clear. He has strong preferences about coding style.
Your test above runs in about 10 seconds on my system under Ruby 1.9.2.
The following equivalent code runs in about 6 seconds and is fairly
idiomatic Ruby:
s = "This is a test string"
(0..1E7).each do
s =~ / test /
end
This code runs in about 4 seconds, but it is a bit less pretty to my eyes:
s = "This is a test string"
i = 0
while i < 1E7 do
s =~ / test /
i += 1
end
I'm sure there are other solutions as well. The thing to keep in mind
is that method calls in Ruby are relatively expensive, so if you need
speed, you should try to avoid them.
Don't get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.
-Jeremy
···
On 02/02/2012 05:20 PM, Ryan Davis wrote:
On Feb 2, 2012, at 14:55 , Dmitry Nikiforov wrote:
Here's another example with significantly bigger performance difference:
Ruby:
s = "This is a test string"
re = Regexp.new( / test / )
for a in 0..1E7
re.match( s )
end
Perl:
my $s = "This is a test string";
for my $a ( 0..1E7 ) {
$s =~ / test /;
}
Perl takes about 1.5 seconds to execute this, while Ruby takes a
whopping 16!!! :((( I have a very strong feeling that I didn't compile
Ruby properly - there can't be such a huge difference in regexp matching
It's all the parens, whitespace, and use of tabs that slows ruby down:
One thing you can do is to replace for loops with while loops. For loops in Ruby will be translated to method calls to Enumerable#each, and in Ruby 1.9, Enumerable#each is slower than using ordinary while loops because of the overhead of processing enumerators. It is actually even slower than Ruby 1.8's Enumerable#each because 1.8 does not have enumerators.
···
On 2/2/2012 9:14 PM, Dmitry Nikiforov wrote:
My main area of development is processing of rather large amounts of
data (billions of entries, primarily processed by regular expressions,
with some statistical analysis on top, and potentially - addition of NLP
later). You _have_ to iterate through every entry of the incoming data
(which might already be in the database, plain text file, or might be
just a "fire hose" of data pouring into the system in real time).
While I'd LOVE to have a nice and clean syntax, performance is still
number 1 on my list of priorities, therefore I asked if maybe there are
ways to improve Ruby performance.
Choosing the right language is a lot less important than choosing the
right algorithm:
5461 % time ruby -e 'n = 10**8; p (n + 3*n**2 + 2*n**3)/6'
333333338333333350000000
real 0m0.009s
user 0m0.004s
sys 0m0.004s
My code was merely an example of very simple loop. Its purpose was not
to calculate something, but run through the loop, and execute
multiplication on every iteration.
Yes, and maybe it was a bad example for what you are trying to do:
My main area of development is processing of rather large amounts of
data (billions of entries, primarily processed by regular expressions,
with some statistical analysis on top, and potentially - addition of NLP
later). You _have_ to iterate through every entry of the incoming data
(which might already be in the database, plain text file, or might be
just a "fire hose" of data pouring into the system in real time).
Question: the data needs to come from somewhere. Are you sure that
your processing is CPU bound? If it is IO bound the difference
between Perl and Ruby won't really show. I reckon it's better to
create a more realistic example of what you are trying to do and
measure again. (And take care to run tests between Ruby and Perl
alternating in order to prevent OS IO caching from preferring one or
the other.)
Kind regards
robert
···
On Fri, Feb 3, 2012 at 3:14 AM, Dmitry Nikiforov <dniq@dniq-online.com> wrote: