Ummmm... yeah... every time you tell it to do ten times as many loops,
it takes almost ten times as long. What's so surprising?
If you tweak the code to make it say how long each loop took, you'll
see that it actually gets FASTER at first, presumably due to assorted
constant overhead, then a tiny pinch slower (possibly due to the
switch to a different kind of number) but little enough that IMHO
that's lost in the noise.
puts("Pwr Tot Secs uS/Loop ")
(3..8).each do |x|
limit = 10**x
start_time = Time.now()
for a in 0 .. limit
# do nothing here, just timing how long the loops take
end
elapsed = Time.now() - start_time
puts(" #{x} #{elapsed} #{elapsed * 1000000.0 / limit} ")
end
You're replacing a method call (a.match b) with a syntactic construct a =~ b, the latter of which bypasses method dispatch and goes straight to the C-implimentation. Nothing else is really different, just a more direct code path. The match data is still available via the usual globals.
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new (assuming no args/blocks passed). They're throwbacks to java devs and serve no purpose but to make things more verbose. In this specific case, there are tangible reasons to use =~ over #match.
···
On Feb 2, 2012, at 15:33 , Peter Vandenabeele wrote:
The same "formatted" code with just replacing re.match( s) by
s =~ /test/ also causes the same change from 22 to 7 seconds
on my system (with the same formatting, spaces, etc.).
I tried to drive that point home by showing a ruby solution that was 1000x faster than his perl solution, but unfortunately, rationality and micro-benchmarking don't often play well together.
···
On Feb 2, 2012, at 15:36 , Jeremy Bopp wrote:
Don't get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.
Hmmm... Didn't realize this would make difference Thanks!
Don't get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.
Well, I started doing these benchmarks after I've tried to rewrite parts
of the project I'm working on in Ruby. The project is rather
complicated, so it seemed as if Ruby's neat, clean syntax would make it
easier to handle, but the performance was dreadful. Initially I tried
1.8.7 that came natively with OS X Lion, then installed 1.9.3, without
much difference in performance - it's still mostly multiple times slower
than the Perl version I have The problem with Perl version, though,
is that once it reaches certain limit - it becomes rather hard to manage
(especially so if you focus on performance the most - there are tricks
in Perl that make code run significantly faster, but make it virtually
unreadable).
One thing you can do is to replace for loops with while loops. For loops
in Ruby will be translated to method calls to Enumerable#each, and in
Ruby 1.9, Enumerable#each is slower than using ordinary while loops
because of the overhead of processing enumerators. It is actually even
slower than Ruby 1.8's Enumerable#each because 1.8 does not have
enumerators.
Question: the data needs to come from somewhere. Are you sure that
your processing is CPU bound? If it is IO bound the difference
between Perl and Ruby won't really show. I reckon it's better to
create a more realistic example of what you are trying to do and
measure again. (And take care to run tests between Ruby and Perl
alternating in order to prevent OS IO caching from preferring one or
the other.)
Yep, I'm sure it's CPU bound: the CPU load is at 100%. The data comes in
faster than the Ruby script can process it, unfortunately, at this
point. I'm trying to optimize it, of course, but so far Perl version
beats Ruby hands down. But there aren't too many options at my disposal,
it seems. In my examples, the "for" seems to be the major culprit: it
alone, without ANYTHING within the loop, takes 19 seconds to execute 1E8
times. The (0..1E8).each only saves about 1 second for me. Which doesn't
really matter - most of the loops in my scripts are "while" loops
anyway. Still, the regexps themselves run very slow. I wish Ruby used
standard Perl's PCRE library - that would make at least regexps run as
fast as they do in Perl, and I would be able to write my scripts in Ruby
i noticed the mult 10 too late. what i was emphasizing is that, given
the simple loop above, your point of acceptance should be less than
10**7. otherwise, beyond that, you' d get unacceptable response time.
just imagine, 7 seconds! this would not be acceptable for database
apps for example w response times of less than 5 seconds.
kind regards -botp
···
On Sat, Feb 4, 2012 at 12:27 AM, Dave Aronson <rubytalk2dave@davearonson.com> wrote:
On Thu, Feb 2, 2012 at 23:03, botp <botpena@gmail.com> wrote:
(3..8).each do |x|
t=Time.now();for a in 0..10**x;end; puts("#{x} #{Time.now()-t}")
end
You're replacing a method call (a.match b) with a syntactic construct a =~
b, the latter of which bypasses method dispatch and goes straight to the
C-implimentation.
Wow, I never knew that. I don't understand how it accomplishes this, a
could be any kind of object with =~ defined anywhere on it, how can it
bypass method dispatch?
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed).
I usually use `meth Hash.new` instead of `meth({})` I think it looks
cleaner.
···
On Thu, Feb 2, 2012 at 7:01 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They're throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.
The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way "qr/ test /" in Perl would do, so that
it doesn't have to re-compile it on every iteration.
Hard to believe that this thread has gone this long without a mention of other Ruby runtimes.
You may want to also benchmark with JRuby (jruby.org) or with Rubinius (rubini.us). For ease of installation, you may want to consider using "rvm" to manage your Rubies (google for it to figure out how to install it).
cr
···
On Feb 2, 2012, at 8:19 PM, Dmitry Nikiforov wrote:
Jeremy Bopp wrote in post #1043805:
(0..1E7).each do
s =~ / test /
end
Hmmm... Didn't realize this would make difference Thanks!
Don't get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.
Well, I started doing these benchmarks after I've tried to rewrite parts
of the project I'm working on in Ruby. The project is rather
complicated, so it seemed as if Ruby's neat, clean syntax would make it
easier to handle, but the performance was dreadful. Initially I tried
1.8.7 that came natively with OS X Lion, then installed 1.9.3, without
much difference in performance - it's still mostly multiple times slower
than the Perl version I have The problem with Perl version, though,
is that once it reaches certain limit - it becomes rather hard to manage
(especially so if you focus on performance the most - there are tricks
in Perl that make code run significantly faster, but make it virtually
unreadable).
It creates a Range, which just iterates, not an array. A more idiomatic way
would probably be (1+10**8).times { ... }
As an aside, if all the processing is happening in the loop, then it might
make more sense that the loop just delegates work out to other processes
(e.g. parse a line or process a parsed set of data). This could be pretty
simple if done with a thread pool in a single Ruby script (you'll want one
of the alternate implementations here since you're CPU bound and MRI has a
GIL), or as arbitrarily complex as you like.
···
On Fri, Feb 3, 2012 at 11:13 AM, Dmitry Nikiforov <dniq@dniq-online.com>wrote:
Robert Klemme wrote in post #1043884:
> Question: the data needs to come from somewhere. Are you sure that
> your processing is CPU bound? If it is IO bound the difference
> between Perl and Ruby won't really show. I reckon it's better to
> create a more realistic example of what you are trying to do and
> measure again. (And take care to run tests between Ruby and Perl
> alternating in order to prevent OS IO caching from preferring one or
> the other.)
Yep, I'm sure it's CPU bound: the CPU load is at 100%. The data comes in
faster than the Ruby script can process it, unfortunately, at this
point. I'm trying to optimize it, of course, but so far Perl version
beats Ruby hands down. But there aren't too many options at my disposal,
it seems. In my examples, the "for" seems to be the major culprit: it
alone, without ANYTHING within the loop, takes 19 seconds to execute 1E8
times. The (0..1E8).each only saves about 1 second for me. Which doesn't
really matter - most of the loops in my scripts are "while" loops
anyway. Still, the regexps themselves run very slow. I wish Ruby used
standard Perl's PCRE library - that would make at least regexps run as
fast as they do in Perl, and I would be able to write my scripts in Ruby
You're replacing a method call (a.match b) with a syntactic construct a =~
b, the latter of which bypasses method dispatch and goes straight to the
C-implimentation.
Wow, I never knew that. I don't understand how it accomplishes this, a
could be any kind of object with =~ defined anywhere on it, how can it
bypass method dispatch?
MAGIC!
The code does extra type-checking at runtime.
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed).
I usually use `meth Hash.new` instead of `meth({})` I think it looks
cleaner.
def meth h = {}
# ...
end
takes care of this entirely.
···
On Feb 2, 2012, at 17:36 , Josh Cheek wrote:
On Thu, Feb 2, 2012 at 7:01 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:
Everything in Ruby is an object, even regexps, so you can save your
regexp to a variable or a constant to avoid a recompile. In addition,
the // expression is pretty much just syntactic sugar for
Regexp.new("some string") or Regexp.new(/some regexp/), so you can
forgoe that noise. The sugar is probably faster too since it should
avoid Ruby method calls, unlike Regexp.new, not that it should be an
issue in this example.
To see if this helps at all, try changing the code to the following:
s = "This is a test string"
re = / test /
for a in 0..1E7
s =~ re
end
Try a similar change to the other looping variations that have been
discussed and see if and how much they may improve. For me I didn't
really see any difference between using re as above or using the simple
regexp directly; however, the code was almost an order of magnitude
slower when I replaced the comparison as follows:
s =~ / test#{} /
It seems that Ruby is smart enough to see that the simple regexp will
never need to be re-evaluated. The regexp used above must force that
optimization off because #{} while constantly evaluated to the empty
string is technically dynamic, thus the regexp needs to be re-evaluated
in every iteration of the loop.
If you *really* need performance in the end, however, you might want to
consider coding your critical code paths in something like C and then
calling those from Ruby as a direct extension or using something like
ffi to call into a DLL containing the logic. Your overall code base may
be a little messy, but sometimes the speed you need requires such a
trade-off. Hopefully, you can keep the mess limited to only a small set
of your overall application logic. Of course, the same holds true for
Perl in this regard.
-Jeremy
···
On 02/02/2012 08:21 PM, Dmitry Nikiforov wrote:
Ryan Davis wrote in post #1043813:
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They're throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.
The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way "qr/ test /" in Perl would do, so that
it doesn't have to re-compile it on every iteration.
Not necessary in Ruby: regexp literals are treated specially and are
not recompiled. Usually it's faster to do
io.each do |line|
if line =~ /foo/
end
end
than
rx = /foo/
io.each do |line|
if line =~ rx
end
end
If there is dynamic content, use /o:
input = gets
io.each do |line|
if line =~ /foo:#{input}/o
end
end
Kind regards
robert
···
On Fri, Feb 3, 2012 at 3:21 AM, Dmitry Nikiforov <dniq@dniq-online.com> wrote:
Ryan Davis wrote in post #1043813:
IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They're throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.
The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way "qr/ test /" in Perl would do, so that
it doesn't have to re-compile it on every iteration.
It creates a Range, which just iterates, not an array. A more idiomatic
way
would probably be (1+10**8).times { ... }
Phew...
As an aside, if all the processing is happening in the loop, then it
might
make more sense that the loop just delegates work out to other processes
(e.g. parse a line or process a parsed set of data). This could be
pretty
simple if done with a thread pool in a single Ruby script (you'll want
one
of the alternate implementations here since you're CPU bound and MRI has
a
GIL), or as arbitrarily complex as you like.
Yeah, that's how it works in my Perl version - it all runs on Amazon,
with workload delegated to "worker" servers in a MapReduce-like fashion,
using Redis for inter-server communication.
The o flag tells Ruby to only interpolate the first time, and then cache
the regex
s =~ / test#{} /o
···
On Thu, Feb 2, 2012 at 8:57 PM, Jeremy Bopp <jeremy@bopp.net> wrote:
Try a similar change to the other looping variations that have been
discussed and see if and how much they may improve. For me I didn't
really see any difference between using re as above or using the simple
regexp directly; however, the code was almost an order of magnitude
slower when I replaced the comparison as follows:
s =~ / test#{} /
It seems that Ruby is smart enough to see that the simple regexp will
never need to be re-evaluated. The regexp used above must force that
optimization off because #{} while constantly evaluated to the empty
string is technically dynamic, thus the regexp needs to be re-evaluated
in every iteration of the loop.
If you *really* need performance in the end, however, you might want to
consider coding your critical code paths in something like C and then
calling those from Ruby as a direct extension or using something like
ffi to call into a DLL containing the logic. Your overall code base may
be a little messy, but sometimes the speed you need requires such a
trade-off. Hopefully, you can keep the mess limited to only a small set
of your overall application logic. Of course, the same holds true for
Perl in this regard.
Well, the performance of Perl has so far been very satisfactory. In
fact, as far as RegExps are concerned - I could barely match Perl's
performance in C++ (and even then had to mix in some plain C code). So
far, it seems, that I'm stuck with Perl Not that it's really a bad
thing - I've been developing in it since 1997, so I know it pretty well,
while I've only spent about 2 weeks with Ruby.
I guess I will have to wait and see if Ruby interpreter becomes more
efficient But I have to confess: I'm REALLY tempted to, in some
cases, forgo the performance in favor of handsome code