Ruby optimization - re-implement in compiled language?

Hi all,

I'm a relative Ruby newbie looking for some advice on optimisation ...

I've been using ruby to implement a make-like build system for my personal
use. The end result was quick to produce, has nice code and does things
the way I like it. Full marks to Ruby :slight_smile:

.... Sadly it's also very slow.

I've read the advice in the Programmatic Programmers Guide, removed all
my obvious newbie errors, but it's still too slow.

The profiler identifies all the major bottlenecks as low level ruby
methods (.each etc), so I'm considering re-implemented the guts of the
dependency analysis in C++. I'm sure people here have done similar things,
so I was wondering how beneficial it was for them?

For what it's worth:
聽聽The algorithms involve quite deep recursive calls (by necessity)
聽聽The scripts appear to pause for several seconds at arbitrary points
聽聽聽聽- I've tried disabling the GC
聽聽I'm using Ruby 1.8.2
聽聽Most data is stored in DBM style tables

Thanks in advance ...

路路路

--
Ant Sims
antsims9999@yahoo.co.uk

Ant Sims, April 4:

The algorithms involve quite deep recursive calls (by necessity)

Are you sure? Perhaps you need to rethink your algorithm. Before you
go rewriting your application in another language, a simpler solution is
to rethink your problem/implementation.

Other than that, recursion can be sort of expensive, as Ruby doesn't do
tail-call optimizations (yet),
        nikolai

路路路

--
Nikolai Weibull, now available free of change at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Is there something that Rake doesn't do to fit your needs?

Or make, for that matter?

PGP.sig (194 Bytes)

路路路

On 03 Apr 2005, at 14:54, Ant Sims wrote:

Hi all,

I'm a relative Ruby newbie looking for some advice on optimisation ...

I've been using ruby to implement a make-like build system for my personal
use. The end result was quick to produce, has nice code and does things
the way I like it. Full marks to Ruby :slight_smile:

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

I've read the advice in the Programmatic Programmers Guide, removed all
my obvious newbie errors, but it's still too slow.

The profiler identifies all the major bottlenecks as low level ruby
methods (.each etc), so I'm considering re-implemented the guts of the
dependency analysis in C++. I'm sure people here have done similar things,
so I was wondering how beneficial it was for them?

Each shows high in a profiler, because the entire contents of the inner
part of the loop count against it. Just so you're warned.

Ari

So, iiuc, you're saying that the profiler sort of counts things multiple
times--when you look at any individual part of the inner loop
(wherever/whatever that might be), the entire time spent in that inner loop
is "charged" to each individual part that you might look at?

So, you really can't tell which of those parts is the biggest contributor to
the (presumed) problem?

Randy Kramer

路路路

On Monday 04 April 2005 01:06 am, Aredridel wrote:

> The profiler identifies all the major bottlenecks as low level ruby
> methods (.each etc), so I'm considering re-implemented the guts of the
> dependency analysis in C++. I'm sure people here have done similar
> things, so I was wondering how beneficial it was for them?

Each shows high in a profiler, because the entire contents of the inner
part of the loop count against it. Just so you're warned.

Sorry, I probably should have said ... my build process is quite different
to a standard project (lots of preprocessing stages etc) - hence I use a
non-standard build tool. Also, it was kinda an exercise in learning ruby ...

Personally, I reckon make's support for dependency rules that change over
time is a little poor. If rule change during the build then make must be
reinvoked.

make's use of timestamps is annoying, particularly since some of my
source files are served over nfs. Using GNU make on a Linux ext3
filesystem, this is made worst by the fact that the granularity of file
timestamps is one second. Worse: GNU make thinks two identical
timestamps indicates that a file does not need to be re-built! Build a
file in under a second (e.g. a simple preprocessing script) and is
dependent products won't get updated!

I know there's better systems out there now - Rake, SCons, but I decided
to ditch make a long time ago. Certainly before I was aware of the
alternatives and probably before some of them existed. Now I'm kinda
dependent on all my bizarre build stages :slight_smile:

路路路

On Mon, 04 Apr 2005 11:45:51 +0900, Eric Hodel wrote:

On 03 Apr 2005, at 14:54, Ant Sims wrote:

I've been using ruby to implement a make-like build system for my
personal use

Is there something that Rake doesn't do to fit your needs?

Or make, for that matter?

--
Ant Sims
antsims9999@yahoo.co.uk

Recursion is kinda implicit in the task - one step requires another be
performed as part of it's task. The problem is you don't know what
a build step will require till it is performed.

Sure, you can manually save all state during a process and restore it
later, but it's hassle.

One thing I'm thinking of adding is a means of guessing what steps another
will require based on it's past history. Then prepare them up front so no
recursive build process occurs. That gets hard if the step is being
rebuilt because the user has corrected an error resulting in an
un-buildable dependency, but it presents a nice challenge ... :slight_smile:

Incidentally, by deep I mean 10 levels max.

路路路

On Mon, 04 Apr 2005 07:55:05 +0900, Nikolai Weibull wrote:

Ant Sims, April 4:

The algorithms involve quite deep recursive calls (by necessity)

Are you sure? Perhaps you need to rethink your algorithm. Before you
go rewriting your application in another language, a simpler solution is
to rethink your problem/implementation.

Other than that, recursion can be sort of expensive, as Ruby doesn't do
tail-call optimizations (yet),

--
Ant Sims
antsims9999@yahoo.co.uk

That's really helpful to know. Thanks.

Re-examining the profiler output it looks like most of the cost is doing
GDBM lookups. By that time the keys are just strings so it looks like it's
an issue with GDBM or, more likely, the way I'm using it (the .db files
are quite small).

Looks like a re-code in C++ is pointless.

路路路

On Mon, 04 Apr 2005 14:06:45 +0900, Aredridel wrote:

The profiler identifies all the major bottlenecks as low level ruby
methods (.each etc), so I'm considering re-implemented the guts of the
dependency analysis in C++. I'm sure people here have done similar things,
so I was wondering how beneficial it was for them?

Each shows high in a profiler, because the entire contents of the inner
part of the loop count against it. Just so you're warned.

--
Ant Sims
antsims9999@yahoo.co.uk

"Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
news:200504040851.14886.rhkramer@gmail.com...

> > The profiler identifies all the major bottlenecks as low level ruby
> > methods (.each etc), so I'm considering re-implemented the guts of

the

> > dependency analysis in C++. I'm sure people here have done similar
> > things, so I was wondering how beneficial it was for them?
>
> Each shows high in a profiler, because the entire contents of the

inner

> part of the loop count against it. Just so you're warned.

So, iiuc, you're saying that the profiler sort of counts things multiple
times--when you look at any individual part of the inner loop
(wherever/whatever that might be), the entire time spent in that inner

loop

is "charged" to each individual part that you might look at?

So, you really can't tell which of those parts is the biggest

contributor to

the (presumed) problem?

It's a bit different. First of all there is a column "cumulative seconds"
in the output which actually sums up time through the output table (i.e.
"cumulative seconds" of one row is the value of the previous row plus this
rows "self seconds".

(Note: percent values are garbled because of #sleep)

15:16:01 [source]: ruby -r profile -e '1000.times { sleep 0.01 }'
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
206.67 0.03 0.03 1 31.00 31.00
Profiler__.start_profile
100.00 0.05 0.01 1000 0.01 0.01 Kernel.sleep
  0.00 0.05 0.00 1 0.00 15.00 Integer#times
  0.00 0.05 0.00 1 0.00 15.00 #toplevel

Then if you iterate only once through any Enumerable then #each shows up
as 1 call - which can be confusing. But it's completely logical since
it's actually called only once. Just the block is invoked multiple times.
So, assuming that you have a lot of stuff going on in a block, that time
can be seen nowhere else than within the time of #each - there's no extra
marker for the block. Of course you'll see times of methods that are
called from the block on rows of their own.

Another example with ungarbled percent values:

15:22:48 [source]: ruby -r profile -e '100.times { i=0; i+=1 while i <
1000 }'
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
59.89 15.11 15.11 1 15113.00 25234.00 Integer#times
20.83 20.37 5.26 100000 0.05 0.05 Fixnum#+
19.28 25.23 4.87 100100 0.05 0.05 Fixnum#<
  0.12 25.27 0.03 1 31.00 31.00
Profiler__.start_profile
  0.00 25.27 0.00 1 0.00 25234.00 #toplevel

Kind regards

    robert

路路路

On Monday 04 April 2005 01:06 am, Aredridel wrote:

"Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
> So, you really can't tell which of those parts is the biggest
> contributor to the (presumed) problem?

---<good stuff snipped>---

Then if you iterate only once through any Enumerable then #each shows up
as 1 call - which can be confusing. But it's completely logical since
it's actually called only once. Just the block is invoked multiple times.
So, assuming that you have a lot of stuff going on in a block, that time
can be seen nowhere else than within the time of #each - there's no extra
marker for the block. Of course you'll see times of methods that are
called from the block on rows of their own.

Thanks! (I may need to read this and run the profiler a fewtimes to really
absorb this, but I think I have the gist of it.)

regards,
Randy Kramer

路路路

On Monday 04 April 2005 09:29 am, Robert Klemme wrote:

"Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
news:200504041732.52571.rhkramer@gmail.com...

> "Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
> > So, you really can't tell which of those parts is the biggest
> > contributor to the (presumed) problem?

---<good stuff snipped>---

> Then if you iterate only once through any Enumerable then #each shows

up

> as 1 call - which can be confusing. But it's completely logical since
> it's actually called only once. Just the block is invoked multiple

times.

> So, assuming that you have a lot of stuff going on in a block, that

time

> can be seen nowhere else than within the time of #each - there's no

extra

> marker for the block. Of course you'll see times of methods that are
> called from the block on rows of their own.

Thanks! (I may need to read this and run the profiler a fewtimes to

really

absorb this, but I think I have the gist of it.)

You're welcome. And you're not alone: it took me some time to understand
profiler output myself when I first dealt with it... :slight_smile: Nowadays I
mostly use module Benchmark for this kind of stuff because it's less
intrusive.

Cheers

    robert

路路路

On Monday 04 April 2005 09:29 am, Robert Klemme wrote: