Interesting discovery

I had a user report a slowdown in PDF::Writer. I'm pretty certain I
know where the problem is, but I found something very interesting.
About 3% of the profiled performance of the manual generation (with
Shugo Maeda's new profiler) -- and near 20,000,000 (yes twenty
MILLION) calls -- are calls to Kernel.respond_to?.

A little digging later, and it looks like 98% of these calls -- or
more -- are in Marshal.dump. That's ... amazing.

-austin

···

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

wow. can you confirm with somthing like

   def Kernel::respond_to?(*a, &b)

···

On Wed, 1 Jun 2005, Austin Ziegler wrote:

I had a user report a slowdown in PDF::Writer. I'm pretty certain I
know where the problem is, but I found something very interesting.
About 3% of the profiled performance of the manual generation (with
Shugo Maeda's new profiler) -- and near 20,000,000 (yes twenty
MILLION) calls -- are calls to Kernel.respond_to?.

A little digging later, and it looks like 98% of these calls -- or
more -- are in Marshal.dump. That's ... amazing.

     #
     # keep yer own count somehow...
     #
     __aliased_respond_to?(*a, &b)
   end

   def Marshal::dump(*a, &b)
     #
     # keep yer own count somehow...
     #
     __aliased_dump(*a, &b)
   end

and see how they scale (linearly/exponentially/etc)? that's so crazy it seems
like more investigation might be needed...

cheers.

-a
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

I've done quite a bit of confirmation. Much of this has to do with
the unique nature of the object graph in PDF::Writer documents --
there's quite a bit of depth to the object graph.

I was originally investigating Transaction::Simple as the culprit of
my number of calls, but that didn't mesh with what I *did* find. My
final test was to do:

  def Kernel.respond_to?(*a)
    puts caller[0]
    old_respond_to(*a)
  end

After a few moments, I literally had tens of *thousands* of calls
indicating that the caller was Marshal.dump.

My guess -- and I haven't looked at the Marshal.dump code to confirm
-- is that what's beeing looked for is a custom dumper, and that
search is faster than assuming the dumper is there (because in most
cases, it won't be, I don't think).

-austin

···

On 5/31/05, Ara.T.Howard <Ara.T.Howard@noaa.gov> wrote:

On Wed, 1 Jun 2005, Austin Ziegler wrote:

I had a user report a slowdown in PDF::Writer. I'm pretty certain
I know where the problem is, but I found something very
interesting. About 3% of the profiled performance of the manual
generation (with Shugo Maeda's new profiler) -- and near
20,000,000 (yes twenty MILLION) calls -- are calls to
Kernel.respond_to?.

A little digging later, and it looks like 98% of these calls --
or more -- are in Marshal.dump. That's ... amazing.

wow. can you confirm with somthing like

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

great. thanks for the follow-up. i run into things like that alot and always
wonder if calling a method and rescuing NoMethodError is slower or faster than
checking with respond_to?

   harp:~ > cat a.rb
   require 'benchmark'

   class C
     def c; end
     def rescue_test n
       n.times do
         begin
           self.a
         rescue NoMethodError
           begin
             self.b
           rescue NoMethodError
             self.c
           end
         end
       end
     end
     def respond_to_test n
       n.times do
         if respond_to? 'a'
           self.a
         elsif respond_to? 'b'
           self.b
         else
           self.c
         end
       end
     end
   end

   c = C::new
   n = 2 ** 16
   GC::disable

   Benchmark::bm do |b|
     %w(rescue_test respond_to_test).map{|m| b.report(m){c.send m,n}}
   end

   harp:~ > ruby a.rb
         user system total real
   rescue_test 2.940000 0.320000 3.260000 ( 3.264758)
   respond_to_test 0.160000 0.050000 0.210000 ( 0.208268)

looks like respond_to? is indeed faster.

cheers.

-a

···

On Wed, 1 Jun 2005, Austin Ziegler wrote:

On 5/31/05, Ara.T.Howard <Ara.T.Howard@noaa.gov> wrote:

On Wed, 1 Jun 2005, Austin Ziegler wrote:

I had a user report a slowdown in PDF::Writer. I'm pretty certain I know
where the problem is, but I found something very interesting. About 3% of
the profiled performance of the manual generation (with Shugo Maeda's new
profiler) -- and near 20,000,000 (yes twenty MILLION) calls -- are calls
to Kernel.respond_to?. =20 A little digging later, and it looks like 98%
of these calls -- or more -- are in Marshal.dump. That's ... amazing.

wow. can you confirm with somthing like

I've done quite a bit of confirmation. Much of this has to do with the
unique nature of the object graph in PDF::Writer documents -- there's quite
a bit of depth to the object graph.

I was originally investigating Transaction::Simple as the culprit of my
number of calls, but that didn't mesh with what I *did* find. My final test
was to do:

def Kernel.respond_to?(*a)
   puts caller[0]
   old_respond_to(*a)
end

After a few moments, I literally had tens of *thousands* of calls indicating
that the caller was Marshal.dump.

My guess -- and I haven't looked at the Marshal.dump code to confirm -- is
that what's beeing looked for is a custom dumper, and that search is faster
than assuming the dumper is there (because in most cases, it won't be, I
don't think).

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

But if the method *is* defined, rescue is faster:

   class C
     def a; end # <== added
     def c; end
   ...

                           user system total real
rescue_test 0.070000 0.000000 0.070000 ( 0.072631)
respond_to_test 0.130000 0.010000 0.140000 ( 0.140980)

So, if some obejcts have the methods and some dont, the best choice depends on the balance between the two branches (defined vs. undefined).

···

Ara.T.Howard@noaa.gov wrote:

  Benchmark::bm do |b|
    %w(rescue_test respond_to_test).map{|m| b.report(m){c.send m,n}}
  end

  harp:~ > ruby a.rb
        user system total real
  rescue_test 2.940000 0.320000 3.260000 ( 3.264758)
  respond_to_test 0.160000 0.050000 0.210000 ( 0.208268)

looks like respond_to? is indeed faster.

very true - i hadn't thought of that...

thanks.

-a

···

On Fri, 3 Jun 2005, Joel VanderWerf wrote:

Ara.T.Howard@noaa.gov wrote:

  Benchmark::bm do |b|
    %w(rescue_test respond_to_test).map{|m| b.report(m){c.send m,n}}
  end

  harp:~ > ruby a.rb
        user system total real
  rescue_test 2.940000 0.320000 3.260000 ( 3.264758)
  respond_to_test 0.160000 0.050000 0.210000 ( 0.208268)

looks like respond_to? is indeed faster.

But if the method *is* defined, rescue is faster:

  class C
    def a; end # <== added
    def c; end
  ...

                          user system total real
rescue_test 0.070000 0.000000 0.070000 ( 0.072631)
respond_to_test 0.130000 0.010000 0.140000 ( 0.140980)

So, if some obejcts have the methods and some dont, the best choice
depends on the balance between the two branches (defined vs. undefined).

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

* Joel VanderWerf <vjoel@path.berkeley.edu> [2005-06-03 04:43:52 +0900]:

> rescue_test 2.940000 0.320000 3.260000 ( 3.264758)
> respond_to_test 0.160000 0.050000 0.210000 ( 0.208268)
>
>
>looks like respond_to? is indeed faster.

But if the method *is* defined, rescue is faster:

I'm not sure I understand. If the method exists, what is the
need for rescue to be called?

Are the options being tested these:

  # with respond_to?
  if x.respond_to(:meth) x.meth # or x.call(:meth)
  # or with rescue
  begin
    x.meth
  rescue
    # no meth

  So, if meth exists, the only overhead should be the 'if x.respond_to'.
  If the method does nothing, then a check of this type may appear expensive,
  but if the method does something, then the 'if' should not be too much
  of a burden.
  I suppose that this could be a problem in a tight loop. Under such
  conditions, it may be better to just rescue.

···

Ara.T.Howard@noaa.gov wrote:

--
Jim Freeze
Theory and practice are the same, in theory. -- Ryan Davis

class C
   def a; Object.no_such_method; end
end

rescue NoMethodError is an inappropriate test for whether or not a method exists. (This may or may not be important depending on what what else you need to rescue.)

Being faster does not make it the correct solution.

···

On 02 Jun 2005, at 12:43, Joel VanderWerf wrote:

Ara.T.Howard@noaa.gov wrote:

  Benchmark::bm do |b|
    %w(rescue_test respond_to_test).map{|m| b.report(m){c.send m,n}}
  end
  harp:~ > ruby a.rb
        user system total real
  rescue_test 2.940000 0.320000 3.260000 ( 3.264758)
  respond_to_test 0.160000 0.050000 0.210000 ( 0.208268)
looks like respond_to? is indeed faster.

But if the method *is* defined, rescue is faster:

  class C
    def a; end # <== added
    def c; end

So, if some obejcts have the methods and some dont, the best choice depends on the balance between the two branches (defined vs. undefined).

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Jim Freeze wrote:

* Joel VanderWerf <vjoel@path.berkeley.edu> [2005-06-03 04:43:52 +0900]:

rescue_test 2.940000 0.320000 3.260000 ( 3.264758)
respond_to_test 0.160000 0.050000 0.210000 ( 0.208268)

looks like respond_to? is indeed faster.

But if the method *is* defined, rescue is faster:

I'm not sure I understand. If the method exists, what is the
need for rescue to be called?

I was unclear: I was hypothesizing a different test, in which a certain proportion of the objects had the method and the rest did not. It might be interesting to determine what ratio is the break-even point...

···

Ara.T.Howard@noaa.gov wrote:

Ara.T.Howard@noaa.gov said:

···

On Fri, 3 Jun 2005, Joel VanderWerf wrote:

But if the method *is* defined, rescue is faster:

very true - i hadn't thought of that...

Now, if we could only find out if the method is defined, we could choose
the most efficient way of asking if a method is defined.

Or something like that.

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

* Joel VanderWerf <vjoel@path.berkeley.edu> [2005-06-03 06:43:40 +0900]:

Jim Freeze wrote:
>* Joel VanderWerf <vjoel@path.berkeley.edu> [2005-06-03 04:43:52 +0900]:

>I'm not sure I understand. If the method exists, what is the
>need for rescue to be called?

I was unclear: I was hypothesizing a different test, in which a certain
proportion of the objects had the method and the rest did not. It might
be interesting to determine what ratio is the break-even point...

Yes, but doesn't that ratio depend upon the time taken in each method?

I think the rule of thumb is:

  - If a missing method is a rare occurance, use rescue
  - If a missing method is common and expected, use #respond_to
  - If you don't know, then drop back ten yards and punt.

···

--
Jim Freeze
Theory and practice are the same, in theory. -- Ryan Davis