Ruby in 50 milliseconds or less

If you use ruby 1.8 for quick command line tasks, and you use gems, you may notice that the interpreter has an execution overhead that is small but noticeable and irritating when repeated often enough.

$ time RUBYOPT='' ruby -e 1
ruby -e 1 0.01s user 0.00s system 105% cpu 0.011 total
$ time RUBYOPT='rubygems' ruby -e 1
RUBYOPT='rubygems' ruby -e 1 0.58s user 0.06s system 94% cpu 0.675 total

This is greatly improved in 1.9, which has gems built in.

$ time RUBYOPT='rubygems' ruby19 -e 1
RUBYOPT='rubygems' ruby19 -e 1 0.02s user 0.01s system 48% cpu 0.067 total

An order of magnitude improvement makes the delay much more acceptable, but if you're working with 1.8, that's not an option.

So here's a hack for 1.8 that restores the speed of bare-metal ruby but still lets you use gems. What it does is redefine Kernel#require to try loading things without rubygems, but fall back to using rubygems when there is a load failure.

Put the file in a dir on your $LOAD_PATH, and set RUBYOPT to reference it, as shown below. *Note:* I haven't tested this widely yet. It may break libraries that do their own hacking with require or use LOAD_ERROR for their own devious purposes. I advise not using this hack in production code without careful testing.

$ cat gem-fallback.rb
module Kernel
   req = method :require
   define_method :require do |*args|
     begin
       req.call(*args)
     rescue LoadError
       Kernel.module_eval do
         define_method(:require, &req)
       end
       require 'rubygems'
       require(*args)
     end
   end
end

$ time RUBYOPT='rgem-fallback' ruby -e 1
RUBYOPT='rgem-fallback' ruby -e 1 0.01s user 0.00s system 71% cpu 0.011 total

$ time RUBYOPT='rgem-fallback' ruby -e "require 'tagz'"
RUBYOPT='rgem-fallback' ruby -e "require 'tagz'" 0.60s user 0.07s system 79% cpu 0.850 total

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf wrote:

If you use ruby 1.8 for quick command line tasks, and you use gems, you
may notice that the interpreter has an execution overhead that is small
but noticeable and irritating when repeated often enough.

I've noticed this too.
My solution: a fake gem_prelude :slight_smile:
Great minds think alike.
It would be interesting to time things tho.
http://github.com/rogerdpack/faster_rubygems/tree/master
Cheers!
=r

···

--
Posted via http://www.ruby-forum.com/\.

Or simpler: only put "require 'rubygems'" at the top of scripts which
use rubygems.

(Obviously less convenient than using RUBYOPT of course, but your script
may be more portable)

···

--
Posted via http://www.ruby-forum.com/.

Here's an extreme example where this makes a huge difference:

I have a dir tree with large numbers of small gps log files, in CSV format, and I want to use ruby -a (autosplit) to work with them.

With RUBYOPT=rgem-fallback (or of course RUBYOPT=''):

$ time find . -type f -exec ruby -F, -ane '$F' {} \;
RUBYOPT='' find . -type f -exec ruby -F, -ane '$F' {} \; 2.06s user 1.67s system 39% cpu 9.431 total

With RUBYOPT=rubygems:

$ time find . -type f -exec ruby -F, -ane '$F' {} \;
find . -type f -exec ruby -F, -ane '$F' {} \; 219.02s user 61.52s system 93% cpu 4:59.26 total

Of course, awk would probably be even faster, but ...

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf wrote:

If you use ruby 1.8 for quick command line tasks, and you use gems, you may notice that the interpreter has an execution overhead that is small but noticeable and irritating when repeated often enough.

$ time RUBYOPT='' ruby -e 1
ruby -e 1 0.01s user 0.00s system 105% cpu 0.011 total
$ time RUBYOPT='rubygems' ruby -e 1
RUBYOPT='rubygems' ruby -e 1 0.58s user 0.06s system 94% cpu 0.675 total

This is greatly improved in 1.9, which has gems built in.

$ time RUBYOPT='rubygems' ruby19 -e 1
RUBYOPT='rubygems' ruby19 -e 1 0.02s user 0.01s system 48% cpu 0.067 total

An order of magnitude improvement makes the delay much more acceptable, but if you're working with 1.8, that's not an option.

So here's a hack for 1.8 that restores the speed of bare-metal ruby but still lets you use gems. What it does is redefine Kernel#require to try loading things without rubygems, but fall back to using rubygems when there is a load failure.

Put the file in a dir on your $LOAD_PATH, and set RUBYOPT to reference it, as shown below. *Note:* I haven't tested this widely yet. It may break libraries that do their own hacking with require or use LOAD_ERROR for their own devious purposes. I advise not using this hack in production code without careful testing.

$ cat gem-fallback.rb
module Kernel
  req = method :require
  define_method :require do |*args|
    begin
      req.call(*args)
    rescue LoadError
      Kernel.module_eval do
        define_method(:require, &req)
      end
      require 'rubygems'
      require(*args)
    end
  end
end

$ time RUBYOPT='rgem-fallback' ruby -e 1
RUBYOPT='rgem-fallback' ruby -e 1 0.01s user 0.00s system 71% cpu 0.011 total

$ time RUBYOPT='rgem-fallback' ruby -e "require 'tagz'"
RUBYOPT='rgem-fallback' ruby -e "require 'tagz'" 0.60s user 0.07s system 79% cpu 0.850 total

An update, in case anyone uses this: the sinatra gem uses some black magic involving #caller, and the presence of this additional require method on the call stack will confuse sinatra into thinking it is not in "run" mode and it will not parse ARGV. You can fix this by setting a constant when loading sinatra, as in below. (To reiterate, I don't recommend this for production code. This is mostly for fast startup when using ruby from the command line. For production code, I am using the crown tool that I announced a few weeks ago[1].)

module Kernel
   req = method :require
   define_method :require do |*args|
     begin
       req.call(*args)
     rescue LoadError => ex
       Kernel.module_eval do
         define_method(:require, &req)
       end
       require 'rubygems'
       if args.grep(/sinatra/).any?
         pat = /gem-fallback.rb/
         if defined?(RUBY_IGNORE_CALLERS)
           RUBY_IGNORE_CALLERS << pat
         else
           RUBY_IGNORE_CALLERS = [pat]
         end
       end
       require(*args)
     end
   end
end

[1] GitHub - vjoel/crown: Gather gem lib and bin files under one directory for fast loading and predictable behavior.

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Roger Pack wrote:

http://github.com/rogerdpack/faster_rubygems/tree/master

Looks nice, but it's solving a different problem, isn't it? It appears that you're actually speeding up the gem loading process. My hack only makes a difference if you're running a script that doesn't use gems at all.

Put them together and it's a win in both cases!

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Brian Candler wrote:

Or simpler: only put "require 'rubygems'" at the top of scripts which use rubygems.

(Obviously less convenient than using RUBYOPT of course, but your script may be more portable)

Except I'd rather not have to guess/remember which things are installed as gems, and do it correctly on each host I'm running the script on. So the require hack figures that out for me. (We do some embedded work on smartphones, gumstix, and geode, so some of our systems don't use gems at all.)

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Roger Pack wrote:

http://github.com/rogerdpack/faster_rubygems/tree/master

On linux, faster_rubygems seems to have even more of an impact than on the windows installation you benchmarked. With about 250 gems installed:

$ time ruby examples/require_rubygems_normal.rb
done
ruby examples/require_rubygems_normal.rb 0.57s user 0.05s system 85% cpu 0.726 total

$ time ruby examples/require_fast_start.rb
done
ruby examples/require_fast_start.rb 0.04s user 0.02s system 46% cpu 0.121 total

Very nice!

I had been thinking of something along similar, locating all gem lib dirs. But instead of pushing them all on $:, the idea was to set up a single dir with symlinks to all the gem lib dirs. I expect it would be faster because it would offload more of the path search to the filesystem, rather than to ruby.

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Here's an extreme example where this makes a huge difference:

I have a dir tree with large numbers of small gps log files, in CSV format,
and I want to use ruby -a (autosplit) to work with them.

With RUBYOPT=rgem-fallback (or of course RUBYOPT=''):

$ time find . -type f -exec ruby -F, -ane '$F' {} \;
RUBYOPT='' find . -type f -exec ruby -F, -ane '$F' {} \; 2.06s user 1.67s
system 39% cpu 9.431 total

With RUBYOPT=rubygems:

$ time find . -type f -exec ruby -F, -ane '$F' {} \;
find . -type f -exec ruby -F, -ane '$F' {} \; 219.02s user 61.52s system
93% cpu 4:59.26 total

Of course, awk would probably be even faster, but ...

... that would mean using the right tool for the right task :wink:
Sorry couldn't resist. This however does not mean that your
contribution is not very valuable, because Ruby will be the right tool
often enough and even here, maybe you have a team where everybody
knows Ruby but few know awk....
Cheers
Robert

···

On Sat, Jul 18, 2009 at 9:52 PM, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

--
If you tell the truth you don't have to remember anything.
--
Samuel Clemens (some call him Mark Twain)

Looks nice, but it's solving a different problem, isn't it? It appears
that you're actually speeding up the gem loading process. My hack only
makes a difference if you're running a script that doesn't use gems at
all.

Put them together and it's a win in both cases!

It's genius! :slight_smile:
=r

···

--
Posted via http://www.ruby-forum.com/\.

Robert Dober wrote:

On Sat, Jul 18, 2009 at 9:52 PM, Joel VanderWerf

Of course, awk would probably be even faster, but ...

... that would mean using the right tool for the right task :wink:

and where's the fun in that! :wink:

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407