First script seems slow - What's a better way to write this?

I've inherited a tcl script from previous co-op students, and it's a
little messy so I wanted to clean it up. I wanted to learn Ruby anyway,
so I made a ruby script to search my .tcl file and output a list of all
the procedures and variables, sorted in order of number of times of use
(I'm mainly interested in the unused ones).

The script seems really slow though (~10 seconds for a 3000 line file)-
is that Ruby, or is it just my implementation? I don't care that this
script takes 10 seconds, but I'd like to learn how to write better ruby
code. Here's my script:

def generateTokenList(readFile, token, prefix)
  names = Hash.new
  str = ""
  File.open(readFile, 'r').each do |line|
      if line[token] and not line['#']
        name = line.split[1]
        names[name] = 0 if not names.key?(name)
      end
  end

  names.each do |key, value|
      i = 0
      i = -1 if token == 'proc '
      File.open(readFile, 'r').each do |line|
        i = i + 1 if line[prefix + key] and not line['#']
      end
    names[key] = i
  end

  names = names.sort { |a,b| a[1] <=>b [1] }

  names.each { |pair| str << pair[0] + " uses: " + pair[1].to_s + "\n" }

  return str
end

if ARGV[0] == nil or ARGV[1] == nil
  puts "\nUsage: ruby ProcList.rb inputfilepath outputfilename"
  exit(0)
end

writeFile = File.new(ARGV[1], 'w')
writeFile << "Procedures:\n"
writeFile << generateTokenList(ARGV[0], 'proc ', '')
writeFile << "\n\nVariables:\n"
writeFile << generateTokenList(ARGV[0], 'set ', 36.chr)
writeFile << "Updated: " + File.mtime(ARGV[0]).to_s

···

--
Posted via http://www.ruby-forum.com/.

As a side issue there is a tool to generate cross references in tcl called zdoc (http://www.oklin.com/zdoc/) that might be a better starting point if all you really want to do is get to grips with the existing code. However I have never used it not being a particularly good tcl programmer myself. There is also frink (http://wiki.tcl.tk/2611) to reformat your source code to make it easier to read, which I have used.

I realise that this does nothing for your Ruby but perhaps it will help you get onto something more interesting :slight_smile:

Thanks for the link. However, I forgot to mention that the reason I'm
doing this myself is because no currently available tools like that work
with my code, as it contains commands specific to the program it extends
and gives errors telling me that they are invalid command names.

Basically, I'm just trying to figure out if it's my fault the script is
slow, or if Ruby just isn't very efficient.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :slight_smile:

This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.

···

--
Posted via http://www.ruby-forum.com/\.

Charlotte wrote:

Thanks for the link. However, I forgot to mention that the reason I'm doing this myself is because no currently available tools like that work with my code, as it contains commands specific to the program it extends and gives errors telling me that they are invalid command names.

Basically, I'm just trying to figure out if it's my fault the script is slow, or if Ruby just isn't very efficient.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :slight_smile:
    
This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.
  

Hi there,

I'm afraid that I can't take the time right now to really pore over the script you've posted, but it doesn't look unreasonable to me. Of course, I'm not terribly clever so take that with a grain of salt. :wink:

It's true that blistering speed is not listed as one of the current Ruby interpreter's features and you may be seeing an example of that. You can probably get a better view of the situation by running your script with the profiling library enabled.

Try running it like this:

ruby -rprofile ProcList.rb inputfilepath outputfilename

It will take even longer, but you'll end up with a report showing you where in your script you are spending the most time. Maybe it will reveal a few hot spots that you can speed up a bit.

Good luck, and don't hesitate to continue posting problems here. There are a lot of awfully smart people on this list that are often willing to help out in situations like yours.

Regards,
Matthew Desmarais

how does this do (untested) :

   harp:~ > cat a.rb

   require 'yaml'
   class TclIndex
     def initialize arg
       @procs, @vars = {}, {}
       parse arg
     end
     def parse a
       read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.*$/, ''}}
       lines = a.respond_to?('readlines') ? read[a] : open(a){|f|read[f]}
       lines.each do |line|
         case line
           when %r/^ \s* proc \s+ (\w+)/iox
             @procs[$1] = -1
           when %r/^ \s* set \s+ (\w+)/iox
             @vars[$1] = 0
         end
         @procs.keys.each{|k| @procs[k] += 1 if line[%r/\b#{ k }\b/]}
         @vars.keys.each{|k| @vars[k] += 1 if line[%r/\b#{ k }\b|\$#{ k }\b/]}
       end
     end
     def report o
       o << {
         'procs' => @procs.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
         'vars' => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
       }.to_yaml
     end
   end

   abort "Usage: [inputfilepath = stdin] [outputfilename = stdout]" if
     ARGV.delete('help') or ARGV.delete('--help')

   i = ARGV.shift || STDIN
   o = ARGV.shift || STDOUT

   idx = TclIndex.new i
   idx.report o

regards.

-a

···

On Thu, 13 Apr 2006, Charlotte wrote:

Thanks for the link. However, I forgot to mention that the reason I'm
doing this myself is because no currently available tools like that work
with my code, as it contains commands specific to the program it extends
and gives errors telling me that they are invalid command names.

Basically, I'm just trying to figure out if it's my fault the script is
slow, or if Ruby just isn't very efficient.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :slight_smile:

This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.

--
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama

Dňa Streda 12. Apríl 2006 18:29 Charlotte napísal:

Basically, I'm just trying to figure out if it's my fault the script is
slow, or if Ruby just isn't very efficient.

Well, I hate to mention the giant purple squid in the middle of the kitchen,
but Ruby is... shall we say... not really speedy. I should still outperform
TCL, but not much else, if the programming language benchmarks are to be
trusted. (Which they aren't, but hey.)

Then again, YARV looks surprisingly vital for what was mere vaporware only a
few years ago, so there's a chance of a Blazing Fast (well, not really) Ruby
yet.

David Vallner

Just a quick note - precompiling the regular expressions might help here, too.

Regards,

Dan

···

ara.t.howard@noaa.gov wrote:

On Thu, 13 Apr 2006, Charlotte wrote:

Thanks for the link. However, I forgot to mention that the reason I'm
doing this myself is because no currently available tools like that work
with my code, as it contains commands specific to the program it extends
and gives errors telling me that they are invalid command names.

Basically, I'm just trying to figure out if it's my fault the script is
slow, or if Ruby just isn't very efficient.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :slight_smile:

This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.

how does this do (untested) :

  harp:~ > cat a.rb

  require 'yaml'
  class TclIndex
    def initialize arg
      @procs, @vars = {}, {}
      parse arg
    end
    def parse a
      read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.*$/, ''}}
      lines = a.respond_to?('readlines') ? read[a] : open(a){|f|read[f]}
      lines.each do |line|
        case line
          when %r/^ \s* proc \s+ (\w+)/iox
            @procs[$1] = -1
          when %r/^ \s* set \s+ (\w+)/iox
            @vars[$1] = 0
        end
        @procs.keys.each{|k| @procs[k] += 1 if line[%r/\b#{ k }\b/]}
        @vars.keys.each{|k| @vars[k] += 1 if line[%r/\b#{ k }\b|\$#{ k }\b/]}
      end
    end
    def report o
      o << {
        'procs' => @procs.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
        'vars' => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
      }.to_yaml
    end
  end

  abort "Usage: [inputfilepath = stdin] [outputfilename = stdout]" if
    ARGV.delete('help') or ARGV.delete('--help')

  i = ARGV.shift || STDIN
  o = ARGV.shift || STDOUT

  idx = TclIndex.new i
  idx.report o

regards.

-a

very good point! :

     harp:~ > cat /usr/share/tcl8.3/*tcl |wc -l
        3533

     harp:~ > time cat /usr/share/tcl8.3/*tcl |ruby a.rb >/dev/null

     real 0m0.848s
     user 0m0.810s
     sys 0m0.020s

this is down from 3 sec!

   harp:~ > cat a.rb

   require 'yaml'
   class TclIndex
     def initialize arg
       @procs, @vars = {}, {}
       parse arg
     end
     def parse a
       read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.*$/, ''}}
       lines = a.respond_to?('readlines') ? read[a] : open(a){|f|read[f]}
       proc_re = Hash.new{|h,k| h[k] = %r/\b#{ k }\b/}
       var_re = Hash.new{|h,k| h[k] = %r/\b#{ k }\b|\$#{ k }\b/}
       lines.each do |line|
         case line
           when %r/^ \s* proc \s+ (\w+)/iox
             @procs[$1] = -1
           when %r/^ \s* set \s+ (\w+)/iox
             @vars[$1] = 0
         end
         @procs.keys.each{|k| @procs[k] += 1 if line[proc_re[k]]}
         @vars.keys.each{|k| @vars[k] += 1 if line[var_re[k]]}
       end
     end
     def report o
       o << {
         'procs' => @procs.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
         'vars' => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
       }.to_yaml
     end
   end

   abort "Usage: [inputfilepath = stdin] [outputfilename = stdout]" if
     ARGV.delete('help') or ARGV.delete('--help')

   i = ARGV.shift || STDIN
   o = ARGV.shift || STDOUT

   idx = TclIndex.new i
   idx.report o

regards.

-a

···

On Thu, 13 Apr 2006, Daniel Berger wrote:

Just a quick note - precompiling the regular expressions might help here, too.

--
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama

How do you do that in Ruby?

···

On 4/12/06, Daniel Berger <Daniel.Berger@qwest.com> wrote:

Just a quick note - precompiling the regular expressions might help here, too.

--
R. Mark Volkmann
Object Computing, Inc.

You use this idiom:

class SomeClass
   def initialize
     @some_re = /some_re/
   end
   def some_method
      # do stuff with @some_re
   end
end

···

On Apr 12, 2006, at 2:19 PM, Mark Volkmann wrote:

On 4/12/06, Daniel Berger <Daniel.Berger@qwest.com> wrote:

Just a quick note - precompiling the regular expressions might help here, too.

How do you do that in Ruby?

/re/o
       ^

-a

···

On Thu, 13 Apr 2006, Mark Volkmann wrote:

On 4/12/06, Daniel Berger <Daniel.Berger@qwest.com> wrote:

Just a quick note - precompiling the regular expressions might help here, too.

How do you do that in Ruby?

--
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama

Actually, Ruby is quite smart in some cases.. Try the following:

@re = /^\w+-\w+$/ # Some random expression
def foo(str)
  str =~ @re
end

def bar(str)
  str =~ /^\w+-\w+$/
end

def qux(str)
str =~ Regexp.new("/^\w+-\w+$/")
end

require 'benchmark'
include Benchmark

bm(16) do |test|
  test.report("foo") do
    1_000_000.times {foo("abc-xyz")}
  end
  test.report("bar") do
    1_000_000.times {bar("abc-xyz")}
  end
  test.report("qux") do
    1_000_000.times {qux("abc-xyz")}
  end
end

I get something like this on 1.8 cvs:

                      user system total real
foo 4.920000 0.080000 5.000000 ( 5.581873)
bar 4.610000 0.060000 4.670000 ( 5.457461)
qux 15.280000 0.280000 15.560000 ( 17.514639)

So ruby actually shares a single compiled Regexp object in bar's case
(as can also be proven by counting Regexp's in ObjectSpace with the GC
disabled).

Brian.

···

2006/4/12, Logan Capaldo <logancapaldo@gmail.com>:

You use this idiom:

class SomeClass
   def initialize
     @some_re = /some_re/
   end
   def some_method
      # do stuff with @some_re
   end
end

Brian Mitchell wrote:

I get something like this on 1.8 cvs:

                      user system total real
foo 4.920000 0.080000 5.000000 ( 5.581873)
bar 4.610000 0.060000 4.670000 ( 5.457461)
qux 15.280000 0.280000 15.560000 ( 17.514639)

Just for fun, I tried it too... I get

                     user system total real
foo 5.141000 0.000000 5.141000 ( 5.157000)
bar 4.765000 0.032000 4.797000 ( 4.812000)
qux 22.219000 1.593000 23.812000 ( 23.906000)

Hmm... I know for a fact that my (work) computer is messed up, but still
- it's 3.0Ghz HT P4 with 1Gb RAM. Running Windows XP, as much disabled
as I can to try and convince the thing to run quickly.

Also, I tried the YAML version - at first it told me that I couldn't
modify a frozen string. I read somewhere about this happening if you
try to modify an ARGV value, so I changed

  i = ARGV.shift || STDIN
  o = ARGV.shift || STDOUT

to

  i = File.open(ARGV[0], 'r')
  o = File.new(ARGV[1], 'w')

I don't know if that has an effect on the speed or not, but it worked.

Speed results:
    My script: 450 seconds
    YAML script: 156 seconds

My bottleneck definitely seems to be iterating through each line in the
input file:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
70.25 316.08 316.08 327 966.61 1369.08 IO#each
15.07 383.88 67.79 991826 0.07 0.07 String#
14.15 447.56 63.69 984426 0.06 0.06 String#+
0.11 448.05 0.49 2 243.00 406.50 Hash#sort
... etc.

Hmm... seems like it would be worthwhile to learn all that
lambda/map/#&@^#* gibberish. Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

Wow, on my paltry 1.7Ghz P4 I get:

                      user system total real
foo 3.990000 0.040000 4.030000 ( 4.097883)
bar 3.700000 0.020000 3.720000 ( 3.778370)
qux 13.640000 0.130000 13.770000 ( 13.914830)

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.

···

On Thu, 2006-04-13 at 04:24 +0900, Charlotte wrote:

Brian Mitchell wrote:

> I get something like this on 1.8 cvs:
>
> user system total real
> foo 4.920000 0.080000 5.000000 ( 5.581873)
> bar 4.610000 0.060000 4.670000 ( 5.457461)
> qux 15.280000 0.280000 15.560000 ( 17.514639)

Just for fun, I tried it too... I get

> user system total real
>foo 5.141000 0.000000 5.141000 ( 5.157000)
>bar 4.765000 0.032000 4.797000 ( 4.812000)
>qux 22.219000 1.593000 23.812000 ( 23.906000)

Hmm... I know for a fact that my (work) computer is messed up, but still
- it's 3.0Ghz HT P4 with 1Gb RAM. Running Windows XP, as much disabled
as I can to try and convince the thing to run quickly.

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

in fact my approach is slowed by those features. what makes it a bit faster
is that it makes one pass through the file, does io in bulk, and pre-compiles
all regexes. those are the keys.

regards.

-a

···

On Thu, 13 Apr 2006, Charlotte wrote:

Hmm... seems like it would be worthwhile to learn all that
lambda/map/#&@^#* gibberish. Thanks!

--
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama

Hi,

···

In message "Re: First script seems slow - What's a better way to write t" on Thu, 13 Apr 2006 04:48:12 +0900, Ross Bamford <rossrt@roscopeco.co.uk> writes:

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.

Because qux creates a lot of Regexp objects. Since Oniguruma (1.9
regex engine) takes little bit longer time for pattern compilation and
optimization than old 1.8 regex engine.

              matz.

Ross Bamford wrote:

Wow, on my paltry 1.7Ghz P4 I get:

                      user system total real
foo 3.990000 0.040000 4.030000 ( 4.097883)
bar 3.700000 0.020000 3.720000 ( 3.778370)
qux 13.640000 0.130000 13.770000 ( 13.914830)

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.

Could the operating system have an effect on the speed?

···

--
Posted via http://www.ruby-forum.com/\.

Charlotte wrote:

Ross Bamford wrote:

Wow, on my paltry 1.7Ghz P4 I get:

                      user system total real
foo 3.990000 0.040000 4.030000 ( 4.097883)
bar 3.700000 0.020000 3.720000 ( 3.778370)
qux 13.640000 0.130000 13.770000 ( 13.914830)

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.

Could the operating system have an effect on the speed?

Without opening or closing a single program/window i benchmarkt
this on native windows, colinux-woody and cygwin.

···

-------------------------------------------------------------
C:\temp>ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i386-mswin32]
                      user system total real
foo 5.359000 0.062000 5.421000 ( 5.875000)
bar 4.969000 0.063000 5.032000 ( 5.391000)
qux 18.750000 1.484000 20.234000 ( 22.578000)
-------------------------------------------------------------
colinux:~# ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i486-linux]
                      user system total real
foo 1.360000 2.510000 3.870000 ( 3.868129)
bar 1.160000 2.300000 3.460000 ( 3.460166)
qux 2.020000 13.190000 15.210000 ( 15.210472)
-------------------------------------------------------------
Simon@XPS /cygdrive/c/temp
$ ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i386-cygwin]
                      user system total real
foo 3.656000 0.000000 3.656000 ( 3.973000)
bar 3.422000 0.000000 3.422000 ( 3.709000)
qux 12.813000 0.000000 12.813000 ( 13.991000)
-------------------------------------------------------------

Well, I'm puzzled. I thought native windows should be the
fastest one on a windows machine.

cheers

Simon

but why? it's windows!?

:wink:

-a

···

On Fri, 14 Apr 2006, [UTF-8] Simon Kröger wrote:

Well, I'm puzzled. I thought native windows should be the fastest one on a
windows machine.

--
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama