Runtime disparity - Same program in Perl and Ruby

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3}, but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.

Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:

Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363

Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.

I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.

I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.

Code of both versions follows.

Thanks,
Andrew Fallows

use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";

# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
  $item =~ s/,/_/;
  $item =~ s/\n//g;
  print $item;
  my @files = ();
  $filecodes{$item} = \@files;
}
print "Hash built";

# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");

# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
  my $file = $_;
  # Kicks out if the file in question is not of the necessary format
  if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

  foreach my $target (keys(%filecodes))
  {
    # If the file name contains the code sought
    if($file =~ /$target/)
    {
      print "found $file in $File::Find::dir";

      # Jumps out if the list for this code already contains this file.
      for (0..@{$filecodes{$target}})
      {
        if(defined(${$filecodes{$target}}[$_])
        && $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }
      }
      push(@{$filecodes{$target}}, $File::Find::name);
    }
  }
}

# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";
foreach my$target ( sort(keys( %filecodes )))
{
  my @results = @{$filecodes{$target}};
  if(@results == 0) { push(@results, "NO FILES FOUND") }
  print $target;
  foreach (@results)
  {
    print RESULTS "$target,$_";
    print "\t$_";
  }
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();

Ruby:

class FileSearcher
  $\ = "\n"
  in_file = File.open( "(path)","r")
  start_time = Time.now
  filecodes = Hash.new
  # This loop reads all the item codes in from file and then
  # adds them to a hash, each linked to its own empty array
  while item = in_file.gets
    item = item.gsub(',','_')
    item = item.gsub("\n","")
    files = Array.new
    files.push("empty");
    filecodes[item]= files
  end
  in_file.close

  # The searching portion: looks at each file/location, then compares
it
  # to all the targets. If there is a match, prints a message and
adds
  # that file to the related array.
  require "Find"
  require 'ftools'
  Find.find("(path)") do |file|
    if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
      next
    else
      filecodes.each_key do |target|
        if(file =~ /#{target}/)
          puts "found " + target + " at " + file
          $stdout.flush
          fail = 0
          for i in 0..filecodes[target].size-1 do
            if(filecodes[target][i] != "empty" &&
            File.basename(file) == File.basename(filecodes[target]
[i]))
              fail = 1
              break
            end
          end
          if fail == 0
            if filecodes[target][0] == "empty"
              filecodes[target][0] = file
            else
              filecodes[target].push(file)
            end
          end
        end
      end
    end
  end

  # After the whole directory has been searched, prints each key and
all
  # values found for it to a file called Ruby_results.csv.
  target_file = File.open("(path)","w")
  filecodes.each_key do |target|
    results = filecodes[target]
    if results[0] == "empty"
      results[0] = "NO FILES FOUND"
    end
    puts target
    for i in 0..(results.size-1)
      target_file.puts target + "," + results[i]
    end
  end
  target_file.close
  end_time = Time.now
  puts "Started: " + start_time.to_s
  puts "Ended: " + end_time.to_s
end

Kaldrenon wrote:

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl,

Did you write it in basic or in Perl? :slight_smile:

and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3},

The character class \d is a subset of \w and they are both a subset of \S so your expression could be simplified to:

\S+_\d{3}

but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.

Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:

Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363

Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.

I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.

I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.

Did you compare the output of the Perl and Ruby versions to see if there were any differences?

Code of both versions follows.

Thanks,
Andrew Fallows

use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";

You should include the $! (or $^E) variable in the error message so you know why it failed.

# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
  $item =~ s/,/_/;
  $item =~ s/\n//g;

That is usually done with chomp:

         chomp $item;

  print $item;
  my @files = ();
  $filecodes{$item} = \@files;

You don't need to create an array, just assign an anonymous array:

    $filecodes{$item} = ;

}
print "Hash built";

# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");

# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
  my $file = $_;
  # Kicks out if the file in question is not of the necessary format
  if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

Using $_ instead of the copy in $file:

         return if !-f || !/^\S+_\d{3}/;

  foreach my $target (keys(%filecodes))
  {
    # If the file name contains the code sought
    if($file =~ /$target/)

Because $target may contain some regular expression meta-characters you should quotemeta it:

      if ( $file =~ /\Q$target/ )

Or use the index function:

      if ( 0 <= index $file, $target )

    {
      print "found $file in $File::Find::dir";

      # Jumps out if the list for this code already contains this file.
      for (0..@{$filecodes{$target}})

You have an off-by-one error:

        for (0..$#{$filecodes{$target}})

      {
        if(defined(${$filecodes{$target}}[$_])
        && $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }

${$filecodes{$target}}[$_] can be written more simply as $filecodes{$target}[$_].

But you don't really need to use an array index:

        for ( @{$filecodes{$target}} )
                         {
                             return if defined() && $File::Find::name eq $_;

(Or you could use a Hash of Hashes.)

      }
      push(@{$filecodes{$target}}, $File::Find::name);
    }
  }
}

# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";

You should include the $! (or $^E) variable in the error message so you know why it failed.

foreach my$target ( sort(keys( %filecodes )))
{
  my @results = @{$filecodes{$target}};

Do you really need to make a copy of the array?

  if(@results == 0) { push(@results, "NO FILES FOUND") }

If the array is empty you can just assign to it:

         @results = 'NO FILES FOUND' unless @results;

  print $target;
  foreach (@results)
  {
    print RESULTS "$target,$_";
    print "\t$_";
  }
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();

John

···

--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Kaldrenon wrote:

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

It would help to have a view into the input file an to know what your
program should do. Im sure there has to be another way to code your
problem, and im pretty sure that constructs as the following can and
should be avoided:

Find.find("(path)") do |file|
    if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
      next
    else
      filecodes.each_key do |target|
        if(file =~ /#{target}/)
          puts "found " + target + " at " + file
          $stdout.flush
          fail = 0
          for i in 0..filecodes[target].size-1 do
            if(filecodes[target][i] != "empty" &&
            File.basename(file) == File.basename(filecodes[target]
[i]))
              fail = 1
              break
            end
          end
          if fail == 0
            if filecodes[target][0] == "empty"
              filecodes[target][0] = file
            else
              filecodes[target].push(file)
            end
          end
        end
      end
    end
  end

This is both difficult to read and error prone.

···

--
Posted via http://www.ruby-forum.com/\.

Thanks for the reply, John. There are a number of good tips in your
reply for making my code more "Perl"-y. I don't think many (if any)
will actually change the way the program runs, though, will they? A
lot of the things I did work, but are styled more like Java, the
language I use most. For example, I know I can just use $_ in sub
file_seek, but I prefer to give my vars names that make more sense at
a glance. But I'll keep all of your advice in mind.

Thanks again,
Andrew

Or (IMHO more clearly):

  return unless -f and /^\S+_\d{3}/;

Michele

···

On Thu, 14 Jun 2007 19:11:18 GMT, "John W. Krahn" <dummy@example.com> wrote:

Using $_ instead of the copy in $file:

        return if !-f || !/^\S+_\d{3}/;

--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

reply for making my code more "Perl"-y. I don't think many (if any)
will actually change the way the program runs, though, will they? A

Just try.

lot of the things I did work, but are styled more like Java, the
language I use most. For example, I know I can just use $_ in sub
file_seek, but I prefer to give my vars names that make more sense at
a glance. But I'll keep all of your advice in mind.

$_ is a pronoun and it makes sense in short enough phrases. If you
have a C<for> loop with a two or three lines block (or even a C<for>
modifier) then use it. If it's 100 lines long (probably not a good
idea in its own) then use an explicit name.

Michele

···

On Thu, 14 Jun 2007 20:15:16 -0000, Kaldrenon <kaldrenon@gmail.com> wrote:
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,