My ruby code won't go as fast as my perl code

Dave_Burt · 15 July 2004 06:07

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Basically, we have CSV files with a date like 31-DEC-03 23:59:59 as the
first field (always in order), and the task is to grab into an array (to
later process further) just the parts of each file that fall after a given
date.

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

···

################################################################

#!perl

$start_date = '20040000000000'; # "yyyymmddhhmmss"
$dir = "data";

@months = qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);
%mm = {};
for ($i = 0; $i < 12; $i++) {
$mm{$months[$i]} = sprintf('%.2d', $i)
}
undef @months;

@a = ();

opendir DIR, $dir;
while ($_ = readdir DIR) {
next if /^\./; # skip dotfiles
open IN, "$dir/$_";
while (<IN>) {
  /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/;
  $cc = ($3 ge '87' ? '19' : '20');
  if ("$cc$3$mm{$2}$1$4$5$6" ge $start_date) {
   while (<IN>) {
    push @a, $_;
   }
  }
}
close IN;
}
closedir DIR;

$t = time - $t;
print "Read " . scalar(@a) . " lines in $t seconds$/"; # 4 seconds

$t = time;
open OUT, ">perl.out";
print OUT @a;
$t = time - $t;
print "Wrote in $t seconds$/"; # 3 seconds

################################################################

#!ruby

mm = Hash.new
i = '00'
%w(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC).each do |mmm|
mm[mmm] = i = i.succ
end

start_date = '20040000000000' # "yyyymmddhhmmss"
dir = "data"

date_regex = /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/
a = []
t = Time.new
reading = false

Dir.open(dir).each do |file|
next if file[0] == ?. # skip dotfiles
reading = false
File.open(dir + '/' + file).each_line do |line|
reading ||= (date_regex =~ line &&
(($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6 >= start_date))
a << line if reading
end
end

t = Time.new - t;
puts "Read #{a.size} lines in #{t} seconds"; # 17 seconds

t = Time.new
File.open('ruby.out', 'w') do |f|
f.print a.join
end
t = Time.new - t;
puts "Wrote #{a.size} lines in #{t} seconds"; # 3 seconds

Lennon_Day-Reynolds1 · 15 July 2004 06:28

I'm not sure how the performance would compare, but if you really want
to write this in the "Ruby style," you might consider using the
DateTime class from the standard 'date' modue.

Ex:

require 'date'
d = DateTime.parse('31-DEC-03 12:59:59')
d.year
=> 3
d.month
=> 12
d.hour
=> 12

(etc., etc.)

In general, though, I usually feel that any time my Ruby code is
running within a constant factor of the time equivalent Perl (or
Python) code takes, I'm probably not doing anything wrong. If I start
to see more troublesome scaling issues, though, (i.e., exponential
runtime increases relative to input size) then there's probably
something that needs to be done to the code.

Lennon

Laurent_Julliard4 · 15 July 2004 07:07

not sure, maybe you can make it 4 lines instead of 17 by using csv.rb

Probably this can be speeded up, but perl is faster than ruby often.

PS
sorry, I'm going away in ten minutes few time to play

···

il Thu, 15 Jul 2004 06:05:18 GMT, "Dave Burt" <burtdav@hotmail.com> ha scritto::

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Nobuyoshi_Nakada · 15 July 2004 11:37

Hi,

At Thu, 15 Jul 2004 15:07:18 +0900,
Dave Burt wrote in [ruby-talk:106480]:

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Which version of ruby do you use?

Dir.open(dir).each do |file|
next if file[0] == ?. # skip dotfiles
reading = false
File.open(dir + '/' + file).each_line do |line|
reading ||= (date_regex =~ line &&
(($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6 >= start_date))
a << line if reading
end
end

You open files but never close, this may cause too frequent GC.

  IO.foreach(dir + '/' + file) do |line|
    if reading
      a << line
    else
      reading = (date_regex =~ line &&
        (($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6 >= start_date))
    end
  end

···

--
Nobu Nakada

Ara.T.Howard3 · 15 July 2004 14:42

can you send me some sample data?

-a

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Basically, we have CSV files with a date like 31-DEC-03 23:59:59 as the
first field (always in order), and the task is to grab into an array (to
later process further) just the parts of each file that fall after a given
date.

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

################################################################

#!perl

$start_date = '20040000000000'; # "yyyymmddhhmmss"
$dir = "data";

@months = qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);
%mm = {};
for ($i = 0; $i < 12; $i++) {
$mm{$months[$i]} = sprintf('%.2d', $i)
}
undef @months;

@a = ();

opendir DIR, $dir;
while ($_ = readdir DIR) {
next if /^\./; # skip dotfiles
open IN, "$dir/$_";
while (<IN>) {
/^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/;
$cc = ($3 ge '87' ? '19' : '20');
if ("$cc$3$mm{$2}$1$4$5$6" ge $start_date) {
  while (<IN>) {
   push @a, $_;
  }
}
}
close IN;
}
closedir DIR;

$t = time - $t;
print "Read " . scalar(@a) . " lines in $t seconds$/"; # 4 seconds

$t = time;
open OUT, ">perl.out";
print OUT @a;
$t = time - $t;
print "Wrote in $t seconds$/"; # 3 seconds

################################################################

#!ruby

mm = Hash.new
i = '00'
%w(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC).each do |mmm|
mm[mmm] = i = i.succ
end

start_date = '20040000000000' # "yyyymmddhhmmss"
dir = "data"

date_regex = /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/
a =
t = Time.new
reading = false

Dir.open(dir).each do |file|
next if file[0] == ?. # skip dotfiles
reading = false
File.open(dir + '/' + file).each_line do |line|
reading ||= (date_regex =~ line &&
(($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6 >= start_date))
a << line if reading
end

t = Time.new - t;
puts "Read #{a.size} lines in #{t} seconds"; # 17 seconds

t = Time.new
File.open('ruby.out', 'w') do |f|
f.print a.join
end
t = Time.new - t;
puts "Wrote #{a.size} lines in #{t} seconds"; # 3 seconds

-a

···

On Thu, 15 Jul 2004, Dave Burt wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Denis_Mertz · 15 July 2004 15:12

"Dave Burt" <burtdav@hotmail.com> wrote in message news:<y8pJc.1797$K53.1255@news-server.bigpond.net.au>...

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

The addition operator always creates a new String object. So when you
chain a lot of additions you have a lot of temporary String objects
that are created. You can avoid that by using the Append ( << ) method

(($3>='87'?'19':'20') << $3 << mm[$2] << $1 << $4 << $5 << $6 >=
start_date)

Alternatively you can use string interpolation

("#{($3>='87'?'19':'20')}#{$3}#{mm[$2]}#{$1}#{$4}#{$5}#{$6}" >=
start_date)

or Array.join, like that

([($3>='87'?'19':'20'), $3, mm[$2], $1, $4, $5, $6].join >=
start_date)

I hope it helps

Denis

Denis_Mertz · 15 July 2004 16:32

"Dave Burt" <burtdav@hotmail.com> wrote in message news:<y8pJc.1797$K53.1255@news-server.bigpond.net.au>...

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

You could try this one also (with String Arrays):

# [yy, yy, mm, dd, hh, mm, ss] as strings
start_date = %w{20 04 00 00 00 00 00}
.
.
.
([$3>='87'?'19':'20', $3, mm[$2], $1, $4, $5, $6] >= start_date)

Converting to integers could also help (integer comparison maybe
faster than string comparison)

mm = Hash.new
i = 0 # integer here
%w(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC).each do |mmm|
mm[mmm] = i = i.succ
end

#[yy, yy, mm, dd, hh, mm, ss] as integers
start_date = [20, 04, 0, 0, 0, 0, 0]
.
.
.
d3 = $3.to_i
([d3>=87?19:20, d3, mm[$2], $1.to_i, $4.to_i, $5.to_i, $6.to_i] >=
start_date)

SER1 · 15 July 2004 16:32

#!ruby
require 'csv'
require 'parsedate'

start_date = Time.local(*ParseDate.parsedate('1-APR-03 00:00:00'))
dir = "data"
a = []

t = Time.new
Dir.entries( dir ).each do |f|
  fpath = File.join( dir, f )
  if FileTest.file?( fpath )
    CSV.parse( fpath ) do |row|
      a << row if Time.local(*ParseDate.parsedate(row[0].to_s)) >= start_date
    end
  end
end
t = Time.new - t;
puts "Read #{a.size} lines in #{t} seconds";

# It would be more efficient to output the data in the other loop, but I
# assume you're wanting to do some additional processing to it here.
t = Time.new
File.open( "ruby.out", "w" ) do |outfile|
        CSV::Writer.generate(outfile) do |csv|
                a.each {|r| csv.add_row r }
        end
end
t = Time.new - t;
puts "Wrote #{a.size} lines in #{t} seconds"

Ernest_Ellingson1 · 15 July 2004 16:42

"Dave Burt" <burtdav@hotmail.com> wrote in message
news:y8pJc.1797$K53.1255@news-server.bigpond.net.au...

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Basically, we have CSV files with a date like 31-DEC-03 23:59:59 as the
first field (always in order), and the task is to grab into an array (to
later process further) just the parts of each file that fall after a given
date.

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

################################################################

I built 3 files all with only 9 lines of data, one of the files had a line
that fails the regex text.
I read the files and find the appropriate lines 1000 times in just under 4
seconds on a Pentium 850 Windows XP machine running Ruby 1.81-12.

In the code below I use interpolation, rather than concatenation which
speeds things up a little about 1.5 seconds in the trial.
I also take advantage of a couple of Ruby features. Array#delete_if to
eliminate the lines that fail the regex.
I add a function to class Array that does a binary search for the right
place in the array. This eliminates some searching through the file.
This will speed up your search. Of course the increase in speed will depend
on how many lines exist in each file and how many lines precede the one
where you want to start.

You could also write a function in Perl that would do a binary search as
well.

Since you have sorted dates to begin with, there is no reason not to do a
binary search.
Please reply to the group with your results if you try this on your data .

Ernie

class Array
def findGE(start_date, date_regex, mm)
     starter=0
     ender=self.length
     while true do
      pt=(ender-starter)/2 + starter
       date_regex =~ self[pt]
               #if ($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6 >= start_date
               if "#{$3>='87'?'19':'20'}#{$3}#{mm[$2]}#{$4}#{$5}#{$6}" >=
start_date
                    ender=pt
               else
                    starter=pt
               end
          if (ender-starter) <= 1
               date_regex =~ self[starter]
               #return starter if ($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6

= start_date

               return starter if
"#{$3>='87'?'19':'20'}#{$3}#{mm[$2]}#{$4}#{$5}#{$6}" >= start_date
               date_regex =~ self[ender]
               #return ender if ($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6

= start_date

               return ender if
"#{$3>='87'?'19':'20'}#{$3}#{mm[$2]}#{$4}#{$5}#{$6}" >= start_date
               return false
          end
     end
end
end
mm = Hash.new
i = '00'
%w(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC).each do |mmm|
mm[mmm] = i = i.succ
end
start_date = '20040000000000'
date_regex = /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/
dir="C:/dataTest"
t=Time.now.to_f
1..1000.times do
a=
aFinal=
Dir.open(dir).each do |file|
  next if file[0] == ?.
  File.open(dir + '/' + file){|f|
  a=f.readlines}
  a.delete_if{|line| not date_regex =~ line}
  z=a.findGE(start_date, date_regex, mm)
  aFinal = aFinal + a[z...a.length] if z
end
end
tend=Time.now.to_f
puts "#{tend-t}"

Here is one file, (the one with the bad line)

12-APR-98 21:59:59, aaaa,bbbb,cccc,dddd
30-JUL-99 20:05:35, cccc,ffff,gggg,hhhh
27-JAN-00 15:15:45, xxxx,ffff,cccc,dddd
28-FEB-01 12:30:20, zzzz,bbbb,dddd,gggg
31-DEC-03 23:59:59, xxxx,xxxxx,yyyyy,zzzzz
01-JAN-04 00:01:00, aaaa,bbbb,cccc,dddd
01-FEB-04 00:01:05, bbbb,cccc,dddd,xxxx
05-MAR-04 05:01:59, aaaa,bbbb,cccc,dddd
08-APR-04 05:15:35, aaaa,bbbb,cccc,xxxx
nnyy,aaa,bbb,ccc,xxx

Ernest_Ellingson1 · 15 July 2004 17:12

"Dave Burt" <burtdav@hotmail.com> wrote in message
news:y8pJc.1797$K53.1255@news-server.bigpond.net.au...

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Basically, we have CSV files with a date like 31-DEC-03 23:59:59 as the
first field (always in order), and the task is to grab into an array (to
later process further) just the parts of each file that fall after a given
date.

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

################################################################

"Dave Burt" <burtdav@hotmail.com> wrote in message
news:y8pJc.1797$K53.1255@news-server.bigpond.net.au...

I realise I'm doing this a perlish way, but my question is, is it possible
to do this operation in Ruby in a time more comparable to what the Perl
version's getting? (That's about 4 seconds; my Ruby code runs in about 17
seconds over the same data set, which is far smaller than the production
data set.)

Basically, we have CSV files with a date like 31-DEC-03 23:59:59 as the
first field (always in order), and the task is to grab into an array (to
later process further) just the parts of each file that fall after a given
date.

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

################################################################

I built 3 files all with only 9 lines of data, one of the files had a line
that fails the regex text.
I read the files and find the appropriate lines 1000 times in just under 4
seconds on a Pentium 850 Windows XP machine running Ruby 1.81-12.

In the code below I use interpolation, rather than concatenation which
speeds things up a little about 1.5 seconds in the trial.
I also take advantage of a couple of Ruby features. Array#delete_if to
eliminate the lines that fail the regex.
I add a function to class Array that does a binary search for the right
place in the array. This eliminates some searching through the file.
This will speed up your search. Of course the increase in speed will depend
on how many lines exist in each file and how many lines precede the one
where you want to start.

You could also write a function in Perl that would do a binary search as
well.

Since you have sorted dates to begin with, there is no reason not to do a
binary search.
Please reply to the group with your results if you try this on your data .

Ernie

class Array
def findGE(start_date, date_regex, mm)
     starter=0
     ender=self.length
     while true do
      pt=(ender-starter)/2 + starter
       date_regex =~ self[pt]
               #if ($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6 >= start_date
               if "#{$3>='87'?'19':'20'}#{$3}#{mm[$2]}#{$4}#{$5}#{$6}" >=
start_date
                    ender=pt
               else
                    starter=pt
               end
          if (ender-starter) <= 1
               date_regex =~ self[starter]
               #return starter if ($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6

= start_date

               return starter if
"#{$3>='87'?'19':'20'}#{$3}#{mm[$2]}#{$4}#{$5}#{$6}" >= start_date
               date_regex =~ self[ender]
               #return ender if ($3>='87'?'19':'20')+$3+mm[$2]+$1+$4+$5+$6

= start_date

               return ender if
"#{$3>='87'?'19':'20'}#{$3}#{mm[$2]}#{$4}#{$5}#{$6}" >= start_date
               return false
          end
     end
end
end
mm = Hash.new
i = '00'
%w(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC).each do |mmm|
mm[mmm] = i = i.succ
end
start_date = '20040000000000'
date_regex = /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/
dir="C:/dataTest"
t=Time.now.to_f
1..1000.times do
a=
aFinal=
Dir.open(dir).each do |file|
  next if file[0] == ?.
  File.open(dir + '/' + file){|f|
  a=f.readlines}
  a.delete_if{|line| not date_regex =~ line}
  z=a.findGE(start_date, date_regex, mm)
  aFinal = aFinal + a[z...a.length] if z
end
end
tend=Time.now.to_f
puts "#{tend-t}"

Here is one file, (the one with the bad line)

12-APR-98 21:59:59, aaaa,bbbb,cccc,dddd
30-JUL-99 20:05:35, cccc,ffff,gggg,hhhh
27-JAN-00 15:15:45, xxxx,ffff,cccc,dddd
28-FEB-01 12:30:20, zzzz,bbbb,dddd,gggg
31-DEC-03 23:59:59, xxxx,xxxxx,yyyyy,zzzzz
01-JAN-04 00:01:00, aaaa,bbbb,cccc,dddd
01-FEB-04 00:01:05, bbbb,cccc,dddd,xxxx
05-MAR-04 05:01:59, aaaa,bbbb,cccc,dddd
08-APR-04 05:15:35, aaaa,bbbb,cccc,xxxx
nnyy,aaa,bbb,ccc,xxx

Dave_Burt · 16 July 2004 14:37

Thanks everyone for your input.

I'll post back in about a week when I've had have a chance to try some of
this:
* csv.rb (may well make the program more legible - thanks gabriele renzi)
* making sure files aren't all opened and not closed (oops! thanks Nobu
Nakada)
* binary search (thanks Ernie)
* String#<< or interpolation to gather the match-bits (or maybe even string
arrays... thanks Denis et. al.)
* strscan.rb (thanks Ara T. Howard)

I'm using the windows package 1.81 (13), on a P4 running Win XP. The target
system, though, is an old crusty box, maybe P1, running Windows NT.

Ara, for sample data, your generator does a pretty good job. Here are some
stats, in case you're interested:
* about 200 files (increasing very slowly; maybe 1 per year)
* roughly 1 record per file per hour
* oldest files are up to around 4 years old, and around 10MB
* that makes records about 300 bytes on average
* that makes about 35k records in those oldest files
* the records (comma-separated) consist of a date field (DD-MMM-YY HH:MM:SS)
and about 20 decimal fields
* my test runs are on about 10% of these files.

Ara.T.Howard3 · 15 July 2004 16:32

minimize IO and use the fast stringscanner library:

   ~ > parse.rb csv/
   Read 29696 lines in 11.699876 seconds
   Wrote 29696 lines in 0.03556 seconds

   ~ > parse.pl csv/
   Read 29696 lines in 7 seconds
   Wrote in 0 seconds

~ > diff -u perl.out ruby.out

here's the code(s). note that your perl script had two bugs in it - times were
not reported correctly and the first line containing a valid starting date was
not written to file. the below assumes (like your code does) that the input is
sorted in ascending order (probably not a good assumption since it will fail
silently if not):

   ~ > cat parse.rb
   #!/usr/bin/env ruby
   require 'strscan'
   dir = ARGV.shift

   mm = Hash.new
   i = '00'
   %w(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC).each do |mmm|
    mm[mmm] = i = i.succ
   end

   start_date = '20040101000000' # "yyyymmddhhmmss"
   date_regex = /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d).*$\n/o
   anyline = %r/^.*$\n/o
   a =
   t = Time.new
   buf = nil

Dir.foreach(dir) do |path|
next if path[0] == ?.

buf = IO.read(File.join(dir, path))
s = StringScanner.new buf

     while s.rest?
       if s.scan date_regex
         date = "#{ s[3] >= '87' ? '19' : '20' }#{ s[3] }#{ mm[s[2]] }#{ s[1] }#{ s[4] }#{ s[5] }#{ s[6] }"
         if date >= start_date
           a << s[0]
           a << s.scan(anyline) while s.rest?
         end
       else
         s.scan anyline
       end
     end
   end

   t = Time.now - t;
   puts "Read #{ a.size } lines in #{ t } seconds"
   t = Time.now
   File.open('ruby.out', 'w'){|f| a.each{|e| f.print e}}
   t = Time.new - t;
   puts "Wrote #{ a.size } lines in #{ t } seconds"; # 3 seconds

   ~ > cat parse.pl
   #!/usr/bin/env perl
   $dir = shift;
   $start_date = '20040000000000'; # "yyyymmddhhmmss"

   @months = qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);
   %mm = {};
   for ($i = 0; $i < 12; $i++) {
    $mm{$months[$i]} = sprintf('%.2d', $i)
   }
   undef @months;

@a = ();
$t = time;

   opendir DIR, $dir;
   while ($_ = readdir DIR) {
    next if /^\./; # skip dotfiles
    open IN, "$dir/$_";
    while (<IN>) {
     /^(\d\d)-(\w\w\w)-(\d\d) (\d\d):(\d\d):(\d\d)/;
     $cc = ($3 ge '87' ? '19' : '20');
     if ("$cc$3$mm{$2}$1$4$5$6" ge $start_date) {
      push @a, $_;
      while (<IN>) {
       push @a, $_;
      }
     }
    }
    close IN;
   }
   closedir DIR;

   $t = time - $t;
   print "Read " . scalar(@a) . " lines in $t seconds$/"; # 4 seconds
   $t = time;
   open OUT, ">perl.out";
   print OUT @a;
   $t = time - $t;
   print "Wrote in $t seconds$/"; # 3 seconds

i generated the data sets with this:

   ~ > cat gendata.rb
   require 'fileutils'
   dir = ARGV.shift
   FileUtils.mkdir_p dir

   t_start = Time.mktime(1987)
   t_end = Time.now
   delta_t = t_end - t_start
   t_fmt = '%d-%b-%y %H:%M:%S' # like 31-DEC-03 23:59:59

   1024.times do |fn|
     path = File.join dir, "#{ fn }.csv"
     open(path, 'w') do |f|
       time = t_start + rand(delta_t)
       1024.times do |lineno|
         row = time.strftime(t_fmt).upcase, rand(42), rand(42), rand(42), rand(42)
         f.puts(row.join(','))
         time += rand(42)
       end
     end
   end

it makes 1024 files, each of 1024 lines containing ordered tuples of a format
like your input data.

-a

···

On Thu, 15 Jul 2004, denis wrote:

"Dave Burt" <burtdav@hotmail.com> wrote in message news:<y8pJc.1797$K53.1255@news-server.bigpond.net.au>...

The main slow bit seems to be the string concatenation and comparison
(...+$4+$5+$6 >= start_date).

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Topic		Replies	Views
Basic Ruby performance ruby-talk	42	199	15 February 2012
Runtime disparity - Same program in Perl and Ruby ruby-talk	5	109	16 June 2007
Help me understand why the Ruby block is slower than without ruby-talk	35	161	13 March 2006
Ruby Compile-time optimization ruby-talk	42	156	18 March 2003
Slow regular expressions :( ruby-talk	28	135	28 July 2006

My ruby code won't go as fast as my perl code

On Thu, 15 Jul 2004, Dave Burt wrote: --

--

Related topics

On Thu, 15 Jul 2004, Dave Burt wrote:
--