# of lines in a file

is there built-in method to determine the number of lines in a file?

i tried file.readlines.length but it is very slow (dealing with files > 1 million lines)

thanks,
DAN

Here are a few alternatives that use less memory than File.readlines (which
slurps in the entire file into memory):

require 'benchmark'

big_file = '/usr/share/dict/words'

Benchmark.bm do |x|
  x.report('streaming') do
    lines = 0
    File.open(big_file).each_line do |line|
      lines += 1
    end
  end

  x.report('shelling out') do
    lines = Integer(%x(wc -l '#{big_file}')[/^\d+/])
  end
end

On my machine:
      user system total real
streaming 0.270000 0.010000 0.280000 ( 0.293957)
shelling out 0.000000 0.000000 0.020000 ( 0.052078)

(The file is 234936 lines.)

marcel

···

On Tue, Aug 21, 2007 at 07:24:03AM +0900, blufur wrote:

is there built-in method to determine the number of lines in a file?

i tried file.readlines.length but it is very slow (dealing with files
> 1 million lines)

--
Marcel Molina Jr. <marcel@vernix.org>

if on unix:
`wc -l #{filename}` or similar (I don't remember the exact syntax for wc)

otherwise:

try counting \r\n or \n. Read file in a loop, counting the occurences.
There was a thread recently how to process a file as fast as possible
-- search the archives.

···

On 8/21/07, blufur <blufur@gmail.com> wrote:

is there built-in method to determine the number of lines in a file?

i tried file.readlines.length but it is very slow (dealing with files
> 1 million lines)

"How to reclaim memory without GC.start"

···

On 8/21/07, Jano Svitok <jan.svitok@gmail.com> wrote:

There was a thread recently how to process a file as fast as possible
-- search the archives.

Your use of syntax is correct, there. The -l option tells wc to only
report the number of lines.

···

On Tue, Aug 21, 2007 at 07:34:26AM +0900, Jano Svitok wrote:

On 8/21/07, blufur <blufur@gmail.com> wrote:
> is there built-in method to determine the number of lines in a file?
>
> i tried file.readlines.length but it is very slow (dealing with files
> > 1 million lines)

if on unix:
`wc -l #{filename}` or similar (I don't remember the exact syntax for wc)

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
John Kenneth Galbraith: "If all else fails, immortality can always be
assured through spectacular error."

my attempt:

cfp:~ > cat a.rb && ruby a.rb Documents/words.txt && wc -l Documents/words.txt
require 'benchmark'

big_file = ARGV.shift || '/usr/share/dict/words'

Benchmark.bm do |x|
   x.report('streaming') do
     lines = 0
     File.open(big_file).each_line do |line|
       lines += 1
     end
   end

   x.report('shelling out') do
     lines = Integer(%x(wc -l '#{big_file}')[/^\d+/])
   end

   x.report('letting ruby do the counting') do
     lines = open(big_file){|fd| fd.each{} and fd.lineno}
   end

   x.report('wow') do
     lines = open(big_file){|fd| fd.read(fd.stat.size).count "\n"}
   end

   x.report('smart') do
     class File
       def number_of_lines way_too_big = 2 ** 30
         stat.size > way_too_big ?
           (each{} and lineno) : read(stat.size).count("\n")
       end
     end
     lines = open(big_file){|fd| fd.number_of_lines}
   end
end

       user system total real
streaming 0.420000 0.010000 0.430000 ( 0.436458)
shelling out 0.000000 0.000000 0.010000 ( 0.028870)
letting ruby do the counting 0.290000 0.010000 0.300000 ( 0.296236)
wow 0.010000 0.010000 0.020000 ( 0.025010)
smart 0.010000 0.020000 0.030000 ( 0.029373)

   483523 Documents/words.txt

a @ http://drawohara.com/

···

On Aug 20, 2007, at 4:33 PM, Marcel Molina Jr. wrote:

streaming 0.270000 0.010000 0.280000 ( 0.293957)
shelling out 0.000000 0.000000 0.020000 ( 0.052078)

(The file is 234936 lines.)

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

> if on unix:
> `wc -l #{filename}` or similar (I don't remember the exact
syntax for wc)

Your use of syntax is correct, there. The -l option tells wc to only
report the number of lines.

Nearly correct. It also prints out the filename. A better approach
when calling from Ruby would be

  linecount=`wc -l <#{filename}`.chomp.to_i

···

--
Ronald Fischer <ronald.fischer@venyon.com>
Phone: +49-89-452133-162

# linecount=`wc -l <#{filename}`.chomp.to_i

     it's ok to lose the #chomp ^^^^^^^

kind regards -botp

···

From: Ronald Fischer [mailto:ronald.fischer@venyon.com]