7stud2
(7stud --)
1
If I want the number of lines of the text file <file>, I may use
> File.readlines(<file>).size
but this builds an useless extra Array, or
> %x(wc -l <file>).to_i
but this needs to be on a *nix system (or have a system command wc.exe
on Windows).Or else a File.read followed by a grep for '\n'...
I feel there should be a simpler way to do that...
_md
···
--
Posted via http://www.ruby-forum.com/.
Have you looked at Enumerable's count method?
mike$ wc -l /etc/passwd
83 /etc/passwd
mike$ ruby -e "puts File.open('/etc/passwd') { |f| f.count }"
83
Hope this helps,
Mike
···
On 2013-10-19, at 10:02 AM, Michel Demazure <lists@ruby-forum.com> wrote:
If I want the number of lines of the text file <file>, I may use
File.readlines(<file>).size
but this builds an useless extra Array, or
%x(wc -l <file>).to_i
but this needs to be on a *nix system (or have a system command wc.exe
on Windows).Or else a File.read followed by a grep for '\n'...
I feel there should be a simpler way to do that...
_md
--
Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/
The "`Stok' disclaimers" apply.
Robert_K1
(Robert K.)
3
lines = File.foreach(file).count
Kind regards
robert
···
On Sat, Oct 19, 2013 at 4:02 PM, Michel Demazure <lists@ruby-forum.com> wrote:
If I want the number of lines of the text file <file>, I may use
> File.readlines(<file>).size
but this builds an useless extra Array, or
> %x(wc -l <file>).to_i
but this needs to be on a *nix system (or have a system command wc.exe
on Windows).Or else a File.read followed by a grep for '\n'...
I feel there should be a simpler way to do that...
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
7stud2
(7stud --)
4
Robert Klemme wrote in post #1124923:
lines = File.foreach(file).count
Thanks, Robert, using 'foreach' is cleaner.
FWIW, I benchmarked. The File methods are equivalent and much faster.
require 'benchmark'
file = __FILE__
n = 10000
Benchmark.bm do |rep|
rep.report("readlines") { n.times { File.readlines(file).size } }
rep.report("wc -l ") { n.times { `wc -l #{file}`.to_i } }
rep.report("foreach ") { n.times { File.foreach(file).count } }
end
gives
user system total real
readlines 0.219000 0.499000 0.718000 ( 0.752043)
wc -l 2.542000 5.257000 7.799000 ( 83.502776)
foreach 0.219000 0.531000 0.750000 ( 0.761044)
_md
···
--
Posted via http://www.ruby-forum.com/\.
Robert_K1
(Robert K.)
5
Robert Klemme wrote in post #1124923:
lines = File.foreach(file).count
Thanks, Robert, using 'foreach' is cleaner.
Yes, and it avoids building an Array for the whole file in memory.
FWIW, I benchmarked. The File methods are equivalent and much faster.
Naturally since they avoid the overhead of forking and IPC.
require 'benchmark'
file = __FILE__
n = 10000
Benchmark.bm do |rep|
rep.report("readlines") { n.times { File.readlines(file).size } }
rep.report("wc -l ") { n.times { `wc -l #{file}`.to_i } }
rep.report("foreach ") { n.times { File.foreach(file).count } }
end
gives
user system total real
readlines 0.219000 0.499000 0.718000 ( 0.752043)
wc -l 2.542000 5.257000 7.799000 ( 83.502776)
foreach 0.219000 0.531000 0.750000 ( 0.761044)
It would be interesting to see how that works out for a large file. I
would expect the last version to be more efficiently than the first
one.
Kind regards
robert
···
On Sun, Oct 20, 2013 at 10:37 AM, Michel Demazure <lists@ruby-forum.com> wrote:
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
7stud2
(7stud --)
6
Robert Klemme wrote in post #1124958:
It would be interesting to see how that works out for a large file. I
would expect the last version to be more efficiently than the first
one.
I would guess so. But this below shows the same pattern : Readlines a
bit faster.
file = File.join(File.dirname(__FILE__), 'test.txt')
File.open(file, 'w') do |file|
3000.times { file.puts 'bla' * 10 }
end
n = 10000
Benchmark.bm do |rep|
rep.report("readlines") { n.times { File.readlines(file).size } }
rep.report("foreach ") { n.times { File.foreach(file).count} }
end
user system total real
readlines 11.341000 1.217000 12.558000 ( 12.686726)
foreach 12.433000 1.264000 13.697000 ( 13.871793)
···
--
Posted via http://www.ruby-forum.com/\.
7stud2
(7stud --)
7
Michel Demazure wrote in post #1124962:
user system total real
readlines 11.341000 1.217000 12.558000 ( 12.686726)
foreach 12.433000 1.264000 13.697000 ( 13.871793)
With 300_000 lines and 100 times, instead of 3_000 lines and 10_000
times, one gets the same pattern :
user system total real
readlines 11.622000 1.060000 12.682000 ( 12.692726)
foreach 12.246000 0.858000 13.104000 ( 13.156753)
but the difference is smaller...
_md
···
--
Posted via http://www.ruby-forum.com/\.
Robert_K1
(Robert K.)
8
$ ruby x.rb
user system total real
readlines 56.831000 7.597000 64.428000 ( 64.241000)
foreach 50.357000 5.476000 55.833000 ( 56.153000)
$ cat x.rb
require 'tempfile'
require 'benchmark'
LINE = 'x' * 99
n = 100
Tempfile.open(ENV['TMP'] || '/tmp') do |tmp|
1_000_000.times { tmp.puts LINE }
Benchmark.bm do |rep|
rep.report("readlines") { n.times { File.readlines(tmp.path).size } }
rep.report("foreach ") { n.times { File.foreach(tmp.path).count} }
end
end
So with even larger files the difference shows.
Kind regards
robert
···
On Sun, Oct 20, 2013 at 5:14 PM, Michel Demazure <lists@ruby-forum.com> wrote:
Michel Demazure wrote in post #1124962:
user system total real
readlines 11.341000 1.217000 12.558000 ( 12.686726)
foreach 12.433000 1.264000 13.697000 ( 13.871793)
With 300_000 lines and 100 times, instead of 3_000 lines and 10_000
times, one gets the same pattern :
user system total real
readlines 11.622000 1.060000 12.682000 ( 12.692726)
foreach 12.246000 0.858000 13.104000 ( 13.156753)
but the difference is smaller...
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
What about space? That's also a huge consideration here, isn't it? foreach should win that by lots and lots, too.
···
On Oct 20, 2013, at 4:13 PM, Robert Klemme <shortcutter@googlemail.com> wrote:
On Sun, Oct 20, 2013 at 5:14 PM, Michel Demazure <lists@ruby-forum.com> wrote:
Michel Demazure wrote in post #1124962:
user system total real
readlines 11.341000 1.217000 12.558000 ( 12.686726)
foreach 12.433000 1.264000 13.697000 ( 13.871793)
With 300_000 lines and 100 times, instead of 3_000 lines and 10_000
times, one gets the same pattern :
user system total real
readlines 11.622000 1.060000 12.682000 ( 12.692726)
foreach 12.246000 0.858000 13.104000 ( 13.156753)
but the difference is smaller...
$ ruby x.rb
user system total real
readlines 56.831000 7.597000 64.428000 ( 64.241000)
foreach 50.357000 5.476000 55.833000 ( 56.153000)
$ cat x.rb
require 'tempfile'
require 'benchmark'
LINE = 'x' * 99
n = 100
Tempfile.open(ENV['TMP'] || '/tmp') do |tmp|
1_000_000.times { tmp.puts LINE }
Benchmark.bm do |rep|
rep.report("readlines") { n.times { File.readlines(tmp.path).size } }
rep.report("foreach ") { n.times { File.foreach(tmp.path).count} }
end
end
So with even larger files the difference shows.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
7stud2
(7stud --)
10
tamouse m. wrote in post #1124992:
What about space? That's also a huge consideration here, isn't it?
foreach should win that by lots and lots, too.
Sure.
_md
···
--
Posted via http://www.ruby-forum.com/\.