Very slow IO (STDIN.gets and puts) on Linux, ruby 1.8.2_pre3

Why is Ruby 2x slower in IO than php or bash?

data.dat is 80 MB file with 5000000 lines. I use Linux, 2GB RAM (tested
on another pc with similar result).

···

--------------------

test.php:
#!/usr/bin/php
<? while (fgets(STDIN)); ?>

$ time ./test.php < data.dat
./test.php < data.dat 5,59s user 0,19s system 88% cpu 6,516 total

--------------------

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
./test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total

MiG wrote:

Why is Ruby 2x slower in IO than php or bash?

data.dat is 80 MB file with 5000000 lines. I use Linux, 2GB RAM (tested
on another pc with similar result).

--------------------

test.php:
#!/usr/bin/php
<? while (fgets(STDIN)); ?>

$ time ./test.php < data.dat
./test.php < data.dat 5,59s user 0,19s system 88% cpu 6,516 total

--------------------

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
./test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total

English is so much worse than Japanese! When I try to count to one million in English it takes me 3.42 days, but when I try it in Japanese, it only takes me 3.12 days!

Obviously, that means English is the worse language. Why does English suck so bad?!?

···

-----

In other words: your benchmark is really dumb. That isn't practical code, and trying to draw any conclusions from it is silly. For Ruby to be considered fast, how much time should it take to read and discard a line of text 5 kagillion times? Btw, I found a way to optimize your code:

deleteme.rb
#!/usr/bin/ruby
exit(0)

ben% time ruby deleteme.rb
ruby deleteme.rb 0.00s user 0.00s system 102% cpu 0.006 total

I'm still working on getting it to run in less than 0.004 total.

Ben

MiG wrote:

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
./test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total

Well, Ruby assigns the line string to $_, if you use gets that way. So Ruby has to construct an object for every line. Perhaps PHP doesn't do that?

···

--
Florian Frank

1. I have NOTHING against Ruby, it is my best language
2. Is it wrong-doing to ask?
3. My dumb benchmark: I used real data. If you have 2GB of free RAM and
use 80MB file, is it wrong? It's the same if you have 1MB RAM and use
smaller file. I used the real data I have, that's all. It behaves the
same way with smaller.
4. Thank you for excellent humour.

MiG

···

English is so much worse than Japanese! When I try to count to one
million in English it takes me 3.42 days, but when I try it in Japanese,
it only takes me 3.12 days!

Obviously, that means English is the worse language. Why does English
suck so bad?!?

-----

In other words: your benchmark is really dumb. That isn't practical
code, and trying to draw any conclusions from it is silly. For Ruby to
be considered fast, how much time should it take to read and discard a
line of text 5 kagillion times? Btw, I found a way to optimize your code:

deleteme.rb
#!/usr/bin/ruby
exit(0)

ben% time ruby deleteme.rb
ruby deleteme.rb 0.00s user 0.00s system 102% cpu 0.006 total

I'm still working on getting it to run in less than 0.004 total.

Ben

So the solution is maybe to use getc and parse lines on my own...

MiG

···

Dne 10/3/2005, napsal "Florian Frank" <flori@nixe.ping.de>:

MiG wrote:

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
./test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total

Well, Ruby assigns the line string to $_, if you use gets that way. So
Ruby has to construct an object for every line. Perhaps PHP doesn't do that?

--
Florian Frank

Maybe you're missing the point.

The two programs aren't doing the same amount of work; your benchmarks
aren't equivalent. If you change the PHP benchmark slightly, you'll
likely see PHP is just as slow as Ruby.

[navindra@dot /tmp]$ time php -r 'while (fgets(STDIN));' < FILE
8.421u 2.334s 0:26.53 40.5% 0+0k 0+0io 2pf+0w
[navindra@dot /tmp]$ time ruby -e 'while gets;end' < FILE
11.676u 2.586s 0:39.44 36.1% 0+0k 0+0io 11pf+0w
[navindra@dot /tmp]$ time php -r 'while ($blah=fgets(STDIN));' < FILE
10.680u 2.372s 0:37.83 34.4% 0+0k 0+0io 10pf+0w

Cheers,
Navin.

···

MiG <mig@1984.cz> wrote:

So the solution is maybe to use getc and parse lines on my own...

MiG wrote:

1. I have NOTHING against Ruby, it is my best language
2. Is it wrong-doing to ask?
3. My dumb benchmark: I used real data. If you have 2GB of free RAM and
use 80MB file, is it wrong? It's the same if you have 1MB RAM and use
smaller file. I used the real data I have, that's all. It behaves the
same way with smaller.
4. Thank you for excellent humour.

I'm glad you see the humour. I was a little harsh, but I was having a bad day, sorry.

Really, the benchmark really isn't meaningful. You need to do something with the data you're reading. It doesn't matter if it's a 80MB file or a 10 byte file. If you're simply reading the data and discarding it, you aren't doing anything. For the measurement to be meaningful, you actually need to *do something*.

Would you expect these two applications to take the same amount of time:

#!/bin/env ruby

1000.times do
   # do nothing
end

···

------

#!/bin/env ruby

1000.times do
   num = Math.sin(rand(1.0))
   if num < 0.0
     num += 1.0
   else
     num -= 1.0
   end
end

Both programs are essentially equivalent. Neither actually *does* anything. If the second one ran slower, could you really draw any conclusions about the speed of Ruby's math operations?

In fact, it may be that Ruby's IO is slower than other languages. If Ruby were even close to the speed of C I'd be stunned. Ruby has to construct an object with every line it reads. C just stuffs things blindly into an array. The problem is that your sample doesn't test Ruby's IO capabilities. In the end, your sample code does absolutely nothing.

If you want to benchmark Ruby's IO, try doing something like writing a program to concatenate a number of files, or even just to copy a file. Open one file for writing, and then open a file for reading, read something from the input file, write to the output file.

In any case, until the slowness of Ruby's IO proves to be a problem in actual use, why do you care how it fares on a benchmark?

Ben

Here's my results on a 14.5 mb file, ruby wins.

twillis:~$ time ruby -e 'while gets;end'< HL7Audit.csv

real 0m1.481s
user 0m0.924s
sys 0m0.095s
twillis:~$ time php -r 'while($blah=fgets(STDIN));'< HL7Audit.csv

real 0m2.327s
user 0m1.001s
sys 0m0.083s

···

On Fri, 11 Mar 2005 17:23:50 +0900, Navindra Umanee <navindra@cs.mcgill.ca> wrote:

MiG <mig@1984.cz> wrote:
> So the solution is maybe to use getc and parse lines on my own...

Maybe you're missing the point.

The two programs aren't doing the same amount of work; your benchmarks
aren't equivalent. If you change the PHP benchmark slightly, you'll
likely see PHP is just as slow as Ruby.

[navindra@dot /tmp]$ time php -r 'while (fgets(STDIN));' < FILE
8.421u 2.334s 0:26.53 40.5% 0+0k 0+0io 2pf+0w
[navindra@dot /tmp]$ time ruby -e 'while gets;end' < FILE
11.676u 2.586s 0:39.44 36.1% 0+0k 0+0io 11pf+0w
[navindra@dot /tmp]$ time php -r 'while ($blah=fgets(STDIN));' < FILE
10.680u 2.372s 0:37.83 34.4% 0+0k 0+0io 10pf+0w

Cheers,
Navin.

--
Thomas G. Willis
http://paperbackmusic.net