Newbie question:read file speed

I would expect that using read/write directly instead of going through
the C interface would be faster and more portable than going under the
hood of the C library. If a particular system does not have read/write,
then it can be simulated using fread/fwrite with buffering disabled.

Or am I missing something completely?

Paul

···

On Sat, Apr 05, 2003 at 03:26:01AM +0900, Dan Sugalski wrote:

Perl essentially finds the C RTL’s underlying buffers and memcpy’s
the data out of it, twiddling the internal pointers and counts as it
goes. Can’t get much faster than that, short of having the system do
a DMA read right into a buffer that doesn’t have to move.

But surely, for the operations that matter to Ruby (say "read" and
"write"), the overhead of *calling* C's IO library must be - what -
1E-9 seconds?

See [ruby-talk:68546]

Guy Decoux

“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:UB%ia.3936$2x2.1779642@dca1-nnrp1.news.algx.net…

Ok. New results:

Both files contain 1,572,682 lines.
BIG line file avg line length = 46
SMALL line file avg line length = 11

All tests without writing to screen (only elapsed time written at end of
run):

py22 BIG = 5
py22 SMALL = 3

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

Java BIG = 7
Java SMALL = 3

So which version of the Java did you benchmark here?

robert

I would expect that using read/write directly instead of going through
the C interface would be faster and more portable than going under the
hood of the C library. If a particular system does not have read/write,
then it can be simulated using fread/fwrite with buffering disabled.

If you use read/write you have to manage a buffer, why do you want to
manage a buffer when you can use the internal buffer used by the C
library.

If 1.8 is faster than 1.6 this is just because ruby make, in some case,
something similar to perl.

Guy Decoux

Robert Klemme wrote:

“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:UB%ia.3936$2x2.1779642@dca1-nnrp1.news.algx.net…

Ok. New results:

Both files contain 1,572,682 lines.
BIG line file avg line length = 46
SMALL line file avg line length = 11

All tests without writing to screen (only elapsed time written at end of
run):

py22 BIG = 5
py22 SMALL = 3

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

Java BIG = 7
Java SMALL = 3

So which version of the Java did you benchmark here?

robert

1.4.1_02

“ts” decoux@moulon.inra.fr schrieb im Newsbeitrag
news:200304051149.h35Bnw421543@moulon.inra.fr

I would expect that using read/write directly instead of going
through
the C interface would be faster and more portable than going under
the
hood of the C library. If a particular system does not have
read/write,
then it can be simulated using fread/fwrite with buffering disabled.

If you use read/write you have to manage a buffer, why do you want to
manage a buffer when you can use the internal buffer used by the C
library.

Maybe because it’s more portable? If you have to rely on internals of a
lib that may change then you might be better off to memory map a file
yourself and do the buffer management yourself, too. Just guessing since
I’m no OS internals hacker.

Regards

robert

“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:gHija.4094$2x2.1852180@dca1-nnrp1.news.algx.net

Robert Klemme wrote:

“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:UB%ia.3936$2x2.1779642@dca1-nnrp1.news.algx.net…

Ok. New results:

Both files contain 1,572,682 lines.
BIG line file avg line length = 46
SMALL line file avg line length = 11

All tests without writing to screen (only elapsed time written at end
of
run):

py22 BIG = 5
py22 SMALL = 3

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

Java BIG = 7
Java SMALL = 3

So which version of the Java did you benchmark here?

robert

1.4.1_02

Err, I meant the code not the VM. Sorry for being unclear.

robert

Maybe because it's more portable? If you have to rely on internals of a
lib that may change then you might be better off to memory map a file
yourself and do the buffer management yourself, too. Just guessing since
I'm no OS internals hacker.

Well, sometimes speed and portability are not compatibles.

ruby and perl are evil because they use the internal buffer associated
with a stream, but perl is a little more evil than ruby and this is why
it's faster than ruby.

Guy Decoux

“ts” decoux@moulon.inra.fr schrieb im Newsbeitrag
news:200304071056.h37AudL09677@moulon.inra.fr

Maybe because it’s more portable? If you have to rely on internals
of a
lib that may change then you might be better off to memory map a file
yourself and do the buffer management yourself, too. Just guessing
since
I’m no OS internals hacker.

Well, sometimes speed and portability are not compatibles.

I guess more often than not.

ruby and perl are evil because they use the internal buffer associated
with a stream, but perl is a little more evil than ruby and this is why
it’s faster than ruby.

Warning: Introduction of relative evilness can harm your moral judgement!
:-))

robert

Or at any rate one reason why it’s faster.

That trivial test program was about twice as fast under Perl. I would guess
that calls to the I/O library represent only a small part of that difference

  • I don’t think any changes should be made (particularly ones which might
    sacrifice portability or make the code harder to maintain) without real,
    hard profiling evidence.

Regards,

Brian.

···

On Mon, Apr 07, 2003 at 07:59:56PM +0900, ts wrote:

Well, sometimes speed and portability are not compatibles.

ruby and perl are evil because they use the internal buffer associated
with a stream, but perl is a little more evil than ruby and this is why
it’s faster than ruby.

Robert Klemme wrote:

“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:gHija.4094$2x2.1852180@dca1-nnrp1.news.algx.net

Robert Klemme wrote:

“Greg Brondo” greg@brondo.com schrieb im Newsbeitrag
news:UB%ia.3936$2x2.1779642@dca1-nnrp1.news.algx.net…

Ok. New results:

Both files contain 1,572,682 lines.
BIG line file avg line length = 46
SMALL line file avg line length = 11

All tests without writing to screen (only elapsed time written at end

of

run):

py22 BIG = 5
py22 SMALL = 3

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

Java BIG = 7
Java SMALL = 3

So which version of the Java did you benchmark here?

robert

1.4.1_02

Err, I meant the code not the VM. Sorry for being unclear.

robert

My code.

That trivial test program was about twice as fast under Perl. I would guess
that calls to the I/O library represent only a small part of that difference

See [ruby-talk:68578]

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

- I don't think any changes should be made (particularly ones which might
sacrifice portability or make the code harder to maintain) without real,
hard profiling evidence.

The code is not harder to maintain.

Guy Decoux

Oh I’m not arguing that ruby 1.8p2 isn’t better than ruby 1.6.8. I believe
that those tests were done under Windows, and that just shows how dreadful
Windows I/O libraries are.

Under FreeBSD I see almost zero difference between ruby 1.6.8 and ruby 1.8p2
for the attached program, showing that a real operating system comes with
decent I/O libraries.

Perl is still nearly twice as fast as ruby though:

[brian@vaio ruby]$ time ruby x.rb /usr/share/dict/words
Elapsed = 2.146236062

real 0m2.492s
user 0m2.362s
sys 0m0.120s
[brian@vaio ruby]$ time perl x.pl /usr/share/dict/words
Elapsed = 1

real 0m1.347s
user 0m1.251s
sys 0m0.080s

However evil or optimised perl’s I/O libraries are, I doubt that that’s the
reason for the difference.

Regards,

Brian.

x.rb (172 Bytes)

x.pl (183 Bytes)

···

On Mon, Apr 07, 2003 at 09:21:04PM +0900, ts wrote:

See [ruby-talk:68578]

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

Brian Candler wrote:

See [ruby-talk:68578]

ruby168 BIG = 319
ruby168 SMALL = 95

ruby18p2 BIG = 12
ruby18p2 SMALL = 12

Oh I’m not arguing that ruby 1.8p2 isn’t better than ruby 1.6.8. I believe
that those tests were done under Windows, and that just shows how dreadful
Windows I/O libraries are.

Under FreeBSD I see almost zero difference between ruby 1.6.8 and ruby 1.8p2
for the attached program, showing that a real operating system comes with
decent I/O libraries.

Perl is still nearly twice as fast as ruby though:

[brian@vaio ruby]$ time ruby x.rb /usr/share/dict/words
Elapsed = 2.146236062

real 0m2.492s
user 0m2.362s
sys 0m0.120s
[brian@vaio ruby]$ time perl x.pl /usr/share/dict/words
Elapsed = 1

real 0m1.347s
user 0m1.251s
sys 0m0.080s

However evil or optimised perl’s I/O libraries are, I doubt that that’s the
reason for the difference.

Regards,

Brian.

Does running your code through the profiler provide any insight? Any
chance you could post the Perl and Ruby profiler results side by side?

perl -MDevel::Profiler x.pl # assuming you have Devel::Profiler
ruby -rprofile x.rb

Regards,

Dan

···

On Mon, Apr 07, 2003 at 09:21:04PM +0900, ts wrote:

Not greatly:

$ ruby -rprofile x.rb /usr/share/dict/words
Elapsed = 166.156065
% cumulative self self total
time seconds seconds calls ms/call ms/call name
66.76 93.20 93.20 1 93203.12 139601.56 IO#each_line
33.23 139.60 46.40 235881 0.20 0.20 Fixnum#+
0.00 139.60 0.00 1 0.00 0.00 Float#to_s
0.00 139.60 0.00 1 0.00 0.00 Array#
0.00 139.60 0.00 2 0.00 0.00 IO#write
0.00 139.60 0.00 1 0.00 0.00 Kernel.puts
0.00 139.60 0.00 1 0.00 139617.19 #toplevel
0.00 139.60 0.00 2 0.00 0.00 Time#now
0.00 139.60 0.00 1 0.00 0.00 Float#-
0.00 139.60 0.00 1 0.00 0.00 File#open
0.00 139.60 0.00 2 0.00 0.00 Time#to_f
0.00 139.60 0.00 1 0.00 0.00 String#+

I can’t do Perl without some serious work:

===> Configuring for p5-Devel-Profiler-0.04
Perl 5.006 required–this is only version 5.00503, stopped at ./Makefile.PL line 4.
BEGIN failed–compilation aborted at ./Makefile.PL line 4.
*** Error code 255

However I suspect it would say the same: that it spends most of its time
running a fetch-next-line loop, and a bit of time adding one to a counter
:slight_smile: What it really needs is a C profiler.

Regards,

Brian.

···

On Tue, Apr 08, 2003 at 06:32:29AM +0900, Daniel Berger wrote:

Does running your code through the profiler provide any insight?

…which turns out to be easier than I thought!

$ make clean
$ make CFLAGS=-pg
$ gcc -pg -rdynamic -L. main.o -lruby-static -lcrypt -lm -o ruby
$ ./ruby ~/ruby/x.rb /usr/share/dict/words
Elapsed = 5.885455965995789
$ gprof ./ruby

generates a vast output, edited highlights below.

In summary, of the 2.56 seconds total execution time:

  • 1.11 secs in rb_io_each_line and descendants
  • 0.71 secs in rb_yield_0 and descendants
  • 0.39 secs in rb_eval and descendants

and at a lower level:

  • 0.62 secs in rb_newobj (half str_alloc and half new_dvar)
  • of which 0.43 secs in rb_gc, called 34 times

So garbage collection is certainly an overhead, although how much less
reference-counting would be is hard to say. Mark-and-sweep has the important
advantage that it is much more likely to be implemented correctly :slight_smile:

The actual I/O is minimal - around 0.13 secs in fread and descendants.

Regards,

Brian.

                              called/total       parents 

index %time self descendents called+self name index
called/total children

[4] 98.7 0.94 1.59 144+1415800 <cycle 1 as a whole> [4]
0.01 1.10 1 rb_io_each_line <cycle 1> [10]
0.28 0.43 235881 rb_yield_0 <cycle 1> [15]
0.38 0.01 235882+707663 rb_eval <cycle 1> [19]
0.11 0.00 236043 rb_call0 <cycle 1> [32]
0.05 0.04 236043 call_cfunc <cycle 1> [38]
0.09 0.00 236043 rb_call <cycle 1> [40]
0.02 0.00 235881 rb_yield <cycle 1> [64]
0.00 0.00 8 rb_obj_alloc <cycle 1> [191]

[10] 43.2 0.01 1.10 1 rb_io_each_line <cycle 1> [10]
0.02 1.07 235882/235882 rb_io_getline [11]
0.00 0.00 1/15 rb_io_taint_check [347]
0.00 0.00 1/171 rb_io_check_closed [316]
0.00 0.00 1/13 rb_io_check_readable [349]
235881 rb_yield <cycle 1> [64]

[11] 42.7 0.02 1.07 235882 rb_io_getline [11]
0.06 1.02 235882/235892 rb_io_getline_fast [12]


[12] 41.9 0.06 1.02 235892 rb_io_getline_fast [12]
0.10 0.92 235892/235892 appendline [13]

···

On Tue, Apr 08, 2003 at 07:04:25AM +0900, Brian Candler wrote:

However I suspect it would say the same: that it spends most of its time
running a fetch-next-line loop, and a bit of time adding one to a counter
:slight_smile: What it really needs is a C profiler.


[13] 39.7 0.10 0.92 235892 appendline [13]
0.06 0.59 235890/235890 rb_str_buf_new [16]
0.08 0.05 236005/236005 fread [29]
0.08 0.00 236005/236007 memchr [41]
0.00 0.06 155/156 __srget [42]
0.00 0.00 115/195 rb_str_resize [166]
0.00 0.00 20/180 rb_str_cat [164]
0.00 0.00 155/156 rb_thread_wait_fd [319]
0.00 0.00 155/171 rb_io_check_closed [316]

[15] 27.7 0.28 0.43 235881 rb_yield_0 <cycle 1> [15]
0.05 0.31 235881/235882 new_dvar [21]
0.03 0.00 235881/235882 scope_dup [59]
0.02 0.00 235881/235883 rb_gc_force_recycle [60]
0.02 0.00 235881/235884 rb_block_given_p [61]
0.01 0.00 235881/235921 _setjmp [85]
235881 rb_eval <cycle 1> [19]

[16] 25.2 0.06 0.59 235890 rb_str_buf_new [16]
0.05 0.31 235890/235996 str_alloc [20]
0.05 0.18 235890/242071 ruby_xmalloc [25]

0.04 0.27 235882/474876 new_dvar [21]
0.04 0.27 235996/474876 str_alloc [20]
[17] 24.2 0.09 0.53 474876 rb_newobj [17]
0.00 0.43 34/34 rb_gc [18]
0.10 0.00 474876/475495 memset [36]

[18] 17.0 0.00 0.43 34 rb_gc [18]
0.06 0.26 34/34 gc_sweep [22]
0.06 0.02 578/918 rb_gc_mark <cycle 2> [43]
0.01 0.00 102/918 rb_gc_mark_frame <cycle 2> [321]
0.00 0.01 34/34 rb_gc_mark_threads [81]
0.00 0.00 34/918 mark_locations_array <cycle


[19] 15.6 0.38 0.01 235882+707663 rb_eval <cycle 1> [19]
0.01 0.00 235892/236075 rb_class_of [77]
0.00 0.00 4/4 ev_const_get [231]
0.00 0.00 1/1 rb_str_new3 [246]
0.00 0.00 1/2 new_blktag [260]
0.00 0.00 1/235921 _setjmp [85]


[20] 14.0 0.05 0.31 235996 str_alloc [20]
0.04 0.27 235996/474876 rb_newobj [17]


[21] 13.8 0.05 0.31 235882 new_dvar [21]
0.04 0.27 235882/474876 rb_newobj [17]


[25] 9.3 0.05 0.19 242071 ruby_xmalloc [25]
0.04 0.14 242071/242083 malloc [26]


[29] 5.1 0.08 0.05 236005 fread [29]
0.05 0.00 236005/237339 memcpy [50]