…which turns out to be easier than I thought!
$ make clean
$ make CFLAGS=-pg
$ gcc -pg -rdynamic -L. main.o -lruby-static -lcrypt -lm -o ruby
$ ./ruby ~/ruby/x.rb /usr/share/dict/words
Elapsed = 5.885455965995789
$ gprof ./ruby
generates a vast output, edited highlights below.
In summary, of the 2.56 seconds total execution time:
- 1.11 secs in rb_io_each_line and descendants
- 0.71 secs in rb_yield_0 and descendants
- 0.39 secs in rb_eval and descendants
and at a lower level:
- 0.62 secs in rb_newobj (half str_alloc and half new_dvar)
- of which 0.43 secs in rb_gc, called 34 times
So garbage collection is certainly an overhead, although how much less
reference-counting would be is hard to say. Mark-and-sweep has the important
advantage that it is much more likely to be implemented correctly 
The actual I/O is minimal - around 0.13 secs in fread and descendants.
Regards,
Brian.
called/total parents
index %time self descendents called+self name index
called/total children
[4] 98.7 0.94 1.59 144+1415800 <cycle 1 as a whole> [4]
0.01 1.10 1 rb_io_each_line <cycle 1> [10]
0.28 0.43 235881 rb_yield_0 <cycle 1> [15]
0.38 0.01 235882+707663 rb_eval <cycle 1> [19]
0.11 0.00 236043 rb_call0 <cycle 1> [32]
0.05 0.04 236043 call_cfunc <cycle 1> [38]
0.09 0.00 236043 rb_call <cycle 1> [40]
0.02 0.00 235881 rb_yield <cycle 1> [64]
0.00 0.00 8 rb_obj_alloc <cycle 1> [191]
…
[10] 43.2 0.01 1.10 1 rb_io_each_line <cycle 1> [10]
0.02 1.07 235882/235882 rb_io_getline [11]
0.00 0.00 1/15 rb_io_taint_check [347]
0.00 0.00 1/171 rb_io_check_closed [316]
0.00 0.00 1/13 rb_io_check_readable [349]
235881 rb_yield <cycle 1> [64]
…
[11] 42.7 0.02 1.07 235882 rb_io_getline [11]
0.06 1.02 235882/235892 rb_io_getline_fast [12]
…
[12] 41.9 0.06 1.02 235892 rb_io_getline_fast [12]
0.10 0.92 235892/235892 appendline [13]
···
On Tue, Apr 08, 2003 at 07:04:25AM +0900, Brian Candler wrote:
However I suspect it would say the same: that it spends most of its time
running a fetch-next-line loop, and a bit of time adding one to a counter
What it really needs is a C profiler.
[13] 39.7 0.10 0.92 235892 appendline [13]
0.06 0.59 235890/235890 rb_str_buf_new [16]
0.08 0.05 236005/236005 fread [29]
0.08 0.00 236005/236007 memchr [41]
0.00 0.06 155/156 __srget [42]
0.00 0.00 115/195 rb_str_resize [166]
0.00 0.00 20/180 rb_str_cat [164]
0.00 0.00 155/156 rb_thread_wait_fd [319]
0.00 0.00 155/171 rb_io_check_closed [316]
…
[15] 27.7 0.28 0.43 235881 rb_yield_0 <cycle 1> [15]
0.05 0.31 235881/235882 new_dvar [21]
0.03 0.00 235881/235882 scope_dup [59]
0.02 0.00 235881/235883 rb_gc_force_recycle [60]
0.02 0.00 235881/235884 rb_block_given_p [61]
0.01 0.00 235881/235921 _setjmp [85]
235881 rb_eval <cycle 1> [19]
…
[16] 25.2 0.06 0.59 235890 rb_str_buf_new [16]
0.05 0.31 235890/235996 str_alloc [20]
0.05 0.18 235890/242071 ruby_xmalloc [25]
…
0.04 0.27 235882/474876 new_dvar [21]
0.04 0.27 235996/474876 str_alloc [20]
[17] 24.2 0.09 0.53 474876 rb_newobj [17]
0.00 0.43 34/34 rb_gc [18]
0.10 0.00 474876/475495 memset [36]
[18] 17.0 0.00 0.43 34 rb_gc [18]
0.06 0.26 34/34 gc_sweep [22]
0.06 0.02 578/918 rb_gc_mark <cycle 2> [43]
0.01 0.00 102/918 rb_gc_mark_frame <cycle 2> [321]
0.00 0.01 34/34 rb_gc_mark_threads [81]
0.00 0.00 34/918 mark_locations_array <cycle
…
[19] 15.6 0.38 0.01 235882+707663 rb_eval <cycle 1> [19]
0.01 0.00 235892/236075 rb_class_of [77]
0.00 0.00 4/4 ev_const_get [231]
0.00 0.00 1/1 rb_str_new3 [246]
0.00 0.00 1/2 new_blktag [260]
0.00 0.00 1/235921 _setjmp [85]
…
[20] 14.0 0.05 0.31 235996 str_alloc [20]
0.04 0.27 235996/474876 rb_newobj [17]
…
[21] 13.8 0.05 0.31 235882 new_dvar [21]
0.04 0.27 235882/474876 rb_newobj [17]
…
[25] 9.3 0.05 0.19 242071 ruby_xmalloc [25]
0.04 0.14 242071/242083 malloc [26]
…
[29] 5.1 0.08 0.05 236005 fread [29]
0.05 0.00 236005/237339 memcpy [50]