ram@lilith:~/src/ruby$time gawk '{i=0;j=length($0);while( i <= j) {i++;
z=substr($0,i,1); {count=count+ 1; a[z]=a[z]+1; }}}END{ for ( i in a ) {print
i ":" a[i];}print "Digits: " count;}' < pi.out | sort
0:16095
1:16107
2:15965
3:15915
4:15967
5:16169
6:15934
7:16091
8:15999
9:15790
:1
Digits: 160033
real 0m0.305s
user 0m0.310s
sys 0m0.000s
I'm really new to ruby. So I would like to know, if there is any possible
improvement of the ruby code.
pi.out contains about 160000 digits of PI and is made by "ruby sample/pi.rb >
pi.out' inside the src-dir of the ruby-*.tgz
# get the input
num = STDIN.gets;
# initialize array entries with 0
a = Array.new(255,0);
# count the occurence of each byte in the input
num.each_byte {|i| a[i] += 1};
# only interested in the byte/characters for 0-9
(?0..?9).each {|i| print "#{[i].pack("c")} #{a[i]}\n"};
I suspect the num[i..i].to_i is pretty expensive... and it's not
really needed in this case.
Cheers,
Ruben
···
At Wed, 16 Jun 2004 23:13:26 +0900, Ralf Müller wrote:
ram@lilith:~/src/ruby$time gawk '{i=0;j=length($0);while( i <= j) {i++;
z=substr($0,i,1); {count=count+ 1; a[z]=a[z]+1; }}}END{ for ( i in a )
{print i ":" a[i];}print "Digits: " count;}' < pi.out | sort
0:16095
1:16107
2:15965
3:15915
4:15967
5:16169
6:15934
7:16091
8:15999
9:15790
:1
Digits: 160033
real 0m0.305s
user 0m0.310s
sys 0m0.000s
I'm really new to ruby. So I would like to know, if there is any possible
improvement of the ruby code.
pi.out contains about 160000 digits of PI and is made by "ruby sample/pi.rb
> pi.out' inside the src-dir of the ruby-*.tgz
11111111111111111111111111111111111111111111111111111111111111
ram@lilith:~/src/ruby$cat rec_pi.rb
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"
The main difference seems to be using "count" instead of "each_byte". But why
is it 5-6x faster to "count" 9.times than "each_byte" only once?
Or did i ignore anything?
The above code is pretty nice.. I didn't know about the
'count'-method. The reason this is faster, is because the 'big loop',
the scan over the string, is in this code done internally, in the C
implementation of the count method. The other code has the loop
explicitly in ruby-code, which is slower.
The difference however is that the above code does the scan over the
String several times (# of chars to check - 1) in C, while all the other
code scans the String only once in ruby.
Probably there is a trade-off... maybe if you would wanna count all
occurences of [a-zA-Z0-9], then you would have to do the
'count'-method 57 times (57 because the above code assumes that only
the characters you count appear in the String), effectively scanning
the String 57 times. This might be slower than scanning the String
only once in ruby-code. (didn't test this though...)
Ruben
···
At Thu, 17 Jun 2004 16:48:21 +0900, Ralf Müller wrote:
11111111111111111111111111111111111111111111111111111111111111
ram@lilith:~/src/ruby$cat rec_pi.rb
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"
The main difference seems to be using "count" instead of "each_byte". But why
is it 5-6x faster to "count" 9.times than "each_byte" only once?
Or did i ignore anything?
There is a trade-off, but you may need to throw in some kanji or unicode
to get there. With just [a-zA-Z0-9], a single scan in Ruby is still
slower. Here are some timings:
user system total real
array: 0.880000 0.080000 0.960000 ( 0.958946)
hash: 1.210000 0.070000 1.280000 ( 1.330615)
count: 0.100000 0.000000 0.100000 ( 0.091805)
delete: 0.030000 0.020000 0.050000 ( 0.051760)
Both 'array' and 'hash' use #each_byte (M iterations in Ruby code),
'count' uses #count repeatedly (62*M iterations in C code), and 'delete'
uses #delete to partition the string into 5 substrings and #delete! to
count characters (5*M + 62*M/5 = about 17*M iterations in C code).
Benchmark.bm(8) do |bm|
bm.report('array:') do
a = Array.new(255,0);
str.each_byte {|i| a[i] += 1};
[?0..?9,?a..?z,?A..?Z].map {|x| x.to_a}.flatten.each {|i|
"#{[i].pack("c")} #{a[i]}"
};
end
bm.report('hash:') do
c={}
str.each_byte{|b|c[b]||=0; c[b]+=1}
c.sort.each {|k,v| "#{k.chr} #{v}"}
end
bm.report('count:') do
['0'..'9','a'..'z','A'..'Z'].map {|x| x.to_a}.flatten.each do |i|
"#{i} #{str.count(i)}"
end
end
bm.report('delete:') do
['0'..'9','a'..'m','n'..'z','A'..'M','N'..'Z'].each do |range|
s = str.delete("^#{range.first}-#{range.last}")
range.each do |i|
n = s.size
s.delete!(i)
"#{i} #{n - s.size}"
end
end
end
end
__END__
···
On Thu, 17 Jun 2004, Ruben wrote:
The difference however is that the above code does the scan over the
String several times (# of chars to check - 1) in C, while all the other
code scans the String only once in ruby.
Probably there is a trade-off... maybe if you would wanna count all
occurences of [a-zA-Z0-9], then you would have to do the
'count'-method 57 times (57 because the above code assumes that only
the characters you count appear in the String), effectively scanning
the String 57 times. This might be slower than scanning the String
only once in ruby-code. (didn't test this though...)
It does not seem to make any difference if the source i given at the command
line or in a separate file:
ram@lilith:~/src/ruby$cat rec_pi.rb
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"
ram@lilith:~/src/ruby$time ruby rec_pi.rb < pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790
real 0m0.048s
user 0m0.040s
sys 0m0.010s
ram@lilith:~/src/ruby$time ruby -e 'n = STDIN.gets; tot = 0; 9.times do |i| c
= n.count(i.to_s); tot += c; puts "#{i} #{c}"; end; puts "9 #{n.size - tot}"'
<pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790
real 0m0.043s
user 0m0.010s
sys 0m0.030s
But maybe, you're right and these conditions do just not lead to that
difference. Under which circumstances does the difference occur?
regards
ralf
···
Am Donnerstag, 17. Juni 2004 00:00 schrieb Relm:
On Thu, 17 Jun 2004, why the lucky stiff wrote:
> Ralf Müller wrote:
> >tried with Hash instead of Array:
> >ram@lilith:~/src/ruby$time ruby -e 'str=STDIN.gets; h=Hash.new();
> > (0..9).each {|i| h[i.to_s] = 0}; (0..str.length-1).each {|i|
> > h[str[i..i]] += 1 }; h.each {|k,v| print "#{k} #{v}\n"} ' < pi.out |
> > sort
>
> why@stungun$ time ruby -n -e 'c={};$_.each_byte{|b|c[b]||=0; c[b]+=1};
> c.sort.each {|k,v| puts "#{k - ?0} #{v}"}' < pi.out
>
> same idea as yours. just runs 3x swift.
I typo a lot, so no command line. Maybe 150-170x speed of OP.
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"