I just did a quick benchmark to prove something to myself. But I'd like to get a sanity check from the people on the list.
Basically I want to drop what will be a trailing "\n" from input. But it appears that using String#[] and if statements is nearly 200 times more efficient than chop. Which just seems really weird, so here's the benchmark. Maybe I'm doing something wrong.
Does this seem right? Anyone care to comment?
---- index_vs_chop.rb
require 'benchmark'
n = 100_000
bigstring = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
bigstring[0..-1]
end
}
bench.report("Chop") {
n.times do
bigstring.chop
end
}
end
user system total real
Indexing 0.140000 0.000000 0.140000 ( 0.142257)
Chop 0.110000 0.000000 0.110000 ( 0.104612)
Chop2 0.150000 0.000000 0.150000 ( 0.152083)
harp:~ > cat a.rb
require 'benchmark'
n = 100_000
bigstring = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
bigstring[0..-1]
end
}
bench.report("Chop") {
n.times do
bigstring.chop
end
}
bench.report("Chop2") {
n.times do
bigstring = bigstring[0..-2]
end
}
end
-a
···
On Fri, 28 Jul 2006, Mat Schaffer wrote:
I just did a quick benchmark to prove something to myself. But I'd like to get a sanity check from the people on the list.
Basically I want to drop what will be a trailing "\n" from input. But it appears that using String# and if statements is nearly 200 times more efficient than chop. Which just seems really weird, so here's the benchmark. Maybe I'm doing something wrong.
Does this seem right? Anyone care to comment?
---- index_vs_chop.rb
require 'benchmark'
n = 100_000
bigstring = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
bigstring[0..-1]
end
}
bench.report("Chop") {
n.times do
bigstring.chop
end
}
end
Basically I want to drop what will be a trailing "\n" from input. But it appears that using String# and if statements is nearly 200 times more efficient than chop. Which just seems really weird, so here's the benchmark. Maybe I'm doing something wrong.
Well, if you implement chop fully, you get very similar results:
user system total real
Indexing 1.780000 3.980000 5.760000 ( 7.033924)
Chop 1.670000 3.970000 5.640000 ( 7.297766)
Indexing crlf 1.790000 4.020000 5.810000 ( 8.969243)
Chop crlf 1.680000 4.000000 5.680000 ( 7.480123)
···
---
require 'benchmark'
n = 10_000
bigstring = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
bigstring[0..-2] == "\r\n" ? bigstring[0..-2] : bigstring[0..-1]
end
}
bench.report("Chop") {
n.times do
bigstring.chop
end
}
bigstring << "\r\n"
bench.report("Indexing crlf") {
n.times do
bigstring[0..-2] == "\r\n" ? bigstring[0..-2] : bigstring[0..-1]
end
}
bench.report("Chop crlf") {
n.times do
bigstring.chop
end
}
end
I just did a quick benchmark to prove something to myself. But I'd like to get a sanity check from the people on the list.
Basically I want to drop what will be a trailing "\n" from input. But it appears that using String# and if statements is nearly 200 times more efficient than chop. Which just seems really weird, so here's the benchmark. Maybe I'm doing something wrong.
Does this seem right? Anyone care to comment?
<snip>
As someone else pointed out, you'll probably want to use String#chop! for faster performance, since it uses the current object instead of creating a new one.
Also note that str[0..-2] is not quite the same as str.chop when "\r\n" is involved:
I wouldn't think the extra work of checking for "\r\n" would add that much overhead, though.
Regards,
Dan
This communication is the property of Qwest and may contain confidential or
privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.
String#chomp would probably be a better idea for this, but that's OT I suppose. Regardless, its performance is the same as chop, it seems.
Here are my modifications:
require 'benchmark'
class String
def my_chop
self[0..-2]
end
end
n = 100_000
bigstring = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
bigstring[0..-1]
end
}
bench.report("Chop") {
n.times do
bigstring.chop
end
}
bench.report("My Chop") {
n.times do
bigstring.my_chop
end
}
end
And here are my results:
Rehearsal --------------------------------------------
Indexing 0.310000 0.000000 0.310000 ( 0.347943)
Chop 11.940000 30.330000 42.270000 ( 44.501066)
My Chop 12.620000 30.720000 43.340000 ( 46.339651)
---------------------------------- total: 85.920000sec
user system total real
Indexing 0.230000 0.000000 0.230000 ( 0.258177)
Chop 11.980000 30.680000 42.660000 ( 44.966923)
My Chop 12.610000 30.860000 43.470000 ( 45.859064)
Let's see how String#chop is implemented...
static VALUE
rb_str_chop(str)
VALUE str;
{
str = rb_str_dup(str);
rb_str_chop_bang(str);
return str;
}
So it's in C... interesting....
- Jake McArthur
···
On Jul 27, 2006, at 11:12 AM, Mat Schaffer wrote:
Basically I want to drop what will be a trailing "\n" from input.
user system total real
Indexing 0.156000 0.000000 0.156000 ( 0.156000)
Chop 0.094000 0.000000 0.094000 ( 0.094000)
Chop2 0.187000 0.000000 0.187000 ( 0.187000)
ruby -v
ruby 1.8.4 (2005-12-24) [i386-mswin32]
I think the difference in performance is because internally chop does
a dup on the string then calls chop! whereas the index operation
creates a new string which shares the old string but with a different
length. I guess this is also why the rehearsal and final results
differ - cutting out the cost of GC doesn't reflect the true cost of
using chop (especially with big strings).
Regards,
Sean
···
On 7/27/06, Mat Schaffer <schapht@gmail.com> wrote:
user system total real
Indexing 0.140000 0.000000 0.140000 ( 0.142257)
Chop 0.110000 0.000000 0.110000 ( 0.104612)
Chop2 0.150000 0.000000 0.150000 ( 0.152083)
Basically I want to drop what will be a trailing "\n" from input. But it appears that using String# and if statements is nearly 200 times more efficient than chop. Which just seems really weird, so here's the benchmark. Maybe I'm doing something wrong.
Well, if you implement chop fully, you get very similar results:
Ah, but rangeless indexing yields much much better results:
user system total real
Indexing 0.110000 0.000000 0.110000 ( 0.134087)
Chop 3.430000 7.980000 11.410000 ( 14.305555)
Indexing crlf 0.110000 0.000000 0.110000 ( 0.125122)
Chop crlf 3.420000 7.990000 11.410000 ( 13.869411)
···
On 2006-07-27, at 13:36 , Caio Chassot wrote:
---
require 'benchmark'
n = 20_000
bigstring = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
bigstring[-2,2] == "\r\n" ? bigstring[-2,2] : bigstring[-1,1]
end
}
bench.report("Chop") {
n.times do
bigstring.chop
end
}
bigstring << "\r\n"
bench.report("Indexing crlf") {
n.times do
bigstring[-2,2] == "\r\n" ? bigstring[-2,2] : bigstring[-1,1]
end
}
bench.report("Chop crlf") {
n.times do
bigstring.chop
end
}
end
user system total real
Indexing 0.156000 0.000000 0.156000 ( 0.156000)
Chop 0.094000 0.000000 0.094000 ( 0.094000)
Chop2 0.187000 0.000000 0.187000 ( 0.187000)
ruby -v
ruby 1.8.4 (2005-12-24) [i386-mswin32]
I think the difference in performance is because internally chop does
a dup on the string then calls chop! whereas the index operation
creates a new string which shares the old string but with a different
length. I guess this is also why the rehearsal and final results
differ - cutting out the cost of GC doesn't reflect the true cost of
using chop (especially with big strings).
Regards,
Sean
Indexing allocates a new string. It has to since (1) Ruby strings are
mutable, (2) Ruby strings have \0 at the end.
Basically I want to drop what will be a trailing "\n" from input. But it appears that using String# and if statements is nearly 200 times more efficient than chop. Which just seems really weird, so here's the benchmark. Maybe I'm doing something wrong.
Well, if you implement chop fully, you get very similar results:
Ah, but rangeless indexing yields much much better results:
Speaking of catching typos, I apparently went too happy with my de-ranging and implemented the wrong thing. Here are the actual results. Pretty much the same as with ranges:
user system total real
Indexing 3.700000 8.050000 11.750000 ( 14.579216)
Chop 3.520000 8.100000 11.620000 ( 15.165561)
Indexing crlf 3.730000 8.090000 11.820000 ( 15.573669)
Chop crlf 3.520000 8.100000 11.620000 ( 15.706817)
···
On 2006-07-27, at 13:40 , Caio Chassot wrote:
On 2006-07-27, at 13:36 , Caio Chassot wrote:
---
require 'benchmark'
n = 20_000
s = "I am a big string " * 5_000
Benchmark.bmbm do |bench|
bench.report("Indexing") {
n.times do
s[-2,2] == "\r\n" ? s[0, s.length - 2] : s[0, s.length - 1]
end
}
bench.report("Chop") {
n.times do
s.chop
end
}
s << "\r\n"
bench.report("Indexing crlf") {
n.times do
s[-2,2] == "\r\n" ? s[0, s.length - 2] : s[0, s.length - 1]
end
}
bench.report("Chop crlf") {
n.times do
s.chop
end
}
end