String performance

Gavin_Kistner · 9 January 2007 21:07

user system total real
0.010000 0.000000 0.010000 ( 0.009087)
0.010000 0.000000 0.010000 ( 0.008774)
0.000000 0.000000 0.000000 ( 0.004621)

Perhaps your machine is more deterministic than mine, but successive
runs of that benchmark (and using #bmbm to be safer about the
measurement) sometimes show 'a' faster than "a", sometimes slower.

Even with benchmarking, I wouldn't trust that answers that are within a
few percent of each other. And I certainly wouldn't rush off to refactor
code because of it.

···

From: Vincent Fourmond

Kirk_Haines · 9 January 2007 21:17

Increase n from 5000 to 500000 or 5000000.

To understand the difference, just think about how many strings are being created with each.

'a' creates a new string, as does 'b'.
The + operation creates a new string, as well.

So, there's a lot of new string creation happening with either of the + examples.

Change the +'s to << and you will see a difference.

'a' << x << 'b'

<< just changes the old String.

The "a#{x}b" example does the least work.

Kirk Haines

···

On Wed, 10 Jan 2007, Gavin Kistner wrote:

From: Vincent Fourmond

user system total real
0.010000 0.000000 0.010000 ( 0.009087)
0.010000 0.000000 0.010000 ( 0.008774)
0.000000 0.000000 0.000000 ( 0.004621)

Perhaps your machine is more deterministic than mine, but successive
runs of that benchmark (and using #bmbm to be safer about the
measurement) sometimes show 'a' faster than "a", sometimes slower.

Even with benchmarking, I wouldn't trust that answers that are within a
few percent of each other. And I certainly wouldn't rush off to refactor
code because of it.

Vincent_Fourmond · 9 January 2007 21:17

Gavin Kistner wrote:

From: Vincent Fourmond

user system total real
0.010000 0.000000 0.010000 ( 0.009087)
0.010000 0.000000 0.010000 ( 0.008774)
0.000000 0.000000 0.000000 ( 0.004621)

Perhaps your machine is more deterministic than mine, but successive
runs of that benchmark (and using #bmbm to be safer about the
measurement) sometimes show 'a' faster than "a", sometimes slower.

The thing is they are rigourosly equivalents. As soon as the program
is parsed, they are represented exactly as the same objects, a String.
So they are the same, that's why you get around the same processing times.

Moreover, it is normal that interpolation is faster, because it
involves only evaluation of x as a String and string copy, whereas
addition involves two method calls (+ and +), which are rather expensive
(at least more than a string copy for such small strings).

Cheers,

Vince

···

--
Vincent Fourmond, PhD student
http://vincent.fourmond.neuf.fr/

Robert_K1 · 10 January 2007 08:15

I have added some alternatives - string interpolation still wins

robert@fussel /cygdrive/c/temp
$ ruby str_bench.rb
Rehearsal -------------------------------------------
'a' + 2.860000 0.000000 2.860000 ( 2.859000)
"a" + 2.890000 0.000000 2.890000 ( 2.891000)
a#{ 1.860000 0.000000 1.860000 ( 1.859000)
"" << 3.734000 0.000000 3.734000 ( 3.734000)
"a" << 2.328000 0.000000 2.328000 ( 2.328000)
A + 2.625000 0.000000 2.625000 ( 2.625000)
"" << A 3.453000 0.000000 3.453000 ( 3.453000)
--------------------------------- total: 19.750000sec

user system total real
'a' + 2.906000 0.000000 2.906000 ( 2.907000)
"a" + 2.891000 0.000000 2.891000 ( 2.890000)
a#{ 1.859000 0.000000 1.859000 ( 1.860000)
"" << 3.766000 0.000000 3.766000 ( 3.765000)
"a" << 2.344000 0.000000 2.344000 ( 2.344000)
A + 2.640000 0.000000 2.640000 ( 2.641000)
"" << A 3.469000 0.000000 3.469000 ( 3.468000)

robert@fussel /cygdrive/c/temp
$ cat str_bench.rb
require 'benchmark'

n = 1_000_000
c = "stuff"

A = "a"
B = "b"

Benchmark.bmbm do |x|
   x.report('\'a\' +') { n.times {'a' + c + 'b'}}
   x.report('"a" +') { n.times {"a" + c + "b"}}
   x.report('a#{') { n.times {"a#{c}b"}}
   x.report('"" <<') { n.times {"" << "a" << c << "b"}}
   x.report('"a" <<') { n.times {"a" << c << "b"}}

x.report('A +') { n.times {A + c + B}}
x.report('"" << A') { n.times {"" << A << c << B}}
end

robert

···

On 09.01.2007 22:17, khaines@enigo.com wrote:

On Wed, 10 Jan 2007, Gavin Kistner wrote:

From: Vincent Fourmond

user system total real
0.010000 0.000000 0.010000 ( 0.009087)
0.010000 0.000000 0.010000 ( 0.008774)
0.000000 0.000000 0.000000 ( 0.004621)

Perhaps your machine is more deterministic than mine, but successive
runs of that benchmark (and using #bmbm to be safer about the
measurement) sometimes show 'a' faster than "a", sometimes slower.

Even with benchmarking, I wouldn't trust that answers that are within a
few percent of each other. And I certainly wouldn't rush off to refactor
code because of it.

Increase n from 5000 to 500000 or 5000000.

To understand the difference, just think about how many strings are being created with each.

'a' creates a new string, as does 'b'.
The + operation creates a new string, as well.

So, there's a lot of new string creation happening with either of the + examples.

Change the +'s to << and you will see a difference.

'a' << x << 'b'

<< just changes the old String.

The "a#{x}b" example does the least work.

Jason_Mayer · 10 January 2007 12:34

What do the different columns actually mean?

···

On 1/10/07, Robert Klemme <shortcutter@googlemail.com> wrote:

On 09.01.2007 22:17, khaines@enigo.com wrote:
> On Wed, 10 Jan 2007, Gavin Kistner wrote:
>
>> From: Vincent Fourmond
>>> user system total real
>>> 0.010000 0.000000 0.010000 ( 0.009087)
>>> 0.010000 0.000000 0.010000 ( 0.008774)
>>> 0.000000 0.000000 0.000000 ( 0.004621)
>>

Jean-Francois1 · 10 January 2007 16:24

Robert :

robert@fussel /cygdrive/c/temp
$ cat str_bench.rb
require 'benchmark'

n = 1_000_000
c = "stuff"

A = "a"
B = "b"

Benchmark.bmbm do |x|
   x.report('\'a\' +') { n.times {'a' + c + 'b'}}
   x.report('"a" +') { n.times {"a" + c + "b"}}
   x.report('a#{') { n.times {"a#{c}b"}}
   x.report('"" <<') { n.times {"" << "a" << c << "b"}}
   x.report('"a" <<') { n.times {"a" << c << "b"}}

   x.report('A +') { n.times {A + c + B}}
   x.report('"" << A') { n.times {"" << A << c << B}}
end

Hi,

You can add the format way also : "a%sb" % c

x.report('a%sb') { n.times { "a%sb" % c }}

slower than string interpolation.

-- Jean-François.

···

--
À la renverse.

Jan_Svitok · 10 January 2007 12:42

http://ruby-doc.org/stdlib/libdoc/benchmark/rdoc/classes/Benchmark.html

This report shows the user CPU time, system CPU time, the sum of the
user and system CPU times, and the elapsed real time. The unit of time
is seconds.

i.e. time spent in user mode, kernel mode, user+kernel (for these only
time spent by this particular program is counted) and elapsed real
("wallclock" time)

···

On 1/10/07, Jason Mayer <slamboy@gmail.com> wrote:

On 1/10/07, Robert Klemme <shortcutter@googlemail.com> wrote:
>
> On 09.01.2007 22:17, khaines@enigo.com wrote:
> > On Wed, 10 Jan 2007, Gavin Kistner wrote:
> >
> >> From: Vincent Fourmond
> >>> user system total real
> >>> 0.010000 0.000000 0.010000 ( 0.009087)
> >>> 0.010000 0.000000 0.010000 ( 0.008774)
> >>> 0.000000 0.000000 0.000000 ( 0.004621)
> >>

What do the different columns actually mean?

Didi · 10 January 2007 16:29

unsubscribe

Jason_Mayer · 10 January 2007 12:55

Excellent, thanks. I started to benchmark my quiz submission's file
reading and uh, ram usage on the ruby process skyrocketed to 200MB. Does
benchmark not play well with blocks of code?

···

On 1/10/07, Jan Svitok <jan.svitok@gmail.com> wrote:

On 1/10/07, Jason Mayer <slamboy@gmail.com> wrote:
> On 1/10/07, Robert Klemme <shortcutter@googlemail.com> wrote:
> >
> > On 09.01.2007 22:17, khaines@enigo.com wrote:
> > > On Wed, 10 Jan 2007, Gavin Kistner wrote:
> > >
> > >> From: Vincent Fourmond
> > >>> user system total real
> > >>> 0.010000 0.000000 0.010000 ( 0.009087)
> > >>> 0.010000 0.000000 0.010000 ( 0.008774)
> > >>> 0.000000 0.000000 0.000000 ( 0.004621)
> > >>
>
> What do the different columns actually mean?

http://ruby-doc.org/stdlib/libdoc/benchmark/rdoc/classes/Benchmark.html

This report shows the user CPU time, system CPU time, the sum of the
user and system CPU times, and the elapsed real time. The unit of time
is seconds.

i.e. time spent in user mode, kernel mode, user+kernel (for these only
time spent by this particular program is counted) and elapsed real
("wallclock" time)

Jason_Mayer · 10 January 2007 13:10

So I tried to benchmark my code, and it works for very small numbers of
tests.
2 repetitions:
      user system total real
  0.871000 0.010000 0.881000 ( 0.902000)
5 repetitions:
      user system total real
  2.323000 0.050000 2.373000 ( 3.024000)
15 repetitions however gives the same problem - 5 minutes after I started
the program I killed it.

Can someone help me understand if this is a problem with benchmark or with
my code?

···

On 1/10/07, Jason Mayer <slamboy@gmail.com> wrote:

Excellent, thanks. I started to benchmark my quiz submission's file
reading and uh, ram usage on the ruby process skyrocketed to 200MB. Does
benchmark not play well with blocks of code?

Pit · 10 January 2007 14:30

Jason Mayer schrieb:

(...)
Can someone help me understand if this is a problem with benchmark or with
my code?

Jason, simply run the same loops without the benchmark library. Since benchmark is just recording some times, I guess you'll see the same behaviour.

Regards,
Pit

Topic		Replies	Views
String performance ruby-talk	3	81	9 January 2007
"" vs '' ruby-talk	0	71	25 September 2006
String#chop slow? REALLY slow? ruby-talk	13	143	2 August 2006
Ruby performance ruby-talk	9	69	29 September 2007
Something I just found out, am sharing (:newbish) ruby-talk	12	101	24 August 2007

String performance

Related topics