Duplicating characters in a string

So you won't be able to do this any more:

2 + 2

You'll have to use Forth:

2 2 +

or Lisp:

(+ 2 2)

Some awk syntax:

gsub( /./, "&&", s )

···

On Mar 7, 6:56 pm, 7stud -- <bbxx789_0...@yahoo.com> wrote:

(I refuse to consider any code that uses perl syntax)

??? What exactly did you mean by the above text?

Cheers

  robert

···

On 08.03.2008 01:56, 7stud -- wrote:

Robert Klemme wrote:

2008/3/6, Joel VanderWerf <vjoel@path.berkeley.edu>:

It's only about a factor of two faster, according to the following, but
that matters sometimes.

Rehearsal ---------------------------------------------------
s.gsub(re){...} 5.970000 0.000000 5.970000 ( 6.009089)
s.gsub(re,...) 2.900000 0.060000 2.960000 ( 3.148006)
------------------------------------------ total: 8.930000sec

                       user system total real
s.gsub(re){...} 5.760000 0.010000 5.770000 ( 5.925049)
s.gsub(re,...) 2.800000 0.060000 2.860000 ( 2.914432)

What version did you test with? I get much more dramatic differences:

15:56:11 ~
$ ruby /c/Temp/gs.rb
Rehearsal -----------------------------------------------------------
s.gsub(re){|x| x * 2} 25.313000 0.031000 25.344000 ( 25.354000)
s.gsub(re){|x| x << x} 22.812000 0.000000 22.812000 ( 22.845000)
s.gsub(re, '\1\1') 6.516000 0.015000 6.531000 ( 6.539000)
s.gsub(re, '\&\&') 6.578000 0.016000 6.594000 ( 6.595000)
s.gsub(/./){|x| x * 2} 25.172000 0.016000 25.188000 ( 25.182000)
s.gsub(/./){|x| x << x} 22.843000 0.000000 22.843000 ( 22.857000)
s.gsub(/(.)/, '\1\1') 6.344000 0.015000 6.359000 ( 6.355000)
s.gsub(/./, '\&\&') 6.188000 0.032000 6.220000 ( 6.217000)
------------------------------------------------ total: 121.891000sec

                              user system total real
s.gsub(re){|x| x * 2} 25.484000 0.015000 25.499000 ( 25.502000)
s.gsub(re){|x| x << x} 22.813000 0.031000 22.844000 ( 22.856000)
s.gsub(re, '\1\1') 6.312000 0.000000 6.312000 ( 6.337000)
s.gsub(re, '\&\&') 6.359000 0.000000 6.359000 ( 6.359000)
s.gsub(/./){|x| x * 2} 25.922000 0.000000 25.922000 ( 25.994000)
s.gsub(/./){|x| x << x} 22.672000 0.015000 22.687000 ( 22.707000)
s.gsub(/(.)/, '\1\1') 6.375000 0.016000 6.391000 ( 6.389000)
s.gsub(/./, '\&\&') 6.235000 0.000000 6.235000 ( 6.239000)
16:00:22 ~
$ ruby -v
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]
16:00:26 ~
$ cat /c/Temp/gs.rb
require 'benchmark'

s = ("abc" * 1_000_000).freeze
re = /(.)/
re2 = /./

Benchmark.bmbm do |b|
  b.report("s.gsub(re){|x| x * 2}") do
    s.gsub(re) { |x| x * 2 }
  end

  b.report("s.gsub(re){|x| x << x}") do
    s.gsub(re) { |x| x << x }
  end

  b.report("s.gsub(re, '\\1\\1')") do
    s.gsub(re, "\\1\\1")
  end

  b.report("s.gsub(re, '\\&\\&')") do
    s.gsub(re, "\\&\\&")
  end

  b.report("s.gsub(/./){|x| x * 2}") do
    s.gsub(/./) { |x| x * 2 }
  end

  b.report("s.gsub(/./){|x| x << x}") do
    s.gsub(/./) { |x| x << x }
  end

  b.report("s.gsub(/(.)/, '\\1\\1')") do
    s.gsub(/(.)/, "\\1\\1")
  end

  b.report("s.gsub(/./, '\\&\\&')") do
    s.gsub(/./, "\\&\\&")
  end
end
16:00:31 ~
$

And taking your non-perl champion:

s.gsub(/(.)/, '\1\1')

(I refuse to consider any code that uses perl syntax), and pitting it against:

Probably. But note that blocks must be slower because they are Ruby code while the replacement without block is done in compiled C code.

Kind regards

  robert

···

On 07.03.2008 19:41, Joel VanderWerf wrote:

Robert Klemme wrote:

What version did you test with? I get much more dramatic differences:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]

Surprising. The slower cases in your results are all the block cases. Does cygwin ruby perform poorly with blocks in general? Or is it because my ruby is compiled for i686 instead of i386?

Pascal (FreePascal):

uses sysutils; // for timing

function dup_chars( s: ansistring ): ansistring;
var
  out: ansistring;
  i,j: longint;
  c: char;
begin
  setlength( out, length(s) * 2 );
  for i := 1 to length(s) do
  begin
    c := s[i]; j := (i shl 1);
    out[j-1] := c;
    out[j] := c;
  end;
  exit( out )
end;

var
  s : ansistring;
  i : longint;
  when : tDateTime;
begin
  s := '';
  for i:=1 to 1000000 do s += 'abc';
  when := Time;
  dup_chars( s );
  writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.

--->0.031 seconds

···

On Mar 7, 11:45 pm, Joel VanderWerf <vj...@path.berkeley.edu> wrote:

Rehearsal ---------------------------------------------------
s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
7stud 4.110000 0.000000 4.110000 ( 4.142912)
7stud 2.050000 0.000000 2.050000 ( 2.072398)
----------------------------------------- total: 15.470000sec

                       user system total real
s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
7stud 4.100000 0.020000 4.120000 ( 4.300772)
7stud 2.030000 0.010000 2.040000 ( 2.068190)

Some things to keep in mind:
  - captures can be slow, so if you can avoid them, you can get better
    performance
  - looping using "for" is usually faster than using "each { ... }",
    because it doesn't have to create a new scope every time the block
    is called

Three alternative solutions added given the above (my computer is slower
than yours, so I reduced the number of iterations too). Interesting
that I was able to write an enumerator class in ruby that was faster
than the builtin enumerator class (which is written in C). Also
interesting is how much slower the enumerator solution is on YARV:

require 'benchmark'
require 'enumerator'

class EachByteEnumerator
  def initialize(obj)
    @obj = obj
  end

  def each(&block)
    @obj.each_byte(&block)
  end
end

Benchmark.bmbm do |b|
  s = "abc" * 100_000
  re = /(.)/

  b.report("s.gsub(re) {|x| x*2}") do
    s.gsub(re) {|x| x*2}
  end

  b.report("s.gsub(re, '\\1\\1')") do
    s.gsub(re, '\1\1')
  end

  b.report("s.gsub(/./, '\\0\\0')") do
    s.gsub(/./, '\0\0')
  end

  b.report("7stud1") do
    new_str = ""
    s.each_byte do |byte|
      2.times do
        new_str << byte
      end
    end
  end

  b.report("7stud2") do
    new_str = ""
    s.each_byte do |byte|
      new_str << byte << byte
    end
  end

  b.report("7stud_enum") do
    new_str = ""
    o = s.to_enum(:each_byte)
    for byte in o do
      new_str << byte << byte
    end
  end

  b.report("7stud_enum2") do
    new_str = ""
    o = EachByteEnumerator.new(s)
    for byte in o do
      new_str << byte << byte
    end
  end
end

[pbrannan@zaphod tmp]$ ruby test.rb
Rehearsal --------------------------------------------------------
s.gsub(re) {|x| x*2} 2.410000 0.010000 2.420000 ( 2.431675)
s.gsub(re, '\1\1') 0.900000 0.140000 1.040000 ( 1.035942)
s.gsub(/./, '\0\0') 0.840000 0.070000 0.910000 ( 0.919175)
7stud1 1.430000 0.000000 1.430000 ( 1.430979)
7stud2 0.580000 0.010000 0.590000 ( 0.585925)
7stud_enum 0.820000 0.010000 0.830000 ( 0.837720)
7stud_enum2 0.510000 0.000000 0.510000 ( 0.511776)
----------------------------------------------- total: 7.730000sec

                           user system total real
s.gsub(re) {|x| x*2} 2.450000 0.000000 2.450000 ( 2.455824)
s.gsub(re, '\1\1') 0.860000 0.010000 0.870000 ( 0.861705)
s.gsub(/./, '\0\0') 0.820000 0.000000 0.820000 ( 0.818279)
7stud1 1.460000 0.010000 1.470000 ( 1.483312)
7stud2 0.590000 0.000000 0.590000 ( 0.589141)
7stud_enum 0.820000 0.000000 0.820000 ( 0.817231)
7stud_enum2 0.520000 0.000000 0.520000 ( 0.514009)

[pbrannan@zaphod tmp]$ ruby1.9 test.rb
Rehearsal --------------------------------------------------------
s.gsub(re) {|x| x*2} 2.880000 0.030000 2.910000 ( 2.961747)
s.gsub(re, '\1\1') 1.850000 0.120000 1.970000 ( 1.972712)
s.gsub(/./, '\0\0') 1.770000 0.040000 1.810000 ( 1.815295)
7stud1 1.040000 0.000000 1.040000 ( 1.056692)
7stud2 0.640000 0.000000 0.640000 ( 0.660844)
7stud_enum 1.710000 0.080000 1.790000 ( 1.786316)
7stud_enum2 1.640000 0.120000 1.760000 ( 1.770983)
---------------------------------------------- total: 11.920000sec

                           user system total real
s.gsub(re) {|x| x*2} 2.950000 0.030000 2.980000 ( 2.999795)
s.gsub(re, '\1\1') 1.910000 0.030000 1.940000 ( 1.938242)
s.gsub(/./, '\0\0') 1.760000 0.030000 1.790000 ( 1.811339)
7stud1 1.040000 0.000000 1.040000 ( 1.055916)
7stud2 0.650000 0.030000 0.680000 ( 0.685329)
7stud_enum 1.520000 0.120000 1.640000 ( 1.661215)
7stud_enum2 1.450000 0.120000 1.570000 ( 1.590461)

[pbrannan@zaphod tmp]$ ruby1.9 -v
ruby 1.9.0 (2008-01-24 revision 0) [i686-linux]

Your point being?

robert

···

2008/3/11, William James <w_a_x_man@yahoo.com>:

On Mar 7, 11:45 pm, Joel VanderWerf <vj...@path.berkeley.edu> wrote:

> Rehearsal ---------------------------------------------------
> s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
> s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
> 7stud 4.110000 0.000000 4.110000 ( 4.142912)
> 7stud 2.050000 0.000000 2.050000 ( 2.072398)
> ----------------------------------------- total: 15.470000sec
>
> user system total real
> s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
> s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
> 7stud 4.100000 0.020000 4.120000 ( 4.300772)
> 7stud 2.030000 0.010000 2.040000 ( 2.068190)

Pascal (FreePascal):

uses sysutils; // for timing

function dup_chars( s: ansistring ): ansistring;
var
  out: ansistring;
  i,j: longint;
  c: char;
begin
  setlength( out, length(s) * 2 );
  for i := 1 to length(s) do
  begin
    c := s[i]; j := (i shl 1);
    out[j-1] := c;
    out[j] := c;
  end;
  exit( out )
end;

var
  s : ansistring;
  i : longint;
  when : tDateTime;
begin
  s := '';
  for i:=1 to 1000000 do s += 'abc';
  when := Time;
  dup_chars( s );
  writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.

  --->0.031 seconds

--
use.inject do |as, often| as.you_can - without end

On my computer:

s.gsub(re, ...) --> 9.641000 seconds
Lua --> 1.078 seconds

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( 'abc', 1000000 )

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print( os.clock() - time, "seconds" )

···

On Mar 11, 9:26 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2008/3/11, William James <w_a_x_...@yahoo.com>:

> On Mar 7, 11:45 pm, Joel VanderWerf <vj...@path.berkeley.edu> wrote:

> > Rehearsal ---------------------------------------------------
> > s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
> > s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
> > 7stud 4.110000 0.000000 4.110000 ( 4.142912)
> > 7stud 2.050000 0.000000 2.050000 ( 2.072398)
> > ----------------------------------------- total: 15.470000sec

> > user system total real
> > s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
> > s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
> > 7stud 4.100000 0.020000 4.120000 ( 4.300772)
> > 7stud 2.030000 0.010000 2.040000 ( 2.068190)

> Pascal (FreePascal):

> uses sysutils; // for timing

> function dup_chars( s: ansistring ): ansistring;
> var
> out: ansistring;
> i,j: longint;
> c: char;
> begin
> setlength( out, length(s) * 2 );
> for i := 1 to length(s) do
> begin
> c := s[i]; j := (i shl 1);
> out[j-1] := c;
> out[j] := c;
> end;
> exit( out )
> end;

> var
> s : ansistring;
> i : longint;
> when : tDateTime;
> begin
> s := '';
> for i:=1 to 1000000 do s += 'abc';
> when := Time;
> dup_chars( s );
> writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
> end.

> --->0.031 seconds

Your point being?

robert

--
use.inject do |as, often| as.you_can - without end

On Mar 11, 9:26 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2008/3/11, William James <w_a_x_...@yahoo.com>:

> On Mar 7, 11:45 pm, Joel VanderWerf <vj...@path.berkeley.edu> wrote:

> > Rehearsal ---------------------------------------------------
> > s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
> > s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
> > 7stud 4.110000 0.000000 4.110000 ( 4.142912)
> > 7stud 2.050000 0.000000 2.050000 ( 2.072398)
> > ----------------------------------------- total: 15.470000sec

> > user system total real
> > s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
> > s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
> > 7stud 4.100000 0.020000 4.120000 ( 4.300772)
> > 7stud 2.030000 0.010000 2.040000 ( 2.068190)

> Pascal (FreePascal):

> uses sysutils; // for timing

> function dup_chars( s: ansistring ): ansistring;
> var
> out: ansistring;
> i,j: longint;
> c: char;
> begin
> setlength( out, length(s) * 2 );
> for i := 1 to length(s) do
> begin
> c := s[i]; j := (i shl 1);
> out[j-1] := c;
> out[j] := c;
> end;
> exit( out )
> end;

> var
> s : ansistring;
> i : longint;
> when : tDateTime;
> begin
> s := '';
> for i:=1 to 1000000 do s += 'abc';
> when := Time;
> dup_chars( s );
> writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
> end.

> --->0.031 seconds

Your point being?

robert

--
use.inject do |as, often| as.you_can - without end

William James wrote:

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!

I have not been able to reproduce a benchmark in which each_byte was faster than gsub (at least, without resorting to the loop unrolling trick).

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Btw, is there any particular reason why you duplicated the mail content?

···

On 11.03.2008 19:53, William James wrote:

On Mar 11, 9:26 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2008/3/11, William James <w_a_x_...@yahoo.com>:

On Mar 7, 11:45 pm, Joel VanderWerf <vj...@path.berkeley.edu> wrote:
> Rehearsal ---------------------------------------------------
> s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
> s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
> 7stud 4.110000 0.000000 4.110000 ( 4.142912)
> 7stud 2.050000 0.000000 2.050000 ( 2.072398)
> ----------------------------------------- total: 15.470000sec
> user system total real
> s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
> s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
> 7stud 4.100000 0.020000 4.120000 ( 4.300772)
> 7stud 2.030000 0.010000 2.040000 ( 2.068190)
Pascal (FreePascal):
uses sysutils; // for timing
function dup_chars( s: ansistring ): ansistring;
var
  out: ansistring;
  i,j: longint;
  c: char;
begin
  setlength( out, length(s) * 2 );
  for i := 1 to length(s) do
  begin
    c := s[i]; j := (i shl 1);
    out[j-1] := c;
    out[j] := c;
  end;
  exit( out )
end;
var
  s : ansistring;
  i : longint;
  when : tDateTime;
begin
  s := '';
  for i:=1 to 1000000 do s += 'abc';
  when := Time;
  dup_chars( s );
  writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.
  --->0.031 seconds

Your point being?

robert

--
use.inject do |as, often| as.you_can - without end

On my computer:

s.gsub(re, ...) --> 9.641000 seconds
Lua --> 1.078 seconds

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( 'abc', 1000000 )

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print( os.clock() - time, "seconds" )

I do not know Lua. Does this return a copied string or does it do it inplace?

Kind regards

  robert

I don't think your code is doing what you think it is doing:

[pbrannan@zaphod tmp]$ cat test.lua
-- s = string.rep( 'abc', 100000 )
s = 'abcabcabc'

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print(s)
print( os.clock() - time, "seconds" )

[pbrannan@zaphod tmp]$ lua test.lua
abcabcabc
0 seconds
[pbrannan@zaphod tmp]$ lua -v
Lua 5.0.2 Copyright (C) 1994-2004 Tecgraf, PUC-Rio

Paul

···

On Wed, Mar 12, 2008 at 03:57:59AM +0900, William James wrote:

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( 'abc', 1000000 )

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print( os.clock() - time, "seconds" )