Confused by 'test'.gsub(/.*/,'x')

Wybo_Dekker · 2 April 2008 20:12

Why do I get "xx" instead of "x" in the following:

$ irb
>> 'test'.gsub(/.*/,'x')
=> "xx"

and even more confusing (to me):

>> "x\n".gsub(/.*/,'y')
=> "yy\ny"

(I expected "y\n")

···

--
Wybo

Thomas_Wieczorek · 2 April 2008 20:35

Why do I get "xx" instead of "x" in the following:

$ irb
>> 'test'.gsub(/.*/,'x')
=> "xx"

.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

and even more confusing (to me):

>> "x\n".gsub(/.*/,'y')
=> "yy\ny"

Same goes here as above. If you want to replace each character use
'test'.gsub(/./,'x') #=> 'xxxx'
or if you want to replace all characters in each line, use
"test\ntest".gsub(/.+/,'x') #=> "x\nx"

···

On Wed, Apr 2, 2008 at 10:12 PM, Wybo Dekker <wybo@servalys.nl> wrote:

Yossef_Mendelssohn · 2 April 2008 20:55

> Why do I get "xx" instead of "x" in the following:

> $ irb
> >> 'test'.gsub(/.*/,'x')
> => "xx"

.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.

> and even more confusing (to me):

> >> "x\n".gsub(/.*/,'y')
> => "yy\ny"

This makes sense because . doesn't normally match \n, so there's the
replacement before and after. Still, the double replacement when there
are actual characters is just weird.

···

On Apr 2, 3:35 pm, "Thomas Wieczorek" <wieczo...@googlemail.com> wrote:

On Wed, Apr 2, 2008 at 10:12 PM, Wybo Dekker <w...@servalys.nl> wrote:

Same goes here as above. If you want to replace each character use
'test'.gsub(/./,'x') #=> 'xxxx'
or if you want to replace all characters in each line, use
"test\ntest".gsub(/.+/,'x') #=> "x\nx"

--
-yossef

Thomas_Wieczorek · 2 April 2008 20:59

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'

···

On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn <ymendel@pobox.com> wrote:

On Apr 2, 3:35 pm, "Thomas Wieczorek" <wieczo...@googlemail.com> > wrote:

>
> .* matches NO and ALL characters, so gsub() substitutes
> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.

Jens_Wille1 · 2 April 2008 21:13

Thomas Wieczorek [2008-04-02 22:59]:

.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
'xx'

That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
that the .* should match [empty string]test[empty string] just once.

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'

can't explain it either, i'm afraid. but you can see what it does
like so:

'test'.gsub(/.*/) { |m| p m; 'x'}

"test"
""
=>"xx"

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

'test'.gsub(/\A.*/) { |m| p m; 'x'}

"test"
=>"x"

or just do:

'test'.sub(/.*/) { |m| p m; 'x'}

"test"
=>"x"

cheers
jens

···

On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn > <ymendel@pobox.com> wrote:

On Apr 2, 3:35 pm, "Thomas Wieczorek" >> <wieczo...@googlemail.com> wrote:

Brian_Adkins · 2 April 2008 21:25

That seems like a bug to me. The entire string is matched/consumed
by .*, so why try matching again? Or, if you are going to continue,
why stop with just one additional match? Is there code in gsub to
"only match one time after the string is consumed" ?

irb(main):001:0> 'test' =~ /(.*)(.*)(.*)/
=> 0
irb(main):002:0> $1
=> "test"
irb(main):003:0> $2
=> ""
irb(main):004:0> $3
=> ""

···

On Apr 2, 5:13 pm, Jens Wille <jens.wi...@uni-koeln.de> wrote:

Thomas Wieczorek [2008-04-02 22:59]:> On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn
> <ymen...@pobox.com> wrote:
>> On Apr 2, 3:35 pm, "Thomas Wieczorek" > >> <wieczo...@googlemail.com> wrote:
>>> .* matches NO and ALL characters, so gsub() substitutes
>>> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
>>> 'xx'
>> That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
>> more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
>> that the .* should match [empty string]test[empty string] just once.
> Yeah, it is confusing me, but I agreed on that explanation with
> myself, when I read it once here. I'd also expect 'x' instead of 'xx'

can't explain it either, i'm afraid. but you can see what it does
like so:

> 'test'.gsub(/.*/) { |m| p m; 'x'}
"test"
""
=>"xx"

Wybo_Dekker · 2 April 2008 21:39

Jens Wille wrote:

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"

or just do:

> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me 'xx', since .* means: zero or more of any character, except the newline character, i.e.: all of the string should be replaced with a single x, as far as I can see.

···

--
Wybo

Januski_Ken · 2 April 2008 22:08

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.

irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=> "tex"

Ken

···

-----Original Message-----
From: Wybo Dekker [mailto:wybo@servalys.nl]
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Jens Wille wrote:

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"

or just do:

> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.

--
Wybo

_Pena_Botp1 · 3 April 2008 02:38

# sure, that works, and so does test.gsub(/.+/,'x').
# The point is that I don't understand why test.gsub(/.*/,'x') gives me
# 'xx', since .* means: zero or more of any character, except
# the newline
# character, i.e.: all of the string should be replaced with a
# single x, as far as I can see.

you can start (slowly) by comparing these two examples,

irb(main):077:0> ''.gsub(/.*/, 'x')
=> "x"

irb(main):078:0> ''.gsub(/.+/, 'x')
=> ""

kind regards -botp

···

From: Wybo Dekker [mailto:wybo@servalys.nl]

Jens_Wille1 · 2 April 2008 23:03

Januski, Ken [2008-04-03 00:08]:

Of course my background is Perl and I believe that's how it would
work there.

no, works the same way there:

> perl -e '$s = "test"; $s =~ s/.*/x/g; print "$s\n"'
xx

(only a lot more complicated

btw: python, php and javascript, too.

oh, and here's what oniguruma does:

> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=>"xx"

cheers
jens

Bilyk_Alex · 3 April 2008 00:16

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, ...

···

-
Wybo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would venture to say this is exactly what it does. It finds two
matches and replaces them both with 'x'. The first match is an empty
string <zero>, while the second match is the full string <or more >.

Alex

-----Original Message-----
From: Januski, Ken [mailto:kjanuski@phillynews.com]
Sent: Wednesday, April 02, 2008 3:08 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.

irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=> "tex"

Ken

-----Original Message-----
From: Wybo Dekker [mailto:wybo@servalys.nl]
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Jens Wille wrote:

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"

or just do:

> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.

--
Wybo

Zoltan_Dezso · 3 April 2008 01:12

Perl, PHP:

perl -le '$str="test"; $str =~ s/.*?/x/g; print $str;'
xxxxxxxxx

preg_replace('/.*?/', 'x', 'test');
xxxxxxxxx

Ruby:
print 'test'.gsub(/.*?/, 'x')
xtxexsxtx

Zaki

···

--
Posted via http://www.ruby-forum.com/.

Januski_Ken · 3 April 2008 02:23

Right you are. For all the years I've used Perl, and for all that I thought I knew about regexes, I never would have thought I would get that result.

I would have expected one greedy match for the entire text. Instead I guess it's first getting the zero match and then the full match.

···

-----Original Message-----
From: Jens Wille [mailto:jens.wille@uni-koeln.de]
Sent: Wed 4/2/2008 7:03 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Januski, Ken [2008-04-03 00:08]:

Of course my background is Perl and I believe that's how it would
work there.

no, works the same way there:

> perl -e '$s = "test"; $s =~ s/.*/x/g; print "$s\n"'
xx

(only a lot more complicated

btw: python, php and javascript, too.

oh, and here's what oniguruma does:

> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=>"xx"

cheers
jens

Daniel_Sheppard · 3 April 2008 03:32

I would have expected one greedy match for the entire text.
Instead I guess it's first getting the zero match and then
the full match.

Actually, it's vice versa. It matches the whole string (greedy), then
matches the end of string. The "test" string is seen by the regex engine
as:

test<end of string>

.* first matches "test". <end of string> is a special 'character' that
is not consumed by ".", so the remaining string is then "<end of

", This is also matched, as it contains zero or more characters

(but is not then matched infinitely, as the position in the string has
not advanced.

Dan.

Topic		Replies	Views
Regexp Error? ruby-talk	15	119	15 May 2004
Regexp Error? ruby-talk	14	93	14 May 2004
Do You Understand Regular Expressions? ruby-talk	19	128	22 June 2007
Weird gsub behavior ruby-talk	3	76	24 January 2008
String gsub returns an empty string, a bug? ruby-talk	2	173	15 September 2013

Confused by 'test'.gsub(/.*/,'x')

Related topics