Why do I get "xx" instead of "x" in the following:
$ irb
>> 'test'.gsub(/.*/,'x')
=> "xx"
and even more confusing (to me):
>> "x\n".gsub(/.*/,'y')
=> "yy\ny"
(I expected "y\n")
···
--
Wybo
Why do I get "xx" instead of "x" in the following:
$ irb
>> 'test'.gsub(/.*/,'x')
=> "xx"
and even more confusing (to me):
>> "x\n".gsub(/.*/,'y')
=> "yy\ny"
(I expected "y\n")
--
Wybo
Why do I get "xx" instead of "x" in the following:
$ irb
>> 'test'.gsub(/.*/,'x')
=> "xx"
.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'
and even more confusing (to me):
>> "x\n".gsub(/.*/,'y')
=> "yy\ny"
Same goes here as above. If you want to replace each character use
'test'.gsub(/./,'x') #=> 'xxxx'
or if you want to replace all characters in each line, use
"test\ntest".gsub(/.+/,'x') #=> "x\nx"
On Wed, Apr 2, 2008 at 10:12 PM, Wybo Dekker <wybo@servalys.nl> wrote:
> Why do I get "xx" instead of "x" in the following:
> $ irb
> >> 'test'.gsub(/.*/,'x')
> => "xx".* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'
That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.
> and even more confusing (to me):
> >> "x\n".gsub(/.*/,'y')
> => "yy\ny"
This makes sense because . doesn't normally match \n, so there's the
replacement before and after. Still, the double replacement when there
are actual characters is just weird.
On Apr 2, 3:35 pm, "Thomas Wieczorek" <wieczo...@googlemail.com> wrote:
On Wed, Apr 2, 2008 at 10:12 PM, Wybo Dekker <w...@servalys.nl> wrote:
Same goes here as above. If you want to replace each character use
'test'.gsub(/./,'x') #=> 'xxxx'
or if you want to replace all characters in each line, use
"test\ntest".gsub(/.+/,'x') #=> "x\nx"
--
-yossef
Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'
On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn <ymendel@pobox.com> wrote:
On Apr 2, 3:35 pm, "Thomas Wieczorek" <wieczo...@googlemail.com> > wrote:
>
> .* matches NO and ALL characters, so gsub() substitutes
> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.
Thomas Wieczorek [2008-04-02 22:59]:
.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
'xx'That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
that the .* should match [empty string]test[empty string] just once.Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'
can't explain it either, i'm afraid. but you can see what it does
like so:
'test'.gsub(/.*/) { |m| p m; 'x'}
"test"
""
=>"xx"
as soon as you anchor the regexp at the beginning of the string it
gives the expected result:
'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"
or just do:
'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"
![]()
cheers
jens
On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn > <ymendel@pobox.com> wrote:
On Apr 2, 3:35 pm, "Thomas Wieczorek" >> <wieczo...@googlemail.com> wrote:
That seems like a bug to me. The entire string is matched/consumed
by .*, so why try matching again? Or, if you are going to continue,
why stop with just one additional match? Is there code in gsub to
"only match one time after the string is consumed" ?
irb(main):001:0> 'test' =~ /(.*)(.*)(.*)/
=> 0
irb(main):002:0> $1
=> "test"
irb(main):003:0> $2
=> ""
irb(main):004:0> $3
=> ""
On Apr 2, 5:13 pm, Jens Wille <jens.wi...@uni-koeln.de> wrote:
Thomas Wieczorek [2008-04-02 22:59]:> On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn
> <ymen...@pobox.com> wrote:
>> On Apr 2, 3:35 pm, "Thomas Wieczorek" > >> <wieczo...@googlemail.com> wrote:
>>> .* matches NO and ALL characters, so gsub() substitutes
>>> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
>>> 'xx'
>> That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
>> more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
>> that the .* should match [empty string]test[empty string] just once.
> Yeah, it is confusing me, but I agreed on that explanation with
> myself, when I read it once here. I'd also expect 'x' instead of 'xx'can't explain it either, i'm afraid. but you can see what it does
like so:> 'test'.gsub(/.*/) { |m| p m; 'x'}
"test"
""
=>"xx"
Jens Wille wrote:
as soon as you anchor the regexp at the beginning of the string it
gives the expected result:> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"or just do:
> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"
sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me 'xx', since .* means: zero or more of any character, except the newline character, i.e.: all of the string should be replaced with a single x, as far as I can see.
--
Wybo
Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.
irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=> "tex"
Ken
-----Original Message-----
From: Wybo Dekker [mailto:wybo@servalys.nl]
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')
Jens Wille wrote:
as soon as you anchor the regexp at the beginning of the string it
gives the expected result:> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"or just do:
> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"
sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.
--
Wybo
# sure, that works, and so does test.gsub(/.+/,'x').
# The point is that I don't understand why test.gsub(/.*/,'x') gives me
# 'xx', since .* means: zero or more of any character, except
# the newline
# character, i.e.: all of the string should be replaced with a
# single x, as far as I can see.
you can start (slowly) by comparing these two examples,
irb(main):077:0> ''.gsub(/.*/, 'x')
=> "x"
irb(main):078:0> ''.gsub(/.+/, 'x')
=> ""
kind regards -botp
From: Wybo Dekker [mailto:wybo@servalys.nl]
Januski, Ken [2008-04-03 00:08]:
Of course my background is Perl and I believe that's how it would
work there.
no, works the same way there:
> perl -e '$s = "test"; $s =~ s/.*/x/g; print "$s\n"'
xx
(only a lot more complicated ![]()
btw: python, php and javascript, too.
oh, and here's what oniguruma does:
> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=>"xx"
cheers
jens
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, ...
-
Wybo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I would venture to say this is exactly what it does. It finds two
matches and replaces them both with 'x'. The first match is an empty
string <zero>, while the second match is the full string <or more >.
Alex
-----Original Message-----
From: Januski, Ken [mailto:kjanuski@phillynews.com]
Sent: Wednesday, April 02, 2008 3:08 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')
Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.
irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=> "tex"
Ken
-----Original Message-----
From: Wybo Dekker [mailto:wybo@servalys.nl]
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')
Jens Wille wrote:
as soon as you anchor the regexp at the beginning of the string it
gives the expected result:> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"or just do:
> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"
sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.
--
Wybo
Perl, PHP:
perl -le '$str="test"; $str =~ s/.*?/x/g; print $str;'
xxxxxxxxx
preg_replace('/.*?/', 'x', 'test');
xxxxxxxxx
Ruby:
print 'test'.gsub(/.*?/, 'x')
xtxexsxtx
Zaki
--
Posted via http://www.ruby-forum.com/.
Right you are. For all the years I've used Perl, and for all that I thought I knew about regexes, I never would have thought I would get that result.
I would have expected one greedy match for the entire text. Instead I guess it's first getting the zero match and then the full match.
-----Original Message-----
From: Jens Wille [mailto:jens.wille@uni-koeln.de]
Sent: Wed 4/2/2008 7:03 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')
Januski, Ken [2008-04-03 00:08]:
Of course my background is Perl and I believe that's how it would
work there.
no, works the same way there:
> perl -e '$s = "test"; $s =~ s/.*/x/g; print "$s\n"'
xx
(only a lot more complicated ![]()
btw: python, php and javascript, too.
oh, and here's what oniguruma does:
> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=>"xx"
cheers
jens
I would have expected one greedy match for the entire text.
Instead I guess it's first getting the zero match and then
the full match.
Actually, it's vice versa. It matches the whole string (greedy), then
matches the end of string. The "test" string is seen by the regex engine
as:
test<end of string>
.* first matches "test". <end of string> is a special 'character' that
is not consumed by ".", so the remaining string is then "<end of
", This is also matched, as it contains zero or more characters
(but is not then matched infinitely, as the position in the string has
not advanced.
Dan.