Bug in sub?

Hi,

I assume it’s been a long day and I’m missing something but line 3 below
sure looks like a bug to me:

$ irb
irb(main):001:0> “a”.sub(“a”, “+”)
=> "+"
irb(main):002:0> “a”.sub(“a”, "\")
=> "\
"
irb(main):003:0> “a”.sub(“a”, “\+”)
=> ""
irb(main):004:0> exit

robert_feldt@it002473 /tmp/ruby
$ ruby -v
ruby 1.9.0 (2004-04-04) [i386-cygwin]

The same behavior is seen in 1.8.1 i386-cygwin and 1.8.1 i386-mswin32
and for gsub so it’s been in there for some time.

What am I missing?

Regards,

Robert Feldt

irb(main):003:0> "a".sub("a", "\\+")
=> ""

svg% ruby -e 'p "abcdef".sub(/(.)./, "\\+")'
"acdef"
svg%

svg% ruby -e 'p "abcdef".sub(/(.).(..)/, "\\+")'
"cdef"
svg%

Guy Decoux

ts wrote:

“R” == Robert Feldt feldt@ce.chalmers.se writes:

irb(main):003:0> “a”.sub(“a”, “\+”)
=> “”

svg% ruby -e ‘p “abcdef”.sub(/(.)./, “\+”)’
“acdef”
svg%

svg% ruby -e ‘p “abcdef”.sub(/(.).(…)/, “\+”)’
“cdef”
svg%

Yeah, that’s right the “last matched group”; I don’t use that one much
apparently…

Thanks Guy!

/Robert

Sorry, call me thick but I don’t understand that :slight_smile:

Did Guy confirm that it was a bug, or show that it is not a bug?

Thanks
Martin

···

On Tuesday 06 Apr 2004 3:01 pm, Robert Feldt wrote:

Yeah, that’s right the “last matched group”; I don’t use that one much
apparently…

Perhaps I’m missing the point, but what semantic have consecutive
backslashes substituting a string ?
I feels like a bug, but probably isn’t:
irb(main):008:0> puts “abcd”.sub(‘abcd’,“\”).length
1
irb(main):014:0> puts “abcd”.sub(‘abcd’,“\”*2).length
1
irb(main):009:0> puts “abcd”.sub(‘abcd’,“\”*10).length
5

Thx a lot,
benedikt

···

On Tue, 06 Apr 2004 23:01:55 +0900, Robert Feldt wrote:

ts wrote:

svg% ruby -e ‘p “abcdef”.sub(/(.).(…)/, “\+”)’
“cdef”
svg%

Yeah, that’s right the “last matched group”; I don’t use that one much
apparently…

not_a_bug # that’s python for not a bug

svg% ruby -e ‘p “abcdef”.sub(/(.)./, “\+”)’
“acdef”
svg%

/(.)./ and ‘+’

- match a char, followed by another char. remember the first char
- replace the entire match with the remembered first char

abcdef -> acdef
···

On Tue, 6 Apr 2004, Martin Hart wrote:

On Tuesday 06 Apr 2004 3:01 pm, Robert Feldt wrote:

Yeah, that’s right the “last matched group”; I don’t use that one much
apparently…

Sorry, call me thick but I don’t understand that :slight_smile:

Did Guy confirm that it was a bug, or show that it is not a bug?

--         
-         -

svg% ruby -e ‘p “abcdef”.sub(/(.).(…)/, “\+”)’
“cdef”
svg%

/(.).(…)/ and ‘+’

- match a char, followed by another char. followed by two more chars.
  remember the last two chars (and the first) 
- replace the entire match with the remembered last two chars  (which is
  the 'last matched group')

abcdef -> cdef
----        
  --      --

-a

===============================================================================

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
URL :: Solar-Terrestrial Physics Data | NCEI
TRY :: for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done
===============================================================================

normal strings are escaped thusly:

puts “\”,“\\”,“\\\”

\
\
=> nil

regexp strings have and extra level of escaping, so that you can
include literal “\1”'s in your substitution. So, they are escaped
thusly:

puts “”.sub(//,“\\”), “”.sub(//,“\\\\”),
“”.sub(//,“\\\\\\”)

\
\
=> nil

The somewhat confusing part is that a backslash in a gsub that doesn’t
translate to a substitution expression becomes a literal. So, since
there is nothing for the last backslash to escape:

puts “”.sub(//,“\”), “”.sub(//,“\\\”), “”.sub(//,“\\\\\”)

\
\
=> nil

… another reason why, as someone pointed out in a thread earlier this
month, it’s handy to only use the block form, and avoid the argument
form of (g)sub like the plague :slight_smile:

cheers,
–Mark

···

On May 10, 2004, at 11:38 AM, Benedikt Huber wrote:

On Tue, 06 Apr 2004 23:01:55 +0900, Robert Feldt wrote:

ts wrote:

svg% ruby -e ‘p “abcdef”.sub(/(.).(…)/, “\+”)’
“cdef”
svg%

Yeah, that’s right the “last matched group”; I don’t use that one much
apparently…
Perhaps I’m missing the point, but what semantic have consecutive
backslashes substituting a string ?
I feels like a bug, but probably isn’t:
irb(main):008:0> puts “abcd”.sub(‘abcd’,“\”).length
1
irb(main):014:0> puts “abcd”.sub(‘abcd’,“\”*2).length
1
irb(main):009:0> puts “abcd”.sub(‘abcd’,“\”*10).length
5

Thx a lot,
benedikt

Ara.T.Howard wrote:

Sorry, call me thick but I don’t understand that :slight_smile:

Did Guy confirm that it was a bug, or show that it is not a bug?

not_a_bug # that’s python for not a bug

svg% ruby -e ‘p “abcdef”.sub(/(.)./, “\+”)’
“acdef”
svg%

/(.)./ and ‘+’

  • match a char, followed by another char. remember the first char
  • replace the entire match with the remembered first char

abcdef → acdef

  •     -
    

svg% ruby -e ‘p “abcdef”.sub(/(.).(…)/, “\+”)’
“cdef”
svg%

/(.).(…)/ and ‘+’

  • match a char, followed by another char. followed by two more chars.
    remember the last two chars (and the first)
  • replace the entire match with the remembered last two chars (which is
    the ‘last matched group’)

abcdef → cdef

 --      --

Yes, and further: If you want to do substitutions and aren’t sure there
will be no backslash sequences in the replacement string you should use
the block form, like so:

$ ruby -e ‘p “a”.sub(“a”) {“\+”}’
“\+”

I added a note to my code review checklist to “always” use the block
form… :wink:

Sorry for wasting bandwidth on this; I should have read the docs more
closely.

/Robert

only one problem with that:

mark@imac% cat test.rb
require ‘profile’
def gsub_block_test(string, pat, str)
10000.times{ string.gsub(pat){str} }
end
def gsub_arg_test(string, pat, str)
10000.times{ string.gsub(pat,str) }
end

gsub_arg_test(“testing this out”, /[aeiou]/, “.”)
gsub_block_test(“testing this out”, /[aeiou]/, “.”)

mark@imac% ruby test.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
54.97 13.65 13.65 20000 0.68 0.68 String#gsub
44.95 24.81 11.16 2 5580.00 12405.00 Integer#times
0.48 24.93 0.12 1 120.00 120.00
Profiler__.start_profile
0.00 24.93 0.00 2 0.00 0.00
Module#method_added
0.00 24.93 0.00 1 0.00 16550.00
Object#gsub_block_test
0.00 24.93 0.00 1 0.00 8260.00
Object#gsub_arg_test
0.00 24.93 0.00 1 0.00 24830.00 #toplevel

the block version of this same test takes twice as long :slight_smile:

–Mark

···

On Apr 6, 2004, at 3:29 PM, Robert Feldt wrote:

I added a note to my code review checklist to “always” use the block
form… :wink: