Why is "does" missing from this sub!-stitution?

[~/Desktop] dave$ cat foobar.rb ; foobar.rb
#!/usr/local/bin/ruby

get a Stanislaw Lem quote containing underscores and backspaces, then

replace the word following the deleted underscores and backspaces

with the same word surrounded by emphatic arrow glyphs

fortune -m Nillity.each do |x|
x.sub!( /(_+\010+)([a-zA-Z]+)/, “–>#{$2}<–” )
print x
end

This produces:

(science)
%
Everyone knows that dragons don’t exist. But while this simplistic
formulation may satisfy the layman, it does not suffice for the
scientific
mind. The School of Higher Neantical Nillity is in fact wholly
unconcerned
with what --><-- exist. Indeed, the banality of existence has been
so amply demonstrated, there is no need for us to discuss it any further
here. The brilliant Cerebron, attacking the problem analytically,
discovered three distinct kinds of dragon: the mythical, the chimerical,
and the purely hypothetical. They were all, one might say, nonexistent,
but each nonexisted in an entirely different way …
– Stanislaw Lem, “Cyberiad”
%

The word “does” should appear between the --><-- glyphs:

–>does<–

but it doesn’t. What am I overlooking? I’ll admit to being a Perl
convert, so maybe I’m trying to do something non-Ruby?

TIA,
dave

···


[“10110100101101000101000000100010100001100110111010010110001001100000010011000010011101000000010011110010110011100001011010100110001101101001000010010000001001101100011011110110110011100001011010100110001101100000001010110110100001101100011001110100110001101111011010110110010100001100001010100110001001101000011001001110000001000100101010000110000011101001011000100110110011100011010000000100100100101000001010010000000101100010111000101110000011100101110011110100111101000001011011110110101101101010011000001110100001101110011010100110011101001011011010000110110001100111010011000110111101101011011011110100001001101100011011110110110011100001011010100110001101101111010001010000010001001001101010100110001011100000010010000010011101101111011000101110000101101010011001001110000001000100101010101110010001101001111000000100000100101000011011000110110101101010011001001110”].pack(“b*”)

The second argument is evaluated only once (think about it), on method
call, when $2 == nil.

use
x.sub!( /(_+\010+)([a-zA-Z]+)/ ) { “–>#{$2}<–” }

···

On Fri, May 23, 2003 at 05:23:47AM +0900, Dave Oshel wrote:

[~/Desktop] dave$ cat foobar.rb ; foobar.rb
#!/usr/local/bin/ruby

get a Stanislaw Lem quote containing underscores and backspaces, then

replace the word following the deleted underscores and backspaces

with the same word surrounded by emphatic arrow glyphs

fortune -m Nillity.each do |x|
x.sub!( /(_+\010+)([a-zA-Z]+)/, “–>#{$2}<–” )
print x
end


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

*** PUBLIC flooding detected from erikyyy
THAT’s an erik, pholx… :wink:
– Seen on #LinuxGER

Hi –

[~/Desktop] dave$ cat foobar.rb ; foobar.rb
#!/usr/local/bin/ruby

get a Stanislaw Lem quote containing underscores and backspaces, then

replace the word following the deleted underscores and backspaces

with the same word surrounded by emphatic arrow glyphs

fortune -m Nillity.each do |x|
x.sub!( /(_+\010+)([a-zA-Z]+)/, “–>#{$2}<–” )
print x
end
The word “does” should appear between the --><-- glyphs:

–>does<–

but it doesn’t. What am I overlooking? I’ll admit to being a Perl
convert, so maybe I’m trying to do something non-Ruby?

You’re just using an uninitialized variable – you can do that in Ruby
as well as other languages :slight_smile:

But anyway… to do the backreferencing you want, you have to use a
different notation:

irb(main):001:0> “abcde”.sub(/.(.)/,“\1”)
=> “bcde”

or ‘\1’ (depending which version of quote marks you need).

David

···

On Fri, 23 May 2003, Dave Oshel wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

fortune -m Nillity.each do |x|
x.sub!( /(_+\010+)([a-zA-Z]+)/, “–>#{$2}<–” )
print x
end

The second argument is evaluated only once (think about it), on
method
call, when $2 == nil.

use
x.sub!( /(_+\010+)([a-zA-Z]+)/ ) { “–>#{$2}<–” }

I’ve thought about it and still don’t get it.

···

Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.

In article 20030522202818.GA24497@student.ei.uni-stuttgart.de,

[~/Desktop] dave$ cat foobar.rb ; foobar.rb
#!/usr/local/bin/ruby

get a Stanislaw Lem quote containing underscores and backspaces, then

replace the word following the deleted underscores and backspaces

with the same word surrounded by emphatic arrow glyphs

fortune -m Nillity.each do |x|
x.sub!( /(_+\010+)([a-zA-Z]+)/, “–>#{$2}<–” )
print x
end

The second argument is evaluated only once (think about it), on method
call, when $2 == nil.

use
x.sub!( /(_+\010+)([a-zA-Z]+)/ ) { “–>#{$2}<–” }

Maybe so, but this works as expected:

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “” + ‘\2’ + “” )
print x
}

Maybe it’s a bug … or a feature … or a bug … or a bug AND a
feature … :wink:

···

Mauricio Fernández batsman.geo@yahoo.com wrote:

On Fri, May 23, 2003 at 05:23:47AM +0900, Dave Oshel wrote:


[“10110100101101000101000000100010100001100110111010010110001001100000010011000010011101000000010011110010110011100001011010100110001101101001000010010000001001101100011011110110110011100001011010100110001101100000001010110110100001101100011001110100110001101111011010110110010100001100001010100110001001101000011001001110000001000100101010000110000011101001011000100110110011100011010000000100100100101000001010010000000101100010111000101110000011100101110011110100111101000001011011110110101101101010011000001110100001101110011010100110011101001011011010000110110001100111010011000110111101101011011011110100001001101100011011110110110011100001011010100110001101101111010001010000010001001001101010100110001011100000010010000010011101101111011000101110000101101010011001001110000001000100101010101110010001101001111000000100000100101000011011000110110101101010011001001110”].pack(“b*”)

x.sub!( /(_+\010+)([a-zA-Z]+)/ ) { “–>#{$2}<–” }

I’ve thought about it and still don’t get it.

Never mind, I get it now. I was thinking How Perl Does It.

···

Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.

Hi –

In article 20030522202818.GA24497@student.ei.uni-stuttgart.de,

use
x.sub!( /(_+\010+)([a-zA-Z]+)/ ) { “–>#{$2}<–” }

Maybe so, but this works as expected:

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “” + ‘\2’ + “” )
print x
}

Maybe it’s a bug … or a feature … or a bug … or a bug AND a
feature … :wink:

Here’s an interesting illustration of the persistence of $1:

str = “Hello there.”
=> “Hello there.”
str2 = “A bunch of words.”
=> “A bunch of words.”
if /(Hello) (there)/.match(str); str2.sub!(/(A)/, $1); end
=> “Hello bunch of words.”
str2
=> “Hello bunch of words.”
$1
=> “A”

So… it all depends what you want to do. (I haven’t actually thought
of a case where I’d want to do the above, but still :slight_smile:

David

···

On Fri, 23 May 2003, Dave Oshel wrote:

Mauricio Fernández batsman.geo@yahoo.com wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

It’s definitely a feature and not a bug. $2 behaves like a regular variable,
and as you’ve seen, you have to use normal string interpolation #{…} to
insert it. If $2==“foo” and you write the string “#{$2}” then it is
interpolated as “foo”, and then the resulting string is what you
are passing as a parameter to ‘sub’. If $2 is nil, then you pass “”.
So sub has no idea where to substitute any value.

However the string ‘\2’ is treated as a special case by sub itself. When it
sees that sequence, instead of inserting \ and 2 into the result, it inserts
the value of $2.

It would be very strange if there were strings with some sort of ‘deferred
interpolation’, i.e. the string contained the sequence ‘#{expr}’ but expr is
re-evaluated every time you read the string. Ruby is flexible enough that
you could implement that if you wanted though!

Regards,

Brian.

···

On Fri, May 23, 2003 at 08:44:16AM +0900, Dave Oshel wrote:

Maybe so, but this works as expected:

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “” + ‘\2’ + “” )
print x
}

Maybe it’s a bug … or a feature … or a bug … or a bug AND a
feature … :wink:

In article 20030523080548.GA4196@uk.tiscali.com,

Maybe so, but this works as expected:

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “” + ‘\2’ + “” )
print x
}

Maybe it’s a bug … or a feature … or a bug … or a bug AND a
feature … :wink:

It’s definitely a feature and not a bug. $2 behaves like a regular variable,
and as you’ve seen, you have to use normal string interpolation #{…} to
insert it. If $2==“foo” and you write the string “#{$2}” then it is
interpolated as “foo”, and then the resulting string is what you
are passing as a parameter to ‘sub’. If $2 is nil, then you pass “”.
So sub has no idea where to substitute any value.

However the string ‘\2’ is treated as a special case by sub itself. When it
sees that sequence, instead of inserting \ and 2 into the result, it inserts
the value of $2.

It would be very strange if there were strings with some sort of ‘deferred
interpolation’, i.e. the string contained the sequence ‘#{expr}’ but expr is
re-evaluated every time you read the string. Ruby is flexible enough that
you could implement that if you wanted though!

Regards,

Brian.

There’s too much computer science going on here. I don’t actually care
WHY case 2 doesn’t work, as long as I know case 1 DOES work. Thanks for
taking the trouble to suggest causes, though. It does help.

case 1 – the way that worked, after several hours of headscratching

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “” + ‘\2’ + “” )
print x
}

=begin
(science)
%
Everyone knows that dragons don’t exist. But while this simplistic
formulation may satisfy the layman, it does not suffice for the
scientific
mind. The School of Higher Neantical Nillity is in fact wholly
unconcerned
with what does exist. Indeed, the banality of existence has been
so amply demonstrated, there is no need for us to discuss it any further
here. The brilliant Cerebron, attacking the problem analytically,
discovered three distinct kinds of dragon: the mythical, the chimerical,
and the purely hypothetical. They were all, one might say, nonexistent,
but each nonexisted in an entirely different way …
– Stanislaw Lem, “Cyberiad”
%
=end

case 2 – the way I assumed it would it work

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “#{$2}” )
print x
}

=begin
(science)
%
Everyone knows that dragons don’t exist. But while this simplistic
formulation may satisfy the layman, it does not suffice for the
scientific
mind. The School of Higher Neantical Nillity is in fact wholly
unconcerned
with what exist. Indeed, the banality of existence has been
so amply demonstrated, there is no need for us to discuss it any further
here. The brilliant Cerebron, attacking the problem analytically,
discovered three distinct kinds of dragon: the mythical, the chimerical,
and the purely hypothetical. They were all, one might say, nonexistent,
but each nonexisted in an entirely different way …
– Stanislaw Lem, “Cyberiad”
%
=end

···

Brian Candler B.Candler@pobox.com wrote:

On Fri, May 23, 2003 at 08:44:16AM +0900, Dave Oshel wrote:


[“10110100101101000101000000100010100001100110111010010110001001100000010011000010011101000000010011110010110011100001011010100110001101101001000010010000001001101100011011110110110011100001011010100110001101100000001010110110100001101100011001110100110001101111011010110110010100001100001010100110001001101000011001001110000001000100101010000110000011101001011000100110110011100011010000000100100100101000001010010000000101100010111000101110000011100101110011110100111101000001011011110110101101101010011000001110100001101110011010100110011101001011011010000110110001100111010011000110111101101011011011110100001001101100011011110110110011100001011010100110001101101111010001010000010001001001101010100110001011100000010010000010011101101111011000101110000101101010011001001110000001000100101010101110010001101001111000000100000100101000011011000110110101101010011001001110”].pack(“b*”)

Hi –

Maybe so, but this works as expected:

/sw/bin/fortune -m Nillity.each { |x|
x.sub!( /(_+\010+)([a-zA-Z]+\b)/, “” + ‘\2’ + “” )
print x
}

Maybe it’s a bug … or a feature … or a bug … or a bug AND a
feature … :wink:

It’s definitely a feature and not a bug. $2 behaves like a regular variable,
and as you’ve seen, you have to use normal string interpolation #{…} to
insert it. If $2==“foo” and you write the string “#{$2}” then it is
interpolated as “foo”, and then the resulting string is what you
are passing as a parameter to ‘sub’. If $2 is nil, then you pass “”.
So sub has no idea where to substitute any value.

It’s interesting, though, that $2 hasn’t been (re)evaluated by the
time the second argument to #sub is evaluated. Compare with:

irb(main):001:0> “abcdef”.sub(s=“d”,s.upcase)
=> “abcDef”

My first thought by way of explaining the $2 behavior was: there’s
been no assignment to $2, because the regex has been compiled but the
match hasn’t happened yet. But then \2 wouldn’t be meaningful… So
I do actually start to wonder why the previous $2 (nil or otherwise)
persists so long.

David

···

On Fri, 23 May 2003, Brian Candler wrote:

On Fri, May 23, 2003 at 08:44:16AM +0900, Dave Oshel wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Well, you did suggest that it was a bug - which is why I tried to explain
why it wasn’t. If you don’t care to understand why, then I’m afraid that’s
your loss :slight_smile:

Regards,

Brian.

···

On Sat, May 24, 2003 at 04:06:56AM +0900, Dave Oshel wrote:

There’s too much computer science going on here. I don’t actually care
WHY case 2 doesn’t work, as long as I know case 1 DOES work. Thanks for
taking the trouble to suggest causes, though. It does help.

dblack@superlink.net wrote:

My first thought by way of explaining the $2 behavior was: there’s
been no assignment to $2, because the regex has been compiled but the
match hasn’t happened yet. But then \2 wouldn’t be meaningful… So
I do actually start to wonder why the previous $2 (nil or otherwise)
persists so long.

\2 is substituted by .sub (ie, it gets passed in as part of the string,
and .sub looks for and substitutes for it). #$2 is substituted during
the initial parse of the string (before .sub is even called)

irb(main):008:0> “abcdef”.sub(/(…)/, “#$1”)
=> “cdef”
irb(main):009:0> “abcdef”.sub(/(…)/, “#$1”)
=> “abcdef”

Cheers

Dave

It’s interesting, though, that $2 hasn’t been (re)evaluated by the
time the second argument to #sub is evaluated. Compare with:

irb(main):001:0> “abcdef”.sub(s=“d”,s.upcase)
=> “abcDef”

I don’t see any magic there:

  1. The first parameter is evaluated. It evaluates to “d”, and has the
    side-effect of setting s.
  2. The second parameter is evaluated. It evaluates to “D”
  3. The method call is made:
    “abcdef”.sub(“d”,“D”)

My first thought by way of explaining the $2 behavior was: there’s
been no assignment to $2, because the regex has been compiled but the
match hasn’t happened yet.

The main reason is expression evaluation sequencing: the arguments to gsub!
are each evaluated individually, and then the results of those evaluations
are passed to gsub!

So “#{$1}” is evaluated, which generates a new string object
containing “”, and a reference to that object is passed in to gsub!
as its second parameter, as part of the method call.

But then \2 wouldn’t be meaningful…

\2 is a magic sequence to gsub. What it does is:

  • match on the regular expression
  • for each match, replace the match with the dest string, but first
    replace \1, \2 etc in the dest string with match[1], match[2] etc.

gsub could have been written to interpret a literal “$1” or “$2” in the same
way, but then if you did actually want to insert a “$” there would be more
escaping rules to be learned.

Regards,

Brian.

···

On Sat, May 24, 2003 at 07:31:51AM +0900, dblack@superlink.net wrote:

Hi –

My first thought by way of explaining the $2 behavior was: there’s
been no assignment to $2, because the regex has been compiled but the
match hasn’t happened yet. But then \2 wouldn’t be meaningful… So
I do actually start to wonder why the previous $2 (nil or otherwise)
persists so long.

\2 is substituted by .sub (ie, it gets passed in as part of the string,
and .sub looks for and substitutes for it). #$2 is substituted during
the initial parse of the string (before .sub is even called)

irb(main):008:0> “abcdef”.sub(/(…)/, “#$1”)
=> “cdef”
irb(main):009:0> “abcdef”.sub(/(…)/, “#$1”)
=> “abcdef”

I’m with you on the ‘what’, but I’m still curious about the ‘why’.
It’s true that the replacement string gets parsed before sub is
called. But I think “abcdef” gets matched against the regex even
before that parse happens. Which means… if the $1,$2 variables
reflect the results of the most recent match, then one could argue
that by the time the replacement string is parsed, a match has just
taken place and $1,$2 should reflect the results of that match.

So the order of things seems a little odd:

/(…)/.match(“abcdef”) # match operation, $1 set
“abcdef”.sub(/(…)/, “#$1”) # match operation, $1 not set
then string parsed,
then $1 is set

David

···

On Sat, 24 May 2003, Dave Thomas wrote:

dblack@superlink.net wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

dblack@superlink.net wrote:

I’m with you on the ‘what’, but I’m still curious about the ‘why’.
It’s true that the replacement string gets parsed before sub is
called. But I think “abcdef” gets matched against the regex even
before that parse happens.

But it doesn’t. Think about it this way

def fred(a, b)
$x = 1
puts a
puts b
end

$x = 99
fred(11, $x)
fred(11, $x)

#=>
11
99
11
1

The parameters to a method are evaluated at the point of call: the
method can’t affect the value of its parameters.

Which means… if the $1,$2 variables

reflect the results of the most recent match, then one could argue
that by the time the replacement string is parsed, a match has just
taken place and $1,$2 should reflect the results of that match.

.sub is just a method, so it can’t affect the value of its second parameter.

Cheers

Dave

In the second case, /(…)/ simply creates a Regexp object, like
Regexp.new(“(…)”). It does not match it against anything.

This regexp is passed as a parameter to sub; it is inside sub that the
regexp is actually used.

You could write it like this:

class String
def mysub(regexp, newstr)
if regexp =~ self
return $` + newstr + $’
end
self
end
end
“abcdefabcdef”.mysub(/de/,‘zz’) #>> “abzzefabcdef”

(except to get the full functionality, after doing the match you’d replace
'' ‘1’ in newstr with the contents of $1, '' ‘2’ with $2, etc)

Regards,

Brian.

···

On Sat, May 24, 2003 at 10:23:22AM +0900, dblack@superlink.net wrote:

So the order of things seems a little odd:

/(…)/.match(“abcdef”) # match operation, $1 set
“abcdef”.sub(/(…)/, “#$1”) # match operation, $1 not set
then string parsed,
then $1 is set

Hi –

···

On Sat, 24 May 2003, Dave Thomas wrote:

dblack@superlink.net wrote:

I’m with you on the ‘what’, but I’m still curious about the ‘why’.
It’s true that the replacement string gets parsed before sub is
called. But I think “abcdef” gets matched against the regex even
before that parse happens.

But it doesn’t. Think about it this way

def fred(a, b)
$x = 1
puts a
puts b
end

$x = 99
fred(11, $x)
fred(11, $x)

#=>
11
99
11
1

The parameters to a method are evaluated at the point of call: the
method can’t affect the value of its parameters.

I’m hip to this part (see ruby-talk:70238 for proof :slight_smile: It’s the \1
thing that I’ve managed to garble. I’ve been thinking: if $1 hasn’t
been reassigned yet, then how can \1 mean anything? But the answer, I
now see, is: it means \1, which gets passed literally to the method,
which then knows what to do with it.

David


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

In article
Pine.LNX.4.44.0305240751280.28889-100000@candle.superlink.net,

···

dblack@superlink.net wrote:

Hi –

On Sat, 24 May 2003, Dave Thomas wrote:

dblack@superlink.net wrote:

I’m with you on the ‘what’, but I’m still curious about the ‘why’.
It’s true that the replacement string gets parsed before sub is
called. But I think “abcdef” gets matched against the regex even
before that parse happens.

But it doesn’t. Think about it this way

def fred(a, b)
$x = 1
puts a
puts b
end

$x = 99
fred(11, $x)
fred(11, $x)

#=>
11
99
11
1

The parameters to a method are evaluated at the point of call: the
method can’t affect the value of its parameters.

I’m hip to this part (see ruby-talk:70238 for proof :slight_smile: It’s the \1
thing that I’ve managed to garble. I’ve been thinking: if $1 hasn’t
been reassigned yet, then how can \1 mean anything? But the answer, I
now see, is: it means \1, which gets passed literally to the method,
which then knows what to do with it.

Ok, I sort of understand that. But the corresponding Perl syntax seems
more obvious, to my aching head.

Is there a Ruby gotchas page, somewhere on the web?


[“10110100101101000101000000100010100001100110111010010110001001100000010011000010011101000000010011110010110011100001011010100110001101101001000010010000001001101100011011110110110011100001011010100110001101100000001010110110100001101100011001110100110001101111011010110110010100001100001010100110001001101000011001001110000001000100101010000110000011101001011000100110110011100011010000000100100100101000001010010000000101100010111000101110000011100101110011110100111101000001011011110110101101101010011000001110100001101110011010100110011101001011011010000110110001100111010011000110111101101011011011110100001001101100011011110110110011100001011010100110001101101111010001010000010001001001101010100110001011100000010010000010011101101111011000101110000101101010011001001110000001000100101010101110010001101001111000000100000100101000011011000110110101101010011001001110”].pack(“b*”)

Hi –

In article
Pine.LNX.4.44.0305240751280.28889-100000@candle.superlink.net,

I’m hip to this part (see ruby-talk:70238 for proof :slight_smile: It’s the \1
thing that I’ve managed to garble. I’ve been thinking: if $1 hasn’t
been reassigned yet, then how can \1 mean anything? But the answer, I
now see, is: it means \1, which gets passed literally to the method,
which then knows what to do with it.

Ok, I sort of understand that. But the corresponding Perl syntax seems
more obvious, to my aching head.

It’s only distantly corresponding, though; it’s not really the same
syntax, just coincidence that they both happen to be dealing with
substitution.

Is there a Ruby gotchas page, somewhere on the web?

(I wouldn’t put this example on it, but then again that wasn’t your
question :slight_smile: There are some “Ruby-for-Perl-programmers” resources,
the most in-depth one I know of being Hal Fulton’s:
http://hypermetrics.com/rubyhacker/rubyperl/slide1.html. If you
search for “ruby for perl programmers” on Google you’ll find more.

David

···

On Mon, 26 May 2003, Dave Oshel wrote:

dblack@superlink.net wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav