String#split resets regex captures variables (Ruby 1.8.7)

Hi,

I've just spent some time on this problem until I found the solution,
and I wanted to share it/ask for opinions on this:

I've got a string on which I try to match a regexp. If it matches, I
split the string with one of the captured groups, zip the resulting
array with an array of symbols and merge with a last symbol/string array
pair representing the captured separator.

Here's some code to give you an idea:

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
  p [:first, :second].zip(n.split($2)) | [[:separator, $2]]
end

(see pastie here: http://pastie.org/2445983)

The problem is, the resulting array will contain "nil" for :separator.
Apparently, String#split will reset all regexp-related variables... I
guess it uses regexp internally, but I haven't seen anything documenting
this behavior, and I honestly believed this code would work...

So what's your take on this? Does it feel natural/logical to you?
Should I file a bug to enhance Ruby's documentation?

···

--
Posted via http://www.ruby-forum.com/.

It worked for me, you probably are using an old version of Ruby.

RUBY_VERSION # => "1.9.2"

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
  p [:first, :second].zip(n.split($2)) | [[:separator, $2]]
end
# >> [[:first, "a"], [:second, "c"], [:separator, ":"]]

Anyway, why not just do [[:first, $1], [:second, $3], [:separator, $2]]

···

On Sun, Aug 28, 2011 at 7:54 PM, Olivier Lance <bestiol@gmail.com> wrote:

Hi,

I've just spent some time on this problem until I found the solution,
and I wanted to share it/ask for opinions on this:

I've got a string on which I try to match a regexp. If it matches, I
split the string with one of the captured groups, zip the resulting
array with an array of symbols and merge with a last symbol/string array
pair representing the captured separator.

Here's some code to give you an idea:

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
p [:first, :second].zip(n.split($2)) | [[:separator, $2]]
end

(see pastie here: http://pastie.org/2445983\)

The problem is, the resulting array will contain "nil" for :separator.
Apparently, String#split will reset all regexp-related variables... I
guess it uses regexp internally, but I haven't seen anything documenting
this behavior, and I honestly believed this code would work...

So what's your take on this? Does it feel natural/logical to you?
Should I file a bug to enhance Ruby's documentation?

--
Posted via http://www.ruby-forum.com/\.

Olivier Lance wrote in post #1018967:

Hi,

I've just spent some time on this problem until I found the solution,
and I wanted to share it/ask for opinions on this:

I've got a string on which I try to match a regexp. If it matches, I
split the string with one of the captured groups, zip the resulting
array with an array of symbols and merge with a last symbol/string array
pair representing the captured separator.

Here's some code to give you an idea:

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
  p [:first, :second].zip(n.split($2)) | [[:separator, $2]]
end

(see pastie here: http://pastie.org/2445983\)

Do you actually consider that a lucid example of the problem? Bejeesus.

The problem is, the resulting array will contain "nil" for :separator.
Apparently, String#split will reset all regexp-related variables... I
guess it uses regexp internally, but I haven't seen anything documenting
this behavior, and I honestly believed this code would work...

So what's your take on this? Does it feel natural/logical to you?
Should I file a bug to enhance Ruby's documentation?

$ cat ruby.rb
str = 'abcd'

if str =~ /(c)(d)/
  p [$1, $2]
  str.split('b')
  p [$1, $2]
end

$ multiruby ruby.rb
/Users/me/.rvm/gems/ruby-1.9.2-p180@rails3tutorial/gems/ZenTest-4.5.0/lib/multiruby.rb:330:
warning: shadowing outer local variable - s
/Users/me/.rvm/gems/ruby-1.9.2-p180@rails3tutorial/gems/ZenTest-4.5.0/lib/multiruby.rb:391:
warning: shadowing outer local variable - s

VERSION = 1.8.6-p420
CMD = ~/.multiruby/install/1.8.6-p420/bin/ruby ruby.rb

["c", "d"]
[nil, nil]

RESULT = pid 3744 exit 0

VERSION = 1.8.7-p352
CMD = ~/.multiruby/install/1.8.7-p352/bin/ruby ruby.rb

["c", "d"]
[nil, nil]

RESULT = pid 3745 exit 0

VERSION = 1.9.1-p431
CMD = ~/.multiruby/install/1.9.1-p431/bin/ruby ruby.rb

["c", "d"]
["c", "d"]

RESULT = pid 3746 exit 0

VERSION = 1.9.2-p290
CMD = ~/.multiruby/install/1.9.2-p290/bin/ruby ruby.rb

["c", "d"]
["c", "d"]

RESULT = pid 3747 exit 0

TOTAL RESULT = 0 failures out of 4

Passed: 1.8.6-p420, 1.8.7-p352, 1.9.1-p431, 1.9.2-p290
Failed:
$

···

--
Posted via http://www.ruby-forum.com/\.

Hi all,

thanks for all your replies and tests, I wasn't expecting so many
answers :slight_smile:

I'm sorry, my example was indeed a bit complicated to illustrate the
issue. The night got my brain, I should have slept ^^

Anyway, thanks for highlighting the different behaviors of those
variables...
I'm actually working on a code I haven't written, so I left it as
untouched as possible (as it was working) and patched with what I
needed.
I'll have a closer look to refactor it and use smarter/safer things.

As I've said in my topic's subject, I'm using Ruby 1.8.7, so I think I
cannot use named groups.
What would you suggest instead of using $1 etc.?

Should I store $~ and then use captures[1]...?

Thanks for your help
Olivier

···

--
Posted via http://www.ruby-forum.com/.

I think it makes sense for $1, $2 etc to change any time a regex match
is performed anywhere in Ruby, it might be even kind of useful in some
perverse scenarios. That said, you shouldn't use these if you can,
it's a Perl relic (but I admit, sometimes it's the easiest way).

There are two solutions: either bind their values to another variables:

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
  sep = $2
  p [:first, :second].zip(n.split(sep)) | [[:separator, sep]]
end

or use #match like a man:

n = "a:c"
if mtc = n.match(/([a-z])([.:])([a-z])/)
  p [:first, :second].zip(n.split(mtc[2])) | [[:separator, mtc[2]]]
end

-- Matma Rex

···

2011/8/29 7stud -- <bbxx789_05ss@yahoo.com>:

Olivier Lance wrote in post #1018967:

Hi,

I've just spent some time on this problem until I found the solution,
and I wanted to share it/ask for opinions on this:

I've got a string on which I try to match a regexp. If it matches, I
split the string with one of the captured groups, zip the resulting
array with an array of symbols and merge with a last symbol/string array
pair representing the captured separator.

Here's some code to give you an idea:

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
p [:first, :second].zip(n.split($2)) | [[:separator, $2]]
end

(see pastie here: http://pastie.org/2445983\)

Do you actually consider that a lucid example of the problem? Bejeesus.

The problem is, the resulting array will contain "nil" for :separator.
Apparently, String#split will reset all regexp-related variables... I
guess it uses regexp internally, but I haven't seen anything documenting
this behavior, and I honestly believed this code would work...

So what's your take on this? Does it feel natural/logical to you?
Should I file a bug to enhance Ruby's documentation?

$ cat ruby.rb
str = 'abcd'

if str =~ /(c)(d)/
p [$1, $2]
str.split('b')
p [$1, $2]
end

$ multiruby ruby.rb
/Users/me/.rvm/gems/ruby-1.9.2-p180@rails3tutorial/gems/ZenTest-4.5.0/lib/multiruby.rb:330:
warning: shadowing outer local variable - s
/Users/me/.rvm/gems/ruby-1.9.2-p180@rails3tutorial/gems/ZenTest-4.5.0/lib/multiruby.rb:391:
warning: shadowing outer local variable - s

VERSION = 1.8.6-p420
CMD = ~/.multiruby/install/1.8.6-p420/bin/ruby ruby.rb

["c", "d"]
[nil, nil]

RESULT = pid 3744 exit 0

VERSION = 1.8.7-p352
CMD = ~/.multiruby/install/1.8.7-p352/bin/ruby ruby.rb

["c", "d"]
[nil, nil]

RESULT = pid 3745 exit 0

VERSION = 1.9.1-p431
CMD = ~/.multiruby/install/1.9.1-p431/bin/ruby ruby.rb

["c", "d"]
["c", "d"]

RESULT = pid 3746 exit 0

VERSION = 1.9.2-p290
CMD = ~/.multiruby/install/1.9.2-p290/bin/ruby ruby.rb

["c", "d"]
["c", "d"]

RESULT = pid 3747 exit 0

TOTAL RESULT = 0 failures out of 4

Passed: 1.8.6-p420, 1.8.7-p352, 1.9.1-p431, 1.9.2-p290
Failed:
$

--
Posted via http://www.ruby-forum.com/\.

thanks for all your replies and tests, I wasn't expecting so many
answers :slight_smile:

I am doing the opposite experience over at c.l.scheme - zero replies.
So you're lucky to be on this side of the fence.

I'm sorry, my example was indeed a bit complicated to illustrate the
issue. The night got my brain, I should have slept ^^

:slight_smile:

Anyway, thanks for highlighting the different behaviors of those
variables...
I'm actually working on a code I haven't written, so I left it as
untouched as possible (as it was working) and patched with what I
needed.
I'll have a closer look to refactor it and use smarter/safer things.

As I've said in my topic's subject, I'm using Ruby 1.8.7, so I think I
cannot use named groups.
What would you suggest instead of using $1 etc.?

Should I store $~ and then use captures[1]...?

Depends on the number of groups. If I want to be sure values are not
changed I typically do something like

if /../ =~ str
  name = $1
  age = $2.to_i
...
  # use name and age
end

Or, if there are many groups you could do

if /../ =~ str
  all, name, age, more, variables, here = *$~
...
  # use name, age, more, variables, here
end

Kind regards

robert

···

On Mon, Aug 29, 2011 at 4:30 PM, Olivier Lance <bestiol@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I think it makes sense for $1, $2 etc to change any time a regex match
is performed anywhere in Ruby, it might be even kind of useful in some
perverse scenarios. That said, you shouldn't use these if you can,
it's a Perl relic (but I admit, sometimes it's the easiest way).

I don't, that means that if you call some method, then it can implicitly and
unintentionally change variables you've set, and the only way to know it is
to go digging through source code. You could empirically experiment with it,
but say the regex was conditionally evaluated, then you'd not know it was
used unless conditions were right, meaning it might not pop up until you
were in production.

There are two solutions: either bind their values to another variables:

n = "a:c"
if n =~ /([a-z])([.:])([a-z])/
  sep = $2
p [:first, :second].zip(n.split(sep)) | [[:separator, sep]]
end

or use #match like a man:

n = "a:c"
if mtc = n.match(/([a-z])([.:])([a-z])/)
p [:first, :second].zip(n.split(mtc[2])) | [[:separator, mtc[2]]]
end

Another option, for more recent rubies, is to use named capture groups

if "a:c" =~ /(?<first>[a-z])(?<separator>[.:])(?<second>[a-z])/
  p $~.names.map(&:intern).zip($~.captures)
end

# >> [[:first, "a"], [:separator, ":"], [:second, "c"]]

···

2011/8/29 Bartosz Dziewoński <matma.rex@gmail.com>

I am doing the opposite experience over at c.l.scheme - zero replies.
So you're lucky to be on this side of the fence.

Well good luck then ^^

Depends on the number of groups. If I want to be sure values are not
changed I typically do something like

if /../ =~ str
  name = $1
  age = $2.to_i
...
  # use name and age
end

Or, if there are many groups you could do

if /../ =~ str
  all, name, age, more, variables, here = *$~
...
  # use name, age, more, variables, here
end

All right, that's more or less what I've done, I'll improve on that :slight_smile:

Thanks again to everybody!

···

--
Posted via http://www.ruby-forum.com/\.

And actually the situation is much more friendly: $1 etc. are local to
stack frame...

10:57:55 ~$ cat -n x.rb
     1 def x
     2 /(a+)/ =~ "aabb" and $1
     3 end
     4
     5 str = 'abcd'
     6
     7 if str =~ /(c)(d)/
     8 p [$1, $2]
     9 p x
    10 p [$1, $2]
    11 end
10:58:02 ~$ allruby x.rb
CYGWIN_NT-5.1 padrklemme2 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin

···

On Mon, Aug 29, 2011 at 10:49 AM, Josh Cheek <josh.cheek@gmail.com> wrote:

2011/8/29 Bartosz Dziewoński <matma.rex@gmail.com>

I think it makes sense for $1, $2 etc to change any time a regex match
is performed anywhere in Ruby, it might be even kind of useful in some
perverse scenarios. That said, you shouldn't use these if you can,
it's a Perl relic (but I admit, sometimes it's the easiest way).

I don't, that means that if you call some method, then it can implicitly and
unintentionally change variables you've set, and the only way to know it is
to go digging through source code. You could empirically experiment with it,
but say the regex was conditionally evaluated, then you'd not know it was
used unless conditions were right, meaning it might not pop up until you
were in production.

========================================
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
["c", "d"]
"aa"
["c", "d"]

ruby 1.9.2p290 (2011-07-09 revision 32553) [i386-cygwin]
["c", "d"]
"aa"
["c", "d"]

jruby 1.6.3 (ruby-1.8.7-p330) (2011-07-07 965162f) (Java HotSpot(TM)
Client VM 1.6.0_26) [Windows XP-x86-java]
["c", "d"]
"aa"
["c", "d"]

jruby 1.6.3 (ruby-1.9.2-p136) (2011-07-07 965162f) (Java HotSpot(TM)
Client VM 1.6.0_26) [Windows XP-x86-java]
["c", "d"]
"aa"
["c", "d"]

10:58:11 ~$

... and local to thread:

10:59:54 ~$ cat -n x.rb
     1 def x
     2 /(a+)/ =~ "aabb" and $1
     3 end
     4
     5 str = 'abcd'
     6
     7 if str =~ /(c)(d)/
     8 p [$1, $2]
     9 Thread.new { /(c+)/ =~ "ccccc" and p $1 }.join
    10 p [$1, $2]
    11 end
10:59:56 ~$ allruby x.rb
CYGWIN_NT-5.1 padrklemme2 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin

ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
["c", "d"]
"ccccc"
["c", "d"]

ruby 1.9.2p290 (2011-07-09 revision 32553) [i386-cygwin]
["c", "d"]
"ccccc"
["c", "d"]

jruby 1.6.3 (ruby-1.8.7-p330) (2011-07-07 965162f) (Java HotSpot(TM)
Client VM 1.6.0_26) [Windows XP-x86-java]
["c", "d"]
"ccccc"
["c", "d"]

jruby 1.6.3 (ruby-1.9.2-p136) (2011-07-07 965162f) (Java HotSpot(TM)
Client VM 1.6.0_26) [Windows XP-x86-java]
["c", "d"]
"ccccc"
["c", "d"]

11:00:04 ~$

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I think it makes sense for $1, $2 etc to change any time a regex match
is performed anywhere in Ruby, it might be even kind of useful in some
perverse scenarios. That said, you shouldn't use these if you can,
it's a Perl relic (but I admit, sometimes it's the easiest way).

I don't, that means that if you call some method, then it can implicitly and
unintentionally change variables you've set, and the only way to know it is
to go digging through source code. You could empirically experiment with it,
but say the regex was conditionally evaluated, then you'd not know it was
used unless conditions were right, meaning it might not pop up until you
were in production.

That's precisely why you shouldn't use them in anything except for
scripts written in 20 minutes.

···

2011/8/29 Josh Cheek <josh.cheek@gmail.com>:

2011/8/29 Bartosz Dziewoński <matma.rex@gmail.com>

2011/8/29 Robert Klemme <shortcutter@googlemail.com>:

And actually the situation is much more friendly: $1 etc. are local to
stack frame...

... and local to thread:

Fun, fun, fun. Semi-local global variables. I had no idea it's that crazy.

-- Matma Rex