String gsub to replace named capture groups

I have a regexp that looks like this (simplified a bit):

     pattern = /
     (?<lq>
       %7B # single left brace, followed by
       %(?:25|(?![A-Fa-f0-9]{2})) # plaintext '%' or encoded as %25
     )
     > # or
     (?<rq>
       %(?:25|(?![A-Fa-f0-9]{2})) # plaintext '%' or encoded as %25
       %7D # followed by right brace
     )
     /x

(I know this looks similar to CGI.unescape, but there are special cases.)

I want to use this to replace the captured values in a string. Using gsub with a hash seems like the closest thing, but I actually want to replace based on the capture group names, not the captured values.

     string = "%7B% test %%7D"
     replacements = {'\k<lq>' => '{{', '\k<rq>' => '}}'}
     expected = "{% string %}"

     string.gsub(pattern, replacements)

Is it possible?

Andrew

Oops: replacements = {'\k<lq>' => '{%', '\k<rq>' => '%}'}

Andrew Vit

···

On 14-04-15, 16:28, Andrew Vit wrote:

     replacements = {'\k<lq>' => '{{', '\k<rq>' => '}}'}

> expected = "{% string %}"

You could take advantage that the match sets the (thread local) $~ (also known as $LAST_MATCH_INFO if you use the English module), so this seems to work.

#!/usr/bin/env ruby

require 'English'

pattern = /
(?<lq>
%7B # single left brace, followed by
%(?:25|(?![A-Fa-f0-9]{2})) # plaintext '%' or encoded as %25
)

                           # or

(?<rq>
%(?:25|(?![A-Fa-f0-9]{2})) # plaintext '%' or encoded as %25
%7D # followed by right brace
)
/x

string = "%7B% test %%7D"
replacements = { 'lq' => '{%', 'rq' => '%}' }

puts string.gsub(pattern) {
  matched_name = $LAST_MATCH_INFO.names.find { |n| $LAST_MATCH_INFO[n] }
  replacements[matched_name]
}

produces:

~/tmp ∙ ruby try.rb
{% test %}

Hope this helps,

Mike

···

On Apr 15, 2014, at 7:32 PM, Andrew Vit <andrew@avit.ca> wrote:

On 14-04-15, 16:28, Andrew Vit wrote:

    replacements = {'\k<lq>' => '{{', '\k<rq>' => '}}'}

> expected = "{% string %}"

Oops: replacements = {'\k<lq>' => '{%', '\k<rq>' => '%}'}

Andrew Vit

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

If I understand the problem correctly I believe you can solve it passing a
block to gsub.

You can do that with regular captures indeed:

    str.gsub(regexp) do |_|
       if $1
        # the first group matched
      else
        # the second group matched
      end
    end

That is based on the fact that groups are strictly numbered left to right
as their open parens appear in the regexp, and if the group didn't match as
it may happen in an alternation, then it evaluates to nil.

With named captures you'd check $~[:lq] and $~[:rq] instead, same principle
re nil.

+1000 ruby points!

I was actually just arriving at a very similar solution myself:

     replacements = { 'lq' => '{%', 'rq' => '%}' }

     string.gsub(pattern) {
       i = $~.captures.index { |i| !i.nil? }
       k = $~.names[i]
       replacements[k]
     }

(I wonder why there is no better syntax for the MatchData#find...)

My previous attempt used a case statement instead of the lookup hash. I'm not sure which is more performant yet:

     string.gsub(pattern) {
       case
       when $~['ll'] then '{{'
       when $~['rr'] then '}}'
       when $~['lq'] then '{%'
       when $~['rq'] then '%}'
       when $~['sp'] then ' '
       end
     }

Thanks!
Andrew Vit

···

On 14-04-15, 18:01, Mike Stok wrote:

puts string.gsub(pattern) {
   matched_name = $LAST_MATCH_INFO.names.find { |n| $LAST_MATCH_INFO[n] }
   replacements[matched_name]
}

puts string.gsub(pattern) {
   matched_name = $LAST_MATCH_INFO.names.find { |n| $LAST_MATCH_INFO[n] }
   replacements[matched_name]
}

+1000 ruby points!

I was actually just arriving at a very similar solution myself:

    replacements = { 'lq' => '{%', 'rq' => '%}' }

    string.gsub(pattern) {
      i = $~.captures.index { |i| !i.nil? }
      k = $~.names[i]
      replacements[k]
    }

(I wonder why there is no better syntax for the MatchData#find...)

Just a caveat: this approach works only because match of your
capturing extends to the whole expression match. It will break if you
have text matched before or after the group which should stay as is.
If that is the case you would have to either use lookaround if
possible OR introduce groups for the prefix and / or suffix and use
them to construct the replacement.

If there was only one match then you could use String# for that, e.g.

irb(main):001:0> s="foobar"
=> "foobar"
irb(main):002:0> s[/fo+(b)/, 1]="X"
=> "X"
irb(main):003:0> s
=> "fooXar"

I guess this does not apply in your case.

My previous attempt used a case statement instead of the lookup hash. I'm
not sure which is more performant yet:

    string.gsub(pattern) {
      case
      when $~['ll'] then '{{'
      when $~['rr'] then '}}'
      when $~['lq'] then '{%'
      when $~['rq'] then '%}'
      when $~['sp'] then ' '
      end
    }

You'd have to benchmark. This is fairly easy with module Benchmark. :slight_smile:

Kind regards

robert

···

On Wed, Apr 16, 2014 at 3:43 AM, Andrew Vit <andrew@avit.ca> wrote:

On 14-04-15, 18:01, Mike Stok wrote:

--
[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/

Right, that makes sense; in this case my pattern is just a list of named captures union'd together.

The previous implementation used a series of gsub's to replace each pattern one at a time (same as your String[/x/,'y'] suggestion), but this was slow because it had to traverse a large string several times, and I suspect also allocating a whole new string for each iteration.

Thanks!
Andrew Vit

···

On 14-04-16, 2:20, Robert Klemme wrote:

Just a caveat: this approach works only because match of your
capturing extends to the whole expression match. It will break if you
have text matched before or after the group which should stay as is.
If that is the case you would have to either use lookaround if
possible OR introduce groups for the prefix and / or suffix and use
them to construct the replacement.

I guess this does not apply in your case.