Attaching code to run on regular expression match

Eyal_Oren1 · 19 October 2005 10:19

Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

Brian_Schroder1 · 19 October 2005 10:39

Maybe you can refactor your regexp to be used with scan.

irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase end
SOME
WORDS
TO
CHANGE
=> "some words to change"

hth,
Brian

···

On 19/10/05, Eyal Oren <eyal.oren@gmail.com> wrote:

Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Robert · 19 October 2005 10:46

Eyal Oren wrote:

Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

No, I don't think it's possible. You can do this

string.scan(/(\w+):(\w+)/) do |match|
  case match.inject(1) {|pos,x| break pos if x;pos + 1}
    when 1
      # code for group 1
    when 2
      # ...
  end
end

Kind regards

robert

Pit · 19 October 2005 11:08

Eyal Oren schrieb:

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

I'm sure I'm missing something, but wouldn't this work:

   string.scan(/(\w+):(\w+)/) do |m1, m2|
     do_method(m1)
     do_other_method(m2)
   end

Maybe you can show us one of your complex regex?

Regards,
Pit

Eyal_Oren1 · 19 October 2005 10:51

I am not sure that would help, I need to know which of the matches
occurred, because the actions are different for different matches (you
just 'put' all matches).

In your example, "Some words To change" say I want to print the
capitalised words normally, and print the others reversed. I can make
a regex that captures both these words in two groups, but scan
wouldn't work because I wouldn't know if a match was from group one or
group two.

But AFAIK I cannot ask the resulting match which regex he was matched
by, so I still do not know what to do. I could of course test each
regex on the matched word again, but that is not efficient.

···

On 19/10/05, Brian Schröder <ruby.brian@gmail.com> wrote:

On 19/10/05, Eyal Oren <eyal.oren@gmail.com> wrote:
> Hi,
>
> I am parsing query expressions, using a regular expression with
> multiple matches in it, e.g. /(\w+):(\w+)/.
>
> I would like some code to execute on the first match (e.g.
> constructing some object out of it) and some other code on the second
> match (e.g. constructing some other object).
>
> I can of course check the array of matches and find the non-nil
> element, and decide which code to execute. But that becomes very
> cumbersome with a large regex (with say 10 different matches).
>
> So I would rather like to attach some code in a match directly, as one
> does in parsing generators, e.g.
> /(\w+:do_method):(\w+:do_other_method)/.
>
> Would something like that be possible in Ruby? I tried searching but
> I'm not sure how such a feature would be called.

Maybe you can refactor your regexp to be used with scan.

irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase end
SOME
WORDS
TO
CHANGE
=> "some words to change"

Eyal_Oren1 · 19 October 2005 11:16

thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)

ExplicitWiki = /\[\[([^\]]+)\]\]/

# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

# <...>, no space inside brackets
Uri = /<([^<>]+)>/

# dc:title
Prefix = /(\w*):(\w+)/

# "hello"
Literal = /"([^"]*)"/

  Wiki = Regexp.union ExplicitWiki, ImplicitWiki
  Pred = Regexp.union Wiki, Uri, Prefix
  Obj = Regexp.union Pred, Literal
  Annotation = /(#{Pred})\s*(#{Obj})\s*\./

  Variable = /(\?\w+)/
  UriPattern = Regexp.union Variable, Pred
  LiteralPattern = Regexp.union Variable, Obj
  Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/

Pit · 19 October 2005 11:47

Eyal Oren schrieb:

thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)

  ExplicitWiki = /\[\[([^\]]+)\]\]/

  # CamelCase followed by some non-word character, e.g. 'CamelCase.'
  ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

  # <...>, no space inside brackets
  Uri = /<([^<>]+)>/

  # dc:title
  Prefix = /(\w*):(\w+)/

  # "hello"
  Literal = /"([^"]*)"/

  Wiki = Regexp.union ExplicitWiki, ImplicitWiki
  Pred = Regexp.union Wiki, Uri, Prefix
  Obj = Regexp.union Pred, Literal
  Annotation = /(#{Pred})\s*(#{Obj})\s*\./

  Variable = /(\?\w+)/
  UriPattern = Regexp.union Variable, Pred
  LiteralPattern = Regexp.union Variable, Obj
  Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/

OK, thanks for your example. I think the regexp engine of Ruby 1.9 called Oniguruma supports something like named sub-expressions, which might be what you need.

Regards,
Pit

David_Holroyd · 19 October 2005 12:25

I wrote the following a long time ago when I was new to Ruby. Maybe you
could use a similar pattern,

···

On Wed, Oct 19, 2005 at 08:16:58PM +0900, Eyal Oren wrote:

thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)

  ExplicitWiki = /\[\[([^\]]+)\]\]/

  # CamelCase followed by some non-word character, e.g. 'CamelCase.'
  ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

  # <...>, no space inside brackets
  Uri = /<([^<>]+)>/

  # dc:title
  Prefix = /(\w*):(\w+)/

  # "hello"
  Literal = /"([^"]*)"/

  Wiki = Regexp.union ExplicitWiki, ImplicitWiki
  Pred = Regexp.union Wiki, Uri, Prefix
  Obj = Regexp.union Pred, Literal
  Annotation = /(#{Pred})\s*(#{Obj})\s*\./

  Variable = /(\?\w+)/
  UriPattern = Regexp.union Variable, Pred
  LiteralPattern = Regexp.union Variable, Obj
  Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/

----------------------------------------------------------------------
# Perform (possibly) multiple global substitutions on a string.
# the regexps given as keys must not use capturing subexpressions
# '(...)'
class MultiSub
  # hash has regular expression fragments (as strings) as keys, mapped
  # to
  # Procs that will generate replacement text, given the matched value.
  def initialize(hash)
    @mash = Array.new
    expr = nil
    hash.each do |key,val|
      if expr == nil ; expr="(" else expr<<"|(" end
      expr << key << ")"
      @mash << val
    end
    @re = Regexp.new(expr)
  end

  # perform a global multi-sub on the given text, modifiying the passed
  # string
  # 'in place'
  def gsub!(text)
    text.gsub!(@re) { |match|
      idx = -1
      $~.to_a.each { |subexp|
        break unless idx==-1 || subexp==nil
        idx += 1
      }
      idx==-1 ? match : @mash[idx].call(match)
    }
  end
end

# example,

mailSub = proc { |match| "<a href=\"mailto:#{match}\">#{match}</a>" }
urlSub = proc { |match| "<a href=\"#{match}\">#{match}</a>" }

sub = MultiSub.new ({
'(?:mailto:)?[\w\.\-\+\=]+\@[\w\-]+(?:\.[\w\-]+)+\b' => mailSub,
'\b(?:http|https|ftp):[^ \t\n<>"]+[\w/]' => urlSub
})

test = "...."
sub.gsub!(test)
puts test
----------------------------------------------------------------------

ta,
dave

--
http://david.holroyd.me.uk/

Kevin_Ballard · 19 October 2005 13:51

Pit Capitain wrote:

OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.

Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby? I thought they were, but I've
only actually used them in TextMate (an OS X text editor that uses
Oniguruma as its regex engine).

Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

Christophe_Grandsire · 19 October 2005 13:55

Selon Kevin Ballard <kballard@gmail.com>:

Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

I thought Oniguruma was not yet the regex engine of Ruby, but would become it
from Ruby2 on (is it already the engine in Ruby 1.9?), i.e. it is not the regex
engine of Ruby 1.8.

···

--
Christophe Grandsire.

http://rainbow.conlang.free.fr

It takes a straight mind to create a twisted conlang.

James_Edward_Gray_II · 19 October 2005 13:58

Pit Capitain wrote:

OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.

Oniguruma is indeed the regexp engine of Ruby

Ruby 1.9 you mean.

but are you sure named subexpressions aren't already in Ruby?

If you just download and build Ruby 1.8, you don't get Oniguruma yet.

Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

You can build 1.8 to use it, but you must purposefully do so.

James Edward Gray II

···

On Oct 19, 2005, at 8:51 AM, Kevin Ballard wrote:

Gavin_Kistner2 · 19 October 2005 13:58

Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby?

[snip]

Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

Oniguruma is only the engine in versions 1.9+; versions 1.8- use a different regexp engine.

···

On Oct 19, 2005, at 7:51 AM, Kevin Ballard wrote:

Topic		Replies	Views
Specification of Ruby regex? ruby-talk	31	178	28 August 2003
Named groups in regexp matches? ruby-talk	9	139	3 February 2007
Defining regexp's and variables set by them ruby-talk	10	143	8 August 2005
Embedded code construct a la perl ruby-talk	4	116	14 April 2004
A regular expression problem ruby-talk	6	90	5 March 2007

Attaching code to run on regular expression match

Related topics