I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.
I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).
I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).
So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.
Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.
Maybe you can refactor your regexp to be used with scan.
irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase end
SOME
WORDS
TO
CHANGE
=> "some words to change"
hth,
Brian
···
On 19/10/05, Eyal Oren <eyal.oren@gmail.com> wrote:
Hi,
I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.
I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).
I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).
So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.
Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.
I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.
I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).
I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).
So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.
Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.
No, I don't think it's possible. You can do this
string.scan(/(\w+):(\w+)/) do |match|
case match.inject(1) {|pos,x| break pos if x;pos + 1}
when 1
# code for group 1
when 2
# ...
end
end
I am not sure that would help, I need to know which of the matches
occurred, because the actions are different for different matches (you
just 'put' all matches).
In your example, "Some words To change" say I want to print the
capitalised words normally, and print the others reversed. I can make
a regex that captures both these words in two groups, but scan
wouldn't work because I wouldn't know if a match was from group one or
group two.
But AFAIK I cannot ask the resulting match which regex he was matched
by, so I still do not know what to do. I could of course test each
regex on the matched word again, but that is not efficient.
···
On 19/10/05, Brian Schröder <ruby.brian@gmail.com> wrote:
On 19/10/05, Eyal Oren <eyal.oren@gmail.com> wrote:
> Hi,
>
> I am parsing query expressions, using a regular expression with
> multiple matches in it, e.g. /(\w+):(\w+)/.
>
> I would like some code to execute on the first match (e.g.
> constructing some object out of it) and some other code on the second
> match (e.g. constructing some other object).
>
> I can of course check the array of matches and find the non-nil
> element, and decide which code to execute. But that becomes very
> cumbersome with a large regex (with say 10 different matches).
>
> So I would rather like to attach some code in a match directly, as one
> does in parsing generators, e.g.
> /(\w+:do_method):(\w+:do_other_method)/.
>
> Would something like that be possible in Ruby? I tried searching but
> I'm not sure how such a feature would be called.
Maybe you can refactor your regexp to be used with scan.
irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase end
SOME
WORDS
TO
CHANGE
=> "some words to change"
thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:
because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)
ExplicitWiki = /\[\[([^\]]+)\]\]/
# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
# <...>, no space inside brackets
Uri = /<([^<>]+)>/
# dc:title
Prefix = /(\w*):(\w+)/
# "hello"
Literal = /"([^"]*)"/
Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./
thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:
because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)
ExplicitWiki = /\[\[([^\]]+)\]\]/
# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
# <...>, no space inside brackets
Uri = /<([^<>]+)>/
# dc:title
Prefix = /(\w*):(\w+)/
# "hello"
Literal = /"([^"]*)"/
Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./
OK, thanks for your example. I think the regexp engine of Ruby 1.9 called Oniguruma supports something like named sub-expressions, which might be what you need.
I wrote the following a long time ago when I was new to Ruby. Maybe you
could use a similar pattern,
···
On Wed, Oct 19, 2005 at 08:16:58PM +0900, Eyal Oren wrote:
thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:
because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)
ExplicitWiki = /\[\[([^\]]+)\]\]/
# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
# <...>, no space inside brackets
Uri = /<([^<>]+)>/
# dc:title
Prefix = /(\w*):(\w+)/
# "hello"
Literal = /"([^"]*)"/
Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./
----------------------------------------------------------------------
# Perform (possibly) multiple global substitutions on a string.
# the regexps given as keys must not use capturing subexpressions
# '(...)'
class MultiSub
# hash has regular expression fragments (as strings) as keys, mapped
# to
# Procs that will generate replacement text, given the matched value.
def initialize(hash) @mash = Array.new
expr = nil
hash.each do |key,val|
if expr == nil ; expr="(" else expr<<"|(" end
expr << key << ")" @mash << val
end @re = Regexp.new(expr)
end
# perform a global multi-sub on the given text, modifiying the passed
# string
# 'in place'
def gsub!(text)
text.gsub!(@re) { |match|
idx = -1
$~.to_a.each { |subexp|
break unless idx==-1 || subexp==nil
idx += 1
}
idx==-1 ? match : @mash[idx].call(match)
}
end
end
OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.
Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby? I thought they were, but I've
only actually used them in TextMate (an OS X text editor that uses
Oniguruma as its regex engine).
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.
I thought Oniguruma was not yet the regex engine of Ruby, but would become it
from Ruby2 on (is it already the engine in Ruby 1.9?), i.e. it is not the regex
engine of Ruby 1.8.
OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.
Oniguruma is indeed the regexp engine of Ruby
Ruby 1.9 you mean.
but are you sure named subexpressions aren't already in Ruby?
If you just download and build Ruby 1.8, you don't get Oniguruma yet.
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.
You can build 1.8 to use it, but you must purposefully do so.
Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby?
[snip]
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.
Oniguruma is only the engine in versions 1.9+; versions 1.8- use a different regexp engine.