I would like to replace '<math>test1<\math>' and '<math>test2<\math>' by
'ok' in the string "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
#
# @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
# <math>test2<\math>"
# @b = @a.gsub(/<math>.+<\math>/,'ok')
#
# I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
# blabla2 ok"
#
# Could you please help me to tune my regexp so it works the
# way I want ?
I would like to replace '<math>test1<\math>' and '<math>test2<\math>' by
'ok' in the string "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
blabla2 ok"
Could you please help me to tune my regexp so it works the way I want ?
As I suspect seven or eight people will tell you, you need to make
your one-or-more quantifier non-greedy Note the question mark:
a = "blabla1 blabla1 <math>test1<\\math> blabla2 blabla2
<math>test2<\\math>"
puts a.gsub(/<math>.+?<\\math>/,'ok')
I've made a couple of other changes. In a double-quoted string, "\m"
is just the letter m. If you want a \ you have to escape it with
another \. (Are you sure you don't want a forward slash anyway?)
I have a feeling you don't really need instance variables here; if
not, it's better to use locals.
David
···
On Fri, 7 Jul 2006, Fred VD wrote:
--
"To fully realize the potential of Rails, it's crucial that you take
the time to fully understand Ruby--and with "Ruby for Rails" David
has provided just what you need to help you achieve that goal."
-- DAVID HEINEMEIER HANSSON, in the foreword to RUBY FOR RAILS.
Complete foreword & sample chapters at http://www.manning.com/black\!
You need to replace .+ with .+? - the former is 'greedy'; i.e. it
matches as much as it possibly can, so it matches everything between
the first <math> and the last <\math>.
martin
···
On 7/7/06, Fred VD <outrelouxhe@yahoo.fr> wrote:
Hello,
I'm new to Ruby and I'm fighting with regexp.
I would like to replace '<math>test1<\math>' and '<math>test2<\math>' by
'ok' in the string "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
+ and * are greedy. Use +? and *? for non-greedy operation:
a = 'a b <x>c</x> d <x>e</x> f g'
a.gsub(%r!<x>.+?</x>!, 'ok')
=> "a b ok d ok f g"
a.gsub(%r!<x>.+</x>!, 'ok')
=> "a b ok f g"
A greedy expression will match up to the *last * occurrence of what
follows it in the expression, whereas a non-greedy expression will
match up to the *first* occurrence.
Paul.
···
On 07/07/06, Fred VD <outrelouxhe@yahoo.fr> wrote:
fr fred:
# When I try this :
#
# @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
# <math>test2<\math>"
# @b = @a.gsub(/<math>.+<\math>/,'ok')
#
# I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
# blabla2 ok"
#
# Could you please help me to tune my regexp so it works the
# way I want ?
(ignore the newline; i just copiednpaste fr your email...)
Interesting -- it actually works only *because* of the newline
With the newline, the .+ starts again on the second line (since .
doesn't match a newline). Without it, the greed of .+ consumes
everything up to the second <\math>.
So if it's really on two lines, the non-greedy .+ isn't needed. But
it's probably a good idea anyway, as it will work in a larger number
of cases.
David
···
On Fri, 7 Jul 2006, Peña, Botp wrote:
--
"To fully realize the potential of Rails, it's crucial that you take
the time to fully understand Ruby--and with "Ruby for Rails" David
has provided just what you need to help you achieve that goal."
-- DAVID HEINEMEIER HANSSON, in the foreword to RUBY FOR RAILS.
Complete foreword & sample chapters at http://www.manning.com/black\!
I thought this might warrant some explanation in light of the later posts about greedy operators.
The newline from the paste is what makes this work differently from the normal example - greedy expressions (already mentioned) aren't greedy across newlines unless you specify a multiline regex, so the newline is what made the regex appear correct, when it wouldn't be in a general case.
# with newline
a = "blabla1 blabla1 <math>test1<math> blabla2 blabla2 \n<math>test2<math>"
a.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok blabla2 blabla2 \nok"
# sans newline
b = "blabla1 blabla1 <math>test1<math> blabla2 blabla2 <math>test2<math>"
b.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok"
matthew smillie.
···
On Jul 7, 2006, at 11:52, Peña, Botp wrote:
fr fred:
# When I try this :
#
# @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
# <math>test2<\math>"
# @b = @a.gsub(/<math>.+<\math>/,'ok')
#
# I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
# blabla2 ok"
#
# Could you please help me to tune my regexp so it works the
# way I want ?
fr David:
# Interesting -- it actually works only *because* of the newline
# With the newline, the .+ starts again on the second line (since .
# doesn't match a newline). Without it, the greed of .+ consumes
# everything up to the second <\math>.
yes indeed. you're a great teacher.
# So if it's really on two lines, the non-greedy .+ isn't needed. But
# it's probably a good idea anyway, as it will work in a larger number
# of cases.
yes, why does it have to be greedy by default... or is that life?
Well, it needs to be one or the other by default. So, in my experience, the
choice is made to make it greedy. If you want a deep dive on regexes, look
at _Mastering Regular Expressions_ by Jeffrey E F Friedl.
···
On 7/7/06, Peña, Botp <botp@delmonte-phil.com> wrote:
yes, why does it have to be greedy by default... or is that life?