Regexp

Hello,

I'm new to Ruby and I'm fighting with regexp.

I would like to replace '<math>test1<\math>' and '<math>test2<\math>' by
'ok' in the string "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"

When I try this :

@a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
@b = @a.gsub(/<math>.+<\math>/,'ok')

I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
blabla2 ok"

Could you please help me to tune my regexp so it works the way I want ?

Thank you in advance,

Fred.

···

--
Posted via http://www.ruby-forum.com/.

fr fred:
# When I try this :

···

#
# @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
# <math>test2<\math>"
# @b = @a.gsub(/<math>.+<\math>/,'ok')
#
# I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
# blabla2 ok"
#
# Could you please help me to tune my regexp so it works the
# way I want ?

your code works fine here,

irb(main):001:0> @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
irb(main):002:0" <math>test2<\math>"
=> "blabla1 blabla1 <math>test1<math> blabla2 blabla2 \n<math>test2<math>"
irb(main):003:0> @b = @a.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok blabla2 blabla2 \nok"

(ignore the newline; i just copiednpaste fr your email...)

kind regards -botp

Hi --

Hello,

I'm new to Ruby and I'm fighting with regexp.

I would like to replace '<math>test1<\math>' and '<math>test2<\math>' by
'ok' in the string "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"

When I try this :

@a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
@b = @a.gsub(/<math>.+<\math>/,'ok')

I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
blabla2 ok"

Could you please help me to tune my regexp so it works the way I want ?

As I suspect seven or eight people will tell you, you need to make
your one-or-more quantifier non-greedy :slight_smile: Note the question mark:

   a = "blabla1 blabla1 <math>test1<\\math> blabla2 blabla2
   <math>test2<\\math>"
   puts a.gsub(/<math>.+?<\\math>/,'ok')

I've made a couple of other changes. In a double-quoted string, "\m"
is just the letter m. If you want a \ you have to escape it with
another \. (Are you sure you don't want a forward slash anyway?)

I have a feeling you don't really need instance variables here; if
not, it's better to use locals.

David

···

On Fri, 7 Jul 2006, Fred VD wrote:

--
  "To fully realize the potential of Rails, it's crucial that you take
    the time to fully understand Ruby--and with "Ruby for Rails" David
       has provided just what you need to help you achieve that goal."
       -- DAVID HEINEMEIER HANSSON, in the foreword to RUBY FOR RAILS.
  Complete foreword & sample chapters at http://www.manning.com/black\!

You need to replace .+ with .+? - the former is 'greedy'; i.e. it
matches as much as it possibly can, so it matches everything between
the first <math> and the last <\math>.

martin

···

On 7/7/06, Fred VD <outrelouxhe@yahoo.fr> wrote:

Hello,

I'm new to Ruby and I'm fighting with regexp.

I would like to replace '<math>test1<\math>' and '<math>test2<\math>' by
'ok' in the string "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"

When I try this :

@a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
@b = @a.gsub(/<math>.+<\math>/,'ok')

I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
blabla2 ok"

Could you please help me to tune my regexp so it works the way I want ?

+ and * are greedy. Use +? and *? for non-greedy operation:

a = 'a b <x>c</x> d <x>e</x> f g'
a.gsub(%r!<x>.+?</x>!, 'ok')
=> "a b ok d ok f g"
a.gsub(%r!<x>.+</x>!, 'ok')
=> "a b ok f g"

A greedy expression will match up to the *last * occurrence of what
follows it in the expression, whereas a non-greedy expression will
match up to the *first* occurrence.

Paul.

···

On 07/07/06, Fred VD <outrelouxhe@yahoo.fr> wrote:

@a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
<math>test2<\math>"
@b = @a.gsub(/<math>.+<\math>/,'ok')

I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
blabla2 ok"

Hi --

fr fred:
# When I try this :
#
# @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
# <math>test2<\math>"
# @b = @a.gsub(/<math>.+<\math>/,'ok')
#
# I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
# blabla2 ok"
#
# Could you please help me to tune my regexp so it works the
# way I want ?

your code works fine here,

irb(main):001:0> @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
irb(main):002:0" <math>test2<\math>"
=> "blabla1 blabla1 <math>test1<math> blabla2 blabla2 \n<math>test2<math>"
irb(main):003:0> @b = @a.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok blabla2 blabla2 \nok"

(ignore the newline; i just copiednpaste fr your email...)

Interesting -- it actually works only *because* of the newline :slight_smile:
With the newline, the .+ starts again on the second line (since .
doesn't match a newline). Without it, the greed of .+ consumes
everything up to the second <\math>.

So if it's really on two lines, the non-greedy .+ isn't needed. But
it's probably a good idea anyway, as it will work in a larger number
of cases.

David

···

On Fri, 7 Jul 2006, Peña, Botp wrote:

--
  "To fully realize the potential of Rails, it's crucial that you take
    the time to fully understand Ruby--and with "Ruby for Rails" David
       has provided just what you need to help you achieve that goal."
       -- DAVID HEINEMEIER HANSSON, in the foreword to RUBY FOR RAILS.
  Complete foreword & sample chapters at http://www.manning.com/black\!

I thought this might warrant some explanation in light of the later posts about greedy operators.

The newline from the paste is what makes this work differently from the normal example - greedy expressions (already mentioned) aren't greedy across newlines unless you specify a multiline regex, so the newline is what made the regex appear correct, when it wouldn't be in a general case.

# with newline
a = "blabla1 blabla1 <math>test1<math> blabla2 blabla2 \n<math>test2<math>"
a.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok blabla2 blabla2 \nok"

# sans newline
b = "blabla1 blabla1 <math>test1<math> blabla2 blabla2 <math>test2<math>"
b.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok"

matthew smillie.

···

On Jul 7, 2006, at 11:52, Peña, Botp wrote:

fr fred:
# When I try this :
#
# @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
# <math>test2<\math>"
# @b = @a.gsub(/<math>.+<\math>/,'ok')
#
# I get "blabla1 blabla1 ok" instead of "blabla1 blabla1 ok blabla2
# blabla2 ok"
#
# Could you please help me to tune my regexp so it works the
# way I want ?

your code works fine here,

irb(main):001:0> @a = "blabla1 blabla1 <math>test1<\math> blabla2 blabla2
irb(main):002:0" <math>test2<\math>"
=> "blabla1 blabla1 <math>test1<math> blabla2 blabla2 \n<math>test2<math>"
irb(main):003:0> @b = @a.gsub(/<math>.+<\math>/,'ok')
=> "blabla1 blabla1 ok blabla2 blabla2 \nok"

(ignore the newline; i just copiednpaste fr your email...)

fr David:
# Interesting -- it actually works only *because* of the newline :slight_smile:
# With the newline, the .+ starts again on the second line (since .
# doesn't match a newline). Without it, the greed of .+ consumes
# everything up to the second <\math>.

yes indeed. you're a great teacher.

# So if it's really on two lines, the non-greedy .+ isn't needed. But
# it's probably a good idea anyway, as it will work in a larger number
# of cases.

yes, why does it have to be greedy by default... or is that life? :slight_smile:

kind regards -botp

Yes ! The "+?" was what I needed !
Thanks to all for your help.

Fred.

···

--
Posted via http://www.ruby-forum.com/.

That's how regexps are defined "*" and "+" are always greedy and back
up only if they have to. Just get used to it. :slight_smile:

Kind regards

robert

···

2006/7/7, Peña, Botp <botp@delmonte-phil.com>:

yes, why does it have to be greedy by default... or is that life? :slight_smile:

--
Have a look: Robert K. | Flickr

Well, it needs to be one or the other by default. So, in my experience, the
choice is made to make it greedy. If you want a deep dive on regexes, look
at _Mastering Regular Expressions_ by Jeffrey E F Friedl.

···

On 7/7/06, Peña, Botp <botp@delmonte-phil.com> wrote:

yes, why does it have to be greedy by default... or is that life? :slight_smile: