Short regexp question

Fritzek · 18 September 2008 14:47

Hi folks

short question how to use regexp the right way
given is a="a[b]c"
I nedd b="a c"

tried to split using (/(\[|\])/) to get ["a", "[", "b", "]", "c"]

b=a.strip.split(/(\[|\])/)

and then joined the bits together. this just works in a simple case
like "a[b]c", but "[b]" could occur multiple times.

I need something like search for any "[b]" and substitute with a
blank.

Thanks in advance

Fritzek

David_A_Black1 · 18 September 2008 14:54

Hi --

···

On Thu, 18 Sep 2008, Fritzek wrote:

Hi folks

short question how to use regexp the right way
given is a="a[b]c"
I nedd b="a c"

tried to split using (/(\[|\])/) to get ["a", "[", "b", "]", "c"]

b=a.strip.split(/(\[|\])/)

and then joined the bits together. this just works in a simple case
like "a[b]c", but "[b]" could occur multiple times.

I need something like search for any "[b]" and substitute with a
blank.

b = a.delete("[b]")

David

--
Rails training from David A. Black and Ruby Power and Light:
   Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
   Advancing with Rails January 19-22 Fort Lauderdale, FL *
   * Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!

Brian_Candler · 18 September 2008 15:06

I need something like search for any "[b]" and substitute with a
blank.

b = a.gsub(/\[b\]/,' ')

···

--
Posted via http://www.ruby-forum.com/\.

Sebastian_Hungereck1 · 18 September 2008 15:31

Fritzek wrote:

short question how to use regexp the right way
given is a="a[b]c"
I nedd b="a c"

"a[b]c".gsub(/\[.*?\]/, " ")

HTH,
Sebastian

···

--
Jabber: sepp2k@jabber.org
ICQ: 205544826

Fritzek · 18 September 2008 15:02

Hi David

thanks for quick answer. your code just works, if you know "b". I only
know the surrounding brackets "[" and "]" The bit in between could be
everything. sorry, forgot to mention.

Fritzek

···

On 18 Sep., 16:54, "David A. Black" <dbl...@rubypal.com> wrote:

Hi --

On Thu, 18 Sep 2008, Fritzek wrote:
> Hi folks

> short question how to use regexp the right way
> given is a="a[b]c"
> I nedd b="a c"

> tried to split using (/(\[|\])/) to get ["a", "[", "b", "]", "c"]

> b=a.strip.split(/(\[|\])/)

> and then joined the bits together. this just works in a simple case
> like "a[b]c", but "[b]" could occur multiple times.

> I need something like search for any "[b]" and substitute with a
> blank.

b = a.delete("[b]")

David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
Seehttp://www.rubypal.comfor details and updates!

Brian_Candler · 18 September 2008 15:08

b = a.gsub(/\[b\]/,' ')

Also possibly useful is for you:

a = "aaa[b]bbb[b]ccc"
bits = a.split(/\[b\]/)

···

--
Posted via http://www.ruby-forum.com/\.

Fritzek · 18 September 2008 15:42

Hi Sebastian

thanks for the solution. works perfect.

Fritzek

···

On 18 Sep., 17:31, Sebastian Hungerecker <sep...@googlemail.com> wrote:

Fritzek wrote:
> short question how to use regexp the right way
> given is a="a[b]c"
> I nedd b="a c"

"a[b]c".gsub(/\[.*?\]/, " ")

HTH,
Sebastian
--
Jabber: sep...@jabber.org
ICQ: 205544826

Fritzek · 18 September 2008 15:17

Hi Brian

thanks for your answer. as I stated to David, I just know about the
surrounding brackets not the bits between them.

Fritzek

···

On 18 Sep., 17:08, Brian Candler <b.cand...@pobox.com> wrote:

> b = a.gsub(/\[b\]/,' ')

Also possibly useful is for you:

a = "aaa[b]bbb[b]ccc"
bits = a.split(/\[b\]/)

--
Posted viahttp://www.ruby-forum.com/.

Robert_K1 · 18 September 2008 20:07

Not sure whether it makes a difference performance wise but I am always reluctant to use reluctant quantifiers. I'd rather do

irb(main):003:0> "a[b]c".gsub /\[[^\]]*\]/, ' '
=> "a c"

Kind regards

robert

···

On 18.09.2008 17:47, Fritzek wrote:

Hi Sebastian

thanks for the solution. works perfect.

Fritzek

On 18 Sep., 17:31, Sebastian Hungerecker <sep...@googlemail.com> > wrote:

Fritzek wrote:

short question how to use regexp the right way
given is a="a[b]c"
I nedd b="a c"

"a[b]c".gsub(/\[.*?\]/, " ")

Fritzek · 19 September 2008 13:07

Hi Robert

thanks for your objection, but could you shortly explain the
difference (for regexp dummies like me)?

Fritzek

···

On 18 Sep., 22:13, Robert Klemme <shortcut...@googlemail.com> wrote:

On 18.09.2008 17:47, Fritzek wrote:

> Hi Sebastian

> thanks for the solution. works perfect.

> Fritzek

> On 18 Sep., 17:31, Sebastian Hungerecker <sep...@googlemail.com> > > wrote:
>> Fritzek wrote:
>>> short question how to use regexp the right way
>>> given is a="a[b]c"
>>> I nedd b="a c"
>> "a[b]c".gsub(/\[.*?\]/, " ")

Not sure whether it makes a difference performance wise but I am always
reluctant to use reluctant quantifiers. I'd rather do

irb(main):003:0> "a[b]c".gsub /\[[^\]]*\]/, ' '
=> "a c"

Kind regards
    robert

Robert_K1 · 19 September 2008 13:23

Ideally you read "Mastering Regular Expressions" which explains such
topics very nicely.

I believe it is generally better to be more specific about what is to
match (mainly for robustness reasons). Also, with the reluctant
quantifier for every character in the input a match against the next
sub pattern needs to be tested OR there needs to be backtracking to
find out whether there is a shorter match afterwards. Both seem not
very efficient. Granted, this is no hard evidence, but if you are
curious I suggest you do some benchmarks and read the book; it's
really good!

Kind regards

robert

···

2008/9/19 Fritzek <fritz.thielemann@googlemail.com>:

thanks for your objection, but could you shortly explain the
difference (for regexp dummies like me)?

--
use.inject do |as, often| as.you_can - without end

Fritzek · 19 September 2008 13:42

Hi Robert

thanks for explanation and book hint. will search for it.
Fritzek

···

On 19 Sep., 15:23, Robert Klemme <shortcut...@googlemail.com> wrote:

2008/9/19 Fritzek <fritz.thielem...@googlemail.com>:

> thanks for your objection, but could you shortly explain the
> difference (for regexp dummies like me)?

Ideally you read "Mastering Regular Expressions" which explains such
topics very nicely.

I believe it is generally better to be more specific about what is to
match (mainly for robustness reasons). Also, with the reluctant
quantifier for every character in the input a match against the next
sub pattern needs to be tested OR there needs to be backtracking to
find out whether there is a shorter match afterwards. Both seem not
very efficient. Granted, this is no hard evidence, but if you are
curious I suggest you do some benchmarks and read the book; it's
really good!

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

Tod_Beardsley · 19 September 2008 15:41

I wrote a quickie benchmark. CPU speed and compile options will
certainly influence your results.

http://snippets.dzone.com/posts/show/6098

Also, best intro to regular expressions ever:

http://www.regular-expressions.info/tutorial.html

···

--
todb@planb-security.net | ICQ: 335082155 | Note: Due to Google's
privacy policy <http://tinyurl.com/5xbtl> and the United States'
policy on electronic surveillance <http://tinyurl.com/muuyl>,
please do not IM/e-mail me anything you wish to remain secret.

Robert_K1 · 20 September 2008 11:40

I wrote a quickie benchmark. CPU speed and compile options will
certainly influence your results.

http://snippets.dzone.com/posts/show/6098

Hm, it seems line 13 and 18 are identical. Where's the lazy quantifier?

Here's what I'd consider a better benchmark, as it covers the
scenarios I was talking about, especially with situations where there
is a second potential end point ("b" in this case):

robert@fussel /cygdrive/c/Temp
$ cat l.rb
#!/bin/env ruby

require 'benchmark'

REP = 1_000
LONG = 1_000

STRINGS = [
  ["short match", "ab"],
  ["short mismatch", "a"],
  ["long match", "a" * LONG + "b"],
  ["long mismatch", "a" * LONG],
  ["short match double", "abab"],
  ["long match double", "a" * LONG + "bb"],
  ["long match double long", "a" * LONG + "b" + "a" * LONG + "b"],
]

Benchmark.bmbm(6 + STRINGS.inject(0) {|m,(a,b)| a.length > m ?
a.length : m }) do |b|
STRINGS.each do |label, str|
rep = /long mis/ =~ label ? 100 : 100_000

    b.report "neg " + label do
      rep.times { /a[^b]*b/ =~ str }
    end

    b.report "lazy " + label do
      rep.times { /a.*?b/ =~ str }
    end
  end
end

robert@fussel /cygdrive/c/Temp
$ ./l.rb
Rehearsal ---------------------------------------------------------------
neg short match 0.282000 0.000000 0.282000 ( 0.288000)
lazy short match 0.297000 0.000000 0.297000 ( 0.284000)
neg short mismatch 0.328000 0.000000 0.328000 ( 0.341000)
lazy short mismatch 0.375000 0.000000 0.375000 ( 0.366000)
neg long match 9.531000 0.000000 9.531000 ( 9.982000)
lazy long match 12.625000 0.000000 12.625000 ( 12.764000)
neg long mismatch 4.672000 0.000000 4.672000 ( 4.742000)
lazy long mismatch 6.297000 0.000000 6.297000 ( 6.422000)
neg short match double 0.297000 0.000000 0.297000 ( 0.291000)
lazy short match double 0.281000 0.000000 0.281000 ( 0.287000)
neg long match double 9.406000 0.000000 9.406000 ( 9.443000)
lazy long match double 12.500000 0.000000 12.500000 ( 12.592000)
neg long match double long 9.516000 0.000000 9.516000 ( 9.642000)
lazy long match double long 12.547000 0.000000 12.547000 ( 12.745000)
----------------------------------------------------- total: 78.954000sec

user system total real
neg short match 0.312000 0.000000 0.312000 ( 0.305000)
lazy short match 0.297000 0.000000 0.297000 ( 0.301000)
neg short mismatch 0.375000 0.000000 0.375000 ( 0.388000)
lazy short mismatch 0.359000 0.000000 0.359000 ( 0.356000)
neg long match 9.344000 0.000000 9.344000 ( 9.637000)
lazy long match 12.547000 0.000000 12.547000 ( 12.777000)
neg long mismatch 4.703000 0.000000 4.703000 ( 4.783000)
lazy long mismatch 6.219000 0.000000 6.219000 ( 6.242000)
neg short match double 0.297000 0.000000 0.297000 ( 0.301000)
lazy short match double 0.297000 0.000000 0.297000 ( 0.297000)
neg long match double 9.453000 0.000000 9.453000 ( 9.531000)
lazy long match double 12.718000 0.000000 12.718000 ( 13.566000)
neg long match double long 9.407000 0.000000 9.407000 ( 9.442000)
lazy long match double long 12.500000 0.000000 12.500000 ( 12.777000)

robert@fussel /cygdrive/c/Temp

Notice how lazy is up to 30% slower for longer strings.

Also, best intro to regular expressions ever:

Regular Expression Tutorial - Learn How to Use Regular Expressions

Good ref!

Kind regards

robert

···

2008/9/19 Tod Beardsley <todb@planb-security.net>:

--
use.inject do |as, often| as.you_can - without end

Tod_Beardsley · 22 September 2008 14:31

grr curse my copy paste skills. fixed. thanks for paying attention,
Robert. Your bm test is, of course, much more useful.

···

On Sat, Sep 20, 2008 at 6:40 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

Hm, it seems line 13 and 18 are identical. Where's the lazy quantifier?

--
todb@planb-security.net | ICQ: 335082155 | Note: Due to Google's
privacy policy <http://tinyurl.com/5xbtl> and the United States'
policy on electronic surveillance <http://tinyurl.com/muuyl>,
please do not IM/e-mail me anything you wish to remain secret.

Tod_Beardsley · 22 September 2008 14:41

Anyway, I think the moral of this particular long-missing story is, if
you can regex test for smaller anchors first, you can then fail to
match much faster. IOW:

matched =false
if str.match(/b/)
matched = true if str.match(/a[^b]*b/)
end
matched

···

On Mon, Sep 22, 2008 at 9:31 AM, Tod Beardsley <todb@planb-security.net> wrote:

On Sat, Sep 20, 2008 at 6:40 AM, Robert Klemme > <shortcutter@googlemail.com> wrote:

Hm, it seems line 13 and 18 are identical. Where's the lazy quantifier?

grr curse my copy paste skills. fixed. thanks for paying attention,
Robert. Your bm test is, of course, much more useful.

--
todb@planb-security.net | ICQ: 335082155 | Note: Due to Google's
privacy policy <http://tinyurl.com/5xbtl> and the United States'
policy on electronic surveillance <http://tinyurl.com/muuyl>,
please do not IM/e-mail me anything you wish to remain secret.

--
todb@planb-security.net | ICQ: 335082155 | Note: Due to Google's
privacy policy <http://tinyurl.com/5xbtl> and the United States'
policy on electronic surveillance <http://tinyurl.com/muuyl>,
please do not IM/e-mail me anything you wish to remain secret.

Robert_K1 · 22 September 2008 15:10

I am not sure. This approach is likely slower than a single fast RX -
at least if you expect matches most of the time. It all depends...

Kind regards

robert

···

2008/9/22 Tod Beardsley <todb@planb-security.net>:

Anyway, I think the moral of this particular long-missing story is, if
you can regex test for smaller anchors first, you can then fail to
match much faster. IOW:

matched =false
if str.match(/b/)
matched = true if str.match(/a[^b]*b/)
end
matched

--
use.inject do |as, often| as.you_can - without end

Ezra_Zygmuntowicz · 22 September 2008 22:12

Also keep in mind that =~ is generally a lot faster then .match since match has to build the full MatchData object even if you do not use it.

Cheers-
-Ezra

···

On Sep 22, 2008, at 8:10 AM, Robert Klemme wrote:

2008/9/22 Tod Beardsley <todb@planb-security.net>:

Anyway, I think the moral of this particular long-missing story is, if
you can regex test for smaller anchors first, you can then fail to
match much faster. IOW:

matched =false
if str.match(/b/)
matched = true if str.match(/a[^b]*b/)
end
matched

I am not sure. This approach is likely slower than a single fast RX -
at least if you expect matches most of the time. It all depends...

Brian_Candler · 23 September 2008 07:44

Also keep in mind that =~ is generally a lot faster then .match since
match has to build the full MatchData object even if you do not use it.

With =~ the MatchData can still be obtained from $~

Interestingly, not referencing the MatchData *does* give a big speed
improvement.

$ time ruby -e '5_000_000.times { /b/.match("abc") }'

real 0m28.699s
user 0m28.490s
sys 0m0.024s

$ time ruby -e '5_000_000.times { /b/ =~ "abc"; $~ }'

real 0m28.119s
user 0m27.910s
sys 0m0.024s

$ time ruby -e '5_000_000.times { /b/ =~ "abc" }'

real 0m14.311s
user 0m14.285s
sys 0m0.008s

$ ruby -v
ruby 1.8.6 (2008-03-03 patchlevel 114) [i686-linux]

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Simple regexp question ruby-talk	0	64	26 October 2005
Regexp question ruby-talk	10	76	1 October 2004
About regexp ruby-talk	2	93	22 September 2008
New at regexp and Ruby need help on parsing a string ruby-talk	9	100	24 November 2007
Regexp reference ruby-talk	3	87	6 December 2002

Short regexp question

Related topics