My regexp stupidity needs assistance before loose all my hair!

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This [b]is[b.] a test.
[Hello.]
EOS

Much obliged,
T.

trans. (T. Onoma) wrote:

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't care if it's because I stupid and suck at it. It shouldn't have to be this hair pulling! Anyway... Can some one please give the regular expression to match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This [b]is[b.] a test.
[Hello.]
EOS

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Zach

I think that this is what you need: /\[[\w]+\]/

This little application might help you (not sure if it is 100% Ruby
compatible, but may be a start) called TestRexp, which you can get
here: Loading...

hth,
Douglas

···

On Tue, 18 Jan 2005 06:20:43 +0900, trans. (T. Onoma) <transami@runbox.com> wrote:

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This [b]is[b.] a test.
[Hello.]
EOS

Much obliged,
T.

trans. (T. Onoma) wrote:

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't care if it's because I stupid and suck at it. It shouldn't have to be this hair pulling! Anyway... Can some one please give the regular expression to match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This [b]is[b.] a test.
[Hello.]
EOS

s =~ /\[([^\]]*)\]/
puts $1

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it.

Given your other posts in this forum I cannot believe that you are stupid.

So here are some meta-hints on how to "suck less" at Regexes...

Always use the %r{}x form of regexs.

This neatly avoids the leaning toothpick syndrome when\/matching\/paths

The x modifier allows you to use white space and even comments within the regex to make it readable. (Larry Wall of perl fame regrets he didn't make it the default...)

My .emacs has a key-binding that will produce "=~ %r{ }x" and leave the cursor in the middle.
(global-set-key [(control %)]
                 `(lambda ()
                    (interactive)
                    (insert "=~ %r{ }x")
                    (backward-char 4)
                    ))

Pull the development of the regex outside the development of your app. Unit tests are good for that, or even if you just make a wee small script or do it on the command line or in irb.

If you are doing it on the command line beware of nasty interactions between the string and quoting conventions of the shell and ruby.

(Speaking Unix now...)
eg. ruby -e "blah" is A Very Bad Idea. The shell will peek inside the "blah" and do things that you really definitely don't want happening in a regex. Solution, use single quotes, bash never looks in side them. Downside, it means you must _never_ use single quotes in the ruby fragment blah, but you can use double quotes.

   ruby -e 'blah'

Grow the regex slowly. Start with the smallest thing, make it match.

If you immediately write down a large regex, odds on it will match nothing.

Sheer murderous frustration lies that way.

Start small, or strip away stuff on the right hand side of the regex until you match anything something. Then slowly start adding it back.

File.read(fileName) is cute. It allows you to pull the whole file in at once as one string and then you can match across lines.

Be aware that since standards are such good things, everyone has their are own one. ie. POSIX (grep) regexes are different to Emacs regexes which are different to Ruby regexes. grep even provides too different regex languages! Ruby and perl regexes are very similar.

It shouldn't have to be this hair pulling!

It isn't. Really. Do what I suggest and you will slowly find regexes are really a very fun and powerful way of doing things.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

"The notes I handle no better than many pianists. But the pauses
  between the notes -
  ah, that is where the art resides!' - Artur Schnabel

···

On Tue, 18 Jan 2005, trans. (T. Onoma) wrote:

Zach Dennis wrote:

trans. (T. Onoma) wrote:

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't care if it's because I stupid and suck at it. It shouldn't have to be this hair pulling! Anyway... Can some one please give the regular expression to match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This [b]is[b.] a test.
[Hello.]
EOS

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Almost forgot, $1 is the match you are looking for.

···

Zach

Douglas Livingstone wrote:

I think that this is what you need: /\[[\w]+\]/

Ah, this is nicer and shorter then mine... I think I will use this one to. =)

Zach

Thanks. I _see_ now why mine wasn't working, though I don't _understand_ why
it wasn't working. I was using the / /x extension, because I generally like
to space the parts my regexps out to read easier, but for some reason that
causes the above to match [b] instead. Oh well, I just won't do that.

Thanks All for your responses!
T.

···

On Monday 17 January 2005 04:26 pm, Zach Dennis wrote:

trans. (T. Onoma) wrote:
> Let me painfully honest: I hate parsing, especially w/ regexp, and I
> don't care if it's because I stupid and suck at it. It shouldn't have to
> be this hair pulling! Anyway... Can some one please give the regular
> expression to match the first square bracket's contents. In this case it
> would be "Hello".
>
> s = <<-EOS
> [Hello]
> This [b]is[b.] a test.
> [Hello.]
> EOS

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Or:

s =~ /\[.*?\]/

which uses the ? non-greedy modifier to ensure that only the very next
"]" is matched. For example:

str = <<EOT
[this] [is a test]
here are[some]brackets
[brackets ]
no words
no brackets
EOT
    ==>"[this] [is a test]\nhere are[some]brackets\n[brackets ]\n no
words\nno brackets\n"

str.each{|line| p line.scan(/\[.*?\]/)}
["[this]", "[is a test]"]
["[some]"]
["[brackets ]"]
[""]

cheers,
Mark

···

On Tue, 18 Jan 2005 06:26:08 +0900, Zach Dennis <zdennis@mktec.com> wrote:

trans. (T. Onoma) wrote:
> Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
> care if it's because I stupid and suck at it. It shouldn't have to be this
> hair pulling! Anyway... Can some one please give the regular expression to
> match the first square bracket's contents. In this case it would be "Hello".
>
> s = <<-EOS
> [Hello]
> This [b]is[b.] a test.
> [Hello.]
> EOS

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Ah this is my major problem. I tend to write whole chunks of code at once and
then go back and tweak to perfection. Not always the best way to go. And
regexp is a perfect example of when not to do this.

Thanks. That lesson will surely help a great deal.

T.

···

On Monday 17 January 2005 06:33 pm, John Carter wrote:

Grow the regex slowly. Start with the smallest thing, make it match.

If you immediately write down a large regex, odds on it will match
nothing.

Hi,

···

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:

I think that this is what you need: /\[[\w]+\]/

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

Bertram

--
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

Regex-coach is also quite helpful here.
There is a windows and a linux version.
<http://www.weitz.de/regex-coach/&gt;

···

On Tue, 18 Jan 2005 06:26:35 +0900, Douglas Livingstone <rampant@gmail.com> wrote:

I think that this is what you need: /\[[\w]+\]/

This little application might help you (not sure if it is 100% Ruby
compatible, but may be a start) called TestRexp, which you can get
here: http://regexpstudio.com/RegExpStudio.html

--
Kristof

Oops scratch that. That's not the reason either (sigh). But I got it working
now anyway. Thanks.

T.

···

On Monday 17 January 2005 04:40 pm, trans. (T. Onoma) wrote:

On Monday 17 January 2005 04:26 pm, Zach Dennis wrote:
> trans. (T. Onoma) wrote:
> > Let me painfully honest: I hate parsing, especially w/ regexp, and I
> > don't care if it's because I stupid and suck at it. It shouldn't have
> > to be this hair pulling! Anyway... Can some one please give the regular
> > expression to match the first square bracket's contents. In this case
> > it would be "Hello".
> >
> > s = <<-EOS
> > [Hello]
> > This [b]is[b.] a test.
> > [Hello.]
> > EOS
>
> The trick here is to make sure you are non-greedy.
>
> s =~ /\[([^\]]*)\]/

Thanks. I _see_ now why mine wasn't working, though I don't _understand_
why it wasn't working. I was using the / /x extension, because I generally
like to space the parts my regexps out to read easier, but for some reason
that causes the above to match [b] instead. Oh well, I just won't do that.

And I was thinking "ooh, Zach's looks like a better way to do it" :slight_smile:

Douglas

···

On Tue, 18 Jan 2005 06:28:37 +0900, Zach Dennis <zdennis@mktec.com> wrote:

Ah, this is nicer and shorter then mine... I think I will use this one
to. =)

Thanks. I _see_ now why mine wasn't working, though I don't
_understand_ why it wasn't working. I was using the / /x extension,
because I generally like to space the parts my regexps out to
read easier, but for some reason that causes the above to
match [b] instead. Oh well, I just won't do that.

It has todo with the pattern matching being greedy, not the /x flag.
your pattern will match a '[' then as many characters as possible -
including ']' - until a final closing ']'.
There are two solutions:
1. As shown, match any non ']'.
2. Make the match non greedy: %r{ \[(.+?)\] }x

HTH,
Assaph
ps. If you want all occurences in the string, use string#scan instead
of String#match.

Bertram Scharpf wrote:

Hi,

I think that this is what you need: /\[[\w]+\]/

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

In a regular expression squares brackets represent a character class. A charcter class looks for one character matching any of the characters that make up that character class. Say you are looking for the words "fix" or "fox" in sentence.

You could write:

/f(i|o)x/

or you could write:

/f[io]x/

You can also negate a character class, and match anything that is NOT in the character class. You do this by starting your character class with a carrot ^

Say you wanted to find anything f-x, but not "fox"

/f[^o]x/

this will find "fix", "fex", "fux", "fgx", etc.. but not "fox".

In the regular expression: /\[[\w]+\]/

\[ = you are looking for a literal left square bracket
[\w]+ = you are looking for a character class with any word character
   one or more times
\] = you are looking for a closing right square bracket

This will find the "fix" in the sentence "This is a [fix]", but this regular expression will fail if you do "This is a [ fix ]", because the spaces before the "f" and after the "x" are not considered word characters. A better regular expression is (sorry Doug, I"m taking it back, I like mine better now):

/\[([^\]]*)\]/

which will match anything inside of square brackets. This will match:

"This is a [fix]" $1 will equal "fix"
"This is a [ fix ]" $1 will equal " fix "
"This is a [ *sentence inside of a fix* ]" $1 will equal " *sentence inside of a fix* "

I hope this was helpful.

Zach

···

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:

Bertram Scharpf wrote:

Hi,

I think that this is what you need: /\[[\w]+\]/

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

Almost forgot to hit up your question...

/\[[\w]+\]/

and

/\[\w+\]/

are basically the same since \w covers a whole character class of word characters.

Zach

···

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:

Thanks Assaph,

I had an escape character match in the regexp:

  / [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

  / (?=[^`]) \[(.+?)\] /x

And that did the trick.

Just one of those things were you just over look what you think you know to
the point of seizure :wink:

T.

···

On Monday 17 January 2005 04:51 pm, Assaph Mehr wrote:

> Thanks. I _see_ now why mine wasn't working, though I don't
> _understand_ why it wasn't working. I was using the / /x extension,
> because I generally like to space the parts my regexps out to
> read easier, but for some reason that causes the above to
> match [b] instead. Oh well, I just won't do that.

It has todo with the pattern matching being greedy, not the /x flag.
your pattern will match a '[' then as many characters as possible -
including ']' - until a final closing ']'.
There are two solutions:
1. As shown, match any non ']'.
2. Make the match non greedy: %r{ \[(.+?)\] }x

HTH,
Assaph
ps. If you want all occurences in the string, use string#scan instead
of String#match.

Shortcuts like \w define character classes, so the brackets are not needed, as the other poster hinted at. :wink:

  \w+ and [\w]+ are identical

You can put them in classed if you want, mainly to add to them:

  [\w']+ match word and ' characters

Hope that helps.

James Edward Gray II

···

On Jan 18, 2005, at 11:29 AM, Zach Dennis wrote:

[\w]+ = you are looking for a character class with any word character
   one or more times

Hmm... I wonder if he wants to match that much, what about this input text:

"I'm using square brackest in normal text [heh, this is [b]fun![b.]]"

What do you want to match? "[heh, this is [b]" or "[b]"?

Playing on your one, perhaps this is what is needed: /\[([^\[\]\s]*)\]/

Douglas

···

On Wed, 19 Jan 2005 02:29:37 +0900, Zach Dennis <zdennis@mktec.com> wrote:

which will match anything inside of square brackets. This will match:

"This is a [fix]" $1 will equal "fix"
"This is a [ fix ]" $1 will equal " fix "
"This is a [ *sentence inside of a fix* ]" $1 will equal " *sentence
inside of a fix* "