Regexp character classes and +

(Mark Volkmann) #1

On page 71 of Pickaxe 2, discussing character classes, it says "The
significance of special regular expression characters (including +) is
turned off inside the brackets."

Why is this a valid regular expression (+ not the first character in
the character class)

%r{^[-+*/]$}

but this isn't? (+ is the first character in the character class)

%r{^[+-*/]$}

···

--
R. Mark Volkmann
Partner, Object Computing, Inc.

(Joel VanderWerf) #2

Mark Volkmann wrote:

On page 71 of Pickaxe 2, discussing character classes, it says "The
significance of special regular expression characters (including +) is
turned off inside the brackets."

Why is this a valid regular expression (+ not the first character in
the character class)

%r{^[-+*/]$}

but this isn't? (+ is the first character in the character class)

%r{^[+-*/]$}

The - character denotes a range in a regexp.

irb(main):001:0> "abc" =~ /[a-c]{3,3}/
=> 0

Putting - at the beginning of the bracket expression makes it literal.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

(Jim Weirich) #3

Mark Volkmann said:

Why is this a valid regular expression (+ not the first character in
the character class)

%r{^[-+*/]$}

but this isn't? (+ is the first character in the character class)

%r{^[+-*/]$}

Its not the '+' that is getting you ... its the '-'.

Inside brackets, the '-' means a range (e.g. 'A-Z' for the characters A
through Z). But the range "+" (43) through "*" (42) is bad.

Inside brackets, the '-' loses its special meaning when it is the first
(or last) character.

···

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

(Nikolai Weibull) #4

Joel VanderWerf wrote:

The - character denotes a range in a regexp.

Putting - at the beginning of the bracket expression makes it literal.

There's always the all-encompassing (?) backslash escape: /[\-]/,
        nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}