Why is this regex invalid?

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
         from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of the space in the square brackets, yet the first regexp is invalid and the 2nd valid.

Why is this?

Thanks,
Dan

When an unescaped - appears in any position but the first or final one inside brackets, it is interpreted as a range separator rather than a literal '-'. Apparently '\+- ' isn't a valid range.

-mental

···

On Thu, 7 Dec 2006 07:38:10 +0900, Daniel Finnie <danfinnie@optonline.net> wrote:

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
         from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of
the space in the square brackets, yet the first regexp is invalid and
the 2nd valid.

Why is this?

It's the placement of the - that makes a difference. In a character
class, - between two characters denotes a range. So the first
character class includes a range, from + to <space>, which is invalid
because + comes after space in the relevant character encoding.

If you want a literal hyphen in a character class, it's safest to make
it the last character.

-A

···

On 12/6/06, Daniel Finnie <danfinnie@optonline.net> wrote:

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
         from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of
the space in the square brackets, yet the first regexp is invalid and
the 2nd valid.

Why is this?

Thanks,
Dan

I'm pretty sure the dash needs to be escaped in regular expressions. The second one works since it is the last character in the character class, and hence isn't defining a range.

/([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.

-Chris Schneider

···

On Dec 6, 2006, at 3:38 PM, Daniel Finnie wrote:

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
        from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of the space in the square brackets, yet the first regexp is invalid and the 2nd valid.

Why is this?

Thanks,
Dan

The hyphen in the middle expression is ambiguous, because it could
either be a range or a literal.
One way is to rearrange the order so that it comes first:
/([0-9]*)([^-+ ]+)(.*)/

···

On 12/6/06, Daniel Finnie <danfinnie@optonline.net> wrote:

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
         from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of
the space in the square brackets, yet the first regexp is invalid and
the 2nd valid.

Daniel Finnie wrote:

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
        from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of the space in the square brackets, yet the first regexp is invalid and the 2nd valid.

Why is this?

Thanks,
Dan

Hi,

'-' shuld be escaped like this.
regex = /([0-9]*)([^\+\- ]+)(.*)/

Jun

Christopher Schneider <cschneid@colostate.edu> writes:

I'm pretty sure the dash needs to be escaped in regular expressions.
The second one works since it is the last character in the character
class, and hence isn't defining a range.

/([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.

Rubbish. In character ranges, \ is not special.

···

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum

> /([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.

Rubbish. In character ranges, \ is not special.

A rubbish upon your rubbish. \ are special pretty much anywhere.
Otherwise how would you match all chars except a ']' ?

irb(main):007:0> /([0-9]*)([^\+\- ]+)(.*)/.match('44dads')[0..-1]
=> ["44dads", "44", "dads", ""]
irb(main):008:0> /([0-9]*)([^\+\- ]+)(.*)/.match('44-dads')[0..-1]
=> ["44-dads", "4", "4", "-dads"]
irb(main):009:0> /([0-9]*)([^\+- ]+)(.*)/.match('44-dads')[0..-1]
SyntaxError: compile error
(irb):9: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
        from (irb):9
irb(main):010:0> /([0-9]*)([^\+\- ]+)(.*)/.match('44\\dads')[0..-1]
=> ["44\\dads", "44", "\\dads", ""]

It most certainly is special:
irb(main):014:0> /[a\+\- ]/=~".\\"
=> nil

···

On Thu, 07 Dec 2006 01:24:30 +0100, David Kastrup wrote:

Christopher Schneider <cschneid@colostate.edu> writes:

I'm pretty sure the dash needs to be escaped in regular expressions.
The second one works since it is the last character in the character
class, and hence isn't defining a range.

/([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.

Rubbish. In character ranges, \ is not special.

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/