Regex bug?

irb(main):008:0> "[" =~ /[A-z]/
=> 0
irb(main):009:0> RUBY_VERSION
=> "2.0.0"

The grouping [A-z] should not match the '[' character, but it does. Is
this known? I found no mention of it looking through the bug tracker but
may have missed something.

Gavin

Woah this is one funny joke material to impress my friends =)

Don't think I'd call it a joke, just something that's slightly counter-intuitive (and a common rookie mistake -- saw this elsewhere just last week). To quote Paul Simon, "The answer is easy if you take it logically." The set [A-z] does *NOT* mean "All upper and lowercase letters." It means "All the ASCII characters between 'A' and 'z'." A simple "man ascii" (at least, in Linux) shows us this (see the far-right column):

[...]
        030 24 18 CAN (cancel) 130 88 58 X
        031 25 19 EM (end of medium) 131 89 59 Y
        032 26 1A SUB (substitute) 132 90 5A Z
        033 27 1B ESC (escape) 133 91 5B [
        034 28 1C FS (file separator) 134 92 5C \ '\\'
        035 29 1D GS (group separator) 135 93 5D ]
        036 30 1E RS (record separator) 136 94 5E ^
        037 31 1F US (unit separator) 137 95 5F _
        040 32 20 SPACE 140 96 60 `
        041 33 21 ! 141 97 61 a
        042 34 22 " 142 98 62 b
        043 35 23 # 143 99 63 c
[...]

Which is why I always, always, always use [A-Za-z] when matching for alpha, though you can also use [:alpha:] (as per POSIX), though I'm just used to the A-Za-z bit.

-Ken

···

On 2015-01-19 23:17, Rick Daniel wrote:

Thank you,

RICK DANIEL
+62 857 6077 8775

http://araishikeiwai.com [2]

On Tue, Jan 20, 2015 at 10:24 AM, Matthew Kerwin > <matthew@kerwin.net.au> wrote:

irb(main):001:0> 'A'.ord
=> 65
irb(main):002:0> 'z'.ord
=> 122
irb(main):003:0> '['.ord
=> 91

So '[' is definitely between 'A' and 'z'

On 20 January 2015 at 13:16, Gavin Sinclair <gsinclair@gmail.com> >> wrote:

irb(main):008:0> "[" =~ /[A-z]/
=> 0
irb(main):009:0> RUBY_VERSION
=> "2.0.0"

The grouping [A-z] should not match the '[' character, but it
does. Is this known? I found no mention of it looking through
the bug tracker but may have missed something.

Gavin

--

Matthew Kerwin
http://matthew.kerwin.net.au/ [1]

Links:
------
[1] http://matthew.kerwin.net.au/
[2] http://araishikeiwai.com/

Also as a sidenote: It's usually beneficial to use `[:alpha:]' instead of ASCII ranges, unless you really mean just those 26 characters, since [:alpha:] still works internationally:

  'æ' =~ /[[:alpha:]]/ #=> 0
  'æ' =~ /[a-z]/ #=> nil (obviously)

···

On Tue, 20 Jan 2015 00:29:28 -0500 Ken D'Ambrosio <ken@jots.org> wrote:

On 2015-01-19 23:17, Rick Daniel wrote:
> Woah this is one funny joke material to impress my friends =)

Don't think I'd call it a joke, just something that's slightly
counter-intuitive (and a common rookie mistake -- saw this elsewhere
just last week). To quote Paul Simon, "The answer is easy if you take
it logically." The set [A-z] does *NOT* mean "All upper and lowercase
letters." It means "All the ASCII characters between 'A' and 'z'." A
simple "man ascii" (at least, in Linux) shows us this (see the far-right
column):

[...]
        030 24 18 CAN (cancel) 130 88 58 X
        031 25 19 EM (end of medium) 131 89 59 Y
        032 26 1A SUB (substitute) 132 90 5A Z
        033 27 1B ESC (escape) 133 91 5B [
        034 28 1C FS (file separator) 134 92 5C \
  '\\'
        035 29 1D GS (group separator) 135 93 5D ]
        036 30 1E RS (record separator) 136 94 5E ^
        037 31 1F US (unit separator) 137 95 5F _
        040 32 20 SPACE 140 96 60 `
        041 33 21 ! 141 97 61 a
        042 34 22 " 142 98 62 b
        043 35 23 # 143 99 63 c
[...]

Which is why I always, always, always use [A-Za-z] when matching for
alpha, though you can also use [:alpha:] (as per POSIX), though I'm just
used to the A-Za-z bit.

-Ken

> Thank you,
>
> RICK DANIEL
> +62 857 6077 8775
>
> http://araishikeiwai.com [2]
>
> On Tue, Jan 20, 2015 at 10:24 AM, Matthew Kerwin > > <matthew@kerwin.net.au> wrote:
>
>> irb(main):001:0> 'A'.ord
>> => 65
>> irb(main):002:0> 'z'.ord
>> => 122
>> irb(main):003:0> '['.ord
>> => 91
>>
>> So '[' is definitely between 'A' and 'z'
>>
>> On 20 January 2015 at 13:16, Gavin Sinclair <gsinclair@gmail.com> > >> wrote:
>>
>>> irb(main):008:0> "[" =~ /[A-z]/
>>> => 0
>>> irb(main):009:0> RUBY_VERSION
>>> => "2.0.0"
>>>
>>> The grouping [A-z] should not match the '[' character, but it
>>> does. Is this known? I found no mention of it looking through
>>> the bug tracker but may have missed something.
>>>
>>> Gavin
>>
>> --
>>
>> Matthew Kerwin
>> http://matthew.kerwin.net.au/ [1]
>
>
>
> Links:
> ------
> [1] http://matthew.kerwin.net.au/
> [2] http://araishikeiwai.com/

Woah this is one funny joke material to impress my friends =)

Thank you,
*Rick Daniel*
+62 857 6077 8775
http://araishikeiwai.com

···

On Tue, Jan 20, 2015 at 10:24 AM, Matthew Kerwin <matthew@kerwin.net.au> wrote:

irb(main):001:0> 'A'.ord
=> 65
irb(main):002:0> 'z'.ord
=> 122
irb(main):003:0> '['.ord
=> 91

So '[' is definitely between 'A' and 'z'

On 20 January 2015 at 13:16, Gavin Sinclair <gsinclair@gmail.com> wrote:

irb(main):008:0> "[" =~ /[A-z]/
=> 0
irb(main):009:0> RUBY_VERSION
=> "2.0.0"

The grouping [A-z] should not match the '[' character, but it does. Is
this known? I found no mention of it looking through the bug tracker but
may have missed something.

Gavin

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

irb(main):001:0> 'A'.ord
=> 65
irb(main):002:0> 'z'.ord
=> 122
irb(main):003:0> '['.ord
=> 91

So '[' is definitely between 'A' and 'z'

···

On 20 January 2015 at 13:16, Gavin Sinclair <gsinclair@gmail.com> wrote:

irb(main):008:0> "[" =~ /[A-z]/
=> 0
irb(main):009:0> RUBY_VERSION
=> "2.0.0"

The grouping [A-z] should not match the '[' character, but it does. Is
this known? I found no mention of it looking through the bug tracker but
may have missed something.

Gavin

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

No need to be jerkish. Do you not remember starting out, and stumbling over
things that later seemed obvious? In fact, do you not still stumble over
them sometimes?

···

On 20 January 2015 at 14:17, Rick Daniel <rick@araishikeiwai.com> wrote:

Woah this is one funny joke material to impress my friends =)

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Matthew: while I cannot be sure of Rick's intent, given e-mail's infamous lack of inflection, I always assume genuine unless demonstrably provable otherwise. I think he was just surprised that the regex match didn't work the way one might intuitively suppose it ought, and that it was, therefore, surprising, and perhaps funny.

At least, that's my take.

$.02,

-Ken

···

On 2015-01-20 00:20, Matthew Kerwin wrote:

On 20 January 2015 at 14:17, Rick Daniel <rick@araishikeiwai.com> > wrote:

Woah this is one funny joke material to impress my friends =)

No need to be jerkish. Do you not remember starting out, and stumbling
over things that later seemed obvious? In fact, do you not still
stumble over them sometimes?

The "joke" is not about Gavin's misunderstanding, but about how it can
deceive people that "[" =~ /[A-z]/ does not return nil.
Sorry if I wrote it wrong.

Thank you,
*Rick Daniel*
+62 857 6077 8775
http://araishikeiwai.com

···

On Tue, Jan 20, 2015 at 12:34 PM, Ken D'Ambrosio <ken@jots.org> wrote:

On 2015-01-20 00:20, Matthew Kerwin wrote:

On 20 January 2015 at 14:17, Rick Daniel <rick@araishikeiwai.com> >> wrote:

Woah this is one funny joke material to impress my friends =)

No need to be jerkish. Do you not remember starting out, and stumbling
over things that later seemed obvious? In fact, do you not still
stumble over them sometimes?

Matthew: while I cannot be sure of Rick's intent, given e-mail's infamous
lack of inflection, I always assume genuine unless demonstrably provable
otherwise. I think he was just surprised that the regex match didn't work
the way one might intuitively suppose it ought, and that it was, therefore,
surprising, and perhaps funny.

At least, that's my take.

$.02,

-Ken

Something probably got lost in transmission.

···

On 20 January 2015 at 15:37, Rick Daniel <rick@araishikeiwai.com> wrote:

The "joke" is not about Gavin's misunderstanding, but about how it can
deceive people that "[" =~ /[A-z]/ does not return nil.
Sorry if I wrote it wrong.

--
  Matthew Kerwin
  http://matthew.kerwin.net.au/