Specification of Ruby regex?

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

E.g. it seems that /\w{3}/ matches at least three consecutive characters,
but I do not seem to be able to locate any exact documentation on this.

Any idea’s?

Ronald.

Pickaxe is your best friend (ideally, the hardcopy). Some information may be
found here:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_stdtypes.html

Scroll down to “Regular Expressions” sections.

Gennady.

···

----- Original Message -----
From: “Ronald Pijnacker” rhp@dse.nl
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Cc: “Ronald Pijnacker” rhp@dse.nl
Sent: Monday, August 25, 2003 7:03 AM
Subject: Specification of Ruby regex?

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

E.g. it seems that /\w{3}/ matches at least three consecutive characters,
but I do not seem to be able to locate any exact documentation on this.

Any idea’s?

Ronald.

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

E.g. it seems that /\w{3}/ matches at least three consecutive characters,
but I do not seem to be able to locate any exact documentation on this.

Any idea’s?

Pickaxe is your best friend (ideally, the hardcopy). Some information may be
found here:
Programming Ruby: The Pragmatic Programmer's Guide

Scroll down to “Regular Expressions” sections.

Gennady.

There is certainly a lot of information there, but I have the feeling
that there are things not discussed. My example “r {m}” is not mentioned
as such, but works anyway.

A better example would have been /o . I have seen it being used, but it
is not documented. If it does what I’ve been told it does, it is good to
know about.

Another is /(?: …)/ . It seems to work, but also is not documented.
Aparently the Pickaxe book is not exhaustive.

As I am currently reading “Mastering Regular Expressions”, I started
wondering what exacly is or is not supported by Ruby.

Ronald.

Ruby’s regular expressions are almost identical to Perl’s. The Master
Regular Expressions book has a table comparing various language’s
support for regular expressions, including Ruby’s. I don’t think Ruby’s
regular expressions have changed since. The Ruby syntax is the same as
other standard implementations, so modifiers, like /o at the end of a
regex or (?: …) for specifying modifiers within a regex are
supported. Jeffrey Friedl notes in his book the more advanced features
that are not generally supported (by Ruby or other languages).

Regards,

Mark

···

On Tuesday, August 26, 2003, at 03:15 AM, Ronald Pijnacker wrote:

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

[snip]

In the paper version of the Pickaxe r{m} is described on
page 61. Extensions such as (?:…) on pp. 209-211.

My online version of the Pickaxe (from ruby-doc.org) documents both as
well. For the extensions, click The Ruby Language in the TOC and scroll
down to the Extensions section. Repetition (r{m}) is documented in
Standard Types.

···

On Tue, 26 Aug 2003 16:15:38 +0900, Ronald Pijnacker wrote:

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

E.g. it seems that /\w{3}/ matches at least three consecutive
characters, but I do not seem to be able to locate any exact
documentation on this.

Any idea’s?

Pickaxe is your best friend (ideally, the hardcopy). Some information
may be found here:
Programming Ruby: The Pragmatic Programmer's Guide

Scroll down to “Regular Expressions” sections.

Gennady.

There is certainly a lot of information there, but I have the feeling that
there are things not discussed. My example “r {m}” is not mentioned as
such, but works anyway.

A better example would have been /o . I have seen it being used, but it is
not documented. If it does what I’ve been told it does, it is good to know
about.

Another is /(?: …)/ . It seems to work, but also is not documented.
Aparently the Pickaxe book is not exhaustive.

As I am currently reading “Mastering Regular Expressions”, I started
wondering what exacly is or is not supported by Ruby.

Ronald.

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

E.g. it seems that /\w{3}/ matches at least three consecutive
characters,
but I do not seem to be able to locate any exact documentation on
this.

Any idea’s?

Pickaxe is your best friend (ideally, the hardcopy). Some information
may be
found here:
Programming Ruby: The Pragmatic Programmer's Guide

Scroll down to “Regular Expressions” sections.

Gennady.

There is certainly a lot of information there, but I have the feeling
that there are things not discussed. My example “r {m}” is not mentioned
as such, but works anyway.

It IS mentioned there, as well as answeres to ALL of your other questions
(see also Programming Ruby: The Pragmatic Programmer's Guide ,
scroll down to “Regular Expression Options” and “Regular Expression
Patterns”).

Here’s a cut-and-paste subsection of “Regular Expressions” section at
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_stdtypes.html

Repetition

When we specified the pattern that split the song list line, /\s*|\s*/, we
said we wanted to match a vertical bar surrounded by an arbitrary amount of
whitespace. We now know that the \s sequences match a single whitespace
character, so it seems likely that the asterisks somehow mean ``an arbitrary
amount.‘’ In fact, the asterisk is one of a number of modifiers that allow
you to match multiple occurrences of a pattern.

If r stands for the immediately preceding regular expression within a
pattern, then:

  r *  matches zero or more occurrences of r.
  r +  matches one or more occurrences of r.
  r ?  matches zero or one occurrence of r.
  r {m,n}  matches at least ``m'' and at most ``n'' occurrences of r.
  r {m,}  matches at least ``m'' occurrences of r.

These repetition constructs have a high precedence—they bind only to the
immediately preceding regular expression in the pattern. /ab+/ matches an
a'' followed by one or more b’‘s, not a sequence of ab''s. You have to be careful with the * construct too---the pattern /a*/ will match any string; every string has zero or more a’'s.

A cut-and-paste from
Programming Ruby: The Pragmatic Programmer's Guide :

Regular Expression Options

A regular expression may include one or more options that modify the way the
pattern matches strings. If you’re using literals to create the Regexp
object, then the options comprise one or more characters placed immediately
after the terminator. If you’re using Regexp.new, the options are constants
used as the second parameter of the constructor.

  i  Case Insensitive. The pattern match will ignore the case of letters

in the pattern and string. Matches are also case-insensitive if the global
variable $= is set.
o Substitute Once. Any #{…} substitutions in a particular regular
expression literal will be performed just once, the first time it is
evaluated. Otherwise, the substitutions will be performed every time the
literal generates a Regexp object.
m Multiline Mode. Normally, .'' matches any character except a newline. With the /m option, .‘’ matches any character.
x Extended Mode. Complex regular expressions can be difficult to
read. The `x’ option allows you to insert spaces, newlines, and comments in
the pattern to make it more readable.

···

----- Original Message -----
From: “Ronald Pijnacker” rhp@dse.nl
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Cc: “Ronald Pijnacker” rhp@dse.nl
Sent: Tuesday, August 26, 2003 12:15 AM
Subject: Re: Specification of Ruby regex?

Hello!

Tim Hunter wrote:

In the paper version of the Pickaxe r{m} is described on
page 61. Extensions such as (?:…) on pp. 209-211.

My online version of the Pickaxe (from ruby-doc.org) documents both as
well. For the extensions, click The Ruby Language in the TOC and scroll
down to the Extensions section. Repetition (r{m}) is documented in
Standard Types.

btw, since there is a thread about that, i wanted to ask:
does ruby support named matches (sorry i don’t know the proper terminology)?
C# does it like this:
“(?\d{4})-(?\d{1,2})-(?\d{1,2})”

matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.

(looked in pickaxe + google ruby “regexp match group”)

emmanuel

···

On Tue, 26 Aug 2003 16:15:38 +0900, Ronald Pijnacker wrote:

Now that we are on it and just out of curiosity is there any particular
reason /m is Perl’s /s?

On the other hand, the interpreter does not complain with /s so looks
like an undocumented (AFAIK) option. If it is public, what’s its
meaning?

– fxn

···

On Tuesday 26 August 2003 18:12, Gennady wrote:

  m  Multiline Mode. Normally, ``.'' matches any character except

a newline. With the /m option, ``.‘’ matches any character.

Except where they are different. The biggest glaring difference is that
^ and $ do not mean “match start of string” and “match end of string”

a.untaint if /[1]+$/ =~ a # WRONG and maybe dangerous
a.untaint if /\A[a-z]+\z/ =~ a # right

Regards,

Brian.

···

On Tue, Aug 26, 2003 at 05:37:12PM +0900, Mark Wilson wrote:

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

[snip]

Ruby’s regular expressions are almost identical to Perl’s.


  1. a-z ↩︎

Hi,

Thanks for all the feedback. Apparently I have to increase my search
capabilities in ProgrammingRuby, because there are things documented
that I could not find :frowning: .

Ronald.

Hi all,

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

E.g. it seems that /\w{3}/ matches at least three consecutive
characters,
but I do not seem to be able to locate any exact documentation on
this.

Any idea’s?

Pickaxe is your best friend (ideally, the hardcopy). Some information
may be
found here:
Programming Ruby: The Pragmatic Programmer's Guide

Scroll down to “Regular Expressions” sections.

Gennady.

There is certainly a lot of information there, but I have the feeling
that there are things not discussed. My example “r {m}” is not mentioned
as such, but works anyway.

It IS mentioned there, as well as answeres to ALL of your other questions
(see also Programming Ruby: The Pragmatic Programmer's Guide ,
scroll down to “Regular Expression Options” and “Regular Expression
Patterns”).

Here’s a cut-and-paste subsection of “Regular Expressions” section at
Programming Ruby: The Pragmatic Programmer's Guide

Repetition

When we specified the pattern that split the song list line, /\s*|\s*/, we
said we wanted to match a vertical bar surrounded by an arbitrary amount of
whitespace. We now know that the \s sequences match a single whitespace
character, so it seems likely that the asterisks somehow mean ``an arbitrary
amount.‘’ In fact, the asterisk is one of a number of modifiers that allow
you to match multiple occurrences of a pattern.

If r stands for the immediately preceding regular expression within a
pattern, then:

  r *  matches zero or more occurrences of r.
  r +  matches one or more occurrences of r.
  r ?  matches zero or one occurrence of r.
  r {m,n}  matches at least ``m'' and at most ``n'' occurrences of r.
  r {m,}  matches at least ``m'' occurrences of r.

These repetition constructs have a high precedence—they bind only to the
immediately preceding regular expression in the pattern. /ab+/ matches an
a'' followed by one or more b’‘s, not a sequence of ab''s. You have to be careful with the * construct too---the pattern /a*/ will match any string; every string has zero or more a’'s.

A cut-and-paste from
Programming Ruby: The Pragmatic Programmer's Guide :

Regular Expression Options

A regular expression may include one or more options that modify the way the
pattern matches strings. If you’re using literals to create the Regexp
object, then the options comprise one or more characters placed immediately
after the terminator. If you’re using Regexp.new, the options are constants
used as the second parameter of the constructor.

  i  Case Insensitive. The pattern match will ignore the case of letters

in the pattern and string. Matches are also case-insensitive if the global
variable $= is set.
o Substitute Once. Any #{…} substitutions in a particular regular
expression literal will be performed just once, the first time it is
evaluated. Otherwise, the substitutions will be performed every time the
literal generates a Regexp object.
m Multiline Mode. Normally, .'' matches any character except a newline. With the /m option, .‘’ matches any character.
x Extended Mode. Complex regular expressions can be difficult to
read. The `x’ option allows you to insert spaces, newlines, and comments in
the pattern to make it more readable.

Ronald Pijnacker Building: QV-106
Medical Imaging IT (MIMIT) Phone: +31 40 27 62 524
Philips Medical Systems, Best Email: Ronald.Pijnacker@best.ms.philips.com

To be fair to Ronald, his specific example is not there:

    r {m}  matches exactly ``m'' occurrences of r.

puts $& if ‘abcdef’ =~ /.{3}/
puts $& if ‘abcdef’ =~ /.{3,}/
puts $& if ‘abcdef’ =~ /.{3,4}/

END
abc
abcdef
abcd

···

“Gennady” gfb@tonesoft.com wrote:

From: “Ronald Pijnacker” rhp@dse.nl

There is certainly a lot of information there, but I have the feeling
that there are things not discussed. My example “r {m}” is not mentioned
as such, but works anyway.

It IS mentioned there, as well as answeres to ALL of your other questions
(see also Programming Ruby: The Pragmatic Programmer's Guide ,
scroll down to “Regular Expression Options” and “Regular Expression
Patterns”).

Here’s a cut-and-paste subsection of “Regular Expressions” section at
Programming Ruby: The Pragmatic Programmer's Guide

  r *  matches zero or more occurrences of r.
  r +  matches one or more occurrences of r.
  r ?  matches zero or one occurrence of r.
  r {m,n}  matches at least ``m'' and at most ``n'' occurrences of r.
  r {m,}  matches at least ``m'' occurrences of r.

I’m 99.99% sure it doesn’t.

Gavin

···

On Tuesday, August 26, 2003, 10:18:24 PM, Emmanuel wrote:

btw, since there is a thread about that, i wanted to ask:
does ruby support named matches (sorry i don’t know the proper terminology)?
C# does it like this:
“(?\d{4})-(?\d{1,2})-(?\d{1,2})”

matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.

(looked in pickaxe + google ruby “regexp match group”)

I started doing this (xregex: http://zem.novylen.net/ruby/index.html),
but stopped because Onigurama now supports this feature. The code’s
usable as-is, though, and I’d be happy to add features if you want to
use it.

martin

···

Emmanuel Touzery emmanuel.touzery@wanadoo.fr wrote:

btw, since there is a thread about that, i wanted to ask:
does ruby support named matches (sorry i don’t know the proper terminology)?
C# does it like this:
“(?\d{4})-(?\d{1,2})-(?\d{1,2})”

matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.

On the other hand, the interpreter does not complain with /s so looks
like an undocumented (AFAIK) option. If it is public, what's its
meaning?

/s means that you want to work with SJIS codeset

Guy Decoux

[Posted and Cc’d]

btw, since there is a thread about that, i wanted to ask:
does ruby support named matches (sorry i don’t know the proper terminology)?

No, but I patched it a couple of years ago to do so.

http://frottage.org/rjp/ruby/revar.html

C# does it like this:
“(?\d{4})-(?\d{1,2})-(?\d{1,2})”

matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.

string = “text string for this”
string =~ /(?[foo]st.*?) /
puts “#{$foo}” # prints “string”

It’s nice, but not overly useful in most circumstances.

···

On Tue, 26 Aug 2003 21:18:24 +0900, Emmanuel Touzery emmanuel.touzery@wanadoo.fr wrote:

Brian Candler wrote:

···

On Tue, Aug 26, 2003 at 05:37:12PM +0900, Mark Wilson wrote:

I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?

[snip]

Ruby’s regular expressions are almost identical to Perl’s.

Except where they are different. The biggest glaring difference is that
^ and $ do not mean “match start of string” and “match end of string”

a.untaint if /[1]+$/ =~ a # WRONG and maybe dangerous
a.untaint if /\A[a-z]+\z/ =~ a # right

Regards,

Brian.

what do ^ and $ mean then? they do match start and end for me. what
else do they match? *shudders at thought of changing lots of code

Michael


  1. a-z ↩︎

Ronald Pijnacker wrote:

Hi,

Thanks for all the feedback. Apparently I have to increase my search
capabilities in ProgrammingRuby, because there are things documented
that I could not find :frowning: .

There’s a reasonably good summary in chapter 1 of The Ruby Way.
Only a page or two as I recall, but it does have one or two items
not in the Pickaxe.

Hal

Good point - I’ve noted the update…

Cheers

Dave

···

On Wednesday, August 27, 2003, at 04:10 AM, Sabby and Tabby wrote:

To be fair to Ronald, his specific example is not there:

Gavin Sinclair wrote:

···

On Tuesday, August 26, 2003, 10:18:24 PM, Emmanuel wrote:

btw, since there is a thread about that, i wanted to ask:
does ruby support named matches (sorry i don’t know the proper terminology)?
C# does it like this:
“(?\d{4})-(?\d{1,2})-(?\d{1,2})”

matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.

(looked in pickaxe + google ruby “regexp match group”)

I’m 99.99% sure it doesn’t.

Gavin

Is this helpful at all?

year, month, day =
/(\d{4})-(\d{1,2})-(\d{1,2})/.match(s).to_a

(where s is the string to be matched)

The latest Oniguruma supports it. I’m not sure how to use/enable that, but
it does support it.

-austin

···

On Tue, 26 Aug 2003 21:28:07 +0900, Gavin Sinclair wrote:

On Tuesday, August 26, 2003, 10:18:24 PM, Emmanuel wrote:

btw, since there is a thread about that, i wanted to ask: does ruby
support named matches (sorry i don’t know the proper terminology)? C#
does it like this: “(?\d{4})-(?\d{1,2})-(?\d{1,2})”
matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.
(looked in pickaxe + google ruby “regexp match group”)
I’m 99.99% sure it doesn’t.


austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2003.08.26
* 14.57.44