Hi all,
I was just wondering… Is there any place where Ruby’s Regex
capabilities are described?
E.g. it seems that /\w{3}/ matches at least three consecutive
characters,
but I do not seem to be able to locate any exact documentation on
this.
Any idea’s?
Pickaxe is your best friend (ideally, the hardcopy). Some information
may be
found here:
Programming Ruby: The Pragmatic Programmer's Guide
Scroll down to “Regular Expressions” sections.
Gennady.
There is certainly a lot of information there, but I have the feeling
that there are things not discussed. My example “r {m}” is not mentioned
as such, but works anyway.
It IS mentioned there, as well as answeres to ALL of your other questions
(see also Programming Ruby: The Pragmatic Programmer's Guide ,
scroll down to “Regular Expression Options” and “Regular Expression
Patterns”).
Here’s a cut-and-paste subsection of “Regular Expressions” section at
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_stdtypes.html
Repetition
When we specified the pattern that split the song list line, /\s*|\s*/, we
said we wanted to match a vertical bar surrounded by an arbitrary amount of
whitespace. We now know that the \s sequences match a single whitespace
character, so it seems likely that the asterisks somehow mean ``an arbitrary
amount.‘’ In fact, the asterisk is one of a number of modifiers that allow
you to match multiple occurrences of a pattern.
If r stands for the immediately preceding regular expression within a
pattern, then:
r * matches zero or more occurrences of r.
r + matches one or more occurrences of r.
r ? matches zero or one occurrence of r.
r {m,n} matches at least ``m'' and at most ``n'' occurrences of r.
r {m,} matches at least ``m'' occurrences of r.
These repetition constructs have a high precedence—they bind only to the
immediately preceding regular expression in the pattern. /ab+/ matches an
a'' followed by one or more
b’‘s, not a sequence of ab''s. You have to be careful with the * construct too---the pattern /a*/ will match any string; every string has zero or more
a’'s.
A cut-and-paste from
Programming Ruby: The Pragmatic Programmer's Guide :
Regular Expression Options
A regular expression may include one or more options that modify the way the
pattern matches strings. If you’re using literals to create the Regexp
object, then the options comprise one or more characters placed immediately
after the terminator. If you’re using Regexp.new, the options are constants
used as the second parameter of the constructor.
i Case Insensitive. The pattern match will ignore the case of letters
in the pattern and string. Matches are also case-insensitive if the global
variable $= is set.
o Substitute Once. Any #{…} substitutions in a particular regular
expression literal will be performed just once, the first time it is
evaluated. Otherwise, the substitutions will be performed every time the
literal generates a Regexp object.
m Multiline Mode. Normally, .'' matches any character except a newline. With the /m option,
.‘’ matches any character.
x Extended Mode. Complex regular expressions can be difficult to
read. The `x’ option allows you to insert spaces, newlines, and comments in
the pattern to make it more readable.
···
----- Original Message -----
From: “Ronald Pijnacker” rhp@dse.nl
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Cc: “Ronald Pijnacker” rhp@dse.nl
Sent: Tuesday, August 26, 2003 12:15 AM
Subject: Re: Specification of Ruby regex?