From: Selvag Ruby <lists@ruby-forum.com>
To: ruby-talk@ruby-lang.org
Date: 01/27/2014 07:17 AM
Subject: Regexp in Ruby
Sent by: "ruby-talk" <ruby-talk-bounces@ruby-lang.org>
Ruby Regular Expression
1.How to extract the words which are exist with symbols like
punctuation.
For eg:
str='Now, I am Ruby programmer. here, how to list the word which is
exist with symbol like punctuation, comma, or
exclamation.'
that Output may be as:
=>['Now,','programmer.','here,','punctuation,','comma,','exclamation.']
2.from that result, remove the symbol from word and in array. this
result may be as:
=>['Now','programmer','here','punctuation','comma','exclamation']
I'll have a go at explaining, although I'm no expert on Regexp.
str.scan(/(\w+)[[:punct:]]/).flatten
"str" - Take the String object
".scan(" - Execute the method "scan" which is available to String.
"/" - Regexp shorthand. A Regular Expression matches a pattern using
special characters.
"(\w+)" - The parentheses indicate a "capture" group. This means that
you can match a specific subsection of the pattern and extract it. The
"\w" means any character in a word, like a letter, number, or
underscore. The "+" means 1 or more.
"[[:punct:]]" - means match punctuation. According to a quick Googling
this is "-[]\;',./!@#%&*()_{}::"?" Because this is outside the capture
group it is excluded.
"/" - Closes the Regexp.
")" - Closes the "scan" argument.
".flatten" - Turns the nested Array returned by using scan with captures
into a simple Array.
I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
':punct:',
1. it's closed by double square brackets. why this?
2. there are two colons exists. Is there any reason to?
Look up "POSIX bracket expressions". I'm not 100% sure, but I think the
reason for the second outer brackets is that something like [:punct:] is
a group, and therefore belongs inside a group, just like like [aeiuo]
On Jan 28, 2014, at 7:38 AM, Selvag Ruby <lists@ruby-forum.com> wrote:
I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
':punct:',
1. it's closed by double square brackets. why this?
2. there are two colons exists. Is there any reason to?