Regexp in Ruby

Ruby Regular Expression

1.How to extract the words which are exist with symbols like
punctuation.
For eg:

str='Now, I am Ruby programmer. here, how to list the word which is
exist with symbol like punctuation, comma, or
exclamation.'

that Output may be as:
=>['Now,','programmer.','here,','punctuation,','comma,','exclamation.']

2.from that result, remove the symbol from word and in array. this
result may be as:
=>['Now','programmer','here','punctuation','comma','exclamation']

···

--
Posted via http://www.ruby-forum.com/.

From: Selvag Ruby <lists@ruby-forum.com>
To: ruby-talk@ruby-lang.org
Date: 01/27/2014 07:17 AM
Subject: Regexp in Ruby
Sent by: "ruby-talk" <ruby-talk-bounces@ruby-lang.org>

Ruby Regular Expression

1.How to extract the words which are exist with symbols like
punctuation.
For eg:

str='Now, I am Ruby programmer. here, how to list the word which is
exist with symbol like punctuation, comma, or
exclamation.'

that Output may be as:
=>['Now,','programmer.','here,','punctuation,','comma,','exclamation.']

2.from that result, remove the symbol from word and in array. this
result may be as:
=>['Now','programmer','here','punctuation','comma','exclamation']

--
Posted via http://www.ruby-forum.com/\.

This will work. There may be a cleaner way than that match in the collect.
Just change the punctuation list in the scan to match what you need.

arr = str.scan(/\w+[,.!]/)
arr.collect! { |element|
  /\w+/.match(element).to_s
}
p arr

Michael

You can avoid the collect by using a match group with scan:

str.scan(/(\w+)[[:punct:]]/).flatten

···

--
Posted via http://www.ruby-forum.com/.

Hi,
1. /\w+[^\w\s]/
2. /\w+[\p{P}]/
Try rubular.com for any regex problem

···

On Mon, Jan 27, 2014 at 5:47 PM, Selvag Ruby <lists@ruby-forum.com> wrote:

Ruby Regular Expression

1.How to extract the words which are exist with symbols like
punctuation.
For eg:

str='Now, I am Ruby programmer. here, how to list the word which is
exist with symbol like punctuation, comma, or
exclamation.'

that Output may be as:
=>['Now,','programmer.','here,','punctuation,','comma,','exclamation.']

2.from that result, remove the symbol from word and in array. this
result may be as:
=>['Now','programmer','here','punctuation','comma','exclamation']

--
Posted via http://www.ruby-forum.com/\.

--
--------------------------------
Thanks & Regards
Asif Iquebal Sarkar
Bhubaneswar, Orissa

thanks to all replies. your all codes working well. some codes
impressing me to know more of Ruby. Can you explain this code in order
to understand?

···

--
Posted via http://www.ruby-forum.com/.

I'll have a go at explaining, although I'm no expert on Regexp.

str.scan(/(\w+)[[:punct:]]/).flatten

"str" - Take the String object

".scan(" - Execute the method "scan" which is available to String.

"/" - Regexp shorthand. A Regular Expression matches a pattern using
special characters.

"(\w+)" - The parentheses indicate a "capture" group. This means that
you can match a specific subsection of the pattern and extract it. The
"\w" means any character in a word, like a letter, number, or
underscore. The "+" means 1 or more.

"[[:punct:]]" - means match punctuation. According to a quick Googling
this is "-[]\;',./!@#%&*()_{}::"?" Because this is outside the capture
group it is excluded.

"/" - Closes the Regexp.

")" - Closes the "scan" argument.

".flatten" - Turns the nested Array returned by using scan with captures
into a simple Array.

···

____________________________
str.scan(/\w+(?=[[:punct:]])/)

The difference with this one is that "(?=" excludes the punctuation from
the output.

According to this:
http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm

(?= re) Specifies position using a pattern. Doesn't have a range.

--
Posted via http://www.ruby-forum.com/.

I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
':punct:',
1. it's closed by double square brackets. why this?
2. there are two colons exists. Is there any reason to?

···

--
Posted via http://www.ruby-forum.com/.

Look up "POSIX bracket expressions". I'm not 100% sure, but I think the
reason for the second outer brackets is that something like [:punct:] is
a group, and therefore belongs inside a group, just like like [aeiuo]

···

--
Posted via http://www.ruby-forum.com/.

Dear Selvag Ruby,

Growing on the Joel Pearson idea...

str.scan(/(\w+)[[:punct:]]/).flatten

It can be written like ...

str.scan(/\w+(?=[[:punct:]])/)

Abinoam Jr.

···

On Mon, Jan 27, 2014 at 9:48 AM, Joel Pearson <lists@ruby-forum.com> wrote:

You can avoid the collect by using a match group with scan:

str.scan(/(\w+)[[:punct:]]/).flatten

--
Posted via http://www.ruby-forum.com/\.

One place to start is Class: Regexp (Ruby 2.1.0)

Hope this helps,

Mike

···

On Jan 28, 2014, at 7:38 AM, Selvag Ruby <lists@ruby-forum.com> wrote:

I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
':punct:',
1. it's closed by double square brackets. why this?
2. there are two colons exists. Is there any reason to?

--
Posted via http://www.ruby-forum.com/\.

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.