The easiest way to separate substrings from a line string

Sarawut_Poaitwinyu · 9 July 2009 10:12

I have working on tag system and There is information like this

"important urgent project 2009"

i want to seperate into word, one word at the time is okay

What is the easiest way to separate those string word by word, space
between each word might has more than 1.

My idea is to use loop to check character by character that it is space
or not, and then cut the part, but i thought it might have easier way
that i don't know

Thank you in advance

···

--
Posted via http://www.ruby-forum.com/.

David_A_Black1 · 9 July 2009 10:16

Hi --

···

On Thu, 9 Jul 2009, Sarawut Poaitwinyu wrote:

I have working on tag system and There is information like this

"important urgent project 2009"

i want to seperate into word, one word at the time is okay

What is the easiest way to separate those string word by word, space
between each word might has more than 1.

My idea is to use loop to check character by character that it is space
or not, and then cut the part, but i thought it might have easier way
that i don't know

words = string.split

When you call split with no argument, it splits on whitespace
(including more than one character).

David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
Training! Intro to Ruby, with Black & Kastner, September 14-17
(More info: http://rubyurl.com/vmzN\)

Sarawut_Poaitwinyu · 9 July 2009 10:21

David A. Black wrote:

Hi --

or not, and then cut the part, but i thought it might have easier way
that i don't know

words = string.split

When you call split with no argument, it splits on whitespace
(including more than one character).

David

Thank you , i will try

···

On Thu, 9 Jul 2009, Sarawut Poaitwinyu wrote:

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 9 July 2009 16:06

I am more like the "positive" guy - meaning explicitly defining what I
want returned. I would do

words = string.scan /\w+/

That way dot, question mark and other signs won't hurt. It may not
make a difference but it's probably good to see different approaches.

Kind regards

robert

···

2009/7/9 David A. Black <dblack@rubypal.com>:

Hi --

On Thu, 9 Jul 2009, Sarawut Poaitwinyu wrote:

I have working on tag system and There is information like this

"important urgent project 2009"

i want to seperate into word, one word at the time is okay

What is the easiest way to separate those string word by word, space
between each word might has more than 1.

My idea is to use loop to check character by character that it is space
or not, and then cut the part, but i thought it might have easier way
that i don't know

words = string.split

When you call split with no argument, it splits on whitespace
(including more than one character).

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

David_A_Black1 · 9 July 2009 16:50

Hi --

···

On Fri, 10 Jul 2009, Robert Klemme wrote:

2009/7/9 David A. Black <dblack@rubypal.com>:

Hi --

On Thu, 9 Jul 2009, Sarawut Poaitwinyu wrote:

I have working on tag system and There is information like this

"important urgent project 2009"

i want to seperate into word, one word at the time is okay

What is the easiest way to separate those string word by word, space
between each word might has more than 1.

My idea is to use loop to check character by character that it is space
or not, and then cut the part, but i thought it might have easier way
that i don't know

words = string.split

When you call split with no argument, it splits on whitespace
(including more than one character).

I am more like the "positive" guy - meaning explicitly defining what I
want returned. I would do

words = string.scan /\w+/

That way dot, question mark and other signs won't hurt. It may not
make a difference but it's probably good to see different approaches.

string.split does explicitly define what I want back; it's just
something different from what you want back It depends exactly how
you define "word". I was assuming it was /\S+/ but it may indeed be
/\w+/ (or maybe /[^\W\d_]+/ or something).

David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
Training! Intro to Ruby, with Black & Kastner, September 14-17
(More info: http://rubyurl.com/vmzN\)

Robert_K1 · 9 July 2009 20:40

words = string.split

When you call split with no argument, it splits on whitespace
(including more than one character).

I am more like the "positive" guy - meaning explicitly defining what I
want returned. I would do

words = string.scan /\w+/

That way dot, question mark and other signs won't hurt. It may not
make a difference but it's probably good to see different approaches.

string.split does explicitly define what I want back; it's just
something different from what you want back

That's true. I just wanted to make the point that there are these two
major approaches: define positively what you want in your result or
define it ex negativo, i.e. state what you want to use as separator.

The whole point is that both approaches may behave identical with the
original set of test data but will exhibit different behavior as soon
as the input changes. If you use #split, you might get something you
did not want in the first place. With #scan you won't notice - which
could be bad as well.

The super safe variant would be to first do a match on the whole
string to ensure it does contain expected data only and fail if not.
After that it does not matter any more what extraction method one
uses.

It depends exactly how
you define "word". I was assuming it was /\S+/ but it may indeed be
/\w+/ (or maybe /[^\W\d_]+/ or something).

Absolutely.

Kind regards

robert

···

2009/7/9 David A. Black <dblack@rubypal.com>:

On Fri, 10 Jul 2009, Robert Klemme wrote:

2009/7/9 David A. Black <dblack@rubypal.com>:

On Thu, 9 Jul 2009, Sarawut Poaitwinyu wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Dave_Burt2 · 13 July 2009 03:10

Moving off-topic from the thread, but David Black wrote:

... /\w+/ (or maybe /[^\W\d_]+/ or something).

Do people actually say /[^\W\d_]/ instead of /[a-z]/i? The latter is
much easier for me to read. Does the former include non-latin
word-characters?

Cheers,
Dave Burt

Sarawut_Poaitwinyu · 13 July 2009 01:47

Thank you for everyone again, it seems that you guys discussed sort of
regular expression that i didn't understand but thank you for it anyone,
i will try to research about it later

···

--
Posted via http://www.ruby-forum.com/.

David_A_Black1 · 13 July 2009 09:48

Hi --

Moving off-topic from the thread, but David Black wrote:

... /\w+/ (or maybe /[^\W\d_]+/ or something).

Do people actually say /[^\W\d_]/ instead of /[a-z]/i? The latter is
much easier for me to read. Does the former include non-latin
word-characters?

Yes:

s = "\u00e9"

=> "é"

/\w/.match(s)

=> #<MatchData "é">

/[a-z]/.match(s)

=> nil

Another choice would be:

/[[:alpha:]]/

which I believe would do the same thing that my character class did
(unless there are some exclusions, inclusions, or edge cases I'm not
remembering).

David

···

On Mon, 13 Jul 2009, Dave Burt wrote:

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
Training! Intro to Ruby, with Black & Kastner, September 14-17
(More info: http://rubyurl.com/vmzN\)

Topic		Replies	Views
Strings .each method ruby-talk	11	68	4 February 2008
Splitting a string into words using multiple possible separators ruby-talk	3	103	20 April 2007
Read individual words from input? ruby-talk	4	105	23 June 2007
Regular expressions help ruby-talk	15	86	13 July 2008
Array question ruby-talk	4	99	27 October 2009

The easiest way to separate substrings from a line string

Related topics