Counting words

I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal

Here is a naive implementation:

class String
  def words
    scan(/\b\S+\b/)
  end
end

'this is a sentence with some words'.words
=> ["this", "is", "a", "sentence", "with", "some", "words"]
'this is a sentence with some words'.words.size
=> 7

marcel

···

On Sat, Apr 29, 2006 at 02:43:30AM +0900, Jamal Mazrui wrote:

I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

--
Marcel Molina Jr. <marcel@vernix.org>

I'm a bit of a nuby, and this is my first post to the list, but I
think the following one-liner will do the job:

number_of_words = string.split(/\s/).length

I haven't tested it because I'm at work without access to a Ruby interpreter :(.

···

On 4/28/06, Jamal Mazrui <Jamal.Mazrui@fcc.gov> wrote:

I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal

--
Bira

http://sinfoniaferida.blogspot.com

s.scan(/\w+/).size

···

2006/4/28, Jamal Mazrui <Jamal.Mazrui@fcc.gov>:

I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

--
Have a look: Robert K. | Flickr

One way is like this:

irb(main):020:0> a="This is a test."
=> "This is a test."
irb(main):021:0> a.scan(/\b\S.*?\b/).size
=> 4
irb(main):022:0>

The Regexp in line 21 rewritten in a more readable form is:

a.scan(/
  \b (?# a word boundary )
  \S (?# a character that is not a space )
  .*? (?# maybe (*) some more characters (.), but don't be greedy (?))
  \b (?# a word boundary )
  /x

btw, the Regexp above actually works because of the x at the end, meaning an extended regexp.

Regards,
  JJ

···

On Friday, April 28, 2006, at 04:35PM, Jamal Mazrui <Jamal.Mazrui@fcc.gov> wrote:

I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal

---
Help everyone. If you can't do that, then at least be nice.

Eh, sorry. I meant to write:

number_of_words = string.split(/\s+/).length

The "+" is needed to cover words with more than one whitespace
character between them.

···

On 4/28/06, Bira <u.alberton@gmail.com> wrote:

number_of_words = string.split(/\s/).length

--
Bira

http://sinfoniaferida.blogspot.com

"Marcel Molina Jr." <marcel@vernix.org> writes:

I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Here is a naive implementation:

class String
  def words
    scan(/\b\S+\b/)
  end
end

And quite bit more efficient, memory-wise:

class String
  def count_words
    n = 0
    scan(/\b\S+\b/) { n += 1}
    n
  end
end

Making String#count take regexps would be nice (same for #delete).

···

On Sat, Apr 29, 2006 at 02:43:30AM +0900, Jamal Mazrui wrote:

Marcel Molina Jr. <marcel@vernix.org>

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Bira wrote:

number_of_words = string.split(/\s/).length

Eh, sorry. I meant to write:

number_of_words = string.split(/\s+/).length

The "+" is needed to cover words with more than one whitespace
character between them.

--
Bira
http://compexplicita.blogspot.com
http://sinfoniaferida.blogspot.com

Just plain string.split.length will work as well, and should handle line breaks too:

irb(main):001:0> "these are some words".split.length
=> 4
irb(main):002:0> "these are \n some\nwords".split.length
=> 4
irb(main):003:0> "these are \n some\nwords".split
=> ["these", "are", "some", "words"]
irb(main):004:0>

Hope that helps.

-Justin

···

On 4/28/06, Bira <u.alberton@gmail.com> wrote: