#split vs. #length. Different returns

I am wondering why these two lines of code at the bottom, which seem to
say the same thing, produce different results.

text is simply a long string.

words = text.scan(/\w+/)

stop_words = %w{the a by on for of are with just but and to the my I has
some in}
key_words = text.split{/\w../}.select{|word| !stop_words.include?(word)}

# This line of code results in a higher percentrage of key words to stop
words 76.58%
key_words_to_stop_words = ((key_words.length.to_f /
text.split{/\w../}.count.to_f) * 100)
# This line has been rendered as a comment, but produces 75.13% when run
through ruby
# key_words_to_stop_words = ((key_words.length.to_f/ words.length.to_f)
* 100)

puts "#{key_words_to_stop_words} % of key words."

···

--
Posted via http://www.ruby-forum.com/.

String#split doesn't receive a block to specify where to split. So

text.split {/\w../} is the same as text.split, which splits the text
by whitespace.

1.9.2p290 :008 > text = "one, two.three four five"
=> "one, two.three four five"
1.9.2p290 :009 > text.scan(/\w+/)
=> ["one", "two", "three", "four", "five"]
1.9.2p290 :010 > text.split
=> ["one,", "two.three", "four", "five"]

Jesus.

···

On Fri, Feb 8, 2013 at 7:48 AM, Tom Stut <lists@ruby-forum.com> wrote:

I am wondering why these two lines of code at the bottom, which seem to
say the same thing, produce different results.

text is simply a long string.

words = text.scan(/\w+/)

stop_words = %w{the a by on for of are with just but and to the my I has
some in}
key_words = text.split{/\w../}.select{|word| !stop_words.include?(word)}

# This line of code results in a higher percentrage of key words to stop
words 76.58%
key_words_to_stop_words = ((key_words.length.to_f /
text.split{/\w../}.count.to_f) * 100)
# This line has been rendered as a comment, but produces 75.13% when run
through ruby
# key_words_to_stop_words = ((key_words.length.to_f/ words.length.to_f)
* 100)

puts "#{key_words_to_stop_words} % of key words."

This:

words = text.scan(/\w+/)

"Now is the winter of our discontent".scan(/\w+/)
=> ["Now", "is", "the", "winter", "of", "our", "discontent"]

is not the same as this:

text.split(/\w../)

"Now is the winter of our discontent".split(/\w../)
=> ["", " ", "", " ", "", " ", "", " ", "", "", "t"]