I was trying to create a stoplist that is case insensitive. When I run
the code below It includes "In" which I do not want. I was thinking I
could use the .match(/[A-Z,a-z]/) I did use downcase on the string
text, which did work, but I want to leave the string "text" in its
orignal content.
Thanks,
John
text = %q{Los Angeles has some of the nicest weather In the country.}
stopwords = %w{the a by on for of are with just but and to the my in I
has some}
#stopwords = stopwords.match(/[A-Z,a-z]/)
words = text.scan(/\w+/)
keywords = words.select { |word| !stopwords.include?(word) }
I was trying to create a stoplist that is case insensitive. When I run
the code below It includes "In" which I do not want. I was thinking I
could use the .match(/[A-Z,a-z]/) I did use downcase on the string
text, which did work, but I want to leave the string "text" in its
orignal content.
Thanks,
John
text = %q{Los Angeles has some of the nicest weather In the country.}
stopwords = %w{the a by on for of are with just but and to the my in I
has some}
#stopwords = stopwords.match(/[A-Z,a-z]/)
words = text.scan(/\w+/)
keywords = words.select { |word| !stopwords.include?(word) }
puts keywords.join(' ')
You probably figured this out by yourself. Anyway, get the stopwords
array in lowercase, like this:
stopwords.map!{|el| el.downcase}
This gets rid of the disturbing "I" in your stopwords.