I have been working my way through a ruby book (Beginning Ruby) and I
want to extend on an interesting capability dealing with hashes.
the code:
text=''
line_count=0
File.open("txt.txt").each do |line|
line_count +=1
text << line
end
puts "#{line_count} lines"
total_charachters=text.length
puts "#{total_charachters} charachters"
sentence_count=text.split(/\.|\?|!/).length
total_characters_no_spaces=text.gsub(/\s+/,"").length
puts "#{total_characters_no_spaces} without spaces"
word_count=text.split.length
puts "#{word_count} words in the text and #{sentence_count} sentences"
paragraph_count= text.split(/\n\n/).length
puts "#{paragraph_count} paragraphs"
puts "#{sentence_count/paragraph_count} sentences per paragraph on
avarage"
puts "#{word_count/sentence_count} words per sentence"
stop_words= %w{a the by on for of are with just but and to the my has
some in}
words=text.scan(/\w+/)
keywords=words.select{|word| !stop_words.include?(word)}
puts "#{((keywords.length.to_f/words.length.to_f)*100).to_i}% non stop
words"
this has been a fun code, and I have been running various text files
through it.
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
On Thu, Jun 18, 2009 at 10:40 PM, Steven Demonnin <tooltime@uncletoby.net>wrote:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
--
Posted via http://www.ruby-forum.com/\.
That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
···
Steven Demonnin <tooltime@uncletoby.net> wrote:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).
Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn't put a third value in an
array.
It's more that you sort the hash into an array of two-element arrays,
and then sort that array. Iterating through an array of two-element
arrays is similar to iterating through a hash, in the sense that each
iteration yields two values.
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).
IIRC the insertion order is maintained correctly for literals and Hash
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.
Cheers
Robert
···
On Fri, Jun 19, 2009 at 3:09 PM, David A. Black<dblack@rubypal.com> wrote:
On Fri, Jun 19, 2009 at 3:09 PM, David A. Black<dblack@rubypal.com> wrote:
Hi --
On Fri, 19 Jun 2009, Matt Neuburg wrote:
Steven Demonnin <tooltime@uncletoby.net> wrote:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).
IIRC the insertion order is maintained correctly for literals and Hash
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.
You'd want to throw in a flatten(1) to unwrap the inner arrays: