Need a hash/iteration tutorial...text reading

I have been working my way through a ruby book (Beginning Ruby) and I
want to extend on an interesting capability dealing with hashes.

the code:
text=''
line_count=0
File.open("txt.txt").each do |line|
line_count +=1
text << line
end

puts "#{line_count} lines"

total_charachters=text.length
puts "#{total_charachters} charachters"
sentence_count=text.split(/\.|\?|!/).length
total_characters_no_spaces=text.gsub(/\s+/,"").length
puts "#{total_characters_no_spaces} without spaces"
word_count=text.split.length
puts "#{word_count} words in the text and #{sentence_count} sentences"
paragraph_count= text.split(/\n\n/).length
puts "#{paragraph_count} paragraphs"
puts "#{sentence_count/paragraph_count} sentences per paragraph on
avarage"
puts "#{word_count/sentence_count} words per sentence"
stop_words= %w{a the by on for of are with just but and to the my has
some in}
words=text.scan(/\w+/)
keywords=words.select{|word| !stop_words.include?(word)}
puts "#{((keywords.length.to_f/words.length.to_f)*100).to_i}% non stop
words"

this has been a fun code, and I have been running various text files
through it.

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

···

--
Posted via http://www.ruby-forum.com/.

Super quick and dirty, but should get you started:

words = {}
File.open("txt.txt").each do |line|
  line.split(' ').each { |w| words.has_key?(w) ? words[w] += 1 : words[w] =
1 }
end

words.sort_by { |e| e[1]}.reverse.each { |k, v| puts "#{k}: #{v}"}

···

On Thu, Jun 18, 2009 at 10:40 PM, Steven Demonnin <tooltime@uncletoby.net>wrote:

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
--
Posted via http://www.ruby-forum.com/\.

--
Blog: http://citizen428.net/
Twitter: http://twitter.com/citizen428

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

http://www.apeth.com/ruby/02justenoughruby.html

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

···

Steven Demonnin <tooltime@uncletoby.net> wrote:

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

Matt Neuburg wrote:

···

Steven Demonnin <tooltime@uncletoby.net> wrote:

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

Just Enough Ruby

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn't put a third value in an
array.

thanks for the help. I am going to check out the web page.

(Never knew of Histogram. Learn something new every other day or so.)
--
Posted via http://www.ruby-forum.com/\.

Hi --

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

Just Enough Ruby

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Or, slightly more compact:

   wds = h.sort_by {|x| x[1] }

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).

David

···

On Fri, 19 Jun 2009, Matt Neuburg wrote:

Steven Demonnin <tooltime@uncletoby.net> wrote:

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com

Hi --

···

On Fri, 19 Jun 2009, Steven Demonnin wrote:

Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn't put a third value in an
array.

It's more that you sort the hash into an array of two-element arrays,
and then sort that array. Iterating through an array of two-element
arrays is similar to iterating through a hash, in the sense that each
iteration yields two values.

David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com

Hi --

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

Just Enough Ruby

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Or, slightly more compact:

wds = h.sort_by {|x| x[1] }

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).

IIRC the insertion order is maintained correctly for literals and Hash
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.

Cheers
Robert

···

On Fri, Jun 19, 2009 at 3:09 PM, David A. Black<dblack@rubypal.com> wrote:

On Fri, 19 Jun 2009, Matt Neuburg wrote:

Steven Demonnin <tooltime@uncletoby.net> wrote:

David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com

--
Toutes les grandes personnes ont d’abord été des enfants, mais peu
d’entre elles s’en souviennent.

All adults have been children first, but not many remember.

[Antoine de Saint-Exupéry]

Hi --

···

On Sat, 20 Jun 2009, Robert Dober wrote:

On Fri, Jun 19, 2009 at 3:09 PM, David A. Black<dblack@rubypal.com> wrote:

Hi --

On Fri, 19 Jun 2009, Matt Neuburg wrote:

Steven Demonnin <tooltime@uncletoby.net> wrote:

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

Just Enough Ruby

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Or, slightly more compact:

wds = h.sort_by {|x| x[1] }

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).

IIRC the insertion order is maintained correctly for literals and Hash
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.

You'd want to throw in a flatten(1) to unwrap the inner arrays:

   [*hash.sort_by {...}.flatten(1)]

David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2\)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

Ooops, that was yet another mistake, thx for telling me David.
<snip>
<snip>

IIRC the insertion order is maintained correctly for literals and Hash
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.

You'd want to throw in a flatten(1) to unwrap the inner arrays:

[*hash.sort_by {...}.flatten(1)]

well spotted.
R.

···

On Sat, Jun 20, 2009 at 2:28 PM, David A. Black <dblack@rubypal.com> wrote: