[QUIZ][SOLUTION] Markov Chains (#74)

Hi all,

Here's my solution for the quiz No 74. It generates text with a first order Markov Chain. Because the matrix of a Markov chain for native language tests is sparse, the chain is stored in a hash of hashes for better memory usage.

I attached a tar archive for all the files and the main classes for easy reading.

There are some open issues i think, but i did not had the time to implement
them
- for usage of higher order markov chains one should add a list of previous elements in the method MarkovChain#add_elems(). But this breaks other parts of the code and these parts have to be adapted to this.
- better performance of the MarkovChain#rand() method
- and of course there are a lot of open issues with respect to natural language processing, the code does not produce grammatical valid sentences, no quotes, no comma, etc.

I programmed it to learn Ruby and to improve my coding skills, because i have less than 2,5 weeks of experience with Ruby. I bought the Programming Ruby" book by Dave Thomas et. al. and started to read it two weeks and two days ago and i used it a lot the last 48 hours. I think there are some parts of the code where there are better solutions or could be better expressed in a more "rubyish" way.

I used Java and Perl a lot in the last years and i will use Ruby now instead of Perl. I can remember my experiences with hashes of hashes in Perl. This was awkward and often a pain. And on friday as i started coding for this quiz i suddenly realized that i had none of this issues in Ruby. Wow ! Because i know the functional language Haskell i am used to iterators, map's, lambda's etc. and i tried to use iterators and block in the program, but i think i have to get more experience here in Ruby.

If you have any comments, i will be glad to receive and answer them.

Extract the archive

$ tar xzf joern_dinkla_quiz_74.tar.gz

Call the program and print the command line syntax.

cd net\.dinkla\.ruby\.quiz\.74/lib ruby main-markov-chain.rb --help

Best regards,

Joern Dinkla

joern_dinkla_quiz_74.tar.gz (4.84 KB)

markov-chain.rb (2.38 KB)

story-generator.rb (1.59 KB)

words.rb (1.07 KB)

main-markov-chains.rb (1.78 KB)

···

--
Joern Dinkla, http://www.dinkla.net

[snip]

I programmed it to learn Ruby and to improve my coding skills, because i
have less than 2,5 weeks of experience with Ruby. I bought the

[snip]

You are doing good.

sometimes you use parentesis and other times you don't.
def test() is the same as def test
if(cond) is the same as if cond
.length() is the same as .length

   File.open(filename) do | file |
     file.each_line() do |line|

is the same as

   IO.readlines(filename).each do |line|

def initialize(mc = nil)
   if mc.nil?
     @mc = MarkovChain.new()
   else
     @mc = mc
   end
end

is the same as

def initialize(mc = nil)
  @mc = mc || MarkovChain.new
end

···

On 4/9/06, Joern Dinkla <joern@dinkla.net> wrote:

--
Simon Strandgaard

For debugging statements like:

if ( $VERBOSE )
   puts "Calculating probabilities ..."
   STDOUT.flush()
end

I like to use the $DEBUG variable. You can toggle it on and off with Ruby's -d command-line switch.

The call to flush is also not needed:

puts "Calculating probabilities ..." if $DEBUG

Hope that helps.

James Edward Gray II

···

On Apr 9, 2006, at 8:59 AM, Joern Dinkla wrote:

I programmed it to learn Ruby and to improve my coding skills, because i have less than 2,5 weeks of experience with Ruby. I bought the Programming Ruby" book by Dave Thomas et. al. and started to read it two weeks and two days ago and i used it a lot the last 48 hours. I think there are some parts of the code where there are better solutions or could be better expressed in a more "rubyish" way.

Hi again,

I wrote:

Hi all,

Here's my solution for the quiz No 74. It generates text with a first order Markov Chain. Because the matrix of a Markov chain for native language tests is sparse, the chain is stored in a hash of hashes for better memory usage.

Now i implemented higher order Markov Chains as well. I could not resist :slight_smile:

A first order chain, for example the following one

     @mc3 = MarkovChain.new()
     @mc3.add(1,2)
     @mc3.add(3,4)

is stored as the following hash of hashes

  {1=>{2=>1}, 3=>{4=>1}}

A higher order chain, for example the following

     @mc7 = MarkovChain.new(order = 2)
     @mc7.add_elems([20,21,20,22,21,22,20,23])

is also stored as a hash of hashes, but the keys are arrays

{[22, 21]=>{22=>1}, [22, 20]=>{23=>1}, [20, 21]=>{20=>1}, [21, 22]=>{20=>1}, [20,22]=>{21=>1}, [21,20]=>{22=>1}}

Because i am new to Ruby, i had to learn a few things about Arrays, Objects and dup(). I still do not understand this fully (perhaps it is to late here 00:46 CET), but in the method MarkovChain.add() i need to dup the arrays, otherwise some keys will get overwritten.

In class MarkovChain
   # Add an edge from node a to node b.

joern_dinkla_quiz_74_ver2.tar.gz (5.51 KB)

markov-chain.rb (2.64 KB)

story-generator.rb (1.87 KB)

words.rb (1.06 KB)

pretty-print.rb (678 Bytes)

main-markov-chains.rb (1.92 KB)

···

#
   def add(a, b)
     # this dup is needed, otherwise elements will be overwritten
     a = a.dup() if a.instance_of?(Array)
     @absolute[a] ||= {}
     @absolute[a][b] ||= 0
     @absolute[a][b] += 1
     @sum_edges[a] ||= 0
     @sum_edges[a] += 1
     @dirty[a] = true
   end

I will try to find it out tommorow. By the way, below is an example text i generated for order 3 after i fed 4 books of Chesterton into the chain. Beginning with order 3 i think the text looks like a patchwork or collage of some phrases and sentences of the original texts.

Best regards,

Joern

. Up to a certain point it was a wordless clamour startlingly close, and loud enough to be distinct if each word had not killed the other. No, replied Fisher. The Renaissance nobles of the Tudor time were like that. It is awful, I think it goes far enough! said Flambeau; but my information is fragmentary, and only twelve members of the Central Anarchist Council. The gentleman who has for some time past played, with propriety and general applause, the difficult part of Thursday, has died quite suddenly. Consequently, he was willing to go into Parliament as a yeoman or a gentleman or a Jacobite or an Ancient Briton, I should say that, Crane went on. This also tempted him, that in a more partial and also a more premature fashion, for his voice was colourless and sad. What I did next does matter: I gave him a rather wild stare, March had yet another unexpected emotion, for his guide hailed the man as Hoggs and introduced him as Sir Howard Horne, then introducing his so-called Socialist budget, and prepared to expound it in an interview with so
promising a penman. Harold March was to have the gold of Glengyle. So far the crime seemed clear enough; and while Boyle was looking into it he heard a thud behind him, and then a sudden stillness, as of a stone statue walking. He might have been firebrands. The mutinies simmered down; the men who must have known that particular house to be so accurately inaccurate. But what makes you think it a specimen. Put the same feather with a ribbon and an artificial flower and everyone will think it a wildcat, though it may have been a clever fellow, answered the priest. As they raced along to the gate out of which poured and rolled, not Roman, but very late, and that this, my successful
masquerade, might be of considerable value to the public, placed it here with his own pale blue ones, and said, Do you want me to tell you all you want to guess what he doing, keep behind him. About his life and fortune on the table, and went below the surface in a way that at once puzzled and reassured him. On the oranges was the equally clear and exact description, Finest Brazil nuts, 4d. a lb. M. Brun had a dark, handsome lady, of no little majesty, and rather like a mosaic palace, rent with earthquakes; or like a Dutch tulip garden blown to the stars. The other did not answer; he was free in free London, and drinking

--
Joern Dinkla, http://www.dinkla.net

"Simon Strandgaard" <neoneye@gmail.com> writes:

   IO.readlines(filename).each do |line|

     IO.foreach(filename) do |line|

···

Simon Strandgaard

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

[snip]

I programmed it to learn Ruby and to improve my coding skills, because i
have less than 2,5 weeks of experience with Ruby. I bought the

[snip]

You are doing good.

sometimes you use parentesis and other times you don't.
def test() is the same as def test
if(cond) is the same as if cond
.length() is the same as .length

   File.open(filename) do | file |
     file.each_line() do |line|

is the same as

   IO.readlines(filename).each do |line|

Let's use foreach() there, so we don't slurp a file only to process it line by line:

File.foreach(filename) do |line| ... end

def initialize(mc = nil)
   if mc.nil?
     @mc = MarkovChain.new()
   else
     @mc = mc
   end
end

is the same as

def initialize(mc = nil)
  @mc = mc || MarkovChain.new
end

Why not just set the default correctly?

def initialize( mc = MarkovChain.new )
   @mc = mc
end

James Edward Gray II

···

On Apr 9, 2006, at 2:41 PM, Simon Strandgaard wrote:

On 4/9/06, Joern Dinkla <joern@dinkla.net> wrote:

Hi Simon,

Simon Strandgaard wrote:

[snip]

I programmed it to learn Ruby and to improve my coding skills, because i
have less than 2,5 weeks of experience with Ruby. I bought the

[snip]

You are doing good.

Thank you very much.

sometimes you use parentesis and other times you don't.
def test() is the same as def test
if(cond) is the same as if cond
.length() is the same as .length

Yes, sometimes i forgot them. For me the code is easier to read when i add parenthesis to functions (in definitions and calls). I think this is because i am still a newbie. But if i do not put the parenthesis at the end i do not know if its an accessor to an attribute or a function.

But for me there is the exception if a block follows a function call.

  list.each() do |x|
  ...
  end

simply does not look as good as

  list.each do |x|
  ...
  end

   File.open(filename) do | file |
     file.each_line() do |line|

is the same as

   IO.readlines(filename).each do |line|

I will use the File.foreach method as suggested by Christian Neukirchen.

def initialize(mc = nil)
   if mc.nil?
     @mc = MarkovChain.new()
   else
     @mc = mc
   end
end

is the same as

def initialize(mc = nil)
  @mc = mc || MarkovChain.new
end

Yes, that looks better.

--
Simon Strandgaard

Best regards,

Joern

···

On 4/9/06, Joern Dinkla <joern@dinkla.net> wrote:

--
Joern Dinkla, http://www.dinkla.net

James Edward Gray II wrote:

I programmed it to learn Ruby and to improve my coding skills, because i have less than 2,5 weeks of experience with Ruby. I bought the Programming Ruby" book by Dave Thomas et. al. and started to read it two weeks and two days ago and i used it a lot the last 48 hours. I think there are some parts of the code where there are better solutions or could be better expressed in a more "rubyish" way.

For debugging statements like:

if ( $VERBOSE )
  puts "Calculating probabilities ..."
  STDOUT.flush()
end

I like to use the $DEBUG variable. You can toggle it on and off with Ruby's -d command-line switch.

Thanks for the hint. But i do want a difference between debug output and verbose output. Debug output is for the developer and the verbose output for developers and users, so that the user can observe what is going on.

I did not know about the -d switch. Thanks.

The call to flush is also not needed:

puts "Calculating probabilities ..." if $DEBUG

It's not strictly needed, yes, but when i did the tests i wanted to get the output as fast as possible, because i wanted to observe the performance and if i did not flush the buffer the output printed out at once at the end of the program (after it finished). Maybe this is because i used the Ruby Development Tools for Eclipse. I have to check this if the behaviour is the same on the command line.

Hope that helps.

James Edward Gray II

Best regards,

Joern

···

On Apr 9, 2006, at 8:59 AM, Joern Dinkla wrote:

--
Joern Dinkla, http://www.dinkla.net

aha, thanks

···

On 4/9/06, Christian Neukirchen <chneukirchen@gmail.com> wrote:

"Simon Strandgaard" <neoneye@gmail.com> writes:

> IO.readlines(filename).each do |line|

     IO.foreach(filename) do |line|

--
Simon Strandgaard