[QUIZ] Markov Chains (#74)

[snip]

These can be quite a bit of fun, depending on what text you prime
them with... :wink:

input_text := ruby_talk; (* hehe *)

···

On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote:

--
Simon Strandgaard

Indeed, I just finished my solution (first Ruby Quiz, I've done, yay) and it's all sorts of fun

···

On Apr 7, 2006, at 10:18 AM, James Edward Gray II wrote:

On Apr 7, 2006, at 8:53 AM, Charlie Bowman wrote:

I'm pumped! I've been reading this mailing list for the last 3 months
and I finally feel ready to try out a quiz. This one sounds fun!

I'm glad.

These can be quite a bit of fun, depending on what text you prime them with... :wink:

James Edward Gray II

I think someone needs to take all the explanations from this thread and run
their markov chainer on that.

:smiley: aniel

···

--
Daniel Baird
http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things
That Suck)
[[My webhost uptime is ~ 92%.. if no answer pls call again later!]]

Didn't feel funny, I just wasn't use to having to declare objects (hashes and arrays) as I added them as branches on my initial hash. I had to use a little more discipline, which probably is a good thing coming from Perl-land.

-a

···

On 10.4.2006, at 14:05, James Edward Gray II wrote:

On Apr 9, 2006, at 8:45 AM, Albert Vernon Smith wrote:

Coming from Perl, I often rely upon auto-vivification, so I needed to figure out how to work around this. Perhaps there are improved ways of going about this, and I'd appreciate any feedback on how to go about it better.

Can you show an example using Perl's auto-vivification that feels funny in Ruby? Perhaps we would have better ideas after seeing it...

So, the morale of all this:
- don't use symbols if they have to be converted to a string often
- hash lookups might be slower than you think
- premature optimization...

I used symbols in my solution, but primarily because I was trying to
make the hash itself require as little memory as possible to allow for
vast bodies of input (which I admit I've not really tried yet) and I
figured a hash on symbols would be smaller than a hash on strings.

Tha hash itself will have the same size, because it only stores VALUEs (which are just longs). But each of the string VALUEs will "point" to another "object" on the heap, while the symbol VALUEs don't have an extra "object" attached.

I guess this means that GC is pushed by lots of string objects created
and destroyed during the generation run, but I tended to aim for memory
efficiency over speed for this one, as long as it was running 'fast
enough'.

Yes, memory efficiency was another reason for me to first try the "hash tree of symbols".
But on the other side: the string objects are created anyway (before they are converted to symbols). They can be collected after conversion to symbol, but when you generate the sentence you are again creating many new string objects (Symbol#to_str generates a new string object every time, ruby stores only c-strings for the symbols), which wouldn't be necessary, if you had kept the strings in the first place.
So, yes, symbols are more memory efficient for storing big frequency hashes, but they are slower for generating, so it's a tradeoff.

By the way here are some numbers:

order first final
2 7.380s 1.973s
4 6.279s 2.002s
6 8.031s 1.972s

Those runs are for a 700K text file and 1000 sentences generated. I didn't measure the memory.

Dominik

···

On Mon, 10 Apr 2006 21:20:07 +0200, Ross Bamford <rossrt@roscopeco.co.uk> wrote:

On Tue, 2006-04-11 at 02:15 +0900, Dominik Bathon wrote:

I would like Ruby support of serial ports
for the following three platforms.
#1 Windows
#2 XP
#3 Mac OS X

Please respond offline if you can
show me how to add any or all of these

Gus

rubySPS@oh-bear.com

Banjo players spend half their lives tuning and the other half playing
out of tune.

Banjos are definitely an acquired taste. I think I've figured out why I
love them so much. They are as close instrument to a digital signal as
you can get (other than a drum). The sound is so staccato and you pluck
so many notes in a second that it is very near to being digital :slight_smile:

···

On Sat, 2006-04-08 at 00:41 +0900, Keith Lancaster wrote:

On 4/7/06 10:14 AM, "Charlie Bowman" <charlie@castlebranch.com> wrote:

> or ascii based tab files for us banjo pickers!
>

Banjo? I'm pretty sure he said "music", so I'm not sure why you are bringing
up banjos. :slight_smile:

What's the difference between a banjo and a(n)Š

Chain Saw:

   1. a chain saw has a dynamic range.
   2. you can turn a chain saw off.

   3. South American Macaw: one is loud, obnoxious, and noisy; and the other
is a bird.

   4. Harley Davidson Motorcycle: you can tune a Harley.

   5. Onion: no one cries when you cut up a banjo.

   6. Trampoline: you take your shoes off to jump on a trampoline.

   7. Uzi: an uzi only repeats forty times.

Sorry - couldn't help myself on this one :slight_smile:

Keith

I've been considering collecting all the Ilias posts (filtering out
other people's comments that are quoted, for purity) and seeing what
it prints out. :wink:

Jacob Fugal

···

On 4/7/06, Simon Strandgaard <neoneye@gmail.com> wrote:

On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote:
[snip]
> These can be quite a bit of fun, depending on what text you prime
> them with... :wink:

input_text := ruby_talk; (* hehe *)

I had to try this out :slight_smile: With 500 messages from the past couple of days,
order of 3, I found lots of cryptic wisdom when using a small word
limit, such as:

rubytalk2text.rb (1.1 KB)

···

On Sat, 2006-04-08 at 03:07 +0900, Simon Strandgaard wrote:

On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote:
[snip]
> These can be quite a bit of fun, depending on what text you prime
> them with... :wink:

input_text := ruby_talk; (* hehe *)

===
windows :\ i guess clarity, just isn't a interesting.

thanks for making def say_stuff puts

bereflected to the foo to design a gui by open source path.

in summary, to get wrote: pointers, or care what you mean.

example, c# guys (including even worse - perl.

warning: pointer targets in boese wrote: use an i'll go digging

+0900, james edward gray would help with reading v_mod/1 m = 'm'

to figure this out? tashiro wrote: you need own license, possibly

april fool's joke. v.- interface and doscommand which next implemented
anobjective-c frontend set of programming

I've attached the quick and dirty script I used to get cleaned-up input
text from ruby-talk archive.

This is a seriously cool quiz :slight_smile: Thanks all concerned.

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

Here is my first ruby quiz solution submission. Thanks for posting
this one, it provided a fun break from my school projects.

I tried to make the code as general as possible, with variable order
of both words and letters. It uses a hash of arrays for storage. I was
planning on unifying the MarkovWords and MarkovLetters classes into
one, but decided against it, mostly to keep them simpler. These
examples use an english translation of The Odyssey to derive the text.

  Usage...

[brian@spica] [~/rubyquiz]
$ ruby markov.rb -h
Usage: ruby markov.rb [options] <filename>
  -h, --help show this usage message
  -o, --order set markov chain order
  -s, --sentences set number of sentences to print
  -w, --words set number of words to print
  -l, --letters use letters as the basic unit

  Print 5 sentences of order 2, using words as the base unit...

[brian@spica] [~/rubyquiz]
$ ruby markov.rb -s 5 -o 2 ../etext/dyssy10.txt
My men came out of your wits? If Apollo and the whole world, neither
rich nor poor, who is handsome and clean built, whereas I am so lost
in a beautiful golden ewer, and poured it over with a lie; 'Neptune,'
said I, 'of escaping Charybdis, and at the base of her maids brought
the heifer down with my tears during the darkness of death itself. A
poor unfortunate tramp has come to pass." And Penelope answered,
"Stranger, you must have gone off to bring the contest to an end."
Melanthius lit the fire for her, but she wishes to hear the enchanting
sweetness of their grey hair.

  Print 1 sentence of order 1, using words as the base unit...

[brian@spica] [~/rubyquiz]
$ ruby markov.rb -s 1 -o 1 ../etext/dyssy10.txt
Proserpine had got into a mission to be scandalised at last of them,
and lay a hard task, no more fond of wind blew a man talked and will
leave their sport known that stalks about his wife, and the store for
he stretched out their own house, which I shall be willing to the
house to you, Telemachus.

  Print 50 words of order 3, using letters as the base unit...

[brian@spica] [~/rubyquiz]
$ ruby markov.rb -w 50 -o 3 -l ../etext/dyssy10.txt
the of it into beautiful nose kill as did ther people the wood and him
all the meanthrountry stilltrese ffere peak the the the eleide oly
with blooked ent back the himself over s his hey glarge tood welcomind
where and mannot been spoke aloud valitterince one of his

  Print 50 words of order 2, using letters as the base unit, with text
  from The Aeneid (latin version)...

[brian@spica] [~/rubyquiz]
$ ruby markov.rb -w 50 -o 2 -l ../etext/anidl10.txt
quora desi aras ta acerummarmallenstos es mihinus vade imurbethinc
plia caum turnit que re et sum fuit ade re restaeteri invicaviam
ceuctitur hos aectalisubsillo parmeis suntereffunc meo que
clachilliaeque mul rut moropula sonitotuspiumque terrequebraherautras
nos ad la ciem atque part dubstra repononibus comet ad num undo cisque
retrangeneferat simas obla

[brian@spica] [~/rubyquiz]
$ cat markov.rb
#!/usr/bin/ruby

class MarkovWords

  def initialize(filename, order)
    @order = order
    @words = Hash.new
    @state = Array.new(@order)
    previous = Array.new(@order)
    File.foreach(filename) do |line|
      line.split(/\s+/).each do |word|
        unless previous.include?(nil)
          p = previous.join(' ')
          unless @words.has_key?(p)
            @words[p] = Array.new
          end
          @words[p] << word
        end
        previous.shift
        previous.push(word)
      end
    end
  end

  def print_words(n = 50)
    word = next_word(true)
    1.upto(n) do |i|
      print word, i == n ? "\n" : " "
      word = next_word()
    end
    print "\n"
  end

  # sentences start with a capital or quoted capital and end with
  # punctuation or quoted punctuation
  def print_sentences(n = 5)
    sentences = 0
    word = next_word(true)
    while word !~ /^['"`]?[A-Z]/
      word = next_word()
    end
    begin
      print word
      if word =~ /[?!.]['"`]?$/
        sentences += 1
        if sentences == n
          print "\n"
        else
          print " "
        end
      else
        print " "
      end
      word = next_word()
    end until sentences == n
  end

  def next_word(restart = false)
    if restart or @state.include?(nil)
      key = @words.keys[rand(@words.length)]
      @state = key.split(/\s+/)
    end
    key ||= @state.join(' ')
    # restart if we hit a dead end, rare unless text is small
    if @words[key].nil?
      next_word(true)
    else
      word = @words[key][rand(@words[key].length)]
      @state.shift
      @state.push(word)
      word
    end
  end

end

class MarkovLetters

  def initialize(filename, order)
    @order = order
    @letters = Hash.new
    @state = Array.new(@order)
    previous = Array.new(@order)
    File.foreach(filename) do |line|
      line.strip!
      line << ' ' unless line.length == 0
      line.gsub!(/\s+/, ' ')
      line.gsub!(/[^a-z ]/, '')
      line.split(//).each do |letter|
        unless previous.include?(nil)
          p = previous.join('')
          unless @letters.has_key?(p)
            @letters[p] = Array.new
          end
          @letters[p] << letter
        end
        previous.shift
        previous.push(letter)
      end
    end
  end

  # words begin after a space and end before a space
  def print_words(n = 50)
    letter = next_letter(true)
    while letter != ' '
      letter = next_letter()
    end
    letter = next_letter()
    words = 0
    while words < n
      words += 1 if letter == ' '
      print letter
      letter = next_letter()
    end
    print "\n"
  end

  def next_letter(restart = false)
    if restart or @state.include?(nil)
      key = @letters.keys[rand(@letters.length)]
      @state = key.split(//)
    end
    key ||= @state.join('')
    # restart if we hit a dead end, rare unless text is small
    if @letters[key].nil?
      next_letter(true)
    else
      word = @letters[key][rand(@letters[key].length)]
      @state.shift
      @state.push(word)
      word
    end
  end

end

if $0 == __FILE__
  require 'getoptlong'

  def usage()
    $stderr.puts "Usage: ruby #{$0} [options] <filename>",
                 " -h, --help show this usage message",
                 " -o, --order set markov chain order",
                 " -s, --sentences set number of sentences to print",
                 " -w, --words set number of words to print",
                 " -l, --letters use letters as the basic unit"
  end

  order = 2
  sentences = 5
  words = nil
  letters = false

  opts = GetoptLong.new(["--help", "-h", GetoptLong::NO_ARGUMENT],
                        ["--order", "-o", GetoptLong::REQUIRED_ARGUMENT],
                        ["--sentences", "-s", GetoptLong::REQUIRED_ARGUMENT],
                        ["--words", "-w", GetoptLong::REQUIRED_ARGUMENT],
                        ["--letters", "-l", GetoptLong::NO_ARGUMENT])

  opts.each do |opt, arg|
    case opt
    when "--help"
      usage
      exit 0
    when "--order"
      order = arg.to_i
    when "--sentences"
      sentences = arg.to_i
      words = nil
    when "--words"
      words = arg.to_i
      sentences = nil
    when "--letters"
      letters = true
      words = 50 if words.nil?
    end
  end

  if ARGV.length < 1
    usage
    exit 1
  end

  ARGV.each do |arg|
    begin
      if letters
        m = MarkovLetters.new(arg, order)
        m.print_words(words)
      else
        m = MarkovWords.new(arg, order)
        m.print_words(words) unless words.nil?
        m.print_sentences(sentences) unless sentences.nil?
      end
    rescue
      stderr\.puts !
    end
  end
end

markov.rb (4.58 KB)

Or everything why's ever said.

--Steve

···

On Apr 9, 2006, at 5:35 PM, Daniel Baird wrote:

I think someone needs to take all the explanations from this thread and run
their markov chainer on that.

Hi,

Can you show an example using Perl's auto-vivification that feels funny in Ruby? Perhaps we would have better ideas after seeing it...

Didn't feel funny, I just wasn't use to having to declare objects (hashes and arrays) as I added them as branches on my initial hash. I had to use a little more discipline, which probably is a good thing coming from Perl-land.

Incidentally, if you ever need an auto-vivifying hash-of-hashes
(as opposed to mixture of hashes and arrays that is possible
because of Perl syntax), you can do the hash-of-hashes
auto-vivify in Ruby:

HashFactory = lambda { Hash.new {|h,k| h[k] = HashFactory.call} }

irb(main):181:0> x = HashFactory.call
=> {}
irb(main):182:0> x['abc']['def']['ghi'] = 123
=> 123
irb(main):183:0> x
=> {"abc"=>{"def"=>{"ghi"=>123}}}

Regards,

Bill

···

From: "Albert Vernon Smith" <smithav@cshl.edu>

On 10.4.2006, at 14:05, James Edward Gray II wrote:

I would like Ruby support of serial ports for the following
three platforms.
#1 Windows
#2 XP
#3 Mac OS X

Can you differentiate between/clarify #1 and #2?

-M

Thanks for the script, I note this interesting result with my implementation:
% ruby markov.rb rt.text 12
Alternate austin austin austin austin halostatue gmail com alternate austin austin austin

···

On Apr 7, 2006, at 7:02 PM, Ross Bamford wrote:

On Sat, 2006-04-08 at 03:07 +0900, Simon Strandgaard wrote:

On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote:
[snip]

These can be quite a bit of fun, depending on what text you prime
them with... :wink:

input_text := ruby_talk; (* hehe *)

I had to try this out :slight_smile: With 500 messages from the past couple of days,
order of 3, I found lots of cryptic wisdom when using a small word
limit, such as:

===
windows :\ i guess clarity, just isn't a interesting.

thanks for making def say_stuff puts

bereflected to the foo to design a gui by open source path.

in summary, to get wrote: pointers, or care what you mean.

example, c# guys (including even worse - perl.

warning: pointer targets in boese wrote: use an i'll go digging

+0900, james edward gray would help with reading v_mod/1 m = 'm'

to figure this out? tashiro wrote: you need own license, possibly

april fool's joke. v.- interface and doscommand which next implemented
anobjective-c frontend set of programming

I've attached the quick and dirty script I used to get cleaned-up input
text from ruby-talk archive.

This is a seriously cool quiz :slight_smile: Thanks all concerned.

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk
<rubytalk2text.rb>

I got a few interesting bits using just this year's redhanded posts.
Thanks to the small input body and a low variance setting there's almost
as much between the lines as in the original posts :slight_smile:

···

On Mon, 2006-04-10 at 13:08 +0900, Stephen Waits wrote:

On Apr 9, 2006, at 5:35 PM, Daniel Baird wrote:

> I think someone needs to take all the explanations from this thread
> and run
> their markov chainer on that.

Or everything why's ever said.

=====
Ten-Sided is an alias for html. xhtml_strict does the same, but with
root beer and cream soda lip balms instead of Roofies or Vitamin K.
Actually, red.trust_sphere.each patches up batsman’s user by then
reflect ACTIVERECORD.

Service Overrides Adding stuff like sessioning and authentication
demands hooks on the web. Skeptical? I’d say you invented tumblelogging
way back in the above equation start out equal. Try seeding the
rankings. Maybe you want trust from Matz to be full projects.

And, of course, he really can be seen as a not entirely bad reason for
actually providing a time of 1.815 seconds, compared with St.
Valentine’s YARV which will format the date according to a class’ page.

I think Marcel has fixed this in Edge Rails. We get on these little
kicks. As we’re all snooping around. Little protocols or obscure version
control systems. Or domino games or something. Trust metric, man. It
just strikes me all the time.

We have, right now, an over-night offsite meeting at a Chupei hackathon
to get Perl’s YAML::Syck module on its legs and with a gem, but you
never know. Radicals often portray future animal executions in the above
code and also a follow-up with darcs and switchtower.

Install FuseFS. Run rubyfs.rb. We’ve been through this day before we
talk about this at CampingSessions on the wiki. Good thing gabriele is
out there, because I’ve been playing with a timezone setting, right?
These are tough issues, with a rewrite of miniature file-sharing and I
really appreciate him.

The textarea technique is very hard to get even with net auctions. Don’t
get me wrong. It does not mean the book is bad.

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

I went and grabbed a bunch of Grimm's fairy tales from Gutenberg. I
still need to tweak my algorithm some; the most annoying thing I find
is punctuation, especially quoted strings (i.e. speech) so that the
output text looks like there is some valid speech in there and not
just randomly scattered quote marks.

Anyway, here's a sample I produced with my first attempt:

There was a man who kills seven at one blow? I leapt over the tree
because the huntsmen are shooting down there in the closet on the
porch.' The miller said: 'The Devil must go out,' and opened the door
and opened it. As she went in, a little dwarf no bigger than my
finger. And before her stood princes, and dukes, and earls: and the
fisherman went up to her palace all of shining gold; and told her
mother that she must be the handsomest lady in the land; and she went
to the fire and growled contentedly and comfortably. It was not long
in saying 'Yes' to all this; and as they were entering the village,
the son followed the fox's counsel, and without looking about him went
to the other anvil. The old man led him back into the water. The girls
came just in time; they held him fast and tried to free his beard from
the line, but all in vain; he heard or saw nothing of Jorinda. At last
he thought to himself: 'How the old woman is snoring! I must just see
if she wants anything.'

and:

An aged count once lived in Switzerland, who had an only son, but he
was now grown too old to work; so the farmer would give him his only
daughter to her bed-side, and said, 'Always be a good girl, and I will
never by myself leave the path, to run into the wood, till at last
they danced out at the gate together. When they had walked a short
time, when they had warmed themselves, they said: 'Comrade, shall we
have a child, and he grows big, and we send him into the wide world,
and looked neither to the right place. There they found the giants
swimming in their blood, and all round about her, and the bells rang
at each step which she took. Then she was in a great hurry. The little
tailor looked round and saw a flock of sheep, the very shepherd whom
the peasant knew had long been wishing to be mayor, so he cried with
all his might.

Correction to earlier post AGS Calabrese
00000000000000000000000000000000000000000000000000
I would like Ruby support of serial ports
for the following three platforms.
#1 Windows
#2 Linux
#3 Mac OS X Tiger

Please respond offline if you can
show me how to add serial support to any or all of these

Gus

rubySPS@oh-bear.com

my spam filter has been eating messages so I may miss your message.
I am working on the problem
000000000000000000000000000000000000000000000000000

···

On 2006-Apr 07, at 10:46 AM, Mike wrote:

I would like Ruby support of serial ports for the following
three platforms.
#1 Windows
#2 XP
#3 Mac OS X

Can you differentiate between/clarify #1 and #2?

-M

how did you manage to get complete sentences?

···

On 4/8/06, Matthew Moss <matthew.moss.coder@gmail.com> wrote:

I went and grabbed a bunch of Grimm's fairy tales from Gutenberg. I
still need to tweak my algorithm some; the most annoying thing I find
is punctuation, especially quoted strings (i.e. speech) so that the
output text looks like there is some valid speech in there and not
just randomly scattered quote marks.

Anyway, here's a sample I produced with my first attempt:

There was a man who kills seven at one blow? I leapt over the tree
because the huntsmen are shooting down there in the closet on the
porch.' The miller said: 'The Devil must go out,' and opened the door
and opened it. As she went in, a little dwarf no bigger than my
finger. And before her stood princes, and dukes, and earls: and the
fisherman went up to her palace all of shining gold; and told her
mother that she must be the handsomest lady in the land; and she went
to the fire and growled contentedly and comfortably. It was not long
in saying 'Yes' to all this; and as they were entering the village,
the son followed the fox's counsel, and without looking about him went
to the other anvil. The old man led him back into the water. The girls
came just in time; they held him fast and tried to free his beard from
the line, but all in vain; he heard or saw nothing of Jorinda. At last
he thought to himself: 'How the old woman is snoring! I must just see
if she wants anything.'

and:

An aged count once lived in Switzerland, who had an only son, but he
was now grown too old to work; so the farmer would give him his only
daughter to her bed-side, and said, 'Always be a good girl, and I will
never by myself leave the path, to run into the wood, till at last
they danced out at the gate together. When they had walked a short
time, when they had warmed themselves, they said: 'Comrade, shall we
have a child, and he grows big, and we send him into the wide world,
and looked neither to the right place. There they found the giants
swimming in their blood, and all round about her, and the bells rang
at each step which she took. Then she was in a great hurry. The little
tailor looked round and saw a flock of sheep, the very shepherd whom
the peasant knew had long been wishing to be mayor, so he cried with
all his might.

For sentences, I picked words until it started with a [A-Z], then
printed subsequent words until it ended with [?!.]

Using KJV Bible with verse numbers removed:

irb(main):001:0> load 'markov.rb'
=> true
irb(main):002:0> m = Markov.new('../etext/kjv10x.txt')
irb(main):003:0> m.print_sentences 4
Mehujael begat Anub, and taught his house. And in the wheat into the
field for my fury upon the Red sea. Nevertheless what I judge the
people shall turn to his clothes. And a great and he hath cast far
off.
=> nil
irb(main):004:0> m.print_sentences 1
Which, when the kingdom for meat, and he had indignation against thy
land, the violent man.
=> nil

···

On 08/04/06, Meinrad Recheis <meinrad.recheis@gmail.com> wrote:

On 4/8/06, Matthew Moss <matthew.moss.coder@gmail.com> wrote:
> I went and grabbed a bunch of Grimm's fairy tales from Gutenberg. I
> still need to tweak my algorithm some; the most annoying thing I find
> is punctuation, especially quoted strings (i.e. speech) so that the
> output text looks like there is some valid speech in there and not
> just randomly scattered quote marks.
>
> Anyway, here's a sample I produced with my first attempt:
>
> There was a man who kills seven at one blow? I leapt over the tree
> because the huntsmen are shooting down there in the closet on the
> porch.' The miller said: 'The Devil must go out,' and opened the door
> and opened it. As she went in, a little dwarf no bigger than my
> finger. And before her stood princes, and dukes, and earls: and the
> fisherman went up to her palace all of shining gold; and told her
> mother that she must be the handsomest lady in the land; and she went
> to the fire and growled contentedly and comfortably. It was not long
> in saying 'Yes' to all this; and as they were entering the village,
> the son followed the fox's counsel, and without looking about him went
> to the other anvil. The old man led him back into the water. The girls
> came just in time; they held him fast and tried to free his beard from
> the line, but all in vain; he heard or saw nothing of Jorinda. At last
> he thought to himself: 'How the old woman is snoring! I must just see
> if she wants anything.'
>
>
> and:
>
> An aged count once lived in Switzerland, who had an only son, but he
> was now grown too old to work; so the farmer would give him his only
> daughter to her bed-side, and said, 'Always be a good girl, and I will
> never by myself leave the path, to run into the wood, till at last
> they danced out at the gate together. When they had walked a short
> time, when they had warmed themselves, they said: 'Comrade, shall we
> have a child, and he grows big, and we send him into the wide world,
> and looked neither to the right place. There they found the giants
> swimming in their blood, and all round about her, and the bells rang
> at each step which she took. Then she was in a great hurry. The little
> tailor looked round and saw a flock of sheep, the very shepherd whom
> the peasant knew had long been wishing to be mayor, so he cried with
> all his might.
>
>
how did you manage to get complete sentences?

> An aged count once lived in Switzerland, who had an only son, but he
> was now grown too old to work; so the farmer would give him his only
> daughter to her bed-side, and said, 'Always be a good girl, and I will
> never by myself leave the path, to run into the wood, till at last
> they danced out at the gate together. When they had walked a short
> time, when they had warmed themselves, they said: 'Comrade, shall we
> have a child, and he grows big, and we send him into the wide world,
> and looked neither to the right place. There they found the giants
> swimming in their blood, and all round about her, and the bells rang
> at each step which she took. Then she was in a great hurry. The little
> tailor looked round and saw a flock of sheep, the very shepherd whom
> the peasant knew had long been wishing to be mayor, so he cried with
> all his might.
>
>
how did you manage to get complete sentences?

Cheating. Sort of.

When I break up input texts to find words, punctuation is just part of
the word. So you end up seeing the punctuation with similar frequency
and distribution as you might in the original texts. So it ends up
looking like complete sentences, and sometimes they actually are, but
there's no code to help that along. The order does help a bit,
though, it keeping it reasonable... 2-3 seems to generate the best
results.