[SUMMARY] Word Loop (#149)

There were two kinds of solvers this week: hardcore programmers who love a good
challenge and cheaters like me. Both approaches are neat, but I want to focus
on the under appreciated cheater this time. Contrary to the best practices for
gamblers, cheater is a wonderful label for a programmer to have.

Cheaters always find the system and make it work for them. So what's the system
this time? Have another look at one of the quiz examples:

   i
   p
   p
  Mis
   ss
   si

Okay, the first priority is to find a possible loop. We need a repeated letter
with some letters between the two occurrences, obviously. How many letters
between though? Well, the minimum loop is:

  *.
  ..

The next smallest loop is what we see in Mississippi:

  *.
  ..
  ..

That's three and five and we can already see that each size must add two more
letters. So, we need an odd number of letters and at least three of those. Now
we can find loops.

Once we see the loop, it becomes clear that the loop divides the word into
pieces. Let me call them out:

   ^ > = before the loop characters
   ^ * = the repeated letter
   ^ . = loop characters
  >*. ^ = after the loop characters
   ..
   ..

Seeing these makes a cheater wonder, can I output each part naturally down the
lines? To do that, we would first need to spit out those after the loop
characters, in reverse order. That shouldn't be tough since we already know how
to find a loop, but notice that each of those is also indented. It turns out
that they are just indented by the length of characters before the loop, so
that's trivial to deal with.

Now we would need the get those before the loop characters in the output. No
problem there, it's a simple print statement.

The loop is the trickiest part. First, we see that the repeated letter and the
first loop character can be printed along with the before characters. The rest
of the characters have an indent, but it's the same thing we figured out earlier
and we can deal with that. Then outputting the remaining characters of the loop
can be as simple as printing two columns: one pulling letters off the back of
the loop and the other pulling letters off the front.

Wow, cheaters work harder than you think, eh? If we can do all of those pieces,
we can print the word loop as we go. The reason I like trying an approach like
this is that each step is pretty simple and easy to understand. We had to do a
bit of thinking that the computer probably could have done for us, but then you
have to be smart enough to tell the computer how to do the thinking.

Let's move the ideas into code at this point. We will examine Ken Bloom's
solution. Ken is a cheater and a better one than I am at that! I'll tell you
what Ken taught me shortly, but for first let's see the code. Brace yourself
because I'm giving you the whole thing in one shot:

  def loopword word
   matchinfo=
     word.match(/(.*?)(.)(.(?:..)+?)\2(.*)/i)
   if matchinfo
     _,before,letter,looplets,after=matchinfo.to_a
     pad=" "*before.size
     after.reverse.split(//).each{|l| puts pad+l}
     looplets=looplets.split(//)
     puts before+letter+looplets.shift
     until looplets.empty?
       puts pad+looplets.pop+looplets.shift
     end
   else
     puts "No loop."
   end
  end
  
  # ...

Isn't it great to be a cheater?

Let's break it down. Probably the hardest part to understand is the very first
step. Ken hits the word with a tricky Regexp to figure out if it has a loop in
it. That's right, you can count an odd number of at least three characters with
a regular expression. Let's examine the pieces:

  / (.*?) # characters before the loop, if any
    (.) # the first appearance of our repeated character
    (.(?:..)+?) # an odd number of at least three loop characters
    \2 # the repeat of our repeated character
    (.*) # characters after the loop, if any
    /ix # make the expression case insensitive

There are two good tricks in there. First, matching at least three odd
characters is shown to be just any character followed by one or more groups of
two characters. That's handy to remember.

The other trick is using a back-reference to catch the repeated character. What
I didn't know about this trick though was that Ruby's regular expression engine
is smarter than I gave it credit for. I knew this would match:

  >> "-i---i-"[/(\w).+\1/]
  => "i---i"

However, I didn't know the following would work if you made the expression case
insensitive:

  >> "-I---i-"[/(\w).+\1/i]
  => "I---i"

I spent too much effort in my code to get a match to work on the word Markham
when I could have just used the /i switch. Thanks for the tip, Ken.

Once we have tried the expression, the if statement checks to see if we found a
loop. If we didn't the else clause can print our error message and we're done.

When we did find a match, Ken starts by pulling out each capture of the
expression into easy to manage variables. Here's my chance to teach Ken a new
trick though. See how he created an unused variable (_) to capture the match as
a whole? It's not really needed if you switch the method call on the MatchData
object:

  >> md = "Mississippi".match(/(.*?)(.)(.(?:..)+?)\2(.*)/i)
  => #<MatchData:0x58a188>
  >> md.to_a
  => ["Mississippi", "M", "i", "ssiss", "ppi"]
  >> md.captures
  => ["M", "i", "ssiss", "ppi"]

Let's get back to the code. Here it is again to save scrolling:

  def loopword word
   matchinfo=
     word.match(/(.*?)(.)(.(?:..)+?)\2(.*)/i)
   if matchinfo
     _,before,letter,looplets,after=matchinfo.to_a
     pad=" "*before.size
     after.reverse.split(//).each{|l| puts pad+l}
     looplets=looplets.split(//)
     puts before+letter+looplets.shift
     until looplets.empty?
       puts pad+looplets.pop+looplets.shift
     end
   else
     puts "No loop."
   end
  end
  
  # ...

The rest of the code prints the word as I described earlier. First, a variable
is set to that indent we will need in multiple places. The next line reverse()s
and split()s the after loop characters, printing each one with the indent.
After that, Ken breaks up the loop characters into an Array for easy removal at
both ends. The next line prints the before the loop characters, the repeat, and
the first loop character. Finally, the until loop handles the remaining loop
character two at a time, one from each end. That process prints the entire word
and makes a complete solution.

The rest of Ken's code just called the method on a set of sample words:

  # ...
  
  loopword "Mississippi"
  puts
  loopword "Markham"
  puts
  loopword "yummy"
  puts
  loopword "Dana"
  puts
  loopword "Organization"

The non-cheating solutions are also very interesting. They decided that this
problem wasn't hardcore enough for them and they could maximize the fun by
trying to create multiple loops and reuse as many characters as possible. It's
great code that leads to insane output, so be sure to check those out as well.

My thanks to cheaters and hardcore solvers alike. You always give me more
interesting material than I can even talk about.

Tomorrow we will tackle a typical programming task in a completely non-typical
manner...