Please Forward: Ruby Quiz Submission (Part 1)

I believe this message was rejected for length when I sent it the first time, so trying a two-parter...

James Edward Gray II

···

Begin forwarded message:

From: James Quick <jimquick@mac.com>
Date: July 12, 2006 9:51:17 AM CDT
To: submission@rubyquiz.com
Subject: Please Forward: Ruby Quiz Submission

I am not on the ruby-talk list, and would appreciate if you could pass this
along. This was a nice problem to get my feet wet in this language. I
began looking at Ruby last week after hearing about what an old colleague
of mine was doing with it. Because I am familiar with several languages
with which it shares some features (smalltalk, eiffel, objective-c, perl) it
is deceptively easy to pick up, and occasionally infuriating. Thank you,
James, for providing this excellent service to the community. Sometimes
it is essential to have throwaway problems to guide one's exploration.

My first attempts were very classy (in a bad way) partitioning things into separate
classes, rigorously defining interfaces and accessor methods. They were also
very slow, and usually failed to converge on a solution.

Then, by reading source code for a bunch of modules in /usr/lib/ruby, I found
that mixing a few methods into the base classes was not only sufficient for
most of my uses but preferable. Profiling showed that much of my time was
being spent in each_byte loops so I ripped out the conversions to and from
Strings and replaced it with a byte comparison factory which stored lambda
functions. This way, I could pretend that I was producing arrays of byte counts
from string operations without actually doing them more than once.

This was more than twice as fast as my previous attempts but was still
not converging very quickly. I had already optimized my target_counts
array by initializing it with complete information on what was knowable
about a solution before starting a search. Unfortunately during the search,
if several hundred thousand loops had occurred, I would do some sanity
checking which proved to be insane. I converted the target counts into a
string then back into a count array. If the real result of the sanity check was
beyond the range of the target and source, I would munge one or more
values into my target array, thus throwing out all the good work!

It was not until I saw the excellent submissions by Simon Krer that I realized
that my sanity checking was demoting me to a random search. Optimal random
Robinizing is more like solving a rubix cube than like searching randomly
for text. If properly initialized the target and result are mathematically linked.
If you make any change to the target or source without making a complementary
change to the other, you will immediately devolve into a random search.
Doing so is the equivalent of occasionally removing a piece from the
cube, twisting it, and putting it back. You may have made it impossible
to solve (unless by chance you randomly undo what you just did).

Basically, the target contains a direct representation of the pangram search
space in the form of c->n. Each count expresses the assertion that there be
n occurrences of the character 'c' in the pangram. For a particular value of
n it may be a completely bogus assertion. All that matters is that the assertion
is congruent with the information stored in the result_string.

I initialize my target and source with the exact values for constant text
and their spelled out occurrences. Simon does it far more simply at the
expense of a few more iterations (but his chainsaw has a much more
powerful engine!). He initializes his target (which he calls guess) to
an array of 1's indicating the invariant assumption that there will be
1 occurrence of each letter of the alphabet in the pangram. His result
starts with an array of 1's (representing a-z from the itemized
list n a's, n b's .... and n z's). He then adds the counts for "and", the prefix
string, and 26 copies of the word "one". The 26 copies of the word "one"
provide the required link which binds the guess and the "real" result
together. As soon as he chooses a better value than 1 for a letter
frequency he will subtract "one" and then add the spelled out version
of the new choice.

These 26 copies of "one", or the loop of calls to adder lambda functions
in my version, each perform an equivalent task. They ensure that the
target and result are rigorously linked.

From here on, despite their quite different implementations, our solutions
are essentially the same. Random Robinizing proceeds as follows:
When the target and source differ, randomly choose a new value(n).
Add or subtract that value from the source (plain integer arithmetic).
In the destination, decrement the letter counts for each letter in the
spelled out number (before the change), then increment the letter
counts for the new spelled out number.

As you can see, the source and destination have completely different
operations performed on them. They are different but embody the
essence of the pangram itself. A self referential sentence whose
implementation (spelling) is identical to its semantic meaning
(numerical commentary about the sentence).

If you (like me) ran into a performance wall, and found that
answers were converging slowly or not at all, take a step back.
As soon as you break the underlying relationship between the
target and destination, your program will stray into a random walk,
which may contain long cycles of aimless wandering.

I think there may be some bugs lurking here, though I think most are
gone. I would appreciate any stylistic or other advice on best practices.
I alternated several times between studlyCaps and c style naming.
I also realize that I have know idea about the relative performance
tradeoffs between cacheing variables and evaluating method calls.

I did see some fairly large performance boosts when variables are
found first in the outer scope rather then being allocated and lost
with each iteration.

Finally, I am interested in learning where to look for up to date
information on the language definition or syntax. I could only find
something circa 1.6.

The following code makes me realize that I am far from being
proficient in this language. However, I hope the comments and
the basic algorithm are clear enough to provide some useful
insight into the problem domain.

here is some sample output.

As you can see 9 of the 26 letters are already solved before the first pass
through the loop. This neighborhood quickly converged. In retrospect
I am now sure that the 'q' from my initials helped. Zebras are useful
too.
lili% ruby jqpangram.rb
"--- 0 Wed Jul 12 10:32:42 EDT 2006"
[7, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2]
[7, 2, 2, 2, 24, 2, 2, 2, 2, 2, 1, 1, 3, 24, 28, 2, 2, 5, 12, 10, 1, 2, 8, 1, 1, 2]
"--- 10000 Wed Jul 12 10:32:47 EDT 2006"
[7, 2, 2, 2, 28, 5, 3, 7, 8, 2, 1, 2, 3, 16, 17, 2, 2, 12, 31, 25, 3, 7, 12, 3, 4, 2]
[7, 2, 2, 2, 35, 5, 4, 8, 8, 2, 1, 3, 3, 16, 15, 2, 2, 10, 32, 26, 2, 9, 13, 2, 4, 2]
"--- 20000 Wed Jul 12 10:32:52 EDT 2006"
[7, 2, 2, 2, 26, 6, 2, 4, 10, 2, 1, 1, 3, 22, 16, 2, 2, 10, 31, 26, 2, 5, 14, 4, 5, 2]
[7, 2, 2, 2, 21, 7, 2, 3, 9, 2, 1, 1, 3, 17, 20, 2, 2, 9, 31, 25, 4, 4, 14, 5, 5, 2]
"--- 30000 Wed Jul 12 10:32:57 EDT 2006"
[7, 2, 2, 2, 28, 6, 4, 6, 10, 2, 1, 2, 3, 17, 16, 2, 2, 9, 30, 25, 3, 5, 12, 3, 4, 2]
[7, 2, 2, 2, 27, 6, 3, 6, 10, 2, 1, 2, 3, 16, 15, 2, 2, 10, 32, 24, 3, 6, 12, 4, 4, 2]
"--- 40000 Wed Jul 12 10:33:01 EDT 2006"
[7, 2, 2, 2, 29, 8, 4, 8, 9, 2, 1, 3, 3, 15, 17, 2, 2, 12, 30, 23, 4, 6, 12, 2, 4, 2]
[7, 2, 2, 2, 28, 7, 4, 7, 9, 2, 1, 3, 3, 17, 16, 2, 2, 11, 30, 25, 4, 5, 13, 2, 4, 2]
"--- 50000 Wed Jul 12 10:33:06 EDT 2006"
[7, 2, 2, 2, 30, 6, 2, 7, 10, 2, 1, 1, 3, 19, 17, 2, 2, 11, 30, 23, 3, 6, 10, 4, 4, 2]
[7, 2, 2, 2, 28, 4, 2, 6, 7, 2, 1, 2, 3, 19, 16, 2, 2, 11, 31, 23, 3, 5, 10, 3, 4, 2]
"--- 60000 Wed Jul 12 10:33:11 EDT 2006"
[7, 2, 2, 2, 29, 7, 2, 6, 9, 2, 1, 3, 3, 19, 16, 2, 2, 11, 33, 23, 4, 5, 12, 4, 4, 2]
[7, 2, 2, 2, 31, 6, 2, 6, 9, 2, 1, 3, 3, 20, 16, 2, 2, 12, 31, 23, 4, 6, 12, 3, 4, 2]
"solution found in 69833 iterations"
[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
"mypan----------------------------------------"
[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
"A pangram from jq contains one zebra, seven a's, two b's, two c's, two d's, twenty-six e's, eight f's, three g's, six h's, ten i's, two j's, one k, two l's, three m's, fifteen n's, sixteen o's, two p's, two q's, ten r's, thirty-one s's, twenty-five t's, three u's, five v's, twelve w's, four x's, four y's, and two z's."
true

Sorry, I obviously didn't wait long enough. :frowning:

James Edward Gray II

···

On Jul 12, 2006, at 3:04 PM, James Edward Gray II wrote:

I believe this message was rejected for length when I sent it the first time, so trying a two-parter...