Please Forward: Ruby Quiz Submission

James_Edward_Gray_II · 12 July 2006 19:32

From: James Quick <jimquick@mac.com>
Date: July 12, 2006 9:51:17 AM CDT
To: submission@rubyquiz.com
Subject: Please Forward: Ruby Quiz Submission

I am not on the ruby-talk list, and would appreciate if you could pass this
along. This was a nice problem to get my feet wet in this language. I
began looking at Ruby last week after hearing about what an old colleague
of mine was doing with it. Because I am familiar with several languages
with which it shares some features (smalltalk, eiffel, objective-c, perl) it
is deceptively easy to pick up, and occasionally infuriating. Thank you,
James, for providing this excellent service to the community. Sometimes
it is essential to have throwaway problems to guide one's exploration.

My first attempts were very classy (in a bad way) partitioning things into separate
classes, rigorously defining interfaces and accessor methods. They were also
very slow, and usually failed to converge on a solution.

Then, by reading source code for a bunch of modules in /usr/lib/ruby, I found
that mixing a few methods into the base classes was not only sufficient for
most of my uses but preferable. Profiling showed that much of my time was
being spent in each_byte loops so I ripped out the conversions to and from
Strings and replaced it with a byte comparison factory which stored lambda
functions. This way, I could pretend that I was producing arrays of byte counts
from string operations without actually doing them more than once.

This was more than twice as fast as my previous attempts but was still
not converging very quickly. I had already optimized my target_counts
array by initializing it with complete information on what was knowable
about a solution before starting a search. Unfortunately during the search,
if several hundred thousand loops had occurred, I would do some sanity
checking which proved to be insane. I converted the target counts into a
string then back into a count array. If the real result of the sanity check was
beyond the range of the target and source, I would munge one or more
values into my target array, thus throwing out all the good work!

It was not until I saw the excellent submissions by Simon Krer that I realized
that my sanity checking was demoting me to a random search. Optimal random
Robinizing is more like solving a rubix cube than like searching randomly
for text. If properly initialized the target and result are mathematically linked.
If you make any change to the target or source without making a complementary
change to the other, you will immediately devolve into a random search.
Doing so is the equivalent of occasionally removing a piece from the
cube, twisting it, and putting it back. You may have made it impossible
to solve (unless by chance you randomly undo what you just did).

Basically, the target contains a direct representation of the pangram search
space in the form of c->n. Each count expresses the assertion that there be
n occurrences of the character 'c' in the pangram. For a particular value of
n it may be a completely bogus assertion. All that matters is that the assertion
is congruent with the information stored in the result_string.

I initialize my target and source with the exact values for constant text
and their spelled out occurrences. Simon does it far more simply at the
expense of a few more iterations (but his chainsaw has a much more
powerful engine!). He initializes his target (which he calls guess) to
an array of 1's indicating the invariant assumption that there will be
1 occurrence of each letter of the alphabet in the pangram. His result
starts with an array of 1's (representing a-z from the itemized
list n a's, n b's .... and n z's). He then adds the counts for "and", the prefix
string, and 26 copies of the word "one". The 26 copies of the word "one"
provide the required link which binds the guess and the "real" result
together. As soon as he chooses a better value than 1 for a letter
frequency he will subtract "one" and then add the spelled out version
of the new choice.

These 26 copies of "one", or the loop of calls to adder lambda functions
in my version, each perform an equivalent task. They ensure that the
target and result are rigorously linked.

From here on, despite their quite different implementations, our solutions
are essentially the same. Random Robinizing proceeds as follows:
When the target and source differ, randomly choose a new value(n).
Add or subtract that value from the source (plain integer arithmetic).
In the destination, decrement the letter counts for each letter in the
spelled out number (before the change), then increment the letter
counts for the new spelled out number.

As you can see, the source and destination have completely different
operations performed on them. They are different but embody the
essence of the pangram itself. A self referential sentence whose
implementation (spelling) is identical to its semantic meaning
(numerical commentary about the sentence).

If you (like me) ran into a performance wall, and found that
answers were converging slowly or not at all, take a step back.
As soon as you break the underlying relationship between the
target and destination, your program will stray into a random walk,
which may contain long cycles of aimless wandering.

I think there may be some bugs lurking here, though I think most are
gone. I would appreciate any stylistic or other advice on best practices.
I alternated several times between studlyCaps and c style naming.
I also realize that I have know idea about the relative performance
tradeoffs between cacheing variables and evaluating method calls.

I did see some fairly large performance boosts when variables are
found first in the outer scope rather then being allocated and lost
with each iteration.

Finally, I am interested in learning where to look for up to date
information on the language definition or syntax. I could only find
something circa 1.6.

The following code makes me realize that I am far from being
proficient in this language. However, I hope the comments and
the basic algorithm are clear enough to provide some useful
insight into the problem domain.

here is some sample output.

As you can see 9 of the 26 letters are already solved before the first pass
through the loop. This neighborhood quickly converged. In retrospect
I am now sure that the 'q' from my initials helped. Zebras are useful
too.
lili% ruby jqpangram.rb
"--- 0 Wed Jul 12 10:32:42 EDT 2006"
[7, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2]
[7, 2, 2, 2, 24, 2, 2, 2, 2, 2, 1, 1, 3, 24, 28, 2, 2, 5, 12, 10, 1, 2, 8, 1, 1, 2]
"--- 10000 Wed Jul 12 10:32:47 EDT 2006"
[7, 2, 2, 2, 28, 5, 3, 7, 8, 2, 1, 2, 3, 16, 17, 2, 2, 12, 31, 25, 3, 7, 12, 3, 4, 2]
[7, 2, 2, 2, 35, 5, 4, 8, 8, 2, 1, 3, 3, 16, 15, 2, 2, 10, 32, 26, 2, 9, 13, 2, 4, 2]
"--- 20000 Wed Jul 12 10:32:52 EDT 2006"
[7, 2, 2, 2, 26, 6, 2, 4, 10, 2, 1, 1, 3, 22, 16, 2, 2, 10, 31, 26, 2, 5, 14, 4, 5, 2]
[7, 2, 2, 2, 21, 7, 2, 3, 9, 2, 1, 1, 3, 17, 20, 2, 2, 9, 31, 25, 4, 4, 14, 5, 5, 2]
"--- 30000 Wed Jul 12 10:32:57 EDT 2006"
[7, 2, 2, 2, 28, 6, 4, 6, 10, 2, 1, 2, 3, 17, 16, 2, 2, 9, 30, 25, 3, 5, 12, 3, 4, 2]
[7, 2, 2, 2, 27, 6, 3, 6, 10, 2, 1, 2, 3, 16, 15, 2, 2, 10, 32, 24, 3, 6, 12, 4, 4, 2]
"--- 40000 Wed Jul 12 10:33:01 EDT 2006"
[7, 2, 2, 2, 29, 8, 4, 8, 9, 2, 1, 3, 3, 15, 17, 2, 2, 12, 30, 23, 4, 6, 12, 2, 4, 2]
[7, 2, 2, 2, 28, 7, 4, 7, 9, 2, 1, 3, 3, 17, 16, 2, 2, 11, 30, 25, 4, 5, 13, 2, 4, 2]
"--- 50000 Wed Jul 12 10:33:06 EDT 2006"
[7, 2, 2, 2, 30, 6, 2, 7, 10, 2, 1, 1, 3, 19, 17, 2, 2, 11, 30, 23, 3, 6, 10, 4, 4, 2]
[7, 2, 2, 2, 28, 4, 2, 6, 7, 2, 1, 2, 3, 19, 16, 2, 2, 11, 31, 23, 3, 5, 10, 3, 4, 2]
"--- 60000 Wed Jul 12 10:33:11 EDT 2006"
[7, 2, 2, 2, 29, 7, 2, 6, 9, 2, 1, 3, 3, 19, 16, 2, 2, 11, 33, 23, 4, 5, 12, 4, 4, 2]
[7, 2, 2, 2, 31, 6, 2, 6, 9, 2, 1, 3, 3, 20, 16, 2, 2, 12, 31, 23, 4, 6, 12, 3, 4, 2]
"solution found in 69833 iterations"
[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
"mypan----------------------------------------"
[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
"A pangram from jq contains one zebra, seven a's, two b's, two c's, two d's, twenty-six e's, eight f's, three g's, six h's, ten i's, two j's, one k, two l's, three m's, fifteen n's, sixteen o's, two p's, two q's, ten r's, thirty-one s's, twenty-five t's, three u's, five v's, twelve w's, four x's, four y's, and two z's."
true

cut here VVVV jqpangram.rb follows
--------------------------------------
#!/usr/local/bin/ruby -w

class Integer
    ONES = %w[ zero one two three four five six seven eight nine ]
    TEEN = %w[ten eleven twelve thirteen fourteen fifteen
                  sixteen seventeen eighteen nineteen ]
    TENS = %w[ zero ten twenty thirty forty fifty
                sixty seventy eighty ninety ]

    def to_en
        d, n = self.divmod(10)
        if d == 0
            ONES[n]
        elsif d == 1
            TEEN[n]
        elsif d < 10
            TENS[d] + ((n > 0) ? ("-" + ONES[n]) : "")
        else
            raise "Class Integer#to_en only goes to 99 not #{self}"
        end
    end
end

class String
    # to_counts returns a 26 element array containing the counts of
    # each letter in the input string. e.g. "add".to_counts -> [1,0,0,2,0...]
    # Only counting lower case for speed. Be sure to downcase.
    def to_counts
        counts = Array.new(26).fill(0)
        each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end
        counts
    end

    # similar to to_counts except adds the result into an existing array.
    def accum_counts(counts)
        each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end
    end
end

class Array
    include Enumerable
    # Return an array of textual pangram sections from a count_array
    def pan_body()
        self.zip((?a..?z).to_a).collect do |n, c|
            sprintf(" %s%s %c%s", \
                (c==?z ? "and " : ""), \
                n.to_en, c, \
                (n>1) ? "'s" : "")
        end
    end

    # Turn a pangram text array into a string.
    def pan_string()
        pan_body().join(',') + "."
    end
end

class CounterFactory
    # The designated initializer is called with a pangram prefix string
    # e.g. "This string contains". It sets up a pair of hash tables
    # filled with pattern counting functions and other initial values.
    def initialize(withString=nil)
        @add_word = {}
        @subtract_word = {}
        add_numbers('+', @add_word)
        add_numbers('-', @subtract_word)
        add_initial(withString) if withString != nil
        add_numbers('+', @add_word)
    end

    # add_function takes an input string and creates lambda functions which
    # transform count arrays (like those made by to_counts).
    # add_function(1, "one", "+", add_word) creates a function of arity 1
    # and adds it to the @add_word hash table with a key of 1.
    # Calling this function with a count array increments the 'o', 'n', and 'e'
    # letter counts. If the input array was initially filled with 0, then
    # the content of that array is equivalent to "one".to_counts.
    def add_function(key, aString, op, table)
        counts = aString.downcase.to_counts
        src = 'lambda { |targ| '
        counts.each_with_index do |n,i|
                src += "targ[#{i}] #{op}= #{n}; " if (n > 0)
        end
        src += '}'
        table[key] = eval(src)
    end

    # Add functions for each named number, with integer keys, to a table.
    def add_numbers(op, table)
        #add_function(0, "no", op, table)
        add_function(1, "one", op, table)
        for n in (2..99)
            add_function(n, n.to_en + "s", op, table)
        end
    end

    # Add a function mapping the constant elements of a pangram to a count array.
    def add_initial(prefix)
        # A pangram is : "prefix" + counts of each letter + "and" before 'z'
        known_string = prefix.downcase + ('a'..'z').to_a.join("") + "and"
        add_function(:initial, known_string, '+', @add_word)

        # known_count[letter] is the count of each letter we know must exist
        known_count = known_string.to_counts

        name_bytes = Array.new(26).fill(0)
        (1..99).each { |n| n.to_en.accum_counts(name_bytes) }
        in_names = name_bytes.collect {|n| n > 0}
        # Now we have an array of booleans representing whether a character
        # is constant (only appearing in the prefix or enumerated list)
        # or is a variable quantity (because it also appears in number names

        # From the count of what we initially knew to be present, return
        # our initial target. This is our best initial guess for a result
        # which will converge quickly. Basically, anything which is
        # not found in a name, is now known to occur a constant number of
        # times in the result. If it is in a name, I just punt and
        # prepare to see it 1 time like Simon does.
        @target_template = in_names.zip(known_count).collect do |in_name, known|
            if in_name
                1
            else
                known
            end
        end

        # Starting with the count of known contents add the letter counts
        # for the numbers we expect to see. The counts in the target template
        # indicate that we expect to see N occurences of that (char) in the
        # result. Thus if 7 a's are reflected in the target we must have
        # an associated set of bytes {seven's} in the result. The following
        # code initializes the result template with the static byte counts
        # that we know must occur in the result. Using the target template as
        # a guide, we then call the appropriate adder_function to increment
        # the counts for the spelled out numbers which MUST be present in
        # the result if our guess remains congruent. e.g. if the target contained
        # [7, 2, 1, 2]... we MUST increment the result_template by
        # "sevens" "twos" "one" "twos"
        @result_template = known_count.dup
        @target_template.each do |n|
            @add_word[n].call(@result_template)
        end

    end

    # Make sure these don't get clobbered so return a copy
    # Should these be dup'ed or frozen instead?
    def target_template()
        @target_template.dup
    end
    def result_template()
        @result_template.dup
    end

    # This is syntactic sugar, taking any number of arguments
    # or an array of arguments and calling all of them on a fresh array.
    def array_from_functions(*args)
        counts = Array.new(26).fill(0)
        for arg in args
            if arg.is_a?(Array)
                arg.each do |o| @add_word[o].call(counts); end
            else
                p arg
                @add_word[arg].call(counts);
            end
        end
        counts
    end

    # Easy access to the function maps
    attr_reader :add_word, :subtract_word
end

def pangram_main()
    sallowPrefix = 'This pangram tallies'
    sallowTarget = [5, 1, 1, 2, 28, 8, 6, 8, 13, 1, 1,
        3, 2, 18, 15, 2, 1, 7, 25, 22, 4, 4, 9, 2, 4, 1];
    prefix = "A pangram from jq contains one zebra,"

    # Initialize the CounterFactory and grab the maps
    cf = CounterFactory.new(prefix)
    add_word=cf.add_word
    subtract_word=cf.subtract_word

    # Set up the target and result arrays.
    # Though these will converge eventually these are not interchangeable!
    # They are complementary not equivalent. The target initially contains
    # a set of small integers. Each n refers to the expectation of finding
    # N of a particular character in the output. Each N == 1 indicates
    # that we have no idea how many occur in the end result.
    # Any fortunate N > 1 means, we know exactly how may will be in the
    # final tally. The result_count entries contain sums for each byte
    # we know we will encounter and also the sum for the spelled out
    # numbers reflected in out target_count array.
    #
    # These two structure mirror what a pangram is.
    # The target reflects the semantic intent ->
    # It (the pangram) will have: 1 a, 2 b's...
    # The result reflects the mechanical instantiation
    # "This pangram sentence has one a, two b's..."
    target_count = cf.target_template
    result_count = cf.result_template

    delta, i, f, target, result = nil
    loopcount = 0
    different = true

    # We will loop until we converge
    while different
        if loopcount % 10000 == 0
            p "--- #{loopcount} " + Time.new.to_s
            p target_count, result_count
        end
        loopcount += 1

        # Each time through, visit and compare each target and result element
        # While they differ, we will change our guess by updating the
        # counts in the target_array. Change a guess by increasing
        # or decreasing the expected occurence of a single character.
        # Reflect that change in the result by adding and subtracting
        # multiple counts in the result corresponding to spelled out
        # numbers. For instance if target_count[i] is changed from
        # 7 to 12 we decrement the result buckets for the bytes in
        # 'sevens' and increment by the bytes in 'twelves'. The 'seve'
        # changes are a wash so we have 1 less n and one more t, w, and l
        # in the result.

        different = false

        # If the target and result differ make
        for i in (0..25)
            target = target_count[i]
            result = result_count[i]

            if target != result
                delta = rand((target - result).abs + 1)

                begin
                    subtract_word[target].call(result_count)
                    if target < result
                        target = target + delta
                    else
                        target = result + delta
                    end
                    add_word[target].call(result_count)
                    target_count[i] = target
                    different = true
                rescue
                    # I'm not sure how to handle this yet.
                    # A while ago I was getting errors with some
                    # pangram prefix strings. The calculations
                    # diverged, causing calls to nonexistent
                    # adders or subtractors: with keys 0, -1, -2...
                    # instead of fixnums between 1 and 99
                    # This usually happens only after 500K or so iterations.
                    raise
                end

            end
        end
    end

    p "solution found in #{loopcount} iterations"
    p target_count, result_count
    mypan = prefix + target_count.pan_string
    t2=(mypan.downcase).to_counts
    p "mypan" + "--"*20
    p t2, mypan
    p t2 == target_count
end

if __FILE__ == $0
    pangram_main()
end
#[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, 3, 5, 12, 4, 4, 2]
#panstring="A pangram from jq contains one zebra, seven a's, two b's, two c's, \
#two d's, twenty-six e's, eight f's, three g's, six h's, ten i's, two j's, one k, \
#two l's, three m's, fifteen n's, sixteen o's, two p's, two q's, ten r's, \
#thirty-one s's, twenty-five t's, three u's, five v's, twelve w's, four x's, \
#four y's, and two z's."
#true

···

Begin forwarded message:

Topic		Replies	Views
Please Forward: Ruby Quiz Submission (Part 1) ruby-talk	1	150	12 July 2006
Please Forward: Ruby Quiz Submission ruby-talk	0	115	23 April 2006
[ANN] Ruby Quiz ruby-talk	4	81	20 September 2004
Please Forward: Ruby Quiz Submission ruby-talk	0	171	29 August 2006
Please Forward: Ruby Quiz Submission ruby-talk	0	126	16 January 2007

Please Forward: Ruby Quiz Submission

Related topics