[QUIZ] Posix Pangrams (#97)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Martin DeMello

A pangram is a sentence containing every letter of the alphabet at least once (a
famous example in English being "the quick brown fox jumps over the lazy dog").
For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

···

On Fri, Oct 06, 2006 at 10:19:10PM +0900, Ruby Quiz wrote:

by Martin DeMello

A pangram is a sentence containing every letter of the alphabet at least once (a
famous example in English being "the quick brown fox jumps over the lazy dog").
For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

Ruby Quiz <james@grayproductions.net> writes:

A pangram is a sentence containing every letter of the alphabet at least once (a
famous example in English being "the quick brown fox jumps over the lazy dog").
For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

Who has a better solution than these?

"awk chgrp df jobs lex mv tty uniq zcat"
"awk chgrp df join lex mv tty qsub zcat"

I think these are optimal for both (1) and (2).

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Here is my solution. It's not what I planned. I intended to submit an algorithm that would find all the minimal Posix pangrams, and I worked out a rather nice (IMO) recursive algorithm to do just that. But Ruby doesn't allocate enough stack space for that algorithm to run to completion <sigh>. So I've settled on an algorithm that finds just one minimal pangram, but uses pseudo-random numbers to produce a different pangram each time it is run.

My best pangram so far is

[
    "fg", "jobs", "qhold", "stty", "umask",
    "unexpand", "vi", "write", "zcat"
]

Regards, Morton

<code>
#! /usr/bin/env ruby -w

···

#
# Created by Morton Goldberg on October 08, 2006.
#
# Ruby Quiz 97 -- POSIX Pangrams
# quiz_97.3.rb

class Array

    def shuffle!
       n = size - 1
       while n > 0
          k = rand(n)
          self[k], self[n] = self[n], self[k]
          n -= 1
       end
       self
    end

end

# Removed c99, fort77, and m4 -- the complication they add is IMHO
# unedifying.
WORDS = %w[
    admin alias ar asa at awk basename batch bc bg cal cat cd cflow
    chgrp chmod chown cksum cmp comm command compress cp crontab csplit
    ctags cut cxref date dd delta df diff dirname du echo ed env ex expand
    expr false fc fg file find fold fuser gencat get getconf getopts
    grep hash head iconv id ipcrm ipcs jobs join kill lex link ln locale
    localedef logger logname lp ls mailx make man mesg mkdir mkfifo more
    mv newgrp nice nl nm nohup od paste patch pathchk pax pr printf prs ps
    pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub
    read renice rm rmdel rmdir sact sccs sed sh sleep sort split strings
    strip stty tabs tail talk tee test time touch tput tr true tsort tty
    type ulimit umask unalias uname uncompress unexpand unget uniq unlink
    uucp uudecode uuencode uustat uux val vi wait wc what who write xargs
    yacc zcat
]

# Return true if _wds_ is a pangram.
def pangram?(wds)
    wds.join.split(//).uniq.size == 26
end

# Return array giving pangram statistics:
# [<words>, <total-chars>, <repeated-chars>]
def stats(pan)
    tmp = pan.join.split(//)
    [pan.size, tmp.size, tmp.size - tmp.uniq.size]
end

# Given a pangram, return list of pangrams, where each panaram in the list
# is derived from the given one by removing one word.
def diminish(pan)
    result = pan.collect do |item|
       rest = pan - [item]
       rest if pangram?(rest)
    end
    result.compact.shuffle!
end

# Given a list of pangrams return a minimal pangram that can be derived
# from it.
def find_minimal(pans)
    pan = pans.pop
    reduced = diminish(pan)
    return pan if reduced.empty?
    find_minimal(reduced)
end

# Find a minimal pangram.
pangram = find_minimal([WORDS])
p pangram # =>
# [
# "fg", "jobs", "qhold", "stty", "umask",
# "unexpand", "vi", "write", "zcat"
# ]
p stats(pangram) # => [9, 39, 13]
</code>

I woke up this morning with the realization that the code I posted yesterday could be considerably simplified. Here is the simpler version.

Regards, Morton

<code>
#! /usr/bin/env ruby -w

···

#
# Created by Morton Goldberg on October 09, 2006.
#
# Ruby Quiz 97 -- POSIX Pangrams
# quiz_97.4.rb

# Removed c99, fort77, and m4 -- the complication they add is IMHO
# unedifying.
WORDS = %w[
    admin alias ar asa at awk basename batch bc bg cal cat cd cflow
    chgrp chmod chown cksum cmp comm command compress cp crontab csplit
    ctags cut cxref date dd delta df diff dirname du echo ed env ex expand
    expr false fc fg file find fold fuser gencat get getconf getopts
    grep hash head iconv id ipcrm ipcs jobs join kill lex link ln locale
    localedef logger logname lp ls mailx make man mesg mkdir mkfifo more
    mv newgrp nice nl nm nohup od paste patch pathchk pax pr printf prs ps
    pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub
    read renice rm rmdel rmdir sact sccs sed sh sleep sort split strings
    strip stty tabs tail talk tee test time touch tput tr true tsort tty
    type ulimit umask unalias uname uncompress unexpand unget uniq unlink
    uucp uudecode uuencode uustat uux val vi wait wc what who write xargs
    yacc zcat
]

# Return true if _wds_ is a pangram.
def pangram?(wds)
    wds.join.split(//).uniq.size == 26
end

# Return array giving pangram statistics:
# [<words>, <total-chars>, <repeated-chars>]
def stats(pan)
    tmp = pan.join.split(//)
    [pan.size, tmp.size, tmp.size - tmp.uniq.size]
end

# Given a pangram, return a pangram derived from it by removing one word.
def remove_one(pan)
    result = pan.collect do |item|
       diff = pan - [item]
       diff if pangram?(diff)
    end
    result.compact!
    result[rand(result.size)] unless result.empty?
end

# Given a pangram return a minimal pangram derived from it.
def find_minimal(pan)
    nxt = remove_one(pan)
    return pan unless nxt
    find_minimal(nxt)
end

# Find a minimal pangram.
pangram = find_minimal(WORDS)
p pangram # =>
# [
# "expr", "getconf", "jobs", "mv", "qdel",
# "type", "unlink", "what", "zcat"
# ]
p stats(pangram) # => [9, 39, 13]
</code>

Finding the pangram with the minimum number of words is NP-Hard. The
reduction is from Minimum Set Cover[1]:

MINIMUM SET COVER:
INSTANCE: Collection C of subsets of a finite set S.
SOLUTION: A set cover for S, i.e., a subset C' of C such that
every element in S belongs to at least one member of C'.
MEASURE: Cardinality of the set cover, i.e., size(C')

Reduction:
Each element in S becomes mapped to a unique letter in the alphabet used
by pangrams. (Note that in general, we can play Pangrams in any
langauage, the alphabet can be of length we want) Each subset K (which is
an element of C) becomes a word, made up of the letters corresponding to
the elements of K. Finding a pangram with the minimum number of words in
this setup would give a solution to the set-cover instance.

I suspect that the other optimization problem here may also be NP-Hard.
I'm not sure (in all of the 5 minutes that I'm writing this post) how to do
a simple reduction to that problem though.

--Ken Bloom

[1] http://www.ensta.fr/~diam//ro/online/viggo_wwwcompendium/node146.html

···

On Fri, 06 Oct 2006 22:19:10 +0900, Ruby Quiz wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Martin DeMello

A pangram is a sentence containing every letter of the alphabet at least once (a
famous example in English being "the quick brown fox jumps over the lazy dog").
For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/
I've added a signing subkey to my GPG key. Please update your keyring.

Ruby Quiz wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Martin DeMello

A pangram is a sentence containing every letter of the alphabet at least once (a
famous example in English being "the quick brown fox jumps over the lazy dog").
For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

Here's my solution to the problem. I'm not quite sure why it doesn't
find the better result already found by Ezwan as I expected it to be
optimal - please point out my error if you see it. It may well be that
making it optimal would also make it converge much more slowly though.

This uses an A* search algorithm to find the sequence with the least
number of total letters that forms a pangram (= least number of
repeated characters). The script is enormous but on my laptop (1.4GHz
Celeron M processor, 768MB memory) it manages to find the following
solution in under 0.2 seconds:

Words: qalter jobs find chgrp awk uux mv zcat tty
Length: 34
Cost: 34
Pangram? true
Unused Letters: (0)

I'll break the code into parts to make it easier to read:

···

#########################################
# Raw Data #
#########################################

LETTERS = (?a..?z)
WORDS = %w{admin alias ar asa at awk basename batch bc bg c99 cal cat
cd cflow chgrp chmod chown cksum cmp comm command compress cp crontab
csplit ctags cut cxref date dd delta df diff dirname du echo ed env ex
expand expr FALSE fc fg file find fold fort77 fuser gencat get getconf
getopts grep hash head iconv id ipcrm ipcs jobs join kill lex link ln
locale localedef logger logname lp ls m4 mailx make man mesg mkdir
mkfifo more mv newgrp nice nl nm nohup od paste patch pathchk pax pr
printf prs ps pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig
qstat qsub read renice rm rmdel rmdir sact sccs sed sh sleep sort split
strings strip stty tabs tail talk tee test time touch tput tr TRUE
tsort tty type ulimit umask unalias uname uncompress unexpand unget
uniq unlink uucp uudecode uuencode uustat uux val vi wait wc what who
write xargs yacc zcat}

#########################################
# Statistics #
#########################################

# This is all the letters used by a particular word
USED_LETTERS = Hash.new do |h, word|
  letters = word.downcase.split(//)
  letters.uniq!
  h[word] = letters.map {|l| l[0]}.select {|l| LETTERS.include?(l)}
end

# This is the length in letters of a word (auto-caching)
WORD_LENGTH = Hash.new do |h, word|
  length = 0
  word.downcase.each_byte do |byte|
    length += 1 if LETTERS.include?(byte)
  end
  h[word] = length
end

# This is the minimum number of letters that must be included to add
# a particular letter into the sequence. It is the length (letters
only)
# of the shortest word containing this letter.
LETTER_COSTS = {}
WORDS.each do |word|
  letters = Hash.new{0}
  word.each_byte {|byte| letters[byte] += 1}
  letters.each do |letter, count|
    if !LETTER_COSTS[letter] || LETTER_COSTS[letter] >
WORD_LENGTH[word]
      LETTER_COSTS[letter] = WORD_LENGTH[word]
    end
  end
end

#########################################
# Pangram Class - Search Tree States #
#########################################

class Pangram

  include Comparable

  def initialize(words = )
    @words = words
  end

  def children
    remaining = WORDS - @words
    remaining.map do |word|
      Pangram.new(@words + [word])
    end
  end

  def unused_letters
    unless @unused
      @unused = LETTERS.to_a
      @words.each do |word|
        @unused -= USED_LETTERS[word]
      end
    end
    @unused
  end

  def length
    unless @length
      @length = 0
      @words.each do |word|
        @length += WORD_LENGTH[word]
      end
    end
    @length
  end

  def complete?
    unused_letters.length == 0
  end

  # Estimate of the cost to reach the end goal.
  # Must always over-estimate
  def heuristic
    unused_letters.inject(0) {|sum, letter| sum + LETTER_COSTS[letter]}
  end

  def cost
    unless @cost
      @cost = length + heuristic
    end
    @cost
  end

  def <=>(other)
    self.cost <=> other.cost
  end

  def stats
    %~
Words: #{@words.join(" ")}
Length: #{length}
Cost: #{self.length + self.heuristic}
Pangram? #{complete?}
Unused Letters: #{unused_letters.map{|l| l.chr}.join(", ")}
(#{unused_letters.length})
    ~.strip
  end
end

#########################################
# Ordered Queue for A* Search #
#########################################

class OrderedQueue
  def initialize(array = )
    @queue = array.sort
  end

  # Add items to the queue
  # This is optimised for large queues
  def add(items)
    sorted_items = items.sort
    idx = 0
    loop do
      break if idx >= @queue.length - 1
      qitem = @queue[idx]
      while !sorted_items.empty? && sorted_items.first < qitem
        @queue.insert(idx, sorted_items.shift)
        idx += 1
      end
      idx += 1
    end
    sorted_items.each {|item| @queue << item}
  end

  def pop
    @queue.shift
  end

  def empty?
    @queue.length == 0
  end
end

#########################################
# Search implementation #
#########################################

if $0 == __FILE__
  # Start queue with the empty initial pangram
  options = OrderedQueue.new([Pangram.new])
  loop do
    # Check for exhaustion
    if options.empty?
      puts "No solution could be found"
      exit 1
    end

    # Get the best current option
    option = options.pop

    # Check for completion
    if option.complete?
      puts option.stats
      break
    end

    # Add children to options
    options.add(option.children)
  end
end

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

It's really all about trying to make swearwords, isn't it? Copper is
your friend.

On the unix front, I nominate 'mingetty'.

Martin

Logan Capaldo <logancapaldo@gmail.com> writes:

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

We use to play that because the lessons are so boring. :slight_smile:

Longest English words made up from chemical symbols:

THErMoPHOsPHOrEsCEnCe
NONRePReSEnTaTIONAlISm

Longest German words made up from chemical symbols:

INFLaTiONSINdIKAtOReN
AuSLaNdSINVEsTiTIONeN
KNOTeNReCHNErAPPLiKAtION

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

I haven't found a solution with fewer repeated characters, but

"zcat stty jobs cxref newgrp iconv cksum qhold"

uses one fewer utility

Christian Neukirchen wrote:

···

Ruby Quiz <james@grayproductions.net> writes:

Who has a better solution than these?

"awk chgrp df jobs lex mv tty uniq zcat"
"awk chgrp df join lex mv tty qsub zcat"

I think these are optimal for both (1) and (2).
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

I suspect that the other optimization problem here may also be NP-Hard.
I'm not sure (in all of the 5 minutes that I'm writing this post) how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it computationally feasible without having to rewrite it in Fortran. Phew.

Martin

Martin Coxall wrote:

It's really all about trying to make swearwords, isn't it? Copper is
your friend.

On the unix front, I nominate 'mingetty'.

My innocent eyes!

*hurries to cover the pickle jar with small white round objects floating
in formaldehyde with a cloth*

David Vallner

Christian Neukirchen wrote:

Logan Capaldo <logancapaldo@gmail.com> writes:

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

We use to play that because the lessons are so boring. :slight_smile:

Longest English words made up from chemical symbols:

THErMoPHOsPHOrEsCEnCe
NONRePReSEnTaTIONAlISm

Longest German words made up from chemical symbols:

INFLaTiONSINdIKAtOReN
AuSLaNdSINVEsTiTIONeN
KNOTeNReCHNErAPPLiKAtION

I love that kind of wordplay. I appreciate your
sharing it.

Thanks,
Hydrogen Aluminum

Isn't it supposed to use every letter?

Martin

···

On 10/9/06, camerooni@gmail.com <camerooni@gmail.com> wrote:

I haven't found a solution with fewer repeated characters, but

"zcat stty jobs cxref newgrp iconv cksum qhold"

camerooni@gmail.com writes:

I haven't found a solution with fewer repeated characters, but

"zcat stty jobs cxref newgrp iconv cksum qhold"

uses one fewer utility

That's good to know, thank you.

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

There's also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

And if you do naive searches of the entire solution space, the
number of POSIX utilities involved here is not all that trivial.

···

On Thu, 12 Oct 2006 17:06:26 +0900, Martin Coxall wrote:

I suspect that the other optimization problem here may also be NP-
Hard.
I'm not sure (in all of the 5 minutes that I'm writing this post)
how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it
computationally feasible without having to rewrite it in Fortran. Phew.

Martin

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/
I've added a signing subkey to my GPG key. Please update your keyring.

I'm under the impression that it does? What letter am I missing?

-Cameron

Martin Coxall wrote:

···

On 10/9/06, camerooni@gmail.com <camerooni@gmail.com> wrote:
> I haven't found a solution with fewer repeated characters, but
>
> "zcat stty jobs cxref newgrp iconv cksum qhold"
>

Isn't it supposed to use every letter?

Martin

Which letters are missing?

$ ruby -e 'puts(("a".."z").to_a - "zcat stty jobs cxref newgrp iconv cksum qhold".split(""))'
$

James Edward Gray II

···

On Oct 9, 2006, at 11:15 AM, Martin Coxall wrote:

On 10/9/06, camerooni@gmail.com <camerooni@gmail.com> wrote:

I haven't found a solution with fewer repeated characters, but

"zcat stty jobs cxref newgrp iconv cksum qhold"

Isn't it supposed to use every letter?

We haven't talked a lot about this in the past, but I take a pretty relaxed approach to tough quiz problems. Find a good heuristic and get close. That's fine with me.

To me, Ruby Quiz is all about keeping the brain sharp. I think programmers are often susceptible to the "That's not possible!" mentality and we have to fight against that. MJD once gave a neat speech about how when Alchemy was as old as our profession is now, they were still actively trying to turn lead into gold but we seem content to bury ourselves in rules about what we can't do.

Was this week's quiz solvable beyond a shadow of a doubt in all cases? No. Does that mean we shouldn't do it? No way. The solution I talked up in the summary finds a darn good pangram in under a second! It's within 7 characters of perfect. It maybe guesswork, but it's damn good guesswork.

I have a great respect for those who can think outside the box like that. I think we should cultivate that skill in ourselves.

Even at work, I'm the company's resident language geek, so I often see a problem and launch into a detailed plan of how we can solve it to death. Luckily, my two far more practical coworkers usually just politely point out the simple fix that is plenty good enough for our purposes. If it wasn't for them I would never accomplish a complete anything.

I'm all about thinking outside the box. I think we should all learn to love the guesswork!

There's my two cents on this issue.

James Edward Gray II

···

On Oct 12, 2006, at 10:25 AM, Ken Bloom wrote:

On Thu, 12 Oct 2006 17:06:26 +0900, Martin Coxall wrote:

I suspect that the other optimization problem here may also be NP-
Hard.
I'm not sure (in all of the 5 minutes that I'm writing this post)
how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it
computationally feasible without having to rewrite it in Fortran. Phew.

Martin

There's also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

Which letters are missing?

$ ruby -e 'puts(("a".."z").to_a - "zcat stty jobs cxref newgrp iconv
cksum qhold".split(""))'
$

Aha! So it does.

I'm just jealous that yours is smaller than mine.

Martin