[QUIZ] Numbers Can Be Words (#133)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Morton Goldberg

When working with hexadecimal numbers it is likely that you've noticed some hex
numbers are also words. For example, 'bad' and 'face' are both English words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to thinking
that it would be interesting to find out how many and which hex numbers were
also valid English words. Of course, almost immediately I started to think of
generalizations. What about other bases? What about languages other than
English?

Your mission is to pick a word list in some language (it will have be one that
uses roman letters) and write Ruby code to filter the list to extract all the
words which are valid numbers in a given base. For many bases this isn't an
interesting task--for bases 2-10, the filter comes up empty; for bases 11-13,
the filter output is uninteresting (IMO); for bases approaching 36, the filter
passes almost everything (also uninteresting IMO). However, for bases in the
range from 14 to about 22, the results can be interesting and even surprising,
especially if one constrains the filter to accept only words of some length.

I used `/usr/share/dict/words` for my word list. Participants who don't have
that list on their system or want a different one can go to Kevin's Word List
Page (http://wordlist.sourceforge.net/) as a source of other word lists.

Some points you might want to consider: Do you want to omit short words like 'a'
and 'ad'? (I made word length a parameter). Do you want to allow capitalized
words (I prohibited them)? Do you want to restrict the bases allowed (I didn't)?

I'm not sure why this quiz is being phrased as "numbers that are
words". Aren't you just asking for a program that finds words that use
only the first n letters of the alphabet? Or am I missing something
obvious (tends to happen :slight_smile: )?

Actually, one interesting variant that would tie it to numbers would
be if you could include digits that look like letters, i.e.:
0 -> O
1 -> I
2 -> Z
5 -> S
6 -> G
8 -> B

In this case, even numbers in base 10 could be words.

···

On Aug 3, 9:01 am, Ruby Quiz <ja...@grayproductions.net> wrote:

When working with hexadecimal numbers it is likely that you've noticed some hex
numbers are also words. For example, 'bad' and 'face' are both English words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to thinking
that it would be interesting to find out how many and which hex numbers were
also valid English words. Of course, almost immediately I started to think of
generalizations. What about other bases? What about languages other than
English?

Hi,

I have come up with this one-liner:

----------8<----------
puts File.readlines
('/usr/share/dict/words').grep(/\A[a-#{((b=ARGV[0].to_i)-1).to_s(b)}]+\Z/)
---------->8----------

Example:
$ ruby ./rq133_numberscanbewords_rafc.rb 16
a
abed
accede
acceded
ace
aced
ad
add
added
b
baa
baaed
babe
bad
bade
be
bead
beaded
bed
bedded
bee
beef
beefed
c
cab
cabbed
cad
cede
ceded
d
dab
dabbed
dad
dead
deaf
deb
decade
decaf
deed
deeded
deface
defaced
e
ebb
ebbed
efface
effaced
f
fa
facade
face
faced
fad
fade
faded
fed
fee
feed

Regards,
R.

···

2007/8/3, Ruby Quiz <james@grayproductions.net>:

Your mission is to pick a word list in some language (it will have be one
that
uses roman letters) and write Ruby code to filter the list to extract all
the
words which are valid numbers in a given base.

Here are some solutions to this quiz. The first solution deliberately avoids using regular expressions. Note the use of next to skip over words that are too short or capitalized and break to stop the iteration when it gets into territory beyond where numbers of the given base exist.

<code>
WORD_LIST = "/usr/share/dict/words"
WORDS = File.read(WORD_LIST).split

def number_words(base=16, min_letters=3)
    result = []
    WORDS.each do |w|
       next if w.size < min_letters || (?A..?Z).include?(w[0])
       break if w[0] > ?a + (base - 11)
       result << w if w.to_i(base).to_s(base) == w
    end
    result
end
</code>

<example>
number_words(18, 5) # => ["abaca", "abaff", "accede", "achage", "adage", "added", "adead", "aface", "ahead", "bacaba", "bacach", "bacca", "baccae", "bache", "badge", "baggage", "bagged", "beach", "beached", "beachhead", "beaded", "bebed", "bedad", "bedded", "bedead", "bedeaf", "beech", "beedged", "beefhead", "beefheaded", "beehead", "beeheaded", "begad", "behead", "behedge", "cabbage", "cabbagehead", "cabda", "cache", "cadge", "caeca", "caffa", "caged", "chafe", "chaff", "chebec", "cheecha", "dabba", "dagaba", "dagga", "dahabeah", "deadhead", "debadge", "decad", "decade", "deedeed", "deface", "degged", "dhabb", "echea", "edged", "efface", "egghead", "facade", "faced", "faded", "fadge", "feedhead", "gabgab", "gadbee", "gadded", "gadge", "gaffe", "gagee", "geggee", "hache", "haggada", "hagged", "headache", "headed", "hedge"]
</example>

The second solution uses #inject rather than #each, but doesn't seem to be much if any of an improvement. I found it interesting because it's one of few times I've ever needed to pass an argument to break and next.

<code>
WORD_LIST = "/usr/share/dict/words"
WORDS = File.read(WORD_LIST).split

def number_words(base=16, min_letters=3)
    WORDS.inject([]) do |result, w|
       next result if w.size < min_letters || (?A..?Z).include?(w[0])
       break result if w[0] > ?a + (base - 11)
       result << w if w.to_i(base).to_s(base) == w
       result
    end
end
</code>

<example>
number_words(20, 7) # => ["accidia", "accidie", "acidific", "babiche", "bacchiac", "bacchic", "bacchii", "badiaga", "baggage", "beached", "beachhead", "beedged", "beefhead", "beefheaded", "beehead", "beeheaded", "behedge", "bighead", "cabbage", "cabbagehead", "caddice", "caddiced", "caffeic", "cheecha", "cicadid", "dahabeah", "deadhead", "debadge", "debeige", "decadic", "decafid", "decided", "deedeed", "deicide", "diffide", "edifice", "egghead", "feedhead", "giffgaff", "haggada", "haggadic", "headache", "jibhead"]
</example>

In my third and last solution, I take the obvious route and use regular expressions. Maybe regular expressions are better after all.

<code>
WORD_LIST = "/usr/share/dict/words"
WORDS = File.read(WORD_LIST).split

def number_words(base=16, min_letters=3)
    biggest_digit = (?a + (base - 11))
    regex = /\A[a-#{biggest_digit.chr}]+\z/
    result = []
    WORDS.each do |w|
       next if w.size < min_letters || w =~ /^[A-Z]/
       break if w[0] > biggest_digit
       result << w if w =~ regex
    end
    result
end
</code>

The following are all the hex numbers in word list which have at least three letters.

<example>
number_words # => ["aba", "abac", "abaca", "abaff", "abb", "abed", "acca", "accede", "ace", "adad", "add", "adda", "added", "ade", "adead", "aface", "affa", "baa", "baba", "babe", "bac", "bacaba", "bacca", "baccae", "bad", "bade", "bae", "baff", "bead", "beaded", "bebed", "bed", "bedad", "bedded", "bedead", "bedeaf", "bee", "beef", "cab", "caba", "cabda", "cad", "cade", "caeca", "caffa", "cede", "cee", "dab", "dabb", "dabba", "dace", "dad", "dada", "dade", "dae", "daff", "dead", "deaf", "deb", "decad", "decade", "dee", "deed", "deedeed", "deface", "ebb", "ecad", "edea", "efface", "facade", "face", "faced", "fad", "fade", "faded", "fae", "faff", "fed", "fee", "feed"]
</example>

Regards, Morton

"Ruby Quiz" <james@grayproductions.net> wrote in message > The three rules
of Ruby Quiz:

by Morton Goldberg

Your mission is to pick a word list in some language (it will have be one
that
uses roman letters) and write Ruby code to filter the list to extract all
the
words which are valid numbers in a given base. For many bases this isn't
an
interesting task--for bases 2-10, the filter comes up empty; for bases
11-13,
the filter output is uninteresting (IMO); for bases approaching 36, the
filter
passes almost everything (also uninteresting IMO). However, for bases in
the
range from 14 to about 22, the results can be interesting and even
surprising,
especially if one constrains the filter to accept only words of some
length.

Here are my 4 solutions :slight_smile: (all use ?, so they will not work in 1.9)

# solution #1 - Simple one-liner

p File.read(ARGV[0]).split("\n").reject{|w| w !~
%r"^[a-#{(?a-11+ARGV[1].to_i).chr}]+$"}.sort_by{|w| [w.length,w]} if
(?a...?z)===?a-11+ARGV[1].to_i

# solution #2 - Non-hackery substs, like Olaf

p File.read(ARGV[0]).split("\n").reject{|w| w !~
%r"^[a-#{(?a-11+ARGV[1].to_i).chr}|lO]+$"}.sort_by{|w| [w.length,w]} if
(?a...?k)===?a-11+ARGV[1].to_i

# solution #3 - c001 hackerz

p File.read(ARGV[0]).split("\n").reject{|w| w !~
%r"^[a-#{(?a-11+ARGV[1].to_i).chr}|lo]+$"i}.map{|w|
w.downcase.gsub('o','0').gsub('l','1')}.sort_by{|w| [w.length,w]} if
(?a...?k)===?a-11+ARGV[1].to_i

# solution #4 - B16 5H0UT1N6 HACKER2

base=ARGV[1].to_i
base_=base+?a-11

raise "Bad base: [#{base}]" if base<1 || base_>?z

sub0=base_ < ?o
sub1=base>1 && base_ < ?l
sub2=base>2 && base_ < ?z
sub5=base>5 && base_ < ?s
sub6=base>6 && base_ < ?g
sub8=base>8 && base_ < ?b

reg="^["
reg<<'O' if sub0
reg<<'I' if sub1
reg<<'Z' if sub2
reg<<'S' if sub5
reg<<'G' if sub6
reg<<'B' if sub8
reg<<"|a-#{base_.chr}" if base>10
reg<<']+$'

result=File.read(ARGV[0]).split("\n").reject{|w| w !~ %r"#{reg}"i}.map{|w|
w.upcase}.sort_by{|w| [w.length,w]}
result.map!{|w| w.gsub('O','0')} if sub0
result.map!{|w| w.gsub('I','1')} if sub1
result.map!{|w| w.gsub('Z','2')} if sub2
result.map!{|w| w.gsub('S','5')} if sub5
result.map!{|w| w.gsub('G','6')} if sub6
result.map!{|w| w.gsub('B','8')} if sub8
result.reject!{|w| w !~ /[A-Z]/} # NUM8ER5-0NLY LIKE 61885 ARE N0T READA8LE
p result

Just a simple regex, the rest is just option parsing.

#!/usr/bin/env ruby -wKU

require "optparse"

options = {
  :base => 16,
  :min_length => 1,
  :word_file => "/usr/share/dict/words",
  :case_insensitive => false
}

ARGV.options do |opts|
  opts.banner = "Usage: #{File.basename($PROGRAM_NAME)} [OPTIONS]"

  opts.separator ""
  opts.separator "Specific Options:"

  opts.on( "-b", "--base BASE", Integer,
           "Specify base (default #{options[:base]})" ) do |base|
    options[:base] = base
  end

  opts.on( "-l", "--min-word-length LENGTH", Integer,
           "Specify minimum length" ) do |length|
    options[:min_length] = length
  end

  opts.on( "-w", "--word-file FILE",
           "Specify word file",
           "(default #{options[:word_file]})" ) do |word_file|
    options[:word_file] = word_file
  end

  opts.on( "-i", "--ignore-case",
           "Ignore case distinctions in word file." ) do |i|
    options[:ignore_case] = true
  end

  opts.separator "Common Options:"

  opts.on( "-h", "--help",
           "Show this message." ) do
    puts opts
    exit
  end

  begin
    opts.parse!
  rescue
    puts opts
    exit
  end
end

last_letter = (options[:base] - 1).to_s(options[:base])
letters = ("a"..last_letter).to_a.join
exit if letters.size.zero?

criteria = Regexp.new("^[#{letters}]{#{options[:min_length]},}$",
                   options[:ignore_case])

open(options[:word_file]).each do |word|
  puts word if word =~ criteria
end

···

On Aug 3, 6:01 am, Ruby Quiz <ja...@grayproductions.net> wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=

by Morton Goldberg

When working with hexadecimal numbers it is likely that you've noticed some hex
numbers are also words. For example, 'bad' and 'face' are both English words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to thinking
that it would be interesting to find out how many and which hex numbers were
also valid English words. Of course, almost immediately I started to think of
generalizations. What about other bases? What about languages other than
English?

Your mission is to pick a word list in some language (it will have be one that
uses roman letters) and write Ruby code to filter the list to extract all the
words which are valid numbers in a given base. For many bases this isn't an
interesting task--for bases 2-10, the filter comes up empty; for bases 11-13,
the filter output is uninteresting (IMO); for bases approaching 36, the filter
passes almost everything (also uninteresting IMO). However, for bases in the
range from 14 to about 22, the results can be interesting and even surprising,
especially if one constrains the filter to accept only words of some length.

I used `/usr/share/dict/words` for my word list. Participants who don't have
that list on their system or want a different one can go to Kevin's Word List
Page (http://wordlist.sourceforge.net/\) as a source of other word lists.

Some points you might want to consider: Do you want to omit short words like 'a'
and 'ad'? (I made word length a parameter). Do you want to allow capitalized
words (I prohibited them)? Do you want to restrict the bases allowed (I didn't)?

Here is my solution. I tried to make things easy to follow...

First, I create a regular expression to match all words in a number base.
This method basically generates a regex matching single words consisting of
letters in the base. Matching is case insensitive:

def get_regexp(base_num)
  # Get number of letters in the base
  num_letters = base_num - 10
  num_letters = 26 if num_letters > 26 # Cap at all letters in alphabet
  return nil if num_letters < 1 # Nothing would match

  # Create a regular expression to match all letters in the base
  end_c = ("z"[0] - (26 - num_letters)).chr # Move back from 'z' until reach
last char in the base
  regexp_str = "^([a-#{end_c}])+$" # Always starts at 'a'
  Regexp.new(regexp_str, "i")
end

Next we have a "main" method to read file, base, and length parameters from
the command line, and find all words. The code uses a boilerplate
read_words_from_file method to read the words:

if ARGV.size != 3
  puts "Usage: words_as_numbers.rb word_file number_base
minimum_word_length"
else
  word_file = ARGV[0]
  base = ARGV[1].to_i
  word_length = ARGV[2].to_i
  regexp = get_regexp(base)

  # Find all words
  if (regexp != nil)
    for word in read_words_from_file(word_file)
      if word.size >= word_length
        puts word if regexp.match(word)
      end
    end
  end
end

And here is a test run:

words_as_numbers.rb linux.words.txt 16 6

accede
acceded
beaded
bedded
beefed
decade
deeded
deface
facade
facaded

Its interesting that each subsequent base (11, 12, etc) contains all words
in the previous iteration. It would be interesting to analyze the frequency
of words found at each iteration, or create a visualization of the process.
Anyway, here is a pastie of everything: http://pastie.caboo.se/85060

Thanks,

Justin

···

On 8/3/07, Ruby Quiz <james@grayproductions.net> wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby Talk follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Morton Goldberg

When working with hexadecimal numbers it is likely that you've noticed
some hex
numbers are also words. For example, 'bad' and 'face' are both English
words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to
thinking
that it would be interesting to find out how many and which hex numbers
were
also valid English words. Of course, almost immediately I started to think
of
generalizations. What about other bases? What about languages other than
English?

Your mission is to pick a word list in some language (it will have be one
that
uses roman letters) and write Ruby code to filter the list to extract all
the
words which are valid numbers in a given base. For many bases this isn't
an
interesting task--for bases 2-10, the filter comes up empty; for bases
11-13,
the filter output is uninteresting (IMO); for bases approaching 36, the
filter
passes almost everything (also uninteresting IMO). However, for bases in
the
range from 14 to about 22, the results can be interesting and even
surprising,
especially if one constrains the filter to accept only words of some
length.

I used `/usr/share/dict/words` for my word list. Participants who don't
have
that list on their system or want a different one can go to Kevin's Word
List
Page (http://wordlist.sourceforge.net/\) as a source of other word lists.

Some points you might want to consider: Do you want to omit short words
like 'a'
and 'ad'? (I made word length a parameter). Do you want to allow
capitalized
words (I prohibited them)? Do you want to restrict the bases allowed (I
didn't)?

My solution.

robert

word-filter.rb (382 Bytes)

Great Post.I like the link.Now expecting some good ideas from your

upcoming post

···

--
Posted via http://www.ruby-forum.com/.

That's pretty much the quiz, yes. It's not too hard to solve, but the results are pretty interesting.

James Edward Gray II

···

On Aug 3, 2007, at 8:20 AM, Karl von Laudermann wrote:

On Aug 3, 9:01 am, Ruby Quiz <ja...@grayproductions.net> wrote:

When working with hexadecimal numbers it is likely that you've noticed some hex
numbers are also words. For example, 'bad' and 'face' are both English words and
valid hex numbers (2989 and 64206, respectively, in decimal). I got to thinking
that it would be interesting to find out how many and which hex numbers were
also valid English words. Of course, almost immediately I started to think of
generalizations. What about other bases? What about languages other than
English?

I'm not sure why this quiz is being phrased as "numbers that are
words". Aren't you just asking for a program that finds words that use
only the first n letters of the alphabet? Or am I missing something
obvious (tends to happen :slight_smile: )?

I used a one-liner too:

ruby -sne 'print if $_.downcase =~ /\A[\d\s#{("a".."z").to_a.join[0...($size.to_i - 10)]}]+\Z/' -- -size=12 /usr/share/dict/words

James Edward Gray II

···

On Aug 5, 2007, at 10:46 AM, Raf Coremans wrote:

I have come up with this one-liner:

----------8<----------
puts File.readlines
('/usr/share/dict/words').grep(/\A[a-#{((b=ARGV[0].to_i)-1).to_s(b)}]+\Z/)
---------->8----------

Crude but effective.

Written in about 20minutes.

···

###################################################

@words = File.new('/usr/share/dict/words').read.downcase.scan(/[a-z]+/).uniq
@chars = '0123456789abcdefghijklmnopqrstuvwxyz'

def print_matches(base,minsize=0)

   print "Base: " + base.to_s + "\n"

   alphabet = @chars[0,base]

   print "Alphabet: " + alphabet + "\n\nMatching Words:\n\n"

   @words.each do |w|

     if w.length >= minsize
       hexword = true
       w.each_byte { |c|
         if !alphabet.include?(c.chr)
           hexword = false
           break
         end
       }
       p w if hexword
     end
   end

end

print_matches 18,4

#################################################

Output:

Base: 18
Alphabet: 0123456789abcdefgh

Matching Words:

"ababdeh"
"abac"
"abaca"
"abaff"
"abba"
"abed"
"acca"
"accede"
"achage"
"ache"
"adad"
"adage"
"adda"
"added"
"adead"
"aface"
"affa"
"agade"
"agag"
"aged"
"agee"
"agha"
...

Douglas F Shearer
http://douglasfshearer.com

There is no end of numerological[0,1] variations that could be used by
anyone who feels the need for an additional challenge this week.

Regards,

Paul

[0] http://en.wikipedia.org/wiki/Numerology
[1] http://en.wikipedia.org/wiki/Kabbalah#Number-Word_mysticism

Well then, along those lines I have a Hebrew gematria counter. Give it
words on the commandline, and it will tell you what the gematria is of
those words, and what the total gematria.

I use this to check when converting Hebrew citations of Jewish books into
English for the benefit of those reading English newsgroups.

#!/usr/bin/env ruby
$KCODE = "u"
require "jcode"
require 'generator'
class String
   def u_reverse; split(//).reverse.join; end
end

‭LETTERVALUES=Hash.new(0).merge \
‭ Hash['א' => 1, 'ב' => 2, 'ג' => 3, 'ד' => 4, 'ה' => 5,
‭ 'ו' => 6, 'ז' => 7, 'ח' => 8, 'ט' => 9, 'י' => 10, 'כ' => 20
‭ 'ל' => 30, 'מ' => 40, 'נ' => 50, 'ס' => 60, 'ע' => 70, 'פ' => 80,
‭ 'צ' => 90, 'ק' => 100, 'ר' => 200, 'ש' => 300, 'ת' => 400,
‭ 'ם' => 40, 'ך' => 20 , 'ן' => 50, 'ף' => 80, 'ץ' => 90]
gematrias=ARGV.collect do |word|
   word.split(//).inject(0) do |t,l|
      t+LETTERVALUES[l]
   end
end

SyncEnumerator.new(ARGV, gematrias).each do |word,value|
   #reverse the word to print it RTL if all of the characters in it
   #are hebrew letters

   #note that this doesn't find nikudot, but then we don't care
   #anyway because the terminal mangles nikudot -- the result will be
   #so mangled anyway that we don't care whether it's reversed
   word=word.u_reverse if word.split(//)-LETTERVALUES.keys==
   printf "%s %d\n", word, value
end

printf "Total %d\n", gematrias.inject {|t,l| t+l}

···

On Fri, 03 Aug 2007 13:46:16 +0000, Paul Novak wrote:

There is no end of numerological[0,1] variations that could be used by
anyone who feels the need for an additional challenge this week.

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/