[QUIZ] English Numerals (#25)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Timothy Byrd

While we normally write numbers using Arabic (or since Quiz #22, Roman)
numerals, numbers can also be written out as English phrases.

For example:

  7 == seven (the hard way)
  42 == forty-two (a very important number)
  2001 == two thousand and one (a space odyssey)
  1999 == (party like it's) nineteen hundred and ninety-nine

So the quiz is a problem from a Pi Mu Epsilon (US national math club)
newsletter:

"When the integers 1 to 10_000_000_000 are written in the English language, then
sorted as strings, which odd number appears first in the list?"

Your mission, should you choose to accept it, is to:

- Create Ruby code to translate a number to it's English language form. (My
sample version works with integers up to 10**72-1.)

- Determine programmatically which odd number in 1..10_000_000_000 would sort
first if written in English. (Brute force is the obvious solution, but the
computer may have to think about it...)

- Would the answer change for a larger range of values, say 10**30?

- Do French and German Rubyists get a different answer than the Americans?

Extra credit:

-Add a Pinyin translator: http://www.mandarintools.com/numbers.html

Ok sounds interesting, is the rule, less than 2000

1000 -> ten hundred
1800 -> eighteen hundred
2000 -> two thousand

Also what are the rules for the "and"?

1050050 -> one million fifty thousand and fifty
or
1050050 -> one million and fifty thousand and fifty

Also we hyphenate when 20 < x > 100 (and not a multple of 10)?

···

On Sat, 26 Mar 2005 05:18:43 +0900, Ruby Quiz <james@grayproductions.net> wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Timothy Byrd

While we normally write numbers using Arabic (or since Quiz #22, Roman)
numerals, numbers can also be written out as English phrases.

For example:

        7 == seven (the hard way)
        42 == forty-two (a very important number)
        2001 == two thousand and one (a space odyssey)
        1999 == (party like it's) nineteen hundred and ninety-nine

So the quiz is a problem from a Pi Mu Epsilon (US national math club)
newsletter:

"When the integers 1 to 10_000_000_000 are written in the English language, then
sorted as strings, which odd number appears first in the list?"

Your mission, should you choose to accept it, is to:

- Create Ruby code to translate a number to it's English language form. (My
sample version works with integers up to 10**72-1.)

- Determine programmatically which odd number in 1..10_000_000_000 would sort
first if written in English. (Brute force is the obvious solution, but the
computer may have to think about it...)

- Would the answer change for a larger range of values, say 10**30?

- Do French and German Rubyists get a different answer than the Americans?

Extra credit:

-Add a Pinyin translator: http://www.mandarintools.com/numbers.html

Note: I'm afraid I could not bring myself to code up some random ill-defined method of expressing numbers in English, so I did it the way I was taught in school, using hypens and absolutely no "ands" or commas. I think I've got Strunk & White on my side. Regardless, I'll be somewhat surprised if my answer comes out very different from others.

Brute force was not a very practical option when looking for the answer, so I assumed it would be OK to do a little bit of analysis and simplify the problem. If my analysis was wrong, then I'll probably lose badly on this one. :slight_smile: There may also be a much more elegant way to express all of this, but I thought I would just give you a short brain dump here.

Every number from 1 to 10**10 can be expressed in English as a concatenation of one or more independent phrases. Each phrase expresses the value of a triplet of numerals from the original number. For example, 56_106_021 uses three phrases: "fifty-six million" for the millions triplet (56), "one hundred six thousand" for the thousands triplet (106), and "twenty-one" for the ones triplet (021). I don't allow "twelve hundred" for 1200, but I think it would only complicate the logic slightly to handle this.

I proceed to find the lowest possible phrase for each of the four triplets in the range of interest. It must be possible to express the lowest number name using a subset of the lowest possible phrases for each triplet. And since the desired number must be odd, valid subsets will all end with the ones triplet phrase.

The ones triplet range has only 500 phrases: the odd numbers from 1 to 999. The thousands triplet has 1000 phrases: the multiples of 1000 from 1000 to 999_999. Likewise, the millions triplet has 1000 phrases. The billions triplet has only ten phrases because we stop caring after 10_000_000_000 (instead of going to 999_999_999_999). Finding the four lowest phrases for each triplet requires examining only 2510 numbers. There are only eight combinations of the resulting four phrases that represent an odd number.

The result: "eight billion eight hundred eight million eight hundred eight thousand eight hundred eighty-five".

#!/usr/bin/ruby

class Integer

   Ones = %w[ zero one two three four five six seven eight nine ]
   Teen = %w[ ten eleven twelve thirteen fourteen fifteen
              sixteen seventeen eighteen nineteen ]
   Tens = %w[ zero ten twenty thirty forty fifty
              sixty seventy eighty ninety ]
   Mega = %w[ none thousand million billion ]

   def to_english
     places = to_s.split(//).collect {|s| s.to_i}.reverse
     name = []
     ((places.length + 2) / 3).times do |p|
       strings = Integer.trio(places[p * 3, 3])
       name.push(Mega[p]) if strings.length > 0 and p > 0
       name += strings
     end
     name.push(Ones[0]) unless name.length > 0
     name.reverse.join(" ")
   end

   private

   def Integer.trio(places)
     strings = []
     if places[1] == 1
       strings.push(Teen[places[0]])
     elsif places[1] and places[1] > 0
       strings.push(places[0] == 0 ? Tens[places[1]] :
                    "#{Tens[places[1]]}-#{Ones[places[0]]}")
     elsif places[0] > 0
       strings.push(Ones[places[0]])
     end
     if places[2] and places[2] > 0
       strings.push("hundred", Ones[places[2]])
     end
     strings
   end

end

# If there are command-line args, just print out English names.
if ARGV.length > 0
   ARGV.each {|arg| puts arg.to_i.to_english}
   exit 0
end

# Return the name of the number in the specified range that is the
# lowest lexically.
def minimum_english(start, stop, incr)
   min_name = start.to_english
   start.step(stop, incr) do |i|
     name = i.to_english
     min_name = name if min_name > name
   end
   min_name
end

# Find the lowest phrase for each 3-digit cluster of place-values.
# The lowest overall string must be composed of elements from this list.
components =
   [ minimum_english(10**9, 10**10, 10**9),
     minimum_english(10**6, 10**9 - 1, 10**6),
     minimum_english(10**3, 10**6 - 1, 10**3),
     minimum_english(10**0, 10**3 - 1, 2) ]

$result = components[-1]
def search_combinations(list, selected = [])
   if elem = (list = list.dup).shift
     if list.empty?
       # Always include the final element from list in the selection.
       string = selected.dup.push(elem).join(" ")
       $result = string if $result > string
     else
       search_combinations(list, selected)
       search_combinations(list, selected.dup.push(elem))
     end
   end
   $result
end

puts search_combinations(components)

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>

I decided to add a Japanese translator rather than pinyinwhatever...
since I know a smidgeon of it. Of course, my translator is probably
wrong in some cases, but what the heck. I didn't attack the
first-sorted-odd problem, but here's my number to english/japanese
translators.

class JapaneseTranslator
    # My knowledge of counting Japanese is limited, so this may not
    # be entirely correct; in particular, I don't know what rules
    # to follow after 'hyaku man' (1,000,000).
    # I also combine a digit with its group, such as 'gohyaku' rather
    # than 'go hyaku'; I just like reading it better that way.

    DIGITS = %w(zero ichi ni san shi go roku shichi hachi kyu)
    GROUPS = %w(nothingtoseeheremovealong ju hyaku sen)
    MAN = 10000

    def to_spoken(val)
        case val <=> 0
        when -1
            '- ' + to_spoken(-val)
        when 0
            DIGITS[0]
        else
            group(val, 0)
        end
    end

    private

    def group(val, level)
        if val >= MAN
            group(val / MAN, 0) + 'man ' + group(val % MAN, 0)
        else
            case val
            when 0
                ''
            when 1
                level == 0 ? DIGITS[val] : GROUPS[level]
            when 2...10
                DIGITS[val] + (GROUPS[level] if level > 0).to_s
            else
                group(val / 10, level+1) + ' ' + group(val % 10, level)
            end
        end
    end
end

class USEnglishTranslator
    # Formal, US English. Optional 'and'. Will not produce things
    # such as 'twelve hundred' but rather 'one thousand two hundred'.
    # The use of 'and' is incomplete; it is sometimes missed.

    DIGITS = %w(zero one two three four five six seven eight nine)
    TEENS = %w(ten eleven twelve thirteen fourteen fifteen sixteen
                seventeen eighteen nineteen)
    TENS = %w(hello world twenty thirty forty fifty sixty seventy
                eighty ninety)
    GROUPS = %w(thousand million billion trillion quadrillion
                quintillion sextillion septillion octillion nonillion
                decillion)
    K = 1000

    def initialize(conjunction = true)
        @conjunction = conjunction
    end

    def to_spoken(val)
        case val <=> 0
        when -1
            'negative ' + to_spoken(-val)
        when 0
            DIGITS[0]
        else
            group(val, 0).flatten.join(' ')
        end
    end

    private

    def group(val, level)
        x = group(val / K, level + 1) << GROUPS[level] if val >= K
        x.to_a << under_1000(val % K, level)
    end

    def under_1000(val, level)
        x = [DIGITS[val / 100]] << 'hundred' if val >= 100
        x.to_a << under_100(val % 100, (level == 0 and not x.nil?))
    end

    def under_100(val, junction)
        x = [('and' if @conjunction and junction)] # wyf?
        case val
        when 0
            []
        when 1...10
            x << DIGITS[val]
        when 10...20
            x << TEENS[val - 10]
        else
            d = val % 10
            x << (TENS[val / 10] + ('-' + DIGITS[d] if d != 0).to_s)
        end
    end
end

class Integer
    def to_spoken(translator = USEnglishTranslator.new)
        translator.to_spoken(self).squeeze(' ').strip
    end
end

SAMPLES = [ 0, 1, 2, 5, 10, 11, 14, 18, 20, 21, 29, 33, 42, 50, 87, 99,
            100, 101, 110, 167, 199, 200, 201, 276, 300, 314, 500, 610,
            1000, 1039, 1347, 2309, 3098, 23501, 32767, 70000, 5480283,
            2435489238, 234100090000, -42, -2001 ]

TRANSLATORS = { 'US English' => USEnglishTranslator.new,
                'Japanese' => JapaneseTranslator.new }

# main
TRANSLATORS.each do |lang, translator|
    puts
    puts lang
    puts '-' * lang.length
    SAMPLES.each do |val|
        puts "%12d => %s" % [val, val.to_spoken(translator)]
    end
end

- "When the integers 1 to 10_000_000_000 are written in the English
language, then
sorted as strings, which odd number appears first in the list?"

This is a really interesting question. There's a little more creativity in
the answer than I feel comfortable delegating to my computer at this point,
so my answer is incomplete. I haven't "determined [the answer]
programmatically" as required, I've used a computer to help me find the
answer. I figure a proper brute force would be at least a couple of days in
the solving anyway.

My answer is that "a baker's dozen" comes immediately before "a billion, a
hundred and eight thousand, a hundred and eighty five". I haven't yet
enhanced my program to give me this result :slight_smile:

- Would the answer change for a larger range of values, say 10**30? I don't
believe it would, as "billion" < ['trillion', 'quadrillion', 'quintillion',
'sextillion', 'septillion', 'octillion', 'nonillion', 'decillion',
'undecillion', 'duodecillion', 'tredecillion', 'quattuordecillion',
'quindecillion', 'sexdecillion', 'septendecillion', 'octodecillion',
'novemdecillion', 'vigintillion'].min

- Do French and German Rubyists get a different answer than the Americans?
I think the answer is either yes or no depending on whether your answer is
the number or the words. I believe the words for the French/German
interpretation are the same, but the meaning of the billion is different
(1_000_000_000_000 rather than 1_000_000_000).

Here is my helper program, in case anyone considers it relevant :slight_smile:

Cheers,
Dave

# load my translation of the Perl modules Number::Spell and
# Lingua::EN::Numericalize (see CPAN)
require 'numeric-english'

# add to_english method to integers
class Integer; include Numeric::English end

a = []

# return the english representation of the given integer if it is less than
# the one stored in memo (otherwise return memo).
inject_proc = proc do |memo, int|
if int[0] == 0 # test low bit
  memo # ignore even numbers
else
  english = int.to_english
  if english < memo
   english
  else
   memo
  end
end
end

# I'm not sure why I chose these partitions of the problem space. I think
it's
# something to do with independent sections of numbers. We group in threes.

# check numbers one to a million
a << (0..1_000_000).inject('zzz', &inject_proc)
# check numbers from a million to a billion
a << (0..1_000).map{|x| x * 1_000_000 + 1 }.inject('zzz', &inject_proc)
# check numbers from a billion to ten billion
a << (0..10).map{|x| x * 1_000_000_000 + 1 }.inject('zzz', &inject_proc)

# this result is a list of words which have to be combined in an interesting
# way that I haven't formalized. Sorry.
p a

__END__

Output:
["eight hundred eight thousand eight hundred eighty five", "eight hundred
eight
million five", "eight billion five"]

I wrote a quick program to implement the to_en (to_english) version as
well. It supports using and and commas -- the and has a significant
effect on the lowest collated odd number returned. As the brute force
solution was rather brutish (I let it run for two nights without
reaching the correct solution), I came up with an "optimized"
solution. Below is the output and below that the code - the output is
too wide -- sorry, but the code is still properly formatted :slight_smile:

Patrick

-------------- Output

···

-------------------------------------------------------------------

Using both commas and 'and'
eight billion and eighty-five

Repeat without 'and' or comma
eight billion eight hundred eight million eight hundred eight thousand
eight hundred eighty-five

-------------- Code
----------------------------------------------------------------------

class Integer
  DIVISORS = [[100, ''],
              [10, 'hundred'],
              [1000, 'thousand'],
              [1000, 'million'],
              [1000, 'billion'],
              [1000, 'trillion'],
              [1000, 'quadrillion'],
              [1000, 'quintillion'],
              [1000, 'sextillion'],
              [1000, 'septillion'],
              [1000, 'octillion'],
              [1000, 'nonillion'],
              [1000, 'decillion'],
              [1000, 'undecillion'],
              [1000, 'duodecillion'],
              [1000, 'tredecillion'],
              [1000, 'quattuordecillion'],
              [1000, 'quindecillion'],
              [1000, 'sexdecillion'],
              [1000, 'septendecillion'],
              [1000, 'octodecillion'],
              [1000, 'novemdecillion'],
              [1000, 'vigintillion'],
              [1000, 'unvigintillion'],
              [1000, 'duovigintillion'],
              [1000, 'trevigintillion'],
              [1000, 'quattuorvigintillion'],
              [1000, 'quinvigintillion'],
              [1000, 'sexvigintillion'],
              [1000, 'septvigintillion'],
              [1000, 'octovigintillion'],
              [1000, 'novemvigintillion'],
              [1000, 'trigintillion'],
              [1000, 'untrigintillion'],
              [1000, 'duotrigintillion'],
              [1000, 'tretrigintillion'],
              [1000, 'quattuortrigintillion'],
              [1000, 'quintrigintillion'],
              [1000, 'sextrigintillion'],
              [1000, 'septtrigintillion'],
              [1000, 'octotrigintillion'],
              [1000, 'novemtrigintillion'],
              [1000, 'quadragintillion'],
              [1000, 'unquadragintillion'],
              [1000, 'duoquadragintillion'],
              [1000, 'trequadragintillion'],
              [1000, 'quattuorquadragintillion'],
              [1000, 'quinquadragintillion'],
              [1000, 'sexquadragintillion'],
              [1000, 'septquadragintillion'],
              [1000, 'octoquadragintillion'],
              [1000, 'novemquadragintillion'],
              [1000, 'quinquagintillion'],
              [1000, 'unquinquagintillion'],
              [1000, 'duoquinquagintillion'],
              [1000, 'trequinquagintillion'],
              [1000, 'quattuorquinquagintillion'],
              [1000, 'quinquinquagintillion'],
              [1000, 'sexquinquagintillion'],
              [1000, 'septquinquagintillion'],
              [1000, 'octoquinquagintillion'],
              [1000, 'novemquinquagintillion'],
              [1000, 'sexagintillion'],
              [1000, 'unsexagintillion'],
              [1000, 'duosexagintillion'],
              [1000, 'tresexagintillion'],
              [1000, 'quattuorsexagintillion'],
              [1000, 'quinsexagintillion'],
              [1000, 'sexsexagintillion'],
              [1000, 'septsexagintillion'],
              [1000, 'octosexagintillion'],
              [1000, 'novemsexagintillion'],
              [1000, 'septuagintillion'],
              [1000, 'unseptuagintillion'],
              [1000, 'duoseptuagintillion'],
              [1000, 'treseptuagintillion'],
              [1000, 'quattuorseptuagintillion'],
              [1000, 'quinseptuagintillion'],
              [1000, 'sexseptuagintillion'],
              [1000, 'septseptuagintillion'],
              [1000, 'octoseptuagintillion'],
              [1000, 'novemseptuagintillion'],
              [1000, 'octogintillion'],
              [1000, 'unoctogintillion'],
              [1000, 'duooctogintillion'],
              [1000, 'treoctogintillion'],
              [1000, 'quattuoroctogintillion'],
              [1000, 'quinoctogintillion'],
              [1000, 'sexoctogintillion'],
              [1000, 'septoctogintillion'],
              [1000, 'octooctogintillion'],
              [1000, 'novemoctogintillion'],
              [1000, 'nonagintillion'],
              [1000, 'unnonagintillion'],
              [1000, 'duononagintillion'],
              [1000, 'trenonagintillion'],
              [1000, 'quattuornonagintillion'],
              [1000, 'quinnonagintillion'],
              [1000, 'sexnonagintillion'],
              [1000, 'septnonagintillion'],
              [1000, 'octononagintillion'],
              [1000, 'novemnonagintillion'],
              [1000, 'centillion'],
             ]

  ENGLISH_MAP = {
          1 => 'one',
          2 => 'two',
          3 => 'three',
          4 => 'four',
          5 => 'five',
          6 => 'six',
          7 => 'seven',
          8 => 'eight',
          9 => 'nine',
         10 => 'ten',
         11 => 'eleven',
         12 => 'twelve',
         13 => 'thirteen',
         14 => 'fourteen',
         15 => 'fifteen',
         16 => 'sixteen',
         17 => 'seventeen',
         18 => 'eighteen',
         19 => 'nineteen',
         20 => 'twenty',
         30 => 'thirty',
         40 => 'forty',
         50 => 'fifty',
         60 => 'sixty',
         70 => 'seventy',
         80 => 'eighty',
         90 => 'ninety',
  }

  ENGLISH_TEXT = {
    :and => 'and',
    :comma => ',',
    :zero => 'zero',
    :minus => 'minus'
  }

  def Integer.insert_and?(number, result, recurse)
    number > 0 &&
    result.empty? &&
    !recurse && !ENGLISH_TEXT[:and].empty?
  end

  def Integer.insert_comma?(index, result)
    (index >= 2) &&
    (
      !ENGLISH_TEXT[:and].empty? &&
      !result.empty? &&
      !result[/^#{Regexp.escape(ENGLISH_TEXT[:and])}/]
    ) ||
    (
      ENGLISH_TEXT[:and].empty? &&
      result[/\s\S+$/]
    )
  end

  def Integer.to_en(number, recurse=false)
    result = ''
    DIVISORS.each_with_index do |(divisor, name), index|
      break unless number > 0
      remainder = number % divisor
      number /= divisor
      next if (remainder == 0)

      # check for special case 1100...2000
      # if (1 == number && (1..9) === remainder && index == 1 && !recurse)

      # check for special case 1100...1999 skipping [2-9]0..
      if ((1..9) === number && (1..9) === remainder && index == 1 && !recurse)
        remainder += number * 10
        number = 0
      end

      part = ''
      case remainder
        when 1..19
          part = ENGLISH_MAP[remainder]
        when 20..99
          units = remainder % 10
          part = ENGLISH_MAP[remainder - units]
          part += '-' + ENGLISH_MAP[units] if (units != 0)
        else
          # Recurse
          part = Integer.to_en(remainder, true)
      end

      # the following section is complicated by supporting
      # commas and ands
      full_part = ''
      if insert_and?(number, result, recurse)
        full_part += ENGLISH_TEXT[:and] + ' '
      end

      full_part += part
      full_part += ' ' + name unless name.empty?

      if insert_comma?(index, result)
        full_part += ENGLISH_TEXT[:comma]
      end

      full_part += ' ' + result unless result.empty?
      result = full_part
    end
    result
  end

  def to_en
    Integer.to_en(self)
  end
end

# assumptions based upon US number to words algorithm
# - the first digit of the number in question must begin with
# the lowest number name alphabetically (this is a pretty good
# prune all by itself
#
# - the number in question must be a member of the lowest
# divisor name (alphabetically)
#

# we know the number must be eight somethings
#number = EnglishNumber::ENGLISH_MAP.values.sort
#puts number.first

# ok so we must be in the eight billions
#names = EnglishNumber::DIVISORS.map { |d,name| name }
#names = names.reject { |n| n.size == 0 }
#names = names.sort
#puts names.first

# now we re-apply the same rules as above, but
# not forgetting to "odd" rule (meaning we cannot
# use eight on the final digit
# now we know the last digit will be a five
#number = EnglishNumber::ENGLISH_MAP.to_a.reject { |k,n| k % 2 == 0 }
#number = number.map { |k,name| name }
#number.sort
#puts number.first

# So putting it all together we check all entries in
# our heavily pruned tree of choices
# numbers ending in 5 and having a combination of
# 0 and 8's

def try_some_numbers
  lowest = 'zzzz'
  number = 5
  prefix = 0
  while number < 10_000_000_000
    number = (prefix.to_s(2).gsub(/1/, '8') + '5').to_i
    name = number.to_en
    if name < lowest
      lowest = name
    end
    prefix += 1
  end
  puts lowest
end

puts "Using both commas and 'and'"
try_some_numbers

puts "\nRepeat without 'and' or comma"
Integer::ENGLISH_TEXT[:comma] = ''
Integer::ENGLISH_TEXT[:and] = ''
try_some_numbers

The writeup is coming later. For now, here is my solution.

I implemented both European (more logical - BetaMax?) and American
(will dominate - VHS?) numbers. Though I'd put in 'and's originally,
I also ran with them disabled by removing the comment mark in the
first line of find_for_interval(), so I could compare answers. First
I did brute force, then tried for something more elegant.

I have to admit that when I read Glenn Parker's solution, I discovered
my 'elegant' version was elegantly incorrect. It inspired to to fix
things and then refactor out a bit of the duplicated code.

My coding style here is very old-school & c-like. We'll see what I
learn of the Ruby Way during the write-up.

Answers:

Without 'and's or commas, I get:

english.rb 10_000_000_000

E: 808808885 - eight hundred eight million eight hundred eight
thousand eight hundred eighty-five

···

A: 8808808885 - eight billion eight hundred eight million eight
hundred eight thousand eight hundred eighty-five

english.rb 10_000_000_000_000

E: 8808808808885 - eight billion eight hundred eight milliard eight
hundred eight million eight hundred eight thousand eight hundred
eighty-five

A: 8808808885 - eight billion eight hundred eight million eight
hundred eight thousand eight hundred eighty-five

If I include the 'and's (still no commas), I get:

english.rb 10_000_000_000

E: 885 - eight hundred and eighty-five

A: 8000000885 - eight billion eight hundred and eighty-five

english.rb 10_000_000_000_000

E: 8000000000885 - eight billion eight hundred and eighty-five

A: 8000000885 - eight billion eight hundred and eighty-five

Code:

#############################################################

=begin
While we normally write numbers using Arabic (or since Quiz #22,
Roman) numerals, numbers can also be written out as English phrases.

For example:

    7 == seven (the hard way)
    42 == forty-two (a very important number)
    2001 == two thousand and one (a space odyssey)
    1999 == (party like it's) nineteen hundred and ninety-nine

So the quiz is a problem from a Pi Mu Epsilon (US national math club)
newsletter:

"When the integers 1 to 10_000_000_000 are written in the English
language, then sorted as strings, which odd number appears first in
the list?"

- Create Ruby code to translate a number to it's English language form.

- Determine programatically which odd number in 1..10_000_000_000
would sort first if written in English. (Brute force is the obvious
solution, but the computer may have to think about it...)

- Would the answer change for a larger range of values, say 10**30?

- Do French and German Rubyists get a different answer than the
Americans?

=end

module EnglishNumerals

    Numbers = {
        1 => 'one',
        2 => 'two',
        3 => 'three',
        4 => 'four',
        5 => 'five',
        6 => 'six',
        7 => 'seven',
        8 => 'eight',
        9 => 'nine',
        10 => 'ten',
        11 => 'eleven',
        12 => 'twelve',
        13 => 'thirteen',
        14 => 'fourteen',
        15 => 'fifteen',
        16 => 'sixteen',
        17 => 'seventeen',
        18 => 'eighteen',
        19 => 'nineteen',
        20 => 'twenty',
        30 => 'thirty',
        40 => 'forty',
        50 => 'fifty',
        60 => 'sixty',
        70 => 'seventy',
        80 => 'eighty',
        90 => 'ninety'
        }

    AmExponents = {
        3 => 'thousand',
        6 => 'million',
        9 => 'billion',
        12 => 'trillion',
        15 => 'quadrillion',
        18 => 'quintillion',
        21 => 'sexillion',
        24 => 'septillion',
        27 => 'octillion',
        30 => 'nonillion',
        33 => 'decillion',
        36 => 'undecillion',
        39 => 'duodecillion',
        42 => 'tredecillion',
        45 => 'quattuordecillion',
        48 => 'quindecillion',
        51 => 'sexdecillion',
        54 => 'septendecillion',
        57 => 'octodecillion',
        60 => 'novemdecillion',
        63 => 'vigintillion',
        66 => 'unvigintillion',
        69 => 'duovigintillion'
        }

    EurExponents = {
        3 => 'thousand',
        6 => 'million',
        9 => 'milliard',
        12 => 'billion',
        15 => 'billiard',
        18 => 'trillion',
        21 => 'trilliard',
        24 => 'quadrillion',
        27 => 'quadrilliard',
        30 => 'quintillion',
        33 => 'quintilliard',
        36 => 'sextillion',
        39 => 'sextilliard',
        42 => 'septillion',
        45 => 'septilliard',
        48 => 'octillion',
        51 => 'octilliard',
        54 => 'noventillion',
        57 => 'noventilliard',
        60 => 'decillion',
        63 => 'decilliard',
        66 => 'undecillion',
        69 => 'undecilliard'
        }

     Max_exponent = 69

     def self.to_English_base(val, include_and = false)

        result = ''

        sep = include_and ? ' and ' : ' ';

        if val >= 100 then
            v1 = val / 100
            result << ' ' << Numbers[v1] << ' hundred'
            val -= v1 * 100
        end

        if val >= 20 then
            v1 = val / 10
            result << sep << Numbers[v1 * 10]
            val -= v1 * 10
            sep = '-'
        end

        if val > 0 then
            result << sep << Numbers[val]
        end

        result
    end

    def self.to_English(val, eu_names = true, include_and = true)
        val = val.to_i.abs

        return "zero" if val == 0

        include_and = false if val <= 100

        exp_hash = eu_names ? EurExponents : AmExponents

        exp = Max_exponent

        result = ''

        while val > 0
            n = 10 ** exp

            if exp == 3 && val >= 1100 && val < 2000 then
                v1 = val / 100
                val -= v1 * 100
                result << to_English_base(v1, false) << ' hundred'
            elsif val >= n
                v1 = val / n
                val -= v1 * n
                result << to_English_base(v1, exp == 0 && include_and)
                if exp > 0 then
                    result << ' ' << exp_hash[exp]
                end
            end

            exp -= 3
        end

        result.strip
    end

    def self.to_American(val, include_and = true)
        to_English(val, false, include_and)
    end

    def self.lowest_brute(val, eu_names)
        lowStr = 'zero'
        lowV = 0

        (1..val).each { |v|

            next if (v & 1) == 0

            str = to_English(v, eu_names)
            if str < lowStr then
                lowV = v
                lowStr = str
            end
            }

        [ lowV, lowStr ]
    end

    def self.find_for_interval(val, exp, eu_names)

        include_and = (0 == exp) # && false

        lowStr = 'zero'
        lowV = 0

        if (exp <= 0) then
            suffix = ''
        elsif eu_names then
            suffix = ' ' + EurExponents[exp]
        else
            suffix = ' ' + AmExponents[exp]
        end

        (1..val).each { |v|

            # Only skip odd numbers if exp == 0
            #
            next if (0 == exp) && (v & 1) == 0

            str = to_English(v, eu_names, include_and) + suffix
            if str < lowStr then
                lowV = v
                lowStr = str
            end
            }

        [ lowV, lowStr ]
    end

    def self.lowest_elegant(val, eu_names)

        # get the first (exp == 0) interval
        # smallest with 'and' in 1..999
        # (cumulative answer)
        #
        exp = 0
        interval_size = val < 1000 ? val : 999;
        lowV, lowStr = find_for_interval(interval_size, exp, eu_names)

        while (val = val / 1000) > 0

            exp += 3

            # intermediate interval to tack in front of number so far
            #
            interval_size = val < 1000 ? val : 999;
            vInter, strInter =
                    find_for_interval(interval_size, exp, eu_names)

            str = strInter + ' ' + lowStr
            if str < lowStr then
                lowStr = str
                lowV = lowV + (vInter * 10 ** exp)
            end
        end

        [ lowV, lowStr ]
    end
end

# If no argument, then work in interactive mode
#
if !ARGV[0] || ARGV[0].to_i == 0 then

    $<.each_line { |line|
         puts EnglishNumerals.to_English(line.chomp)
         puts EnglishNumerals.to_American(line.chomp)
    }

else
    val = ARGV[0].to_i

    # Positive argument, try something clever...
    #
    if val > 0
        vE, strE = EnglishNumerals.lowest_elegant(val, true)
        vA, strA = EnglishNumerals.lowest_elegant(val, false)

    # Negative argument, do a brute force solution
    #
    else
        vE, strE = EnglishNumerals.lowest_brute(val.abs, true)
        vA, strA = EnglishNumerals.lowest_brute(val.abs, false)
    end

    puts "E: #{vE} - #{strE}"
    puts "A: #{vA} - #{strA}"
end

#############################################################

* Ruby Quiz (Mar 26, 2005 02:10):

- Create Ruby code to translate a number to it's English language
form. (My sample version works with integers up to 10**72-1.)

My Lisp module proves yet again to be capable of answering a Ruby Quiz:

  require 'lisp/format'
  ARGV.each{ |arg| puts Lisp.format("~R", arg.to_i) }

Enjoy,
        nikolai

···

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: minimalistic.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

When writing numbers using Arabic numerals, we customarily group them
into phrases or clauses of three digits each. These clauses are
identical and are separated by scaling factors such as 'thousand' and
'billion'. All the solutions took advantage of this repetition, though
I'd given irregular examples such as 'twelve hundred' or using 'and'
to separate the final tens and ones from the rest of the number.

Most of the solutions (though not mine, infamously) extended the
integer class to have a method for translating the number to English.

Eliah Hecht submitted the first solution. He extended the Integer
with a to_en method. Unlike the others, he only had an array for
the numbers 1..9 and defined methods to build items like seventeen or
sixty from their component syllables. His general method was to
convert the number to a string and split off the initial clause,
generate the English form of that with a degree suffix and concatenate
the result of a recursive call on the remaining part of the number.

He put in the 'and's, and his reasoned out answer of "eight billion
and eighty-five" was - I'm chagrined to say - more correct than my
"eight billion eight hundred and eighty-five". (Including 'and'
really screws up a recursive or iterative approach - or at least my
approach - to building the number, since you don't include the 'and'
without something in front of it that you may want to replace later.)

Glenn Parker gave the next solution, noting: "I'm afraid I could not
bring myself to code up some random ill-defined method of expressing
numbers in English, so I did it the way I was taught in school, using
hypens and absolutely no "ands" or commas. I think I've got Strunk &
White on my side. Regardless, I'll be somewhat surprised if my answer
comes out very different from others."

A spirited defense of the purity of the English language. And his
thoughts on finessing around the brute force approach are worth
quoting:

'Every number from 1 to 10**10 can be expressed in English as a
concatenation of one or more independent phrases. Each phrase
expresses the value of a triplet of numerals from the original number.
For example, 56_106_021 uses three phrases: "fifty-six million" for
the millions triplet (56), "one hundred six thousand" for the
thousands triplet (106), and "twenty-one" for the ones triplet (021).
I don't allow "twelve hundred" for 1200, but I think it would only
complicate the logic slightly to handle this.

'I proceed to find the lowest possible phrase for each of the four
triplets in the range of interest. It must be possible to express the
lowest number name using a subset of the lowest possible phrases for
each triplet. And since the desired number must be odd, valid subsets
will all end with the ones triplet phrase.

'The ones triplet range has only 500 phrases: the odd numbers from 1
to 999. The thousands triplet has 1000 phrases: the multiples of 1000
from 1000 to 999_999. Likewise, the millions triplet has 1000
phrases. The billions triplet has only ten phrases because we stop
caring after 10_000_000_000 (instead of going to 999_999_999_999).
Finding the four lowest phrases for each triplet requires examining
only 2510 numbers. There are only eight combinations of the resulting
four phrases that represent an odd number.

'The result: "eight billion eight hundred eight million eight hundred
eight thousand eight hundred eighty-five".'

His code is concise. He effectively splits up the number into clauses
or phrases in a single line by splitting it into an array of
individual digits, and then processes them by grabbing three at a time
from the bottom.

Matthew D Moss did a Japanese translator in addition to the English
one. He implemented his Integer.to_spoken to accept a translator as
an optional argument - used as a function object.

Dave Burt used code from "[his] translation of the Perl modules
Number::Spell and Lingua::EN::Numericalize (see CPAN)". He also
comments:

'My answer is that "a baker's dozen" comes immediately before "a
billion, a hundred and eight thousand, a hundred and eighty five". I
haven't yet enhanced my program to give me this result :)'

Patrick Hurley applied some logic to his solution, pruning his search
space down to: "numbers ending in 5 and having a combination of 0 and
8's".

Nikolai Weibull commented "My Lisp module proves yet again to be
capable of answering a Ruby Quiz".

As for me, I discovered how horribly slow my solution was. I was able
to improve it by an order of magnitude - from just over five seconds
to just over half a second. Most of the gain was from changing one
routine:

    def self.to_English(val, eu_names = true, include_and = true)
        v = val.to_i.abs
        return "zero" if v == 0
        include_and = false if v <= 100
        exp_hash = eu_names ? EurExponents : AmExponents

        a = []
        (a.push v % 1000; v /= 1000) while v > 0

        r = ''

        # allow for numbers like 'twelve hundred'

···

#
        if a[1] == 1 and a[0] >= 100 then
            a[1] = 0
            a[0] += 1000
        end

        a.each_with_index {|obj, i|
            next if obj == 0
            r = "#{to_English_base(obj, include_and && \
                i == 0)} #{exp_hash[i]}".strip + " #{r}"
            }

        r.strip
    end

I used to counting down through the entire exponent table for each
number testing if the number were larger than 10**exponent. I now
chop the number up into three digit chunks and iterate up through the
chunks. (Note that I've also changed all my hashes to arrays, but
they only helped by a few percentage points.) At first I thought this
code wouldn't support numbers like 'twelve hundred', but it turn out
to take just a few lines. It's a textbook example that one
well-placed optimization at the end can make a world of difference.

-- Timothy

"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new vocabulary."
-James D. Nicoll

Ok sounds interesting, is the rule, less than 2000

Close. It's at least 1100 and less than 2000, so:

1000 -> one thousand
1100 -> eleven hundred
1900 -> nineteen hundred
2000 -> two thousand

Also what are the rules for the "and"?

I was assuming the "and" only goes in the final three digits and only
if not a multiple of 100, so your:

1050050 -> one million fifty thousand and fifty

is what I'd pick.

Also we hyphenate when 20 < x > 100 (and not a multple of 10)?

Yes, between 20 and 100 (non-inclusive). This applies to each group of
three digits, so:

57057057 -> fifty-seven million fifty-seven thousand and fifty-seven

Hope that clears things up.

BTW, here's a fun-fact on the original post:

    7 == seven (the hard way)

    In the dice game craps, a "hardway" bet is that a number
    will come up as a pair rather than some other combination.
    For example a six would have to be a pair of 3's rather than
    5 & 1 or 4 & 2. Sevewn the hard way would require rolling a
    pair of 3.5's, which is generally considered quite difficult.
    (See The Odds: Bets - How Craps Works | HowStuffWorks )

-- Timothy

Patrick Hurley wrote:

Also what are the rules for the "and"?

1050050 -> one million fifty thousand and fifty
or
1050050 -> one million and fifty thousand and fifty

Also we hyphenate when 20 < x > 100 (and not a multple of 10)?

I was always taught there is no and (except at the decimal place), e.g. one million fifty thousand fifty and five tenths. I don't remember who taught this but I remember learning it disctinctly.

That hyphenation thing gives me a headache. What's wrong with
one hundred twenty five?

Glenn Parker wrote:

I'll be somewhat surprised if my answer comes out very different from others.

OK, not _too_ surprised. Since "and" sorts earlier than any number in English, using it can throw off the results by quite a bit.

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

[snip Timothy's nice summary]

My big thanks to Timothy for running the quiz this week, start to finish!

James Edward Gray II

Hans Fugal wrote:

Patrick Hurley wrote:

Also what are the rules for the "and"?

1050050 -> one million fifty thousand and fifty
or
1050050 -> one million and fifty thousand and fifty

Also we hyphenate when 20 < x > 100 (and not a multple of 10)?

I was always taught there is no and (except at the decimal place), e.g. one million fifty thousand fifty and five tenths. I don't remember who taught this but I remember learning it disctinctly.

I was taught this, too. They stressed it, in fact, as though it
were something that actually mattered.

That hyphenation thing gives me a headache. What's wrong with
one hundred twenty five?

I have never seen "twenty five" in English -- AFAIK it's always
hyphenated.

Hal

I was always taught there is no and (except at the decimal
place), e.g. one million fifty thousand fifty and five tenths.
I don't remember who taught this but I remember learning it
disctinctly.

Well, numbers aren't really exact things, after all. :slight_smile:

I wanted to put a little twist into the quiz, and "two thousand and
one" simply sounds right. (Though I had to break 1776 to get it.)

Here's a page with more info:

http://www.absoluteastronomy.com/encyclopedia/N/Na/Names_of_numbers_in_English.htm

-- Timothy

Timothy Byrd wrote:

I was always taught there is no and (except at the decimal
place), e.g. one million fifty thousand fifty and five tenths.
I don't remember who taught this but I remember learning it
disctinctly.

Well, numbers aren't really exact things, after all. :slight_smile:

I wanted to put a little twist into the quiz, and "two thousand and
one" simply sounds right. (Though I had to break 1776 to get it.)

Here's a page with more info:

http://www.absoluteastronomy.com/encyclopedia/N/Na/Names_of_numbers_in_English.htm

The "and" is not my only problem with that page.

For example, they think "one hundred" is singular? Please.

By the way, I hear that one hundred are expected at the
next Ruby Conf... :wink:

Hal

Ok almost done asking for information before I hide from my kids and
do the quiz :slight_smile:

One last quesiton should'nt we have comma's between number clauses?

i.e.

1_111_111_111

one billion, one hundred eleven million, one hundred eleven thousand
and one hundred eleven

Patrick

···

On Sat, 26 Mar 2005 07:54:49 +0900, Timothy Byrd <timothy@etap.com> wrote:

> I was always taught there is no and (except at the decimal
> place), e.g. one million fifty thousand fifty and five tenths.
> I don't remember who taught this but I remember learning it
> disctinctly.

Well, numbers aren't really exact things, after all. :slight_smile:

I wanted to put a little twist into the quiz, and "two thousand and
one" simply sounds right. (Though I had to break 1776 to get it.)

Here's a page with more info:

http://www.absoluteastronomy.com/encyclopedia/N/Na/Names_of_numbers_in_English.htm

-- Timothy

For example, they think "one hundred" is singular? Please.

By the way, I hear that one hundred are expected at the
next Ruby Conf... :wink:

Well, "one hundred" is a good number. :wink:

(Going even more off topic.) That just reminded me of something. In
the US, the media refer to corporations in the singular - "Microsoft is
a ...". But I seem to recall reading British articles that use
"Microsoft are going to ..."

-- Timothy

(Posting from home now)

One last quesiton should'nt we have comma's between number clauses?
i.e.
1_111_111_111
one billion, one hundred eleven million, one hundred eleven thousand
and one hundred eleven

My answer is: "if you like, go for it". Does it change the result?
(The answer to smallest odd number...)

Btw, I would have the last bit of that as "one hundred eleven thousand,
one hundred and eleven". Putting the "and" inside the last clause adds
a twist to the problem so we can see how people deal with an
inconsistency like this. Your way is like the 'comma', 'comma' & 'and'
form of English which is consistent, but 2_100 would give "two thousand
and one hundred" which sounds odd to me. (On the other hand, many
numbers don't sound right with my algorithm, either. And it's a valid
point of view to want to try to eliminate inconsistency in the first
place.)

-- Timothy

Timothy Byrd wrote:

For example, they think "one hundred" is singular? Please.

By the way, I hear that one hundred are expected at the
next Ruby Conf... :wink:

Well, "one hundred" is a good number. :wink:

(Going even more off topic.) That just reminded me of something. In
the US, the media refer to corporations in the singular - "Microsoft is
a ...". But I seem to recall reading British articles that use
"Microsoft are going to ..."

Yes. Don't get me started.

Those are called "collective nouns." They appear singular but
act as a plural and should be treated plurally. Thus "Microsoft
are" is correct, and "Microsoft is" is a failure of the US
education system.

I can see the rationale that a corporation is a single entity.
But when it's time for a pronoun, even Americans go for the
plural -- even in the same sentence:

   "Microsoft are angry, and they plan to sue." (UK/correct)
   "Microsoft is angry, and it plans to sue." (at least consistent)
   "Microsoft is angry, and they plan to sue." (American)

Hal