[QUIZ.SUMMARY] Dice Roller (#61)

My reason for choosing a dice roller is somewhat selfish: I was interested
to see how people would solve a problem that required parsing a
mini-language. I've written lexers and parsers before, years ago, but I
wanted to see what methods the Ruby gurus would employ.

I was not unsurprised. While there were some "traditional" solutions, there
were also a few that made me remember something I realized in past Ruby
Quizzes: it's all about pattern matching. One of the solutions to past quiz
#24 (Texas Hold'Em) showed how much power can be gained by a careful
examination of the patterns in the problem; with a few carefully built
regular expressions, some gsub calls and a bit of magic, you can turn what
looks like a complex problem into something much similar. I should have
remembered that (or, at the least, realized that someone else would).

Discussion on the list about the quiz was rather active, most of the time
getting into the nitty-gritty details of what d10 and 5d6d7 meant, and
occassionally joking about munchkins and their stats of all 18 (with a
strength of 18/00, of course). As solutions came in, it was nice to see
about four or five people making their first attempt. With them, this quiz
gets the bronze medal for submission count, behind the LCD Numbers quiz
(#14) and the Numeric Maze quiz (#60).

I found unique bits in most every solution; even those that took almost
identical approaches would often, at the least, have different regular
expressions. If you are afraid of the mighty regexp, now would be a good
time to study some samples, since the syntax for the dice expressions is
reasonably simple, and many people documented their regular expressions.

Most solutions fell into one of a few types. I'll cover each a bit and point
out anything in particular that attracted my attention.

The first class of solutions massaged the input expression a bit, then used
Ruby's eval method to do the work. This simple solution eluded me, since I
was so fixed on seeing parsing code. Apparently a bunch of you realized that
(as Paul Novak so nicely put) "we don't need no steenking parsers." A few
substitutions and a helper function or two was enough to do the trick, since
aside from the 'd' operator, the expression were valid Ruby already. Let's
take a look at Christian Neukirchen's solution:

    class Integer
      def d(n)
        (1..self).inject(0) { |a, e| a + rand(n) + 1 }
      end
    end

First off we have the random number generation; most solutions had this or a
very similar variant. So the call 3.d(6) will roll and accumulate three
six-sided dice, and the expression 3.d(6) is almost a valid dice expression.
It is in Christian's initialization method that dice expressions are turned
into Ruby expressions:

    def initialize(dice)
       @src = dice.gsub(/d(%|00)(\D|$)/, 'd100\2').
                   gsub(/d(\d+)/, 'd(\1)').
                   gsub(/(\d+|\))d/, '\1.d').
                   gsub(/\d+/) { $&.gsub(/^0+/, '') }

       @dice = eval "lambda { #@src }"
    end

(I've left out a bit of error checking; see the full solution to see
everything.) The first substitution fixes up percentage and double-zero
values as 100. The second turns 'd' as a binary operator into 'd' as a
method call. The third substitution provides the default dice count of 1
if it wasn't specified. And the last substitution removes leading zeroes of
integers; this last substitution prevents the Ruby from interpreting values
as octal.

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those substitutions
every time roll is called.

There were several variations of the eval method solution, mostly in the
regular expression substitutions. A couple variations rewrote existing
operators (rather than adding new methods to Integer or Fixnum). Rather than
change the 'd' operator into a method call, he made a slightly different
twist and rolled the dice in method_missing. Perhaps Bill didn't know that
classes could be extended, or maybe he was hacking around for fun. Dennis
Ranke had a eval-looking solution with no eval to be found, because the gsub
calls actually substituted the results during parsing. And Stefan Walk wins
for the shortest solution: three lines of hard to read Ruby code.

The second class of solutions involved using a parsing library or parser
generator tool. Three of these showed up: Pierre Barbier de Reuille used
racc, Austin Ziegler used syntax.rb, and David Tran pulled out ANTLR to
generate his parser. Each of these will let you describe your language in
BNF format, or something quite similar to it, and it will generate a Ruby
language parser for you. These can be quite powerful solutions, although
readability of the language specification and the support code varies
widely. But I do recommend looking into these tools if you have need to do
something more powerful that dice expressions; generating a complex parser
with one of these would save much time and effort over a handcoded solution.

And now we're at the third class of solutions: the handcrafted parsers.
About nine people handcrafted parsers, mostly recursive descent which is
rather easy to code. Because these generally involve more code, I won't show
much, but I suggest taking a look at some of them.

Dominik Bathon's solution, while it prefers to use the eval method as it
goes along rather than build a parse tree, is a easy to read solution and
makes good use of StringScanner. Morus Walter's solution is decent to follow
and understand and builds up an expression tree with his Expr class, which
gets evaluated at the end with a single call to roll. My own solution is
similar, but creates a tree of proc objects. Andrew McGuinness and Christer
Nilsson also have recursive descent parsers that can not only roll dice, but
calculation frequency distributions and histograms, and cheat.

I want to look now at a couple of standouts. First, let's look at Dennis
Ranke's second submission. It's a recursive descent parser, contained in his
class RDParser which is included with the submission but a little too long
to summarize here. However, what RDParser has done is allow Dennis to write
this solution:

    parser = RDParser.new do
      token(/\s+/)
      token(/\d+/) {|m| m.to_i }
      token(/./) {|m| m }

      start :expr do
        match(:expr, '+', :term) {|a, _, b| a + b }
        match(:expr, '-', :term) {|a, _, b| a - b }
        match(:term)
      end

      rule :term do
        match(:term, '*', :dice) {|a, _, b| a * b }
        match(:term, '/', :dice) {|a, _, b| a / b }
        match(:dice)
      end

      def roll(times, sides)
        (1..times).inject(0) {|a, b| a + rand(sides) + 1 }
      end

      rule :dice do
        match(:atom, 'd', :sides) {|a, _, b| roll(a, b) }
        match('d', :sides) {|_, b| roll(1, b) }
        match(:atom)
      end

      rule :sides do
        match('%') { 100 }
        match(:atom)
      end

      rule :atom do
        match(Integer)
        match('(', :expr, ')') {|_, a, _| a }
      end
    end

Do I need to say how beautiful that is? It's short, clean, easy to read and
understand... and all Ruby. Package that RDParser into a module and ship
it! I think I've found my next parsing tool...

Let's look at the guts of one more solution, by Pablo Hoch. Pablo's solution
didn't exactly fit into the other classes of solution: it's somewhere
between a parser and an eval-uator. He decided to take the infix dice
expression and turn it into a RPN (Reverse Polish Notation, that is,
postfix) expression:

    def to_rpn(infix)
       stack, rpn, last = [], [], nil
       infix.scan(/\d+|[-+*\/()d%]/) do |token|
         case token
           when /\d+/
             rpn << token
           when '%'
             rpn << "100"
           when /[-+*\/d]/
             while stack.any? && stronger(stack.last, token)
               rpn << stack.pop
             end
             rpn << "1" unless last =~ /\d+|\)|%/
             stack << token
           when '('
             stack << token
           when ')'
             while (op = stack.pop) && (op != '(')
               rpn << op
             end
         end
         last = token
       end
       while op = stack.pop
         rpn << op
       end
       rpn
    end

I love to see code that I can understand immediately, and also recognize how
it could be useful for other applications. This is definitely one of those
because postfix expressions (or prefix, to be complete) are a breeze for a
computer to evaluate:

    def roll
       stack = []
       @expr.each do |token|
         case token
           when /\d+/
             stack << token.to_i
           when /[-+*\/d]/
             b = stack.pop
             a = stack.pop
             stack << a.send(token.to_sym, b)
         end
       end
       stack.pop
    end

Thanks to everyone for the great submissions and lively discussion on the
mailing list (and the correction to the quiz that I missed!). I just hope
when we all sit down to play that I don't see any of you with all characters
stats of 18.

I want to thank Matthew Moss for a wonderful problem and a sensational summary! I also happen to know that Matthew already has another quiz in the queue for a week from tomorrow. How cool is that?

Matthew, you rock.

James Edward Gray II

···

On Jan 12, 2006, at 1:04 AM, Matthew Moss wrote:

Thanks to everyone for the great submissions and lively discussion on the
mailing list (and the correction to the quiz that I missed!).

Matthew Moss <matthew.moss.coder@gmail.com> writes:

First off we have the random number generation; most solutions had this or a
very similar variant. So the call 3.d(6) will roll and accumulate three
six-sided dice, and the expression 3.d(6) is almost a valid dice expression.
It is in Christian's initialization method that dice expressions are turned
into Ruby expressions:

    def initialize(dice)
       @src = dice.gsub(/d(%|00)(\D|$)/, 'd100\2').
                   gsub(/d(\d+)/, 'd(\1)').
                   gsub(/(\d+|\))d/, '\1.d').
                   gsub(/\d+/) { $&.gsub(/^0+/, '') }

       @dice = eval "lambda { #@src }"
    end

(I've left out a bit of error checking; see the full solution to see
everything.) The first substitution fixes up percentage and double-zero
values as 100. The second turns 'd' as a binary operator into 'd' as a
method call. The third substitution provides the default dice count of 1
if it wasn't specified. And the last substitution removes leading zeroes of
integers; this last substitution prevents the Ruby from interpreting values
as octal.

The third substitution doesn't provide a default count (Dice#d does that),
instead, it converts 2d(6) to 2.d(6).

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those substitutions
every time roll is called.

I could have evaluated @src each time without substituting anew, but a
lambda only makes it need to *parse* once.

Thanks for the nice writeup,

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

[...]
} Most solutions fell into one of a few types. I'll cover each a bit and point
} out anything in particular that attracted my attention.
}
} The first class of solutions massaged the input expression a bit, then used
} Ruby's eval method to do the work.
[...]
} The second class of solutions involved using a parsing library or parser
} generator tool.
[...]
} And now we're at the third class of solutions: the handcrafted parsers.
} About nine people handcrafted parsers, mostly recursive descent which is
} rather easy to code.
[...]

I want to go through my handcrafted parser solution, which may have gotten
missed in the shuffle. This is especially likely since I also submitted one
using the eval method. In fact, my handcrafted parser used eval, but only
for the +, -, /, and * operators.

The important bit is that all of the nodes in my syntax tree respond to
to_i with the computed result of its subexpression. This was largely to
make numbers behave the same way that my syntactic elements did, and vice
versa. My parsing method looks like this (with a fail check to make sure
nothing went wrong after each line, mainly for debugging purposes) to
create the syntax tree:

1) token_array = tokenize(expression)
  uses String#scan and checks for garbage
2) token_array = validate_and_cook(token_array)
  replaces some tokens with symbols, deals with dX => 1dX, d00 =>
  d100, d% => d100, makes sure open parens have close parens and
  the operators and operands are in grammatically appropriate places
3) parse_tokens(token_array)
   a) token_array = parse_parens(token_array)
  finds parenthesized expressions and runs the subarray through
  parse_tokens; nested parens are handled recursively. Note that
  there is no need for error checking, since the expression has
  already been validated.
   b) token_array = parse_ops(token_array, /^d$/) if token_array.length > 1
  finds all d operators and creates a DieOperator object with its
  left and right operands. Since all the operators in this grammar
  are left-associative, parse_ops is left-associative. The regex is
  used to determine if the token is the operator(s) being handled.
   c) token_array = parse_ops(token_array, /^[*\/]$/) if token_array.length > 1
     same as b, but for * and / operators. In this case, ArithOperator
  objects are created. The parse_ops method uses an OpsClass hash
  that matches the operator (d, *, /, +, or -) to the appropriate
  class to call new on.
   d) token_array = parse_ops(token_array, /^[-+]$/) if token_array.length > 1
  same as c, but for + and - operators. Still using ArithOperator
  objects.

The ArithOperator class is what uses eval. It is given the operator as a
string, and its to_i method calls eval "#{@left.to_i}#{@op}#{@right.to_i}"
to produce its value.

The points I'm trying to make are:

1) the grammar was so simple that there was no need for a proper
   recursive-descent parser
2) duck typing is way cool and let me treat numbers, DieOperators, and
   ArithOperators identically
3) a little eval let me use one class instead of four for ArithOperator
4) doing all that validation ahead of time let me replace my syntax tree
   with using eval for the whole expression without giving up my nice,
   clear failure messages

} Thanks to everyone for the great submissions and lively discussion on the
} mailing list (and the correction to the quiz that I missed!). I just hope
} when we all sit down to play that I don't see any of you with all characters
} stats of 18.

This one was loads of fun, if not as challenging as the Numeric Maze (which
was my first). It's a pity I'll be out of town this weekend and won't be
able to do the next one.

--Greg

···

On Thu, Jan 12, 2006 at 04:04:58PM +0900, Matthew Moss wrote:

Rather than change the 'd' operator into a method call, he made a slightly
different twist and rolled the dice in method_missing. Perhaps Bill didn't
know that classes could be extended, or maybe he was hacking around for fun.

Yes I could probably dominate the "cheesiest approach" category. :slight_smile:
I did extend a class in my "solution"... String#die_to_meth :stuck_out_tongue:

Thanks for a fun quiz and an excellent summary.

Regards,

Bill

···

From: "Matthew Moss" <matthew.moss.coder@gmail.com>

And I can see I already that I need to read my regexps a little more
carefully. The third regexp in Christian Neukirchen's solution doesn't
provide the default 1 to the 'd' operator; it actually turns the 'd'
operator into a method call on Integer. And the second regexp sets up
parenthesis around the number following the 'd' as arguments for that call.
Silly me....

···

On 1/12/06, James Edward Gray II <james@grayproductions.net> wrote:

On Jan 12, 2006, at 1:04 AM, Matthew Moss wrote:

> Thanks to everyone for the great submissions and lively discussion
> on the
> mailing list (and the correction to the quiz that I missed!).

I want to thank Matthew Moss for a wonderful problem and a
sensational summary! I also happen to know that Matthew already has
another quiz in the queue for a week from tomorrow. How cool is that?

Matthew, you rock.

James Edward Gray II

Ya, you did extend a class.

Man, I must really be tired. Or forgot how to read. Or had a few brain cells
short out. Or something. Or be really tired.

···

On 1/12/06, Bill Kelly <billk@cts.com> wrote:

From: "Matthew Moss" <matthew.moss.coder@gmail.com>
>
> Rather than change the 'd' operator into a method call, he made a
slightly
> different twist and rolled the dice in method_missing. Perhaps Bill
didn't
> know that classes could be extended, or maybe he was hacking around for
fun.

Yes I could probably dominate the "cheesiest approach" category. :slight_smile:
I did extend a class in my "solution"... String#die_to_meth :stuck_out_tongue:

Thanks for a fun quiz and an excellent summary.

Regards,

Bill

Hey Christian,

I didn't have time to write my own solution to quiz #61, so I'm going over yours, which is similar to what I was kicking around in my head. Unfortunately, I confused by your eval statement, which is wrapped over a new Proc object. Instead, why not wrap the lambda around the eval like so?

@dice = lambda { eval @src }

The both _seem_ to produce the same output, yet conceptually, my approach makes more sense to me. Perhaps I'm missing something crucial in my statement like...

Does eval parse @src for each call() message? Or does the parsing happen on initialization of the Proc (like I would expect)? I suppose I could trace the execution using the debugger, but I'm unaware how to step inside a Kernel method like eval or lambda with it.

~ ryan ~

···

On Jan 12, 2006, at 10:22 AM, Christian Neukirchen wrote:

Matthew Moss <matthew.moss.coder@gmail.com> writes:

@dice = eval "lambda { #@src }"

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those substitutions
every time roll is called.

I could have evaluated @src each time without substituting anew, but a
lambda only makes it need to *parse* once.

Yes. You pass lambda() a block of code that is executed each time the code it called.

James Edward Gray II

···

On Jan 13, 2006, at 1:57 PM, J. Ryan Sobol wrote:

On Jan 12, 2006, at 10:22 AM, Christian Neukirchen wrote:

Matthew Moss <matthew.moss.coder@gmail.com> writes:

@dice = eval "lambda { #@src }"

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those substitutions
every time roll is called.

I could have evaluated @src each time without substituting anew, but a
lambda only makes it need to *parse* once.

Hey Christian,

I didn't have time to write my own solution to quiz #61, so I'm going over yours, which is similar to what I was kicking around in my head. Unfortunately, I confused by your eval statement, which is wrapped over a new Proc object. Instead, why not wrap the lambda around the eval like so?

@dice = lambda { eval @src }

The both _seem_ to produce the same output, yet conceptually, my approach makes more sense to me. Perhaps I'm missing something crucial in my statement like...

Does eval parse @src for each call() message?

@dice.call has eval parse @src everytime (@src may have changed). By doing something like eval "lambda { #{@src} }" you parse @src once and get back a lambda.

···

On Jan 13, 2006, at 2:57 PM, J. Ryan Sobol wrote:

On Jan 12, 2006, at 10:22 AM, Christian Neukirchen wrote:

Matthew Moss <matthew.moss.coder@gmail.com> writes:

@dice = eval "lambda { #@src }"

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those substitutions
every time roll is called.

I could have evaluated @src each time without substituting anew, but a
lambda only makes it need to *parse* once.

Hey Christian,

I didn't have time to write my own solution to quiz #61, so I'm going over yours, which is similar to what I was kicking around in my head. Unfortunately, I confused by your eval statement, which is wrapped over a new Proc object. Instead, why not wrap the lambda around the eval like so?

@dice = lambda { eval @src }

The both _seem_ to produce the same output, yet conceptually, my approach makes more sense to me. Perhaps I'm missing something crucial in my statement like...

Does eval parse @src for each call() message? Or does the parsing happen on initialization of the Proc (like I would expect)? I suppose I could trace the execution using the debugger, but I'm unaware how to step inside a Kernel method like eval or lambda with it.

~ ryan ~

Executed yes. But when is it parsed / checked for valid grammar, syntax, etc.? Did you mean to say that the block of code is parsed AND executed for each call() it receives?

~ ryan ~

···

On Jan 13, 2006, at 3:03 PM, James Edward Gray II wrote:

Does eval parse @src for each call() message?

Yes. You pass lambda() a block of code that is executed each time the code it called.

lambda() doesn't imply that, but eval() does. It must be reparsed with each call.

James Edward Gray II

···

On Jan 13, 2006, at 2:15 PM, J. Ryan Sobol wrote:

On Jan 13, 2006, at 3:03 PM, James Edward Gray II wrote:

Does eval parse @src for each call() message?

Yes. You pass lambda() a block of code that is executed each time the code it called.

Executed yes. But when is it parsed / checked for valid grammar, syntax, etc.? Did you mean to say that the block of code is parsed AND executed for each call() it receives?

obj = lambda { eval "#{2**10}" }

(1..1_000_000).each do
   obj.call
end

$ time ruby test.rb

real 0m26.046s
user 0m24.377s
sys 0m0.358s

···

On Jan 13, 2006, at 3:15 PM, J. Ryan Sobol wrote:

On Jan 13, 2006, at 3:03 PM, James Edward Gray II wrote:

Does eval parse @src for each call() message?

Yes. You pass lambda() a block of code that is executed each time the code it called.

Executed yes. But when is it parsed / checked for valid grammar, syntax, etc.? Did you mean to say that the block of code is parsed AND executed for each call() it receives?

~ ryan ~

-------------------------------------------

obj = eval "lambda { #{2**10} }"

(1..1_000_000).each do
   obj.call
end

$ time ruby test.rb

real 0m3.132s
user 0m2.895s
sys 0m0.047s

What a difference! I'll agree that parsing of eval's string happens on every call() message in the first example. :slight_smile:

~ ryan ~