[NON-SOLUTION] Learning Tic-Tac-Toe (#11)

Ok, I hope I don't become infamous for non-solutions, but I like to share what didn't work because that can be almost as instructive as what did work.

As TTT is a trivial and largely uninteresting game (as has been discussed), I was more interested in the journey than the result. I decided to brush up on my Q-Learning and so coded up a Q learner AI. I then pitted my q learner against itself and let it go for several hours. (I didn't have the time to observe it closely or the inclination to implement some reporting to make pretty graphs or whatnot) I expected that when I came back it would be drawing most of the time with itself, but it was not. It seemed to be playing more or less decent games, with each one winning roughly half the time, but very rarely would they draw. I think perhaps they were taking more risks than they ought to have, or perhaps I had my constant of randomness on their decisions too high.

Of course it's also entirely possible I have an implementation bug. :slight_smile:

The other possiblity that I'm afraid of is a theoretical bug. I'm not entirely sure q-learning is appropriate for this solution the way I implemented it. My concern is that the way I implemented it it might not be a true MDP, because the state transition is not deterministic. (The reward/next state pair is taken from when it's our turn again, and what the other player does is nondeterministic)

I left the laptop home and turned off, but I'd be happy to send the code later if anyone's interested in seeing it. :slight_smile:

I too ran into road blocks. I tried to recreate Donald Michie's famous "matchbox machine", MENACE, in Ruby. Unfortunately, i tripped myself up with my own cleverness.

I was optimizing the amount of positions the machine had to consider (see my earlier message about rotation and mirroring). My implementation was flawed though and failed to shift the weights for possible moves accordingly. This lead my AI to try bogus moves.

There's also a more subtle flaw in my matchbox/bead logic, affecting the machines ability to learn from it's mistakes. I was in the process of tracking that down when I began running into the above problem.

Both of these problems are fixable, of course, but I ran out of time this weekend. In the end, I blame myself. I over complicated it and could have kept things easier on myself. Live and learn, as the saying goes.

James Edward Gray II

ยทยทยท

On Dec 13, 2004, at 12:07 PM, Hans Fugal wrote:

Ok, I hope I don't become infamous for non-solutions, but I like to share what didn't work because that can be almost as instructive as what did work.