### Search

 Search DelphiForFun.org only

### Support DFF

If you shop at Amazon anyway,  consider using this link. We receive a few cents from each purchase.   Thanks.

### Support DFF

If you benefit from the website,  in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal  to help defray the costs.  (No PayPal account necessary to donate via credit card.)  Transaction is secure.

### Contact

 Search DelphiForFun.org only

### Problem Description

Here is a program that can be taught to play tic-tac-toe.  It learns from it's mistakes and successes to improve it's play over time.  This is the virtual world  version of  a game that uses 304 match boxes representing the possible unique board positions for the 1st player.  Each match box has a diagram of the board that it represents, and colored beads. Each bead color uniquely identifies an available cell on the board.   For the  machine  move, a  bead is selected randomly from  the  box representing the current board configuration (one color for each available space on the board) and an X placed in that square.  The machine always plays first and uses X's.   The boxes used remain open and the bead selected remains on top of the box,  At the end of the game, if X wins, it is rewarded by adding beads of the selected colors to each open box.   If it loses, the machine is punished by removing the beads that led to the defeat.

The program learns by playing against human or computer opponents.

### Background & Techniques

This is another  program based on an article in Martin Gardner's The Colossal Book of Mathematics.   He describes the Naughts and Crosses machine designed by Donald Michie in 1961.   Dr.  Michie was a professor at the University of Edinburgh, where tic-tac-toe is known as Naughts and Crosses.   (Actually all Brits know tic-tac-toe as Naughts and Crosses.)    He called his machine MENACE (Matchbox Educable Naughts And Crosses Engine).   Gardner devotes a full chapter to the machine, so I'll just present an outline  here.  Michie placed 4 beads for each of the 9 colors representing the 9 available cells in the initial move boxes, 3 beads for each of the 7 available cells for the machine's 2nd move,  2 beads for each of the 5 available cells for the machine's 3rd moves,  1 bead for each of the 3 available cells for the machine's 4th move.  A 5th move might be required to fill in the last available cell, but we didn't need matchboxes and beads for that case since there is only one choice.  When the machine won, each open box was rewarded with 3 beads of the color on top of the box.  For draws, the reward was one bead; and for losses the bead on top of the open boxes was confiscated.

Michie reports that he trained his machine in 220 games, being beaten by the machine in 8 of the last 10 games.  He probably  played intelligent games to train it that rapidly.  My computerized version   takes 1000 or 2000 games to learn how to play well when playing against a computerized opponent that always plays randomly in any open cell.   Or maybe there's just a bug in the program that makes my machine a slow learner.

#### Notes for Programmers

The main problems solved were:

 Selecting a representation of  board configurations.  My original design used 2 bits to represent the state of each of the 9 board positions forming an 18 bit integer.  I decided this was overkill, and made debugging difficult so I switch to using a 9 character string made up of X, O, and - characters.  Much simpler and fast enough.   A TWeight record was defined to contain information about the beads associated with each box.  The boxes are kept a Stringlist, Positions,  and the Objects property holds pointers to the TWeight records. Generating the virtual matchboxes - Gardner says that the Michie machine used 300 matchboxes.  It took a while but I finally got down to 304 boxes.   There are 2201 non-winning positions presented to the first player in the first four moves.   The transforms described below reduce this to 304 positions.  Note that the 2201 positions would work just fine in the computerized version, probably faster than using the transform procedure.  I wanted to get down to 300 boxes "just for fun".

Procedure GenNext generates a list of positions.  The transform version takes several seconds, so the resulting list is saved to a file named Positions.str and only recreated when the file does not exist.

 The 2201 true board positions are transformed into the 304 virtual matchboxes by rotating and/or flipping the board.   I ended up with 7 transforms, (8 counting the identity transform):  rotate right 90 degrees rotate left 90 degrees rotate 180 degrees mirror (vertical flip, exchange 1st and 3rd rows) mirror +  left 90 mirror + right 90 mirror + 180 Each transform is defined by a 9 digit array with the new index positions for each of the input cells.  This happens in procedure Transform.   An inverse transform procedure, InvTransform,  exists for putting transformed boards back in their original configuration after a move has been made.  All of the transforms are symmetric except for rotate-left and rotate-right.  For the other 5, performing the transform a second time get us back to the original configuration. Selecting a move based on the relative frequency of the bead colors in a box is a little tricky.     It's probably easiest to explain the procedure with an example:  Assume we have 1 red bead, 2 blue beads and 3 green beads to choose from, represented by the array [1,2,3].  We want to simulate putting them all in a hat and drawing one randomly.   I do it by summing the number of beads (6), selecting a random number in the range 1 to total beads, say 4, and running through the array of bead counts accumulating partial sums, until the partial sum is equals or exceeds our random number.   In this case partial sums are 1, 3 and 6 so 6 is the first partial sum exceeding our random number so we select the third entry and place our X in the position represented by the green bead.

At startup time we create a 3X3 array of Tedit components to represent the playing board.  An OnClick exit records opponent moves.  The machine moves by using function FindTransform procedure to locate the Position list entry that corresponds to the current board configuration.   The Objects property in the list contains a pointer to a TWeightRec record that has the bead counts we'll use to make the move.

Quite a bit more to explore here if you are interested.

Addendum November 23, 2004:   As a viewer recently reported, on occasion. the beads remaining in a box seem to get out of sync with the available open cells.  This could result in the machine placing its X in an occupied cell.  I haven't found the cause of the problem yet but did post a new version today that  bypasses the erroneous result.   The problem was only detectable during manual play when the machine would occasionally overwrite your blocking "O" with his "X" and claim a win anyway.