The TicTacToe Machine

[Home] [Puzzles & Projects] [Delphi Techniques] [Math topics] [Library] [Utilities]

Search

Search WWW

Search DelphiForFun.org

As of October, 2016, Embarcadero is offering a free release of Delphi (Delphi 10.1 Berlin Starter Edition ). There are a few restrictions, but it is a welcome step toward making more programmers aware of the joys of Delphi. They do say "Offer may be withdrawn at any time", so don't delay if you want to check it out. Please use the feedback link to let me know if the link stops working.

Support DFF - Shop

If you shop at Amazon anyway, consider using this link.

We receive a few cents from each purchase. Thanks

Support DFF - Donate

If you benefit from the website, in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal to help defray the costs. (No PayPal account necessary to donate via credit card.) Transaction is secure.

Mensa^® Daily Puzzlers

For over 15 years Mensa Page-A-Day calendars have provided several puzzles a year for my programming pleasure. Coding "solvers" is most fun, but many programs also allow user solving, convenient for "fill in the blanks" type. Below are Amazon links to the two most recent years.

Mensa^® 365 Puzzlers Calendar 2017

Mensa ^® 365 Puzzlers Calendar 2018

(Hint: If you can wait, current year calendars are usually on sale in January.)

Contact

Feedback: Send an e-mail with your comments about this program (or anything else).

Problem Description

Here is a program that can be taught to play tic-tac-toe. It learns from it's mistakes and successes to improve it's play over time. This is the virtual world version of a game that uses 304 match boxes representing the possible unique board positions for the 1st player. Each match box has a diagram of the board that it represents, and colored beads. Each bead color uniquely identifies an available cell on the board. For the machine move, a bead is selected randomly from the box representing the current board configuration (one color for each available space on the board) and an X placed in that square. The machine always plays first and uses X's. The boxes used remain open and the bead selected remains on top of the box, At the end of the game, if X wins, it is rewarded by adding beads of the selected colors to each open box. If it loses, the machine is punished by removing the beads that led to the defeat.

The program learns by playing against human or computer opponents.

Background & Techniques

This is another program based on an article in Martin Gardner's The Colossal Book of Mathematics. He describes the Naughts and Crosses machine designed by Donald Michie in 1961. Dr. Michie was a professor at the University of Edinburgh, where tic-tac-toe is known as Naughts and Crosses. (Actually all Brits know tic-tac-toe as Naughts and Crosses.) He called his machine MENACE (Matchbox Educable Naughts And Crosses Engine). Gardner devotes a full chapter to the machine, so I'll just present an outline here. Michie placed 4 beads for each of the 9 colors representing the 9 available cells in the initial move boxes, 3 beads for each of the 7 available cells for the machine's 2nd move, 2 beads for each of the 5 available cells for the machine's 3rd moves, 1 bead for each of the 3 available cells for the machine's 4th move. A 5th move might be required to fill in the last available cell, but we didn't need matchboxes and beads for that case since there is only one choice. When the machine won, each open box was rewarded with 3 beads of the color on top of the box. For draws, the reward was one bead; and for losses the bead on top of the open boxes was confiscated.

Michie reports that he trained his machine in 220 games, being beaten by the machine in 8 of the last 10 games. He probably played intelligent games to train it that rapidly. My computerized version takes 1000 or 2000 games to learn how to play well when playing against a computerized opponent that always plays randomly in any open cell. Or maybe there's just a bug in the program that makes my machine a slow learner.

Non-programmers can jump to the bottom of the page to download the executable version of the machine.

Notes for Programmers

The main problems solved were:

Selecting a representation of board configurations. My original design used 2 bits to represent the state of each of the 9 board positions forming an 18 bit integer. I decided this was overkill, and made debugging difficult so I switch to using a 9 character string made up of X, O, and - characters. Much simpler and fast enough. A TWeight record was defined to contain information about the beads associated with each box. The boxes are kept a Stringlist, Positions, and the Objects property holds pointers to the TWeight records.

Generating the virtual matchboxes - Gardner says that the Michie machine used 300 matchboxes. It took a while but I finally got down to 304 boxes. There are 2201 non-winning positions presented to the first player in the first four moves. The transforms described below reduce this to 304 positions. Note that the 2201 positions would work just fine in the computerized version, probably faster than using the transform procedure. I wanted to get down to 300 boxes "just for fun".

Procedure GenNext generates a list of positions. The transform version takes several seconds, so the resulting list is saved to a file named Positions.str and only recreated when the file does not exist.

The 2201 true board positions are transformed into the 304 virtual matchboxes by rotating and/or flipping the board. I ended up with 7 transforms, (8 counting the identity transform):

rotate right 90 degrees
rotate left 90 degrees
rotate 180 degrees
mirror (vertical flip, exchange 1st and 3rd rows)
mirror + left 90
mirror + right 90
mirror + 180

Each transform is defined by a 9 digit array with the new index positions for each of the input cells. This happens in procedure Transform. An inverse transform procedure, InvTransform, exists for putting transformed boards back in their original configuration after a move has been made. All of the transforms are symmetric except for rotate-left and rotate-right. For the other 5, performing the transform a second time get us back to the original configuration.

Selecting a move based on the relative frequency of the bead colors in a box is a little tricky. It's probably easiest to explain the procedure with an example: Assume we have 1 red bead, 2 blue beads and 3 green beads to choose from, represented by the array [1,2,3]. We want to simulate putting them all in a hat and drawing one randomly. I do it by summing the number of beads (6), selecting a random number in the range 1 to total beads, say 4, and running through the array of bead counts accumulating partial sums, until the partial sum is equals or exceeds our random number. In this case partial sums are 1, 3 and 6 so 6 is the first partial sum exceeding our random number so we select the third entry and place our X in the position represented by the green bead.

At startup time we create a 3X3 array of Tedit components to represent the playing board. An OnClick exit records opponent moves. The machine moves by using function FindTransform procedure to locate the Position list entry that corresponds to the current board configuration. The Objects property in the list contains a pointer to a TWeightRec record that has the bead counts we'll use to make the move.

Quite a bit more to explore here if you are interested.

Addendum November 23, 2004: As a viewer recently reported, on occasion. the beads remaining in a box seem to get out of sync with the available open cells. This could result in the machine placing its X in an occupied cell. I haven't found the cause of the problem yet but did post a new version today that bypasses the erroneous result. The problem was only detectable during manual play when the machine would occasionally overwrite your blocking "O" with his "X" and claim a win anyway.

January 24, 2017: Wow! Here we are 12 years later, still getting requests and getting the chance to work on this interesting project again. This month's effort was triggered by a request from an engineer with the dream building an automated MENACE machine. The original program ran OK, but didn't do a very good job of allowing closer examination.

Version 2.0 posted today, allows selection of Opponent strategies: 1) Random - the original response strategy. 2) Smart Random - handle a couple of tricky board configuration on moves 2 and 4, then choose the center or an available corner. 3) Block "X" winning cell first, then random. 4) Complete our ("O") wins if available then random. 5) (Most human-like?) Apply strategies 4, 3, 2, 1 in that order. The most sophisticated strategy now wins or ties every game. I implemented the 5th strategy after messing up and allowing the computer to win 8 out of 50 manual games. (Perfect play will always result in a draw.) It takes about 500 games with strategy 5 to get the "draw" percentage up over 80%.

Also, the AutoPlay button and Manual play now display information about the last games played (up to 100). Clicking an entry in the list will animate a playback of that game.   The Pattern display now shows the initial weights (# of beads in each cell of the board,) as well as the current weights.

Running/Exploring the Program

	Download source
	Download executable

Suggestions for Further Explorations

	Done January 2017. Michie was not playing randomly, but as an expert. ~~Would a smarter opponent train the program faster? It bothers me that this program is such a slow learner compared to Michie's version.~~
	Done January, 2017: Accomplished by providing pattern number for each game. Verbose mode will display the information for each move within each game. ~~It would be nice to highlight the list entry used to make each move.~~
	Done January 2017: "Playkeys" now provide the move sequence for each game (E.g. Playkey 7136958 says that "X"s were played in cells 7,3,9, and 8 in moves 1,3,5, and 7. "O";s were placed in cells 1, 6, and 5 in moves 2, 4, and 6.: The list of board positions (matchboxes) must be sorted in order to efficiently find our current position. It would more pleasing if the level numbers were pre-pended to the key data when the list is built and when doing look-ups so that entries appeared in move number sequence.

Original: June 10, 2002

Modified: May 15, 2018

[Feedback] [Newsletters (subscribe/view)] [About me]

Copyright © 2000-2018, Gary Darby All rights reserved.