HOME

TheInfoList



OR:

The Matchbox Educable Noughts and Crosses Engine (sometimes called the Machine Educable Noughts and Crosses Engine or MENACE) was a
mechanical computer A mechanical computer is a computer built from mechanical components such as levers and gears rather than electronic components. The most common examples are adding machines and mechanical counters, which use the turning of gears to increment outp ...
made from 304
matchbox Phillumeny (also known as phillumenism) is the hobby of collecting different match-related items: matchboxes, matchbox labels, matchbooks, matchcovers, matchsafes, etc. Matchbox A matchbox is a box made of cardboard or thin wood and designe ...
es designed and built by
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
researcher
Donald Michie Donald Michie (; 11 November 1923 – 7 July 2007) was a British researcher in artificial intelligence. During World War II, Michie worked for the Government Code and Cypher School at Bletchley Park, contributing to the effort to solve " Tunny ...
in 1961. It was designed to play human opponents in games of
noughts and crosses Tic-tac-toe (American English), noughts and crosses (Commonwealth English), or Xs and Os (Canadian or Irish English) is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid with ''X'' or ''O''. T ...
(tic-tac-toe) by returning a move for any given state of play and to refine its strategy through
reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
. Michie did not have a computer readily available, so he worked around this restriction by building it out of matchboxes. The matchboxes used by Michie each represented a single possible layout of a noughts and crosses grid. When the computer first played, it would randomly choose moves based on the current layout. As it played more games, through a reinforcement loop, it disqualified strategies that led to losing games, and supplemented strategies that led to winning games. Michie held a tournament against MENACE in 1961, wherein he experimented with different openings. Following MENACE's maiden tournament against Michie, it demonstrated successful artificial intelligence in its strategy. Michie's essays on MENACE's weight initialisation and the BOXES algorithm used by MENACE became popular in the field of computer science research. Michie was honoured for his contribution to machine learning research, and was twice commissioned to program a MENACE simulation on an actual computer.


Origin

Donald Michie Donald Michie (; 11 November 1923 – 7 July 2007) was a British researcher in artificial intelligence. During World War II, Michie worked for the Government Code and Cypher School at Bletchley Park, contributing to the effort to solve " Tunny ...
(1923–2007) had been on the team decrypting the German Tunny Code during
World War II World War II or the Second World War, often abbreviated as WWII or WW2, was a world war that lasted from 1939 to 1945. It involved the vast majority of the world's countries—including all of the great powers—forming two opposin ...
. Fifteen years later, he wanted to further display his mathematical and computational prowess with an early
convolutional neural network In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
. Since computer equipment was not obtainable for such uses, and Michie did not have a computer readily available, he decided to display and demonstrate artificial intelligence in a more esoteric format and constructed a functional
mechanical computer A mechanical computer is a computer built from mechanical components such as levers and gears rather than electronic components. The most common examples are adding machines and mechanical counters, which use the turning of gears to increment outp ...
out of matchboxes and beads. MENACE was constructed as the result of a
bet Black Entertainment Television (acronym BET) is an American basic cable channel targeting African-American audiences. It is owned by the CBS Entertainment Group unit of Paramount Global via BET Networks and has offices in New York City, Los A ...
with a computer science colleague who postulated that such a machine was impossible. Michie undertook the task of collecting and defining each matchbox as a 'fun project', later turned into a demonstration tool. Michie completed his essay on MENACE in 1963, "Experiments on the mechanization of game-learning", as well as his essay on the BOXES Algorithm, written with R. A. Chambers and had built up an AI research unit in Hope Park Square,
Edinburgh Edinburgh ( ; gd, Dùn Èideann ) is the capital city of Scotland and one of its 32 Council areas of Scotland, council areas. Historically part of the county of Midlothian (interchangeably Edinburghshire before 1921), it is located in Lothian ...
,
Scotland Scotland (, ) is a country that is part of the United Kingdom. Covering the northern third of the island of Great Britain, mainland Scotland has a border with England to the southeast and is otherwise surrounded by the Atlantic Ocean to the ...
. MENACE "learned" by playing increasing matches of noughts and crosses. Each time, it would eliminate a losing strategy by the human player confiscating the beads that corresponded to each move. It reinforced winning strategies by making the moves more likely, by supplying extra beads. This was one of the earliest versions of the Reinforcement Loop, the schematic algorithm of looping the algorithm, dropping unsuccessful strategies until only the winning ones remain. This model starts as completely random, and gradually learns.


Composition

MENACE was made from 304 matchboxes glued together in an arrangement similar to a chest of drawers.The Science Book, Second Edition, Dorling Kindersley Ltd., 2015, pg. 288 Each box had a code number, which was keyed into a chart. This chart had drawings of
tic-tac-toe Tic-tac-toe (American English), noughts and crosses (Commonwealth English), or Xs and Os (Canadian or Irish English) is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid with ''X'' or ''O''. T ...
game grids with various configurations of X's, O's and empty squares, corresponding to all possible permutations a game could go through as it progressed. After removing duplicate arrangements (ones that were simply rotations or mirror images of other configurations), MENACE used 304 permutations in its chart and thus that many matchboxes. Each individual matchbox tray contained a collection of coloured beads. Each colour represented a move on a square on the game grid, and so matchboxes with arrangements where positions on the grid were already taken would not have beads for that position. Additionally, at the front of the tray were two extra pieces of card in a "V" shape, the point of the "V" pointing at the front of the matchbox. Michie and his artificial intelligence team called MENACE's algorithm "Boxes", after the apparatus used for the machine. The first stage "Boxes" operated in five phases, each setting a definition and a precedent for the rules of the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
in relation to the game.


Operation

MENACE played first, as O, since all matchboxes represented permutations only relevant to the "X" player. To retrieve MENACE's choice of move, the opponent or operator located the matchbox that matched the current game state, or a rotation or mirror image of it. For example, at the start of a game, this would be the matchbox for an empty grid. The tray would be removed and lightly shaken so as to move the beads around. Then, the bead that had rolled into the point of the "V" shape at the front of the tray was the move MENACE had chosen to make. Its colour was then used as the position to play on, and, after accounting for any rotations or flips needed based on the chosen matchbox configuration's relation to the current grid, the O would be placed on that square. Then the player performed their move, the new state was located, a new move selected, and so on, until the game was finished. When the game had finished, the human player observed the game's outcome. As a game was played, each matchbox that was used for MENACE's turn had its tray returned to it ajar, and the bead used kept aside, so that MENACE's choice of moves and the game states they belonged to were recorded. Michie described his reinforcement system with "reward" and "punishment". Once the game was finished, if MENACE had won, it would then receive a "reward" for its victory. The removed beads showed the sequence of the winning moves. These were returned to their respective trays, easily identifiable since they were slightly open, as well as three bonus beads of the same colour. In this way, in future games MENACE would become more likely to repeat those winning moves, reinforcing winning strategies. If it lost, the removed beads were not returned, "punishing" MENACE, and meaning that in future it would be less likely, and eventually incapable if that colour of bead became absent, to repeat the moves that cause a loss. If the game was a draw, one additional bead was added to each box.


Results in practice


Optimal strategy

Noughts and crosses has a well-known optimal strategy. A player must place their symbol in a way that blocks the other player from achieving any rows while simultaneously making a row themself. However, if both players use this strategy, the game always ends in a draw. If the human player is familiar with the optimal strategy, and MENACE can quickly learn it, then the games will eventually only end in draws. The likelihood of the computer winning increases quickly when the computer plays against a random-playing opponent. When playing against a player using optimal strategy, the odds of a draw grow to 100%. In Donald Michie's official tournament against MENACE in 1961 he used optimal strategy, and he and the computer began to draw consistently after twenty games. Michie's tournamentTrial and Error, Michie Donald, Penguin Science Surveys 1961 Vol 2 had the following milestones: Michie began by consistently opening with "Variant 0", the middle square. At 15 games, MENACE abandoned all non-corner openings. At just over 20, Michie switched to consistently using "Variant 1", the bottom-right square. At 60, he returned to Variant 0. As he neared 80 games, he moved to "Variant 2", the top-middle. At 110, he switched to "Variant 3", the top right. At 135, he switched to "Variant 4", middle-right. At 190, he returned to Variant 1, and at 210, he returned to Variant 0. The trend in changes of beads in the "2" boxes runs:


Correlation

Depending on the strategy employed by the human player, MENACE produces a different trend on scatter graphs of wins. Using a random turn from the human player results in an almost-perfect positive trend. Playing the optimal strategy returns a slightly slower increase. The reinforcement does not create a perfect standard of wins; the algorithm will draw random uncertain conclusions each time. After the ''j''-th round, the correlation of near-perfect play runs: \sum_^j D^ V_i Where ''Vi'' is the outcome (+1 is win, 0 is draw and -1 is loss) and ''D'' is the decay factor (average of past values of wins and losses). Below, ''Mn'' is the multiplier for the ''n''-th round of the game.


Legacy

Donald Michie's MENACE proved that a computer could "learn" from failure and success to become good at a task. It used what would become core principles within the field of machine learning before they had been properly theorised. For example, the combination of how MENACE starts with equal numbers of types of beads in each matchbox, and how these are then selected at random, creates a learning behaviour similar to weight initialisation in modern
artificial neural networks Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
. In 1968, Donald Michie and R.A Chambers made another "BOXES"-based algorithm called GLEE, (Game Learning Expectimaxing Engine) which had to learn how to balance a pole on a cart. After the resounding reception of MENACE, Michie was invited to the US Office of Naval Research, where he was commissioned to build a "Boxes"-running program for an IBM Computer for use at
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
. Michie created a simulation program of MENACE on a
Pegasus Pegasus ( grc-gre, Πήγασος, Pḗgasos; la, Pegasus, Pegasos) is one of the best known creatures in Greek mythology. He is a winged divine stallion usually depicted as pure white in color. He was sired by Poseidon, in his role as hor ...
2 computer with the aid of D. Martin. There have been multiple recreations of MENACE in more recent years, both in its original physical form and as a computer program. Its algorithm was later converged into Christopher Watkin's Q-Learning algorithm. Although not as a functional computer, in examples of demonstration, MENACE has been used as a teaching aid for various neural network classes, including a public demonstration from University College London Researcher Matthew Scroggs. A copy of MENACE built by Scroggs was featured in the 2019
Royal Institution Christmas Lectures The Royal Institution Christmas Lectures are a series of lectures on a single topic each, which have been held at the Royal Institution in London each year since 1825, missing 1939–1942 because of the Second World War. The lectures present sci ...
, and in a 2023 episode of QI XL.


See also

*
Hexapawn Hexapawn is a deterministic two-player game invented by Martin Gardner. It is played on a rectangular board of variable size, for example on a 3×3 board or on a regular chessboard. On a board of size ''n''×''m'', each player begins with ''m' ...


References


Sources

* , Michie and R. A Chambers' paper on the AI implications of BOXES and MENACE. * {{citation , last=Russell , first=David W., year=2012 , title=The BOXES Methodology: Black Box Dynamic Control , publisher=Springer London , isbn=978-1849965286, a book on the "Boxes" algorithm employed by MENACE.


External links


Online simulation of MENACE
Machine learning Mechanical computers