TD-Gammon
   HOME

TheInfoList



OR:

TD-Gammon is a
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back nearly 5,000 years to the regions of Mesopotamia and Pe ...
program developed in 1992 by Gerald Tesauro at IBM's
Thomas J. Watson Research Center The Thomas J. Watson Research Center is the headquarters for IBM Research. The center comprises three sites, with its main laboratory in Yorktown Heights, New York, U.S., 38 miles (61 km) north of New York City, Albany, New York and with ...
. Its name comes from the fact that it is an
artificial neural net Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
trained by a form of
temporal-difference learning Temporal difference (TD) learning refers to a class of Model-free (reinforcement learning), model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the envi ...
, specifically TD-Lambda. TD-Gammon achieved a level of play just slightly below that of the top human backgammon players of the time. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play.


Algorithm for play and learning

During play, TD-Gammon examines on each turn all possible legal moves and all their possible responses (two-
ply Ply, Pli, Plies or Plying may refer to: Common uses * Ply (layer), typically of paper or wood ** Plywood, made of layers of wood ** Tire ply, a layer of cords embedded in the rubber of a tire Places * Plymouth railway station, England, station ...
lookahead), feeds each resulting board position into its
evaluation function An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a g ...
, and chooses the move that leads to the board position that got the highest score. In this respect, TD-Gammon is no different than almost any other computer board-game program. TD-Gammon's innovation was in how it learned its evaluation function. TD-Gammon's learning algorithm consists of updating the weights in its neural net after each turn to reduce the difference between its evaluation of previous turns' board positions and its evaluation of the present turn's board position—hence "
temporal-difference learning Temporal difference (TD) learning refers to a class of Model-free (reinforcement learning), model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the envi ...
". The score of any board position is a set of four numbers reflecting the program's estimate of the likelihood of each possible game result: White wins normally, Black wins normally, White wins a gammon, Black wins a gammon. For the final board position of the game, the algorithm compares with the actual result of the game rather than its own evaluation of the board position. After each turn, the learning algorithm updates each weight in the neural net according to the following rule: : w_ - w_t = \alpha(Y_ - Y_t)\sum_^\lambda^ \nabla_w Y_k where: :


Experiments and stages of training

Unlike previous neural-net backgammon programs such as
Neurogammon Neurogammon is a computer backgammon program written by Gerald Tesauro at IBM's Thomas J. Watson Research Center. It was the first viable computer backgammon program implemented as a neural net, and set a new standard in computer backgammon play. ...
(also written by Tesauro), where an expert trained the program by supplying the "correct" evaluation of each position, TD-Gammon was at first programmed "knowledge-free". In early experimentation, using only a raw board encoding with no human-designed features, TD-Gammon reached a level of play comparable to Neurogammon: that of an intermediate-level human backgammon player. Even though TD-Gammon discovered insightful features on its own, Tesauro wondered if its play could be improved by using hand-designed features like Neurogammon's. Indeed, the self-training TD-Gammon with expert-designed features soon surpassed all previous computer backgammon programs. It stopped improving after about 1,500,000 games (self-play) using 80 hidden units.


Advances in backgammon theory

TD-Gammon's exclusive training through self-play (rather than tutelage) enabled it to explore strategies that humans previously had not considered or had ruled out erroneously. Its success with unorthodox strategies had a significant impact on the backgammon community. For example, on the opening play, the conventional wisdom was that given a roll of 2-1, 4-1, or 5-1, White should move a single checker from point 6 to point 5. Known as "slotting", this technique trades the risk of a hit for the opportunity to develop an aggressive position. TD-Gammon found that the more conservative play of 24-23 was superior. Tournament players began experimenting with TD-Gammon's move, and found success. Within a few years, slotting had disappeared from tournament play, though in 2006 it made a reappearance for 2-1. Backgammon expert
Kit Woolsey Kit Woolsey (born Christopher Robin Woolsey in 1943) is an American bridge and backgammon player. He was inducted into the ACBL Hall of Fame in 2005. Personal life Woolsey was born in Washington, DC. He graduated from Oberlin College in 1964 and ...
found that TD-Gammon's positional judgement, especially its weighing of risk against safety, was superior to his own or any human's. TD-Gammon's excellent positional play was undercut by occasional poor endgame play. The endgame requires a more analytical approach, sometimes with extensive lookahead. TD-Gammon's limitation to two-ply lookahead put a ceiling on what it could achieve in this part of the game. TD-Gammon's strengths and weaknesses were the opposite of
symbolic artificial intelligence In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. S ...
programs and most computer software in general: it was good at matters that require an intuitive "feel" but bad at systematic analysis.


See also

*
World Backgammon Federation The World Backgammon Federation (WBGF), formerly the European Backgammon Federation (EUBGF) until 2018, is the international body established to support and promote the tables game of backgammon worldwide. Their functions include the regulation o ...


References


External links


TD-Gammon
at IBM * {{tables games Backgammon Applied machine learning Reinforcement learning