TD-Gammon

	TD-Gammon TD-Gammon is a computer backgammon program developed in the 1990s by Gerald Tesauro at IBM's Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, specifically temporal-difference learning#TD-Lambda, TD-Lambda. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play. In 1993, TD-Gammon (version 2.1) was trained with 1.5 million games of self-play, and achieved a level of play just slightly below that of the top human backgammon players of the time. In 1998, during a 100-game series, it was defeated by the world champion by a mere margin of 8 points. Its unconventional assessment of some opening strategies had been accepted and adopted by expert players. TD-gammon is commonly cited as an early success of reinforcement learning and neural networks, and was cited in, for example, papers for deep Q-learning and AlphaGo. Algorithm for p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
picture info	Backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back at least 1,600 years. The earliest record of backgammon itself dates to 17th-century England, being descended from the 16th-century Irish (game), game of Irish.Forgeng, Johnson and Cram (2003), p. 269. Backgammon is a two-player game of contrary movement in which each player has fifteen piece (tables game), pieces known traditionally as men (short for "tablemen"), but increasingly known as "checkers" in the United States in recent decades. The backgammon table pieces move along twenty-four "point (tables game), points" according to the roll of two dice. The objective of the game is to move the fifteen pieces around the board and be first to ''bear off'', i.e., remove them from the board. The achievement of this while the opponent is still a long way behind results in a triple win known as a ' ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Gerald Tesauro Gerald J. "Gerry" Tesauro is an American computer scientist and a researcher at IBM, known for his development of TD-Gammon, a backgammon program that taught itself to play at a world-championship level through self-play and temporal difference learning, an early success in reinforcement learning and neural networks. He subsequently researched on autonomic computing, multi-agent systems for e-commerce, and contributed to the game strategy algorithms for IBM Watson. Career Education Tesauro earned a B.S. in physics from the University of Maryland, College Park. He then pursued graduate studies in plasma physics at Princeton University, supported by a Hertz Foundation Fellowship starting in 1980. He completed his Ph.D. in theoretical physics in 1986 under the supervision of Nobel laureate Philip W. Anderson. Backgammon After completing his Ph.D., he undertook postdoctoral research at the Center for Complex Systems Research, University of Illinois at Urbana-Champaign. During ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]
	Temporal-difference Learning Temporal difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, and perform updates based on current estimates, like dynamic programming methods. While Monte Carlo methods only adjust their estimates once the final outcome is known, TD methods adjust predictions to match later, more accurate, predictions about the future before the final outcome is known. This is a form of bootstrapping, as illustrated with the following example: Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday's weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty good idea of what the weather would be on Saturday – and thus be able ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] [Amazon]