TD-Gammon

	TD-Gammon TD-Gammon is a computer backgammon program developed in 1992 by Gerald Tesauro at IBM's Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, specifically TD-Lambda. TD-Gammon achieved a level of play just slightly below that of the top human backgammon players of the time. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play. Algorithm for play and learning During play, TD-Gammon examines on each turn all possible legal moves and all their possible responses (two-ply lookahead), feeds each resulting board position into its evaluation function, and chooses the move that leads to the board position that got the highest score. In this respect, TD-Gammon is no different than almost any other computer board-game program. TD-Gammon's innovation was in how it learned its evaluation function. TD-Gammon's learning algorithm consists ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Temporal-difference Learning Temporal difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, and perform updates based on current estimates, like dynamic programming methods. While Monte Carlo methods only adjust their estimates once the final outcome is known, TD methods adjust predictions to match later, more accurate, predictions about the future before the final outcome is known. (A revised version is available oRichard Sutton's publication page) This is a form of bootstrapping, as illustrated with the following example: :"Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday's weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty go ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back nearly 5,000 years to the regions of Mesopotamia and Persia. The earliest record of backgammon itself dates to 17th-century England, being descended from the 16th-century Irish (game), game of Irish.Forgeng, Johnson and Cram (2003), p. 269. Backgammon is a two-player game of contrary movement in which each player has fifteen piece (tables game), pieces, known traditionally as 'men' (short for 'tablemen') but increasingly known as 'checkers' in the US in recent decades. These pieces move along twenty-four 'point (tables game), points' according to the roll of two dice. The objective of the game is to move the fifteen pieces around the board and be first to ''bear off'', i.e., remove them from the board. The achievement of this while the opponent is still a long way behind results in a triple wi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back nearly 5,000 years to the regions of Mesopotamia and Persia. The earliest record of backgammon itself dates to 17th-century England, being descended from the 16th-century Irish (game), game of Irish.Forgeng, Johnson and Cram (2003), p. 269. Backgammon is a two-player game of contrary movement in which each player has fifteen piece (tables game), pieces, known traditionally as 'men' (short for 'tablemen') but increasingly known as 'checkers' in the US in recent decades. These pieces move along twenty-four 'point (tables game), points' according to the roll of two dice. The objective of the game is to move the fifteen pieces around the board and be first to ''bear off'', i.e., remove them from the board. The achievement of this while the opponent is still a long way behind results in a triple wi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Neurogammon Neurogammon is a computer backgammon program written by Gerald Tesauro at IBM's Thomas J. Watson Research Center. It was the first viable computer backgammon program implemented as a neural net, and set a new standard in computer backgammon play. It won the 1st Computer Olympiad in London in 1989, handily defeating all opponents. Its level of play was that of an intermediate-level human player. Neurogammon contains seven separate neural networks, each with a single hidden layer. One network makes doubling-cube decisions; the other six choose moves at different stages of the game. The networks were trained by backpropagation from transcripts of 400 games in which the author played himself. The author's move was taught as the best move in each position. In 1992, Tesauro completed TD-Gammon, which combined a form of reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Evaluation Function An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a game tree. Most of the time, the value is either a real number or a quantized integer, often in ''n''ths of the value of a playing piece such as a stone in go or a pawn in chess, where ''n'' may be tenths, hundredths or other convenient fraction, but sometimes, the value is an array of three values in the unit interval, representing the win, draw, and loss percentages of the position. There do not exist analytical or theoretical models for evaluation functions for unsolved games, nor are such functions entirely ad-hoc. The composition of evaluation functions is determined empirically by inserting a candidate function into an automaton and evaluating its subsequent performance. A significant body of evidence now exists for several games l ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to maximize a function by gradient ascent. In coordinate-free terms, the gradient of a function f(\bf) may be defined by: :df=\nabla f \cdot d\bf where ''df'' is the total infinitesimal change in ''f'' for an infinitesimal displacement d\bf, and is seen to be maximal when d\bf is in the direction of the gradi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	World Backgammon Federation The World Backgammon Federation (WBGF), formerly the European Backgammon Federation (EUBGF) until 2018, is the international body established to support and promote the tables game of backgammon worldwide. Their functions include the regulation of competition rules worldwide, the assessment and ranking of players and the establishment of regional governing bodies. Among their objectives is the legal recognition of backgammon as a mind sport at national and international levels. The WBGF is based in Schwaz, Austria.''World Backgammon Association'' at ucolours.com. Retrieved 4 March 2022. Members The national backgammon organisations of the following countries are members of the WBGF:Member Na ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Symbolic Artificial Intelligence In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems (in particular, expert systems), symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems. Symbolic AI was the dominant paradigm of AI research from the mid-1950s until the middle 1990s. Researchers in the 1960s and the 1970s were convinced that symbolic approaches would eventually succeed in creating ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Kit Woolsey Kit Woolsey (born Christopher Robin Woolsey in 1943) is an American bridge and backgammon player. He was inducted into the ACBL Hall of Fame in 2005. Personal life Woolsey was born in Washington, DC. He graduated from Oberlin College in 1964 and earned a master's degree in mathematics from the University of Illinois at Urbana–Champaign in 1965. He lives in Kensington, California with his wife, world champion finalist bridge player Sally Woolsey. Career In bridge, he was the winner of the 1986 Rosenblum Cup world teams championship. He was also runner-up in the 1982 Rosenblum Cup, 1989 Bermuda Bowl and won the Senior Teams at the 2000 World Team Olympiad, and another gold at the 2003 Senior Bowl, as well as more than a dozen American Contract Bridge League (ACBL) North American Bridge Championships (NABC-level) events. Many of his successes were in partnership with Ed Manfield. He is a World Bridge Federation (WBF) World Grand Master and was Inducted into the ACBL Hall of Fa ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as Computer program, programs. These programs enable computers to perform a wide range of tasks. A computer system is a nominally complete computer that includes the Computer hardware, hardware, operating system (main software), and peripheral equipment needed and used for full operation. This term may also refer to a group of computers that are linked and function together, such as a computer network or computer cluster. A broad range of Programmable logic controller, industrial and Consumer electronics, consumer products use computers as control systems. Simple special-purpose devices like microwave ovens and remote controls are included, as are factory devices like industrial robots and computer-aided design, as well as general-purpose devi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Learning Rate In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting. While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction. A too high learning rate will make the learning jump over minima but a too low learning rate will either take too long to converge or get stuck in an undesirable local minimum. In order to achieve faster convergence, prevent oscillations and getting stuck in undesirable ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Combinatorial Search {{no footnotes, date=January 2013 In computer science and artificial intelligence, combinatorial search studies search algorithms for solving instances of problems that are believed to be hard in general, by efficiently exploring the usually large solution space of these instances. Combinatorial search algorithms achieve this efficiency by reducing the effective size of the search space or employing heuristics. Some algorithms are guaranteed to find the optimal solution, while others may only return the best solution found in the part of the state space that was explored. Classic combinatorial search problems include solving the eight queens puzzle or evaluating moves in games with a large game tree, such as reversi or chess. A study of computational complexity theory helps to motivate combinatorial search. Combinatorial search algorithms are typically concerned with problems that are NP-hard. Such problems are not believed to be efficiently solvable in general. However, the vari ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]

Members