HOME

TheInfoList



OR:

In computer science, Monte Carlo tree search (MCTS) is a heuristic
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
for some kinds of
decision process In psychology, decision-making (also spelled decision making and decisionmaking) is regarded as the cognitive process resulting in the selection of a belief or a course of action among several possible alternative options. It could be either rat ...
es, most notably those employed in software that plays
board game Board games are tabletop games that typically use . These pieces are moved or placed on a pre-marked board (playing surface) and often include elements of table, card, role-playing, and miniatures games as well. Many board games feature a com ...
s. In that context MCTS is used to solve the game tree. MCTS was combined with neural networks in 2016 and has been used in multiple board games like Chess,
Shogi , also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, ''chaturanga, Xiangqi'', Indian chess, and '' janggi''. ''Shōgi' ...
,
Checkers Checkers (American English), also known as draughts (; British English), is a group of strategy board games for two players which involve diagonal moves of uniform game pieces and mandatory captures by jumping over opponent pieces. Checkers i ...
,
Backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back nearly 5,000 years to the regions of Mesopotamia and P ...
, Contract Bridge,
Computer Go Computer Go is the field of artificial intelligence (AI) dedicated to creating a computer program that plays the traditional board game Go. The field is sharply divided into two eras. Before 2015, the programs of the era were weak. The best ...
, Scrabble, and
Clobber Clobber is an abstract strategy game invented in 2001 by combinatorial game theorists Michael H. Albert, J.P. Grossman and Richard Nowakowski. It has subsequently been studied by Elwyn Berlekamp and Erik Demaine among others. Since 2005, it has ...
as well as in turn-based-strategy video games (such as Total War: Rome II's implementation in the high level campaign AI).


History


Monte Carlo method

The Monte Carlo method, which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to the 1940s. In his 1987 PhD thesis, Bruce Abramson combined minimax search with an ''expected-outcome model'' based on random game playouts to the end, instead of the usual static evaluation function. Abramson said the expected-outcome model "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent." He experimented in-depth with tic-tac-toe and then with machine-generated evaluation functions for Othello and chess. Such methods were then explored and successfully applied to heuristic search in the field of
automated theorem proving Automated theorem proving (also known as ATP or automated deduction) is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a m ...
by W. Ertel, J. Schumann and C. Suttner in 1989, thus improving the exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or
iterative deepening In computer science, iterative deepening search or more specifically iterative deepening depth-first search (IDS or IDDFS) is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with inc ...
. In 1992, B. Brügmann employed it for the first time in a Go-playing program. In 2002, Chang et al. proposed the idea of "recursive rolling out and backtracking" with "adaptive" sampling choices in their Adaptive Multi-stage Sampling (AMS) algorithm for the model of Markov decision processes. AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT (Upper Confidence Trees).


Monte Carlo tree search (MCTS)

In 2006, inspired by these predecessors, Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search, L. Kocsis and Cs. Szepesvári developed the UCT (Upper Confidence bounds applied to Trees) algorithm, and S. Gelly et al. implemented UCT in their program MoGo. In 2008, MoGo achieved dan (master) level in 9×9 Go, and the Fuego program began to win against strong amateur players in 9×9 Go. In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an amateur 2 dan player.
Google Deepmind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
developed the program
AlphaGo AlphaGo is a computer program that plays the board game Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that competed ...
, which in October 2015 became the first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19x19 board. In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating
Lee Sedol Lee Sedol ( ko, 이세돌; born 2 March 1983), or Lee Se-dol, is a former South Korean professional Go player of 9 dan rank. As of February 2016, he ranked second in international titles (18), behind only Lee Chang-ho (21). He is the f ...
in a five-game match with a final score of four games to one. AlphaGo represents a significant improvement over previous Go programs as well as a milestone in machine learning as it uses Monte Carlo tree search with
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units ...
s (a
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...
method) for policy (move selection) and value, giving it efficiency far surpassing previous programs. MCTS algorithm has also been used in programs that play other
board game Board games are tabletop games that typically use . These pieces are moved or placed on a pre-marked board (playing surface) and often include elements of table, card, role-playing, and miniatures games as well. Many board games feature a com ...
s (for example Hex, Havannah,
Game of the Amazons The Game of the Amazons (in Spanish, ''El Juego de las Amazonas;'' often called Amazons for short) is a two-player abstract strategy game invented in 1988 by Walter Zamkauskas of Argentina.. The game is played by moving pieces and blocking the op ...
, and Arimaa), real-time video games (for instance
Ms. Pac-Man is a 1982 maze arcade game developed by General Computer Corporation and published by Midway. It is the first sequel to ''Pac-Man'' (1980) and the first entry in the series to not be made by Namco. Controlling the title character, Pac-Man's wif ...
and Fable Legends), and nondeterministic games (such as skat, poker, Magic: The Gathering, or
Settlers of Catan ''Catan'', previously known as ''The Settlers of Catan'' or simply ''Settlers'', is a multiplayer board game designed by Klaus Teuber. It was first published in 1995 in Germany by Franckh-Kosmos Verlag (Kosmos) as ''Die Siedler von Catan''. P ...
).


Principle of operation

The focus of MCTS is on the analysis of the most promising moves, expanding the search tree based on
random sampling In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attemp ...
of the search space. The application of Monte Carlo tree search in games is based on many ''playouts,'' also called ''roll-outs''. In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts. The most basic way to use playouts is to apply the same number of playouts after each legal move of the current player, then choose the move which led to the most victories. The efficiency of this method—called ''Pure Monte Carlo Game Search''—often increases with time as more playouts are assigned to the moves that have frequently resulted in the current player's victory according to previous playouts. Each round of Monte Carlo tree search consists of four steps: * ''Selection'': Start from root and select successive child nodes until a leaf node is reached. The root is the current game state and a leaf is any node that has a potential child from which no simulation (playout) has yet been initiated. The section below says more about a way of biasing choice of child nodes that lets the game tree expand towards the most promising moves, which is the essence of Monte Carlo tree search. * ''Expansion'': Unless ends the game decisively (e.g. win/loss/draw) for either player, create one (or more) child nodes and choose node from one of them. Child nodes are any valid moves from the game position defined by . * ''Simulation'': Complete one random playout from node . This step is sometimes also called playout or rollout. A playout may be as simple as choosing uniform random moves until the game is decided (for example in chess, the game is won, lost, or drawn). * ''Backpropagation'': Use the result of the playout to update information in the nodes on the path from to . This graph shows the steps involved in one decision, with each node showing the ratio of wins to total playouts from that point in the game tree for the player that the node represents. In the Selection diagram, black is about to move. The root node shows there are 11 wins out of 21 playouts for white from this position so far. It complements the total of 10/21 black wins shown along the three black nodes under it, each of which represents a possible black move. Note that this graph does not follow the UCT algorithm described below. If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible, a draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1. This ensures that during selection, each player's choices expand towards the most promising moves for that player, which mirrors the goal of each player to maximize the value of their move. Rounds of search are repeated as long as the time allotted to a move remains. Then the move with the most simulations made (i.e. the highest denominator) is chosen as the final answer.


Pure Monte Carlo game search

This basic procedure can be applied to any game whose positions necessarily have a finite number of moves and finite length. For each position, all feasible moves are determined: ''k'' random games are played out to the very end, and the scores are recorded. The move leading to the best score is chosen. Ties are broken by
fair coin In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In the ...
flips. Pure Monte Carlo Game Search results in strong play in several games with random elements, as in the game '' EinStein würfelt nicht!''. It converges to optimal play (as ''k'' tends to infinity) in board filling games with random turn order, for instance in the game Hex with random turn order. DeepMind's AlphaZero replaces the simulation step with an evaluation based on a neural network.


Exploration and exploitation

The main difficulty in selecting child nodes is maintaining some balance between the ''exploitation'' of deep variants after moves with high average win rate and the ''exploration'' of moves with few simulations. The first formula for balancing exploitation and exploration in games, called UCT (''Upper Confidence Bound'' 1 ''applied to trees''), was introduced by Levente Kocsis and Csaba Szepesvári. UCT is based on the UCB1 formula derived by Auer, Cesa-Bianchi, and Fischer and the probably convergent AMS (Adaptive Multi-stage Sampling) algorithm first applied to multi-stage decision-making models (specifically, Markov Decision Processes) by Chang, Fu, Hu, and Marcus. Kocsis and Szepesvári recommend to choose in each node of the game tree the move for which the expression \frac + c\sqrt has the highest value. In this formula: * stands for the number of wins for the node considered after the -th move * stands for the number of simulations for the node considered after the -th move * stands for the total number of simulations after the -th move run by the parent node of the one considered * is the exploration parameter—theoretically equal to ; in practice usually chosen empirically The first component of the formula above corresponds to exploitation; it is high for moves with high average win ratio. The second component corresponds to exploration; it is high for moves with few simulations. Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT that traces its roots back to the AMS simulation optimization algorithm for estimating the value function in finite-horizon Markov Decision Processes (MDPs) introduced by Chang et al. (2005) in Operations Research. (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT.)


Advantages and disadvantages

Although it has been proven that the evaluation of moves in Monte Carlo tree search converges to minimax, the basic version of Monte Carlo tree search converges only in so called "Monte Carlo Perfect" games. However, Monte Carlo tree search does offer significant advantages over
alpha–beta pruning Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player games ( ...
and similar algorithms that minimize the search space. In particular, pure Monte Carlo tree search does not need an explicit
evaluation function An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a g ...
. Simply implementing the game's mechanics is sufficient to explore the search space (i.e. the generating of allowed moves in a given position and the game-end conditions). As such, Monte Carlo tree search can be employed in games without a developed theory or in general game playing. The game tree in Monte Carlo tree search grows asymmetrically as the method concentrates on the more promising subtrees. Thus, it achieves better results than classical algorithms in games with a high
branching factor In computing, tree data structures, and game theory, the branching factor is the number of children at each node, the outdegree. If this value is not uniform, an ''average branching factor'' can be calculated. For example, in chess, if a "node" i ...
. A disadvantage is that in certain positions, there may be moves that look superficially strong, but that actually lead to a loss via a subtle line of play. Such "trap states" require thorough analysis to be handled correctly, particularly when playing against an expert player; however, MCTS may not "see" such lines due to its policy of selective node expansion. It is believed that this may have been part of the reason for AlphaGo's loss in its fourth game against Lee Sedol. In essence, the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar".


Improvements

Various modifications of the basic Monte Carlo tree search method have been proposed to shorten the search time. Some employ domain-specific expert knowledge, others do not. Monte Carlo tree search can use either ''light'' or ''heavy'' playouts. Light playouts consist of random moves while heavy playouts apply various heuristics to influence the choice of moves. These heuristics may employ the results of previous playouts (e.g. the Last Good Reply heuristic) or expert knowledge of a given game. For instance, in many Go-playing programs certain stone patterns in a portion of the board influence the probability of moving into that area. Paradoxically, playing suboptimally in simulations sometimes makes a Monte Carlo tree search program play stronger overall. Domain-specific knowledge may be employed when building the game tree to help the exploitation of some variants. One such method assigns nonzero ''priors'' to the number of won and played simulations when creating each child node, leading to artificially raised or lowered average win rates that cause the node to be chosen more or less frequently, respectively, in the selection step. A related method, called ''progressive bias'', consists in adding to the UCB1 formula a \frac element, where is a heuristic score of the -th move. The basic Monte Carlo tree search collects enough information to find the most promising moves only after many rounds; until then its moves are essentially random. This exploratory phase may be reduced significantly in a certain class of games using RAVE (''Rapid Action Value Estimation''). In these games, permutations of a sequence of moves lead to the same position. Typically, they are board games in which a move involves placement of a piece or a stone on the board. In such games the value of each move is often only slightly influenced by other moves. In RAVE, for a given game tree node , its child nodes store not only the statistics of wins in playouts started in node but also the statistics of wins in all playouts started in node and below it, if they contain move (also when the move was played in the tree, between node and a playout). This way the contents of tree nodes are influenced not only by moves played immediately in a given position but also by the same moves played later. When using RAVE, the selection step selects the node, for which the modified UCB1 formula (1-\beta(n_i, \tilde_i))\frac + \beta(n_i, \tilde_i)\frac + c\sqrt has the highest value. In this formula, \tilde_i and \tilde_i stand for the number of won playouts containing move and the number of all playouts containing move , and the \beta(n_i, \tilde_i) function should be close to one and to zero for relatively small and relatively big and \tilde_i, respectively. One of many formulas for \beta(n_i, \tilde_i), proposed by D. Silver, says that in balanced positions one can take \beta(n_i, \tilde_i)=\frac, where is an empirically chosen constant. Heuristics used in Monte Carlo tree search often require many parameters. There are automated methods to tune the parameters to maximize the win rate. Monte Carlo tree search can be concurrently executed by many threads or processes. There are several fundamentally different methods of its
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of IB ...
execution: * ''Leaf parallelization'', i.e. parallel execution of many playouts from one leaf of the game tree. * ''Root parallelization'', i.e. building independent game trees in parallel and making the move basing on the root-level branches of all these trees. * ''Tree parallelization'', i.e. parallel building of the same game tree, protecting data from simultaneous writes either with one, global mutex, with more mutexes, or with non-blocking synchronization.


See also

*
AlphaGo AlphaGo is a computer program that plays the board game Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that competed ...
, a Go program using Monte Carlo tree search, reinforcement learning and
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...
. *
AlphaGo Zero AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal ''Nature'' on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any ...
, an updated Go program using Monte Carlo tree search, reinforcement learning and
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...
. *
AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind team ...
, a generalized version of AlphaGo Zero using Monte Carlo tree search, reinforcement learning and
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...
. * Leela Chess Zero, a free software implementation of AlphaZero's methods to chess, which is currently among the leading chess playing programs.


References


Bibliography

* * {{cite journal, author1=Maciej Świechowski , author2=Konrad Godlewski , author3=Bartosz Sawicki , author4=Jacek Mańdziuk , title=Monte Carlo Tree Search: a review of recent modifications and applications, journal = Springer Nature Artificial Intelligence Review , pages=(66 pages), date=July 2022 , doi=10.1007/s10462-022-10228-y


External links


Beginner's Guide to Monte Carlo Tree Search
Combinatorial game theory Heuristic algorithms Monte Carlo methods Optimal decisions Computer Go