In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, Monte Carlo tree search (MCTS) is a
heuristic
A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...
search algorithm
In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the Feasible region, search space of a problem do ...
for some kinds of
decision processes, most notably those employed in
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
that plays
board game
A board game is a type of tabletop game that involves small objects () that are placed and moved in particular ways on a specially designed patterned game board, potentially including other components, e.g. dice. The earliest known uses of the ...
s. In that context MCTS is used to solve the
game tree
In the context of combinatorial game theory, a game tree is a graph representing all possible game states within a sequential game that has perfect information. Such games include chess, checkers, Go, and tic-tac-toe.
A game tree can be us ...
.
MCTS was combined with neural networks in 2016
and has been used in multiple board games like
Chess
Chess is a board game for two players. It is an abstract strategy game that involves Perfect information, no hidden information and no elements of game of chance, chance. It is played on a square chessboard, board consisting of 64 squares arran ...
,
Shogi
, also known as Japanese chess, is a Strategy game, strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as chess, Western chess, chaturanga, xiangqi, Indian chess, and janggi. ...
,
Checkers
Checkers (American English), also known as draughts (; English in the Commonwealth of Nations, Commonwealth English), is a group of Abstract strategy game, strategy board games for two players which involve forward movements of uniform game ...
,
Backgammon
Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back at least 1,600 years. The earliest record of backgammo ...
,
Contract Bridge
Contract bridge, or simply bridge, is a trick-taking game, trick-taking card game using a standard 52-card deck. In its basic format, it is played by four players in two Team game, competing partnerships, with partners sitting opposite each othe ...
,
Go,
Scrabble
''Scrabble'' is a word game in which two to four players score points by placing tiles, each bearing a single letter, onto a Board game, game board divided into a 15×15 grid of squares. The tiles must form words that, in crossword fashion, re ...
, and
Clobber as well as in turn-based-strategy video games (such as
Total War: Rome II's implementation in the high level campaign AI) and applications outside of games.
History
Monte Carlo method
The
Monte Carlo method
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...
, which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to the 1940s. In his 1987 PhD thesis, Bruce Abramson combined
minimax search with an ''expected-outcome model'' based on random game playouts to the end, instead of the usual
static evaluation function
An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a g ...
. Abramson said the expected-outcome model "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent."
He experimented in-depth with
tic-tac-toe and then with machine-generated evaluation functions for
Othello
''The Tragedy of Othello, the Moor of Venice'', often shortened to ''Othello'' (), is a tragedy written by William Shakespeare around 1603. Set in Venice and Cyprus, the play depicts the Moorish military commander Othello as he is manipulat ...
and
chess
Chess is a board game for two players. It is an abstract strategy game that involves Perfect information, no hidden information and no elements of game of chance, chance. It is played on a square chessboard, board consisting of 64 squares arran ...
.
Such methods were then explored and successfully applied to heuristic search in the field of
automated theorem proving
Automated theorem proving (also known as ATP or automated deduction) is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a majo ...
by W. Ertel, J. Schumann and C. Suttner in 1989, thus improving the exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or
iterative deepening.
In 1992, B. Brügmann employed it for the first time in a
Go-playing program.
In 2002, Chang et al.
proposed the idea of "recursive rolling out and backtracking" with "adaptive" sampling choices in their Adaptive Multi-stage Sampling (AMS) algorithm for the model of
Markov decision processes. AMS was the first work to explore the idea of
UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT (Upper Confidence Trees).
Monte Carlo tree search (MCTS)

In 2006, inspired by its predecessors,
Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search, L. Kocsis and Cs. Szepesvári developed the UCT (Upper Confidence bounds applied to Trees) algorithm,
and S. Gelly et al. implemented UCT in their program MoGo.
In 2008, MoGo achieved
dan (master) level in 9×9 Go, and the Fuego program began to win against strong amateur players in 9×9 Go.
In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an
amateur 2 dan player.
Google Deepmind developed the program
AlphaGo, which in October 2015 became the first Computer Go program to beat a professional human Go player without
handicaps on a full-sized 19x19 board.
In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating
Lee Sedol in
a five-game match with a final score of four games to one. AlphaGo represents a significant improvement over previous Go programs as well as a milestone in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
as it uses Monte Carlo tree search with
artificial neural network
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
s (a
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
method) for policy (move selection) and value, giving it efficiency far surpassing previous programs.
The MCTS algorithm has also been used in programs that play other
board game
A board game is a type of tabletop game that involves small objects () that are placed and moved in particular ways on a specially designed patterned game board, potentially including other components, e.g. dice. The earliest known uses of the ...
s (for example
Hex,
Havannah,
Game of the Amazons, and
Arimaa), real-time video games (for instance
Ms. Pac-Man and
Fable Legends), and nondeterministic games (such as
skat,
poker
Poker is a family of Card game#Comparing games, comparing card games in which Card player, players betting (poker), wager over which poker hand, hand is best according to that specific game's rules. It is played worldwide, with varying rules i ...
,
Magic: The Gathering, or
Settlers of Catan
''Catan'', previously known as ''The Settlers of Catan'' or simply ''Settlers'', is a multiplayer board game designed by Klaus Teuber. It was first published in 1995 in Germany by Franckh-Kosmos Verlag (Kosmos) as ''Die Siedler von Catan'' (). ...
).
Principle of operation
The focus of MCTS is on the analysis of the most promising moves, expanding the
search tree
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and les ...
based on
random sampling
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the who ...
of the search space.
The application of Monte Carlo tree search in games is based on many ''playouts,'' also called ''roll-outs''. In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts.
The most basic way to use playouts is to apply the same number of playouts after each legal move of the current player, then choose the move which led to the most victories.
The efficiency of this method—called ''Pure Monte Carlo Game Search''—often increases with time as more playouts are assigned to the moves that have frequently resulted in the current player's victory according to previous playouts. Each round of Monte Carlo tree search consists of four steps:
* ''Selection'': Start from root and select successive child nodes until a leaf node is reached. The root is the current game state and a leaf is any node that has a potential child from which no simulation (playout) has yet been initiated. The section below says more about a way of biasing choice of child nodes that lets the game tree expand towards the most promising moves, which is the essence of Monte Carlo tree search.
* ''Expansion'': Unless ends the game decisively (e.g. win/loss/draw) for either player, create one (or more) child nodes and choose node from one of them. Child nodes are any valid moves from the game position defined by .
* ''Simulation'': Complete one random playout from node . This step is sometimes also called playout or rollout. A playout may be as simple as choosing
uniform random moves until the game is decided (for example in chess, the game is won, lost, or drawn).
* ''Backpropagation'': Use the result of the playout to update information in the nodes on the path from to .

This graph shows the steps involved in one decision, with each node showing the ratio of wins to total playouts from that point in the game tree for the player that the node represents. In the Selection diagram, black is about to move. The root node shows there are 11 wins out of 21 playouts for white from this position so far. It complements the total of 10/21 black wins shown along the three black nodes under it, each of which represents a possible black move. Note that this graph does not follow the UCT algorithm described below.
If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible, a draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1. This ensures that during selection, each player's choices expand towards the most promising moves for that player, which mirrors the goal of each player to maximize the value of their move.
Rounds of search are repeated as long as the time allotted to a move remains. Then the move with the most simulations made (i.e. the highest denominator) is chosen as the final answer.
Pure Monte Carlo game search
This basic procedure can be applied to any game whose positions necessarily have a finite number of moves and finite length. For each position, all feasible moves are determined: ''k'' random games are played out to the very end, and the scores are recorded. The move leading to the best score is chosen. Ties are broken by
fair coin
In probability theory and statistics, a sequence of Independence (probability theory), independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is ca ...
flips. Pure Monte Carlo Game Search results in strong play in several games with random elements, as in the game ''
EinStein würfelt nicht!
''EinStein würfelt nicht!'' (''Einstein/"OneStone" does not play dice'') is a board game, designed by Ingo Althöfer, a professor of applied mathematics in Jena, Germany. It was the official game of an exhibition about Albert Einstein in German ...
''. It converges to optimal play (as ''k'' tends to infinity) in board filling games with random turn order, for instance in the game
Hex with random turn order. DeepMind's AlphaZero replaces the simulation step with an evaluation based on a neural network.
Exploration and exploitation
The main difficulty in selecting child nodes is maintaining some balance between the ''exploitation'' of deep variants after moves with high average win rate and the ''exploration'' of moves with few simulations. The first formula for balancing exploitation and exploration in games, called UCT (''Upper Confidence Bound'' 1 ''applied to trees''), was introduced by
Levente Kocsis and
Csaba Szepesvári.
UCT is based on the UCB1 formula derived by Auer, Cesa-Bianchi, and Fischer and the probably convergent AMS (Adaptive Multi-stage Sampling) algorithm first applied to multi-stage decision-making models (specifically,
Markov Decision Processes) by Chang, Fu, Hu, and Marcus.
Kocsis and Szepesvári recommend to choose in each node of the game tree the move for which the expression
has the highest value. In this formula:
* stands for the number of wins for the node considered after the -th move
* stands for the number of simulations for the node considered after the -th move
* stands for the total number of simulations after the -th move run by the parent node of the one considered
* is the exploration parameter—theoretically equal to ; in practice usually chosen empirically
The first component of the formula above corresponds to exploitation; it is high for moves with high average win ratio. The second component corresponds to exploration; it is high for moves with few simulations.
Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT that traces its roots back to the AMS simulation optimization algorithm for estimating the value function in finite-horizon
Markov Decision Processes (MDPs) introduced by Chang et al.
(2005) in
Operations Research
Operations research () (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a branch of applied mathematics that deals with the development and application of analytical methods to improve management and ...
. (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT.
)
Advantages and disadvantages
Although it has been proven that the evaluation of moves in Monte Carlo tree search converges to
minimax
Minimax (sometimes Minmax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, combinatorial game theory, statistics, and philosophy for ''minimizing'' the possible loss function, loss for a Worst-case scenari ...
when using UCT,
the basic version of Monte Carlo tree search converges only in so called "Monte Carlo Perfect" games. However, Monte Carlo tree search does offer significant advantages over
alpha–beta pruning and similar algorithms that minimize the search space.
In particular, pure Monte Carlo tree search does not need an explicit
evaluation function
An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position (usually at a leaf or terminal node) in a g ...
. Simply implementing the game's mechanics is sufficient to explore the search space (i.e. the generating of allowed moves in a given position and the game-end conditions). As such, Monte Carlo tree search can be employed in games without a developed theory or in
general game playing.
The game tree in Monte Carlo tree search grows asymmetrically as the method concentrates on the more promising subtrees. Thus, it achieves better results than classical algorithms in games with a high
branching factor
In computing, tree data structures, and game theory, the branching factor is the number of children at each node, the outdegree. If this value is not uniform, an ''average branching factor'' can be calculated.
For example, in chess, if a "node ...
.
A disadvantage is that in certain positions, there may be moves that look superficially strong, but that actually lead to a loss via a subtle line of play. Such "trap states" require thorough analysis to be handled correctly, particularly when playing against an expert player; however, MCTS may not "see" such lines due to its policy of selective node expansion. It is believed that this may have been part of the reason for
AlphaGo's loss in its fourth game against Lee Sedol. In essence, the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar".
Improvements
Various modifications of the basic Monte Carlo tree search method have been proposed to shorten the search time. Some employ domain-specific expert knowledge, others do not.
Monte Carlo tree search can use either ''light'' or ''heavy'' playouts. Light playouts consist of random moves while heavy playouts apply various heuristics to influence the choice of moves. These heuristics may employ the results of previous playouts (e.g. the Last Good Reply heuristic) or expert knowledge of a given game. For instance, in many Go-playing programs certain stone patterns in a portion of the board influence the probability of moving into that area.
Paradoxically, playing suboptimally in simulations sometimes makes a Monte Carlo tree search program play stronger overall.

Domain-specific knowledge may be employed when building the game tree to help the exploitation of some variants. One such method assigns nonzero ''priors'' to the number of won and played simulations when creating each child node, leading to artificially raised or lowered average win rates that cause the node to be chosen more or less frequently, respectively, in the selection step.
A related method, called ''progressive bias'', consists in adding to the UCB1 formula a
element, where is a heuristic score of the -th move.
The basic Monte Carlo tree search collects enough information to find the most promising moves only after many rounds; until then its moves are essentially random. This exploratory phase may be reduced significantly in a certain class of games using RAVE (''Rapid Action Value Estimation'').
In these games, permutations of a sequence of moves lead to the same position. Typically, they are board games in which a move involves placement of a piece or a stone on the board. In such games the value of each move is often only slightly influenced by other moves.
In RAVE, for a given game tree node , its child nodes store not only the statistics of wins in playouts started in node but also the statistics of wins in all playouts started in node and below it, if they contain move (also when the move was played in the tree, between node and a playout). This way the contents of tree nodes are influenced not only by moves played immediately in a given position but also by the same moves played later.

When using RAVE, the selection step selects the node, for which the modified UCB1 formula
has the highest value. In this formula,
and
stand for the number of won playouts containing move and the number of all playouts containing move , and the
function should be close to one and to zero for relatively small and relatively big and
, respectively. One of many formulas for
, proposed by D. Silver, says that in balanced positions one can take
, where is an empirically chosen constant.
Heuristics used in Monte Carlo tree search often require many parameters. There are automated methods to tune the parameters to maximize the win rate.
Monte Carlo tree search can be concurrently executed by many
threads or
processes. There are several fundamentally different methods of its
parallel execution:
* ''Leaf parallelization'', i.e. parallel execution of many playouts from one leaf of the game tree.
* ''Root parallelization'', i.e. building independent game trees in parallel and making the move basing on the root-level branches of all these trees.
* ''Tree parallelization'', i.e. parallel building of the same game tree, protecting data from simultaneous writes either with one, global
mutex, with more mutexes, or with
non-blocking synchronization.
See also
*
AlphaGo, a Go program using Monte Carlo tree search,
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
and
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
.
*
AlphaGo Zero, an updated Go program using Monte Carlo tree search,
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
and
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
.
*
AlphaZero, a generalized version of AlphaGo Zero using Monte Carlo tree search,
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
and
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
.
*
Leela Chess Zero, a
free software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
implementation of AlphaZero's methods to chess, which is currently among the leading chess playing programs.
References
Bibliography
*
*
External links
Beginner's Guide to Monte Carlo Tree Search
{{Graph traversal algorithms
Combinatorial game theory
Heuristic algorithms
Monte Carlo methods
Optimal decisions
Computer Go