computer program
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.
A computer program ...
developed by
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
research company
DeepMind
DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was List of mergers and acquisitions by Google, acquired by Google in 2014 and became a wholly owned subsid ...
to master the games of
chess
Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
,
shogi
, also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, ''chaturanga, Xiangqi'', Indian chess, and '' janggi''. ''Shōgi'' ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
uses an approach similar to
AlphaGo Zero
AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal ''Nature'' on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any ...
.
On December 5, 2017, the DeepMind team released a
preprint
In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. The preprint may be available, often as a non-typeset versio ...
introducing AlphaZero, which within 24 hours of training achieved a superhuman level of play in these three games by defeating world-champion programs Stockfish,
elmo
Elmo is a red Muppet monster character on the long-running PBS/ HBO children's television show ''Sesame Street''. A furry red monster who has a falsetto voice and illeism, he hosts the last full five-minute segment (fifteen minutes prio ...
, and the three-day version of AlphaGo Zero. In each case it made use of custom
tensor processing unit
Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in ...
s (TPUs) that the Google programs were optimized to use. AlphaZero was trained solely via self-play using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the
neural network
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s, all in
parallel
Parallel is a geometric term of location which may refer to:
Computing
* Parallel algorithm
* Parallel computing
* Parallel metaheuristic
* Parallel (software), a UNIX utility for running programs in parallel
* Parallel Sysplex, a cluster of ...
, with no access to opening books or endgame tables. After four hours of training, DeepMind estimated AlphaZero was playing chess at a higher
Elo rating
The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess. It is named after its creator Arpad Elo, a Hungarian-American physics professor.
The Elo system was invented as an improved ch ...
than Stockfish 8; after nine hours of training, the algorithm defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws). The trained algorithm played on a single machine with four TPUs.
DeepMind's paper on AlphaZero was published in the journal ''
Science
Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe.
Science may be as old as the human species, and some of the earliest archeological evidence for ...
'' on 7 December 2018. However, the AlphaZero program itself has not been made available to the public. In 2019 DeepMind published a new paper detailing
MuZero
MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atar ...
, a new algorithm able to generalise AlphaZero's work, playing both Atari and board games without knowledge of the rules or representations of the game.
Relation to AlphaGo Zero
AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ)
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
, and is able to play
shogi
, also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, ''chaturanga, Xiangqi'', Indian chess, and '' janggi''. ''Shōgi'' ...
and
chess
Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
as well as Go. Differences between AZ and AGZ include:
* AZ has hard-coded rules for setting search hyperparameters.
* The neural network is now updated continually.
* Go (unlike chess) is symmetric under certain reflections and rotations; AlphaGo Zero was programmed to take advantage of these symmetries. AlphaZero is not.
* Chess can end in a
draw
Draw, drawing, draws, or drawn may refer to:
Common uses
* Draw (terrain), a terrain feature formed by two parallel ridges or spurs with low ground in between them
* Drawing (manufacturing), a process where metal, glass, or plastic or anything ...
unlike Go; therefore, AlphaZero takes into account the possibility of a drawn game.
Stockfish and elmo
Comparing
Monte Carlo tree search
In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games. In that context MCTS is used to solve the game tree.
MCTS ...
searches, AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variation.
Training
AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the
neural network
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s. In parallel, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero's performance exceeded the benchmark after around four hours of training for Stockfish, two hours for elmo, and eight hours for AlphaGo Zero.
Preliminary results
Outcome
Chess
In AlphaZero's chess match against Stockfish 8 (2016
TCEC
Top Chess Engine Championship, formerly known as Thoresen Chess Engines Competition (TCEC or nTCEC), is a computer chess tournament that has been run since 2010. It was organized, directed, and hosted by Martin Thoresen until the end of Season 6; f ...
world champion), each program was given one minute per move. Stockfish was allocated 64 threads and a hash size of 1 GB, a setting that Stockfish's
Tord Romstad Tord is a given name, derived from the elements thor meaning thunder, thunder god; and '' meaning peace, beautiful, fair. The name developed as a short form of Thorfrid (Old Norse).
Notable people with the name include:
*Tord Andersson (born 19 ...
later criticized as suboptimal. AlphaZero was trained on chess for a total of nine hours before the match. During the match, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72. In a series of twelve, 100-game matches (of unspecified time or resource constraints) against Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24.
Shogi
AlphaZero was trained on shogi for a total of two hours before the tournament. In 100 shogi games against elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73 search), AlphaZero won 90 times, lost 8 times and drew twice. As in the chess games, each program got one minute per move, and elmo was given 64 threads and a hash size of 1 GB.
Go
After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40.
Analysis
DeepMind stated in its preprint, "The game of chess represented the pinnacle of AI research over several decades. State-of-the-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic
reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
algorithm originally devised for the game of go that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules." DeepMind's
Demis Hassabis
Demis Hassabis (born 27 July 1976) is a British artificial intelligence researcher and entrepreneur. In his early career he was a video game AI programmer and designer, and an expert player of board games. He is the chief executive officer and ...
, a chess player himself, called AlphaZero's play style "alien": It sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a positional advantage. "It's like chess from another dimension."
Given the difficulty in chess of forcing a win against a strong opponent, the +28 –0 =72 result is a significant margin of victory. However, some grandmasters, such as
Hikaru Nakamura
Christopher Hikaru NakamuraKomodo
Komodo may refer to:
Computers
* Komodo Edit, a free text editor for dynamic programming languages
* Komodo IDE an integrated development environment (IDE) for dynamic programming languages
* Komodo (chess), a chess engine
People
* Komodo ...
developer
Larry Kaufman
Lawrence Charles Kaufman (born November 15, 1947) is an American chess and shōgi player. In chess, he was awarded the title Grandmaster by FIDE for winning the 2008 World Seniors Championship (which he later retroactively shared with Mihai ...
, downplayed AlphaZero's victory, arguing that the match would have been closer if the programs had access to an
opening
Opening may refer to:
* Al-Fatiha, "The Opening", the first chapter of the Qur'an
* The Opening (album), live album by Mal Waldron
* Backgammon opening
* Chess opening
* A title sequence or opening credits
* , a term from contract bridge
* , ...
database (since Stockfish was optimized for that scenario). Romstad additionally pointed out that Stockfish is not optimized for rigidly fixed-time moves and the version used was a year old.
Similarly, some shogi observers argued that the elmo hash size was too low, that the resignation settings and the "EnteringKingRule" settings (cf. shogi § Entering King) may have been inappropriate, and that elmo is already obsolete compared with newer programs.
Reaction and criticism
Papers headlined that the chess training took only four hours: "It was managed in little more than the time between breakfast and lunch." ''
Wired
''Wired'' (stylized as ''WIRED'') is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics. Owned by Condé Nast, it is headquartered in San Fra ...
'' described AlphaZero as "the first multi-skilled AI board-game champ". AI expert Joanna Bryson noted that Google's "knack for good publicity" was putting it in a strong position against challengers. "It's not only about hiring the best programmers. It's also very political, as it helps make Google as strong as possible when negotiating with governments and regulators looking at the AI sector."
Human chess grandmasters generally expressed excitement about AlphaZero. Danish grandmaster
Peter Heine Nielsen
Peter Heine Nielsen (born 24 May 1973) is a Danish chess trainer and player. He was awarded the title of Grandmaster by FIDE in 1994. He coached world champions Vishwanathan Anand and Magnus Carlsen winning World Championships in 2007, 2008, 20 ...
likened AlphaZero's play to that of a superior alien species. Norwegian grandmaster
Jon Ludvig Hammer
Jon Ludvig Nilssen Hammer (born 2 June 1990) is a Norwegian chess grandmaster and three-time Norwegian Chess Champion. He was the main second for Magnus Carlsen in the World Chess Championship 2013.
Chess career
At the 38th Chess Olympiad i ...
characterized AlphaZero's play as "insane attacking chess" with profound positional understanding. Former
champion
A champion (from the late Latin ''campio'') is the victor in a challenge, contest or competition. There can be a territorial pyramid of championships, e.g. local, regional / provincial, state, national, continental and world championships, an ...
Garry Kasparov
Garry Kimovich Kasparov (born 13 April 1963) is a Russian chess grandmaster, former World Chess Champion, writer, political activist and commentator. His peak rating of 2851, achieved in 1999, was the highest recorded until being surpassed by ...
said "It's a remarkable achievement, even if we should have expected it after AlphaGo."
Grandmaster
Hikaru Nakamura
Christopher Hikaru Nakamura
Top US correspondence chess player Wolff Morrow was also unimpressed, claiming that AlphaZero would probably not make the semifinals of a fair competition such as
TCEC
Top Chess Engine Championship, formerly known as Thoresen Chess Engines Competition (TCEC or nTCEC), is a computer chess tournament that has been run since 2010. It was organized, directed, and hosted by Martin Thoresen until the end of Season 6; f ...
where all engines play on equal hardware. Morrow further stated that although he might not be able to beat AlphaZero if AlphaZero played drawish openings such as the
Petroff Defence
Petrov's Defence or the Petrov Defence (also called Petroff Defence, Petrov's Game, Russian Defence, or Russian Game – russian: Русская партия) is a chess opening characterised by the following moves:
:1. e4 e5
:2. Nf3 Nf6
Th ...
, AlphaZero would not be able to beat him in a
correspondence chess
Correspondence chess is chess played by various forms of long-distance correspondence, traditionally through the postal system. Today it is usually played through a correspondence chess server, a public internet chess forum, or email. Less common ...
game either.
Motohiro Isozaki, the author of YaneuraOu, noted that although AlphaZero did comprehensively beat elmo, the rating of AlphaZero in shogi stopped growing at a point which is at most 100~200 higher than elmo. This gap is not that high, and elmo and other shogi software should be able to catch up in 1–2 years.
Final results
DeepMind addressed many of the criticisms in their final version of the paper, published in December 2018 in ''
Science
Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe.
Science may be as old as the human species, and some of the earliest archeological evidence for ...
''. They further clarified that AlphaZero was not running on a supercomputer; it was trained using 5,000
tensor processing units
Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in ...
(TPUs), but only ran on four TPUs and a 44-core CPU in its matches.
Chess
In the final results, Stockfish version 8 ran under the same conditions as in the
TCEC
Top Chess Engine Championship, formerly known as Thoresen Chess Engines Competition (TCEC or nTCEC), is a computer chess tournament that has been run since 2010. It was organized, directed, and hosted by Martin Thoresen until the end of Season 6; f ...
superfinal: 44 CPU cores, Syzygy endgame tablebases, and a 32GB hash size. Instead of a fixed
time control
A time control is a mechanism in the tournament play of almost all two-player board games so that each round of the match can finish in a timely way and the tournament can proceed. Time controls are typically enforced by means of a game clock, ...
of one move per minute, both engines were given 3 hours plus 15 seconds per move to finish the game. In a 1000-game match, AlphaZero won with a score of 155 wins, 6 losses, and 839 draws. DeepMind also played a series of games using the TCEC opening positions; AlphaZero also won convincingly. Stockfish needed 10-to-1 time odds to match AlphaZero.
Shogi
Similar to Stockfish, Elmo ran under the same conditions as in the 2017 CSA championship. The version of Elmo used was WCSC27 in combination with YaneuraOu 2017 Early KPPT 4.79 64AVX2 TOURNAMENT. Elmo operated on the same hardware as Stockfish: 44 CPU cores and a 32GB hash size. AlphaZero won 98.2% of games when playing sente (i.e. having the first move) and 91.2% overall.
Reactions and criticisms
Human grandmasters were generally impressed with AlphaZero's games against Stockfish. Former world champion
Garry Kasparov
Garry Kimovich Kasparov (born 13 April 1963) is a Russian chess grandmaster, former World Chess Champion, writer, political activist and commentator. His peak rating of 2851, achieved in 1999, was the highest recorded until being surpassed by ...
said it was a pleasure to watch AlphaZero play, especially since its style was open and dynamic like his own.
In the computer chess community,
Komodo
Komodo may refer to:
Computers
* Komodo Edit, a free text editor for dynamic programming languages
* Komodo IDE an integrated development environment (IDE) for dynamic programming languages
* Komodo (chess), a chess engine
People
* Komodo ...
developer Mark Lefler called it a "pretty amazing achievement", but also pointed out that the data was old, since Stockfish had gained a lot of strength since January 2018 (when Stockfish 8 was released). Fellow developer Larry Kaufman said AlphaZero would probably lose a match against the latest version of Stockfish, Stockfish 10, under Top Chess Engine Championship (TCEC) conditions. Kaufman argued that the only advantage of neural network–based engines was that they used a GPU, so if there was no regard for power consumption (e.g. in an equal-hardware contest where both engines had access to the same CPU and GPU) then anything the GPU achieved was "free". Based on this, he stated that the strongest engine was likely to be a hybrid with neural networks and standard alpha–beta search.
AlphaZero inspired the computer chess community to develop
Leela Chess Zero
Leela Chess Zero (abbreviated as LCZero, lc0) is a Free and open-source software, free, open-source, and deep neural network–based chess engine and volunteer computing project. Development has been spearheaded by programmer Gary Linscott, who ...
, using the same techniques as AlphaZero. Leela contested several championships against Stockfish, where it showed roughly similar strength to Stockfish, although Stockfish has since pulled away.
In 2019 DeepMind published
MuZero
MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atar ...
, a unified system that played excellent chess, shogi, and go, as well as games in the
Atari
Atari () is a brand name that has been owned by several entities since its inception in 1972. It is currently owned by French publisher Atari SA through a subsidiary named Atari Interactive. The original Atari, Inc. (1972–1992), Atari, Inc., ...
Learning Environment, without being pre-programmed with their rules.
See also
*
AlphaGo
AlphaGo is a computer program that plays the board game Go (game), Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that ...
*
AlphaFold
AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.
AlphaFold AI software has had two major ve ...
*
General game playing
General game playing (GGP) is the design of artificial intelligence programs to be able to play more than one game successfully. For many games like chess, computers are programmed to play these games using a specially designed algorithm, which ca ...
**
MuZero
MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atar ...
**
ReBeL
A rebel is a participant in a rebellion.
Rebel or rebels may also refer to:
People
* Rebel (given name)
* Rebel (surname)
* Patriot (American Revolution), during the American Revolution
* American Southerners, as a form of self-identification; ...
, Facebook's general game-player that additionally handles Poker
*
Leela Chess Zero
Leela Chess Zero (abbreviated as LCZero, lc0) is a Free and open-source software, free, open-source, and deep neural network–based chess engine and volunteer computing project. Development has been spearheaded by programmer Gary Linscott, who ...
*
Pluribus (poker bot)
Pluribus is a computer poker player using artificial intelligence built by Facebook, Facebook's AI Lab and Carnegie Mellon University. Pluribus plays the poker variation no-limit Texas hold 'em and is "the first bot to beat humans in a complex mult ...