AlphaGo Zero is a version of

DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...

's Go software

AlphaGo AlphaGo is a computer program that plays the board game Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that competed u ...

. AlphaGo's team published an article in the journal ''

Nature Nature, in the broadest sense, is the physical world or universe. "Nature" can refer to the phenomena of the physical world, and also to life in general. The study of nature is a large, if not the only, part of science. Although humans are ...

'' on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of

AlphaGo Master Master is a version of DeepMind's Go software AlphaGo, named after the account name (originally Magister/Magist) used online, which won 60 straight online games against human professional Go players from 29 December 2016 to 4 January 2017. This ver ...

in 21 days, and exceeded all the old versions in 40 days. Training

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...

(AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills because expert data is "often expensive, unreliable or simply unavailable."

Demis Hassabis Demis Hassabis (born 27 July 1976) is a British artificial intelligence researcher and entrepreneur. In his early career he was a video game AI programmer and designer, and an expert player of board games. He is the chief executive officer and ...

, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge". Furthermore, AlphaGo Zero performed better than standard reinforcement deep learning models (such as DQN implementations) due to its integration of Monte Carlo tree search. David Silver, one of the first authors of DeepMind's papers published in ''Nature'' on AlphaGo, said that it is possible to have generalised AI algorithms by removing the need to learn from humans. Google later developed

AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind team re ...

, a generalized version of AlphaGo Zero that could play

chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to dist ...

and

Shōgi , also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, ''chaturanga, Xiangqi'', Indian chess, and ''janggi''. ''Shōgi'' ...

in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (

Stockfish Stockfish is unsalted fish, especially cod, dried by cold air and wind on wooden racks (which are called "hjell" in Norway) on the foreshore. The drying of food is the world's oldest known preservation method, and dried fish has a storage lif ...

) and a top Shōgi program (

Elmo Elmo is a red Muppet monster character on the long-running PBS/ HBO children's television show ''Sesame Street''. A furry red monster who has a falsetto voice and illeism, he hosts the last full five-minute segment (fifteen minutes prio ...

Training

AlphaGo Zero's neural network was trained using

TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...

, with 64 GPU workers and 19 CPU parameter servers. Only four TPUs were used for inference. The

neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

initially knew nothing about Go beyond the

rules Rule or ruling may refer to: Education * Royal University of Law and Economics (RULE), a university in Cambodia Human activity * The exercise of political or personal control by someone with authority or power * Business rule, a rule pert ...

. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in

reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome. In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession. It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level. For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run. DeepMind submitted its initial findings in a paper to ''Nature'' in April 2017, which was then published in October 2017.

Hardware cost

The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million.

Applications

According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as

protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...

(see AlphaFold) or accurately simulating chemical reactions. AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car. DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings.

Reception

AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo.

Oren Etzioni Oren Etzioni (born 1964) is an American entrepreneur, Professor Emeritus of computer science, and founding CEO of the Allen Institute for Artificial Intelligence (AI2). On June 15, 2022, he announced that he will step down as CEO of AI2 effective ...

of the

Allen Institute for Artificial Intelligence The Allen Institute for AI (abbreviated AI2) is a research institute founded by late Microsoft co-founder Paul Allen. The institute seeks to achieve scientific breakthroughs by constructing AI systems with reasoning, learning, and reading capabi ...

called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs".

The Guardian ''The Guardian'' is a British daily newspaper. It was founded in 1821 as ''The Manchester Guardian'', and changed its name in 1959. Along with its sister papers '' The Observer'' and '' The Guardian Weekly'', ''The Guardian'' is part of the ...

called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of

Sheffield University , mottoeng = To discover the causes of things , established = – University of SheffieldPredecessor institutions: – Sheffield Medical School – Firth College – Sheffield Technical School – University College of Sheffield , type = Pu ...

and Tom Mitchell of

Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...

, who called it an impressive feat and an “outstanding engineering accomplishment" respectively. Mark Pesce of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".

Gary Marcus Gary F. Marcus (born February 8, 1970) is a professor emeritus of psychology and neural science at New York University. In 2014 he founded Geometric Intelligence, a machine-learning company later acquired by Uber. Marcus's books include '' Guit ...

, a psychologist at

New York University New York University (NYU) is a private research university in New York City. Chartered in 1831 by the New York State Legislature, NYU was founded by a group of New Yorkers led by then- Secretary of the Treasury Albert Gallatin. In 1832, th ...

, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains". In response to the reports, South Korean Go professional

Lee Sedol Lee Sedol ( ko, 이세돌; born 2 March 1983), or Lee Se-dol, is a former South Korean professional Go player of 9 dan rank. As of February 2016, he ranked second in international titles (18), behind only Lee Chang-ho (21). He is the f ...

said, "The previous version of AlphaGo wasn’t perfect, and I believe that’s why AlphaGo Zero was made." On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players. Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGo's playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, I’ve become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. It’s now between computers." Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team. "Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said. Chinese Go professional, Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."

Comparison with predecessors

AlphaZero

On 5 December 2017, DeepMind team released a preprint on

arXiv arXiv (pronounced "archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists of ...

, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in

shogi , also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, '' chaturanga, Xiangqi'', Indian chess, and ''janggi''. ''Shōgi ...

, and Go, defeating world-champion programs,

, and 3-day version of AlphaGo Zero in each case. AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ)

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include: * AZ has hard-coded rules for setting search

hyperparameters In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to mo ...

. * The neural network is now updated continually. * Chess (unlike Go) can end in a tie; therefore AZ can take into account the possibility of a tie game. An

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...

program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a

GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...

instead of the TPUs recent versions of AlphaGo rely on.

References

External links and further reading

AlphaGo blog
* *
AlphaGo Zero Games

AMA on Reddit
{{Go (game) 2017 software Go engines Applications of artificial intelligence Applied machine learning Google AlphaGo 2017 in go