OA5 TSR9242 Mad Monkey Vs The Dragon Claw
   HOME

TheInfoList



OR:

OpenAI Five is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
by
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
that plays the five-on-five
video game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This fee ...
''
Dota 2 ''Dota 2'' is a 2013 multiplayer online battle arena (MOBA) video game by Valve. The game is a sequel to ''Defense of the Ancients'' (''DotA''), a community-created mod for Blizzard Entertainment's '' Warcraft III: Reign of Chaos.'' ''Dota 2' ...
''. Its first public appearance occurred in 2017, where it was demonstrated in a live one-on-one game against the professional player, Dendi, who lost to it. The following year, the system had advanced to the point of performing as a full team of five, and began playing against and showing the capability to defeat professional teams. By choosing a game as complex as ''Dota 2'' to study
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, OpenAI thought they could more accurately capture the unpredictability and continuity seen in the real world, thus constructing more general problem-solving systems. The algorithms and code used by OpenAI Five were eventually borrowed by another
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
in development by the company, one which controlled a physical robotic hand. OpenAI Five has been compared to other similar cases of
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
(AI) playing against and defeating humans, such as AlphaStar in the video game ''
StarCraft II ''StarCraft II'' is a military science fiction video game created by Blizzard Entertainment as a sequel to the successful ''StarCraft'' video game released in 1998. Set in a fictional future, the game centers on a galactic struggle for dominance a ...
'',
AlphaGo AlphaGo is a computer program that plays the board game Go (game), Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that ...
in the board game Go,
Deep Blue Deep Blue may refer to: Film * ''Deep Blues: A Musical Pilgrimage to the Crossroads'', a 1992 documentary film about Mississippi Delta blues music * Deep Blue (2001 film), ''Deep Blue'' (2001 film), a film by Dwight H. Little * Deep Blue (2003 ...
in
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
, and
Watson Watson may refer to: Companies * Actavis, a pharmaceutical company formerly known as Watson Pharmaceuticals * A.S. Watson Group, retail division of Hutchison Whampoa * Thomas J. Watson Research Center, IBM research center * Watson Systems, make ...
on the television game show ''
Jeopardy! ''Jeopardy!'' is an American game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead given genera ...
''.


History

Development on the algorithms used for the bots began in November 2016. OpenAI decided to use ''
Dota 2 ''Dota 2'' is a 2013 multiplayer online battle arena (MOBA) video game by Valve. The game is a sequel to ''Defense of the Ancients'' (''DotA''), a community-created mod for Blizzard Entertainment's '' Warcraft III: Reign of Chaos.'' ''Dota 2' ...
'', a competitive five-on-five video game, as a base due to it being popular on the
live stream Livestreaming is streaming media simultaneously recorded and broadcast in real-time over the internet. It is often referred to simply as streaming. Non-live media such as video-on-demand, vlogs, and YouTube videos are technically streamed, but no ...
ing platform
Twitch Twitch may refer to: Biology * Muscle contraction ** Convulsion, rapid and repeated muscle contraction and relaxation ** Fasciculation, a small, local, involuntary muscle contraction ** Myoclonic twitch, a jerk usually caused by sudden muscle con ...
, having native support for
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
, and had an
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
(API) available. Before becoming a team of five, the first public demonstration occurred at
The International 2017 The International 2017 (TI7) was the seventh iteration of The International, an annual ''Dota 2'' esports world championship tournament. Hosted by Valve, the game's developer, the tournament began with the online qualifier phase in June 2017, a ...
in August, the annual premiere championship tournament for the game, where Dendi, a professional Ukrainian player of the game, lost against an OpenAI bot in a live one-on-one matchup. After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of
real time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...
, and that the learning software was a step in the direction of creating software that can handle complex tasks "like being a surgeon". OpenAI used a methodology called
reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
, as the bots learn over time by playing against itself hundreds of times a day for months, in which they are rewarded for actions such as killing an enemy and destroying towers. By June 2018, the ability of the bots expanded to play together as a full team of five and were able to defeat teams of amateur and semi-professional players. At
The International 2018 The International 2018 (TI8) was the eighth iteration of The International, an annual ''Dota 2'' world championship esports tournament. Hosted by Valve, the game's developer, TI8 followed a year-long series of tournaments awarding qualifying po ...
, OpenAI Five played in two games against professional teams, one against the Brazilian-based paiN Gaming and the other against an all-star team of former Chinese players. Although the bots lost both matches, OpenAI still considered it a successful venture, stating that playing against some of the best players in ''Dota 2'' allowed them to analyze and adjust their algorithms for future games. The bots' final public demonstration occurred in April 2019, where they won a
best-of-three There are a number of formats used in various levels of competition in sports and games to determine an overall champion. Some of the most common are the ''single elimination'', the ''best-of-'' series, the ''total points series'' more commonly kn ...
series against The International 2018 champions OG at a live event in
San Francisco San Francisco (; Spanish language, Spanish for "Francis of Assisi, Saint Francis"), officially the City and County of San Francisco, is the commercial, financial, and cultural center of Northern California. The city proper is the List of Ca ...
. A four-day online event to play against the bots, open to the public, occurred the same month. There, the bots played in 42,729 public games, winning all but 4,075 of them.


Architecture

Each OpenAI Five bot is a neural network containing a single layer with a 4096-unit
LSTM Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) ca ...
that observes the current game state extracted from the Dota developer's API. The neural network conducts actions via numerous possible action heads (no human data involved), and every head has meaning. For instance, the number of ticks to delay an action, what action to select – the X or Y coordinate of this action in a grid around the unit. In addition, action heads are computed independently. The AI system observes the world as a list of 20,000 numbers and takes an action by conducting a list of eight enumeration values. Also, it selects different actions and targets to understand how to encode every action and observe the world. OpenAI Five has been developed as a general-purpose reinforcement learning training system on the "Rapid" infrastructure. Rapid consists of two layers: it spins up thousands of machines and helps them ‘talk’ to each other and a second layer runs software. By 2018, OpenAI Five had played around 180 years worth of games in reinforcement learning running on 256
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
s and 128,000
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
cores, using
Proximal Policy Optimization Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning va ...
, a
policy gradient method Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
.


Comparisons with other game AI systems

Prior to OpenAI Five, other AI versus human experiments and systems have been successfully used before, such as ''
Jeopardy! ''Jeopardy!'' is an American game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead given genera ...
'' with
Watson Watson may refer to: Companies * Actavis, a pharmaceutical company formerly known as Watson Pharmaceuticals * A.S. Watson Group, retail division of Hutchison Whampoa * Thomas J. Watson Research Center, IBM research center * Watson Systems, make ...
,
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
with
Deep Blue Deep Blue may refer to: Film * ''Deep Blues: A Musical Pilgrimage to the Crossroads'', a 1992 documentary film about Mississippi Delta blues music * Deep Blue (2001 film), ''Deep Blue'' (2001 film), a film by Dwight H. Little * Deep Blue (2003 ...
, and Go with
AlphaGo AlphaGo is a computer program that plays the board game Go (game), Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that ...
. In comparison with other games that have used AI systems to play against human players, ''Dota 2'' differs as explained below: Long run view: The bots run at 30
frames per second A frame is often a structural system that supports other components of a physical construction and/or steel frame that limits the construction's extent. Frame and FRAME may also refer to: Physical objects In building construction *Framing (con ...
for an average match time of 45 minutes, which results in 80,000 ticks per game. OpenAI Five observes every fourth frame, generating 20,000 moves. By comparison, chess usually ends before 40 moves, while Go ends before 150 moves. Partially observed state of the game: Players and their allies can only see the map directly around them. The rest of it is covered in a
fog of war The fog of war (german: links=no, Nebel des Krieges) is the uncertainty in situational awareness experienced by participants in military operations. The term seeks to capture the uncertainty regarding one's own capability, adversary capability, ...
which hides enemies units and their movements. Thus, playing ''Dota 2'' requires making inferences based on this incomplete data, as well as predicting what their opponent could be doing at the same time. By comparison, Chess and Go are "full-information games", as they do not hide elements from the opposing player. Continuous action space: Each playable character in a ''Dota 2'' game, known as a hero, can take dozens of actions that target either another unit or a position. The OpenAI Five developers allow the space into 170,000 possible actions per hero. Without counting the perpetual aspects of the game, there are an average of ~1,000 valid actions each tick. By comparison, the average number of actions in chess is 35 and 250 in Go. Continuous observation space: ''Dota 2'' is played on a large map with ten heroes, five on each team, along with dozens of buildings and
non-player character A non-player character (NPC), or non-playable character, is any character in a game that is not controlled by a player. The term originated in traditional tabletop role-playing games where it applies to characters controlled by the gamemaster o ...
(NPC) units. The OpenAI system observes the state of a game through developers’ bot API, as 20,000 numbers that constitute all information a human is allowed to get access to. A chess board is represented as about 70 lists, whereas a Go board has about 400 enumerations.


Reception

OpenAI Five have received acknowledgement from the AI, tech, and video game community at large.
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
founder
Bill Gates William Henry Gates III (born October 28, 1955) is an American business magnate and philanthropist. He is a co-founder of Microsoft, along with his late childhood friend Paul Allen. During his career at Microsoft, Gates held the positions ...
called it a "big deal", as their victories "required teamwork and collaboration". Chess player
Garry Kasparov Garry Kimovich Kasparov (born 13 April 1963) is a Russian chess grandmaster, former World Chess Champion, writer, political activist and commentator. His peak rating of 2851, achieved in 1999, was the highest recorded until being surpassed by ...
, who lost against the
Deep Blue Deep Blue may refer to: Film * ''Deep Blues: A Musical Pilgrimage to the Crossroads'', a 1992 documentary film about Mississippi Delta blues music * Deep Blue (2001 film), ''Deep Blue'' (2001 film), a film by Dwight H. Little * Deep Blue (2003 ...
AI in 1997, stated that despite their losing performance at The International 2018, the bots would eventually "get there, and sooner than expected". In a conversation with ''
MIT Technology Review ''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology, and editorially independent of the university. It was founded in 1899 as ''The Technology Review'', and was re-launched without "The" in ...
'', AI experts also considered OpenAI Five system as a significant achievement, as they noted that ''Dota 2'' was an "extremely complicated game", so even beating non-professional players was impressive. ''
PC Gamer ''PC Gamer'' is a magazine and website founded in the United Kingdom in 1993 devoted to PC gaming and published monthly by Future plc. The magazine has several regional editions, with the UK and US editions becoming the best selling PC games ma ...
'' wrote that their wins against professional players was a significant event in machine learning. In contrast, ''
Motherboard A motherboard (also called mainboard, main circuit board, mb, mboard, backplane board, base board, system board, logic board (only in Apple computers) or mobo) is the main printed circuit board (PCB) in general-purpose computers and other expand ...
'' wrote that the victory was "basically cheating" due to the simplified hero pools on both sides, as well as the fact that bots were given direct access to the API, as opposed to using
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
to interpret pixels on the screen. ''
The Verge ''The Verge'' is an American technology news website operated by Vox Media, publishing news, feature stories, guidebooks, product reviews, consumer electronics news, and podcasts. The website launched on November 1, 2011, and uses Vox Media' ...
'' wrote that the bots were evidence that the company's approach to reinforcement learning and its general philosophy about AI was "yielding milestones". In 2019,
DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was List of mergers and acquisitions by Google, acquired by Google in 2014 and became a wholly owned subsid ...
unveiled a similar bot for ''
Starcraft II ''StarCraft II'' is a military science fiction video game created by Blizzard Entertainment as a sequel to the successful ''StarCraft'' video game released in 1998. Set in a fictional future, the game centers on a galactic struggle for dominance a ...
'', AlphaStar. Like OpenAI Five, AlphaStar used reinforcement learning and self-play. ''The Verge'' reported that "the goal with this type of AI research is not just to crush humans in various games just to prove it can be done. Instead, it’s to prove that — with enough time, effort, and resources — sophisticated AI software can best humans at virtually any competitive cognitive challenge, be it a board game or a modern video game." They added that the DeepMind and OpenAI victories were also a testament to the power of certain uses of reinforcement learning. It was OpenAI's hope that the technology could have applications outside of the digital realm. In 2018, they were able to reuse the same reinforcement learning algorithms and training code from OpenAI Five for
Dactyl Dactyl may refer to: * Dactyl (mythology), a legendary being * Dactyl (poetry), a metrical unit of verse * Dactyl Foundation, an arts organization * Finger, a part of the hand * Dactylus, part of a decapod crustacean * "-dactyl", a suffix used ...
, a human-like robot hand with a neural network built to manipulate physical objects. In 2019, Dactyl solved the
Rubik's Cube The Rubik's Cube is a Three-dimensional space, 3-D combination puzzle originally invented in 1974 by Hungarians, Hungarian sculptor and professor of architecture Ernő Rubik. Originally called the Magic Cube, the puzzle was licensed by Rubik t ...
.


References


External links

* * {{Differentiable computing 2017 software Applied machine learning Applications of artificial intelligence Dota OpenAI