Self-play is a technique for improving the performance of

reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

agents. Intuitively, agents learn to improve their performance by playing "against themselves".

Definition and motivation

In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage: # It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge. # It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning.

Usage

Self-play is used by the

AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind team r ...

program to improve its performance in the games of

chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to dist ...

shogi , also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, '' chaturanga, Xiangqi'', Indian chess, and '' janggi''. ''Shōg ...

and go. Self-play is also used to train the Cicero AI system to outperform humans at the game of

Diplomacy Diplomacy comprises spoken or written communication by representatives of states (such as leaders and diplomats) intended to influence events in the international system.Ronald Peter Barston, ''Modern diplomacy'', Pearson Education, 2006, p. 1 ...

. The technique is also used in training the DeepNash system to play the game

Stratego ''Stratego'' ( ) is a strategy board game for two players on a board of 10×10 squares. Each player controls 40 pieces representing individual officer and soldier ranks in an army. The pieces have Napoleonic insignia. The objective of the game ...

Connections to other disciplines

Self-play has been compared to the epistemological concept of

tabula rasa ''Tabula rasa'' (; "blank slate") is the theory that individuals are born without built-in mental content, and therefore all knowledge comes from experience or perception. Epistemological proponents of ''tabula rasa'' disagree with the doctri ...

that describes the way that humans acquire knowledge from a "blank slate".

References

{{compu-AI-stub Reinforcement learning Machine learning algorithms

Definition and motivation

Usage

Connections to other disciplines

Further reading

References