In game theory, a stochastic game (or Markov game), introduced by

Lloyd Shapley Lloyd Stowell Shapley (; June 2, 1923 – March 12, 2016) was an American mathematician and Nobel Prize-winning economist. He contributed to the fields of mathematical economics and especially game theory. Shapley is generally considered one of ...

in the early 1950s, is a

repeated game In game theory, a repeated game is an extensive form game that consists of a number of repetitions of some base game (called a stage game). The stage game is usually one of the well-studied 2-person games. Repeated games capture the idea that a ...

with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the

limit inferior In mathematics, the limit inferior and limit superior of a sequence can be thought of as limiting (that is, eventual and extreme) bounds on the sequence. They can be thought of in a similar fashion for a function (see limit of a function). For ...

of the averages of the stage payoffs. Stochastic games generalize Markov decision processes to multiple interacting decision makers, as well as strategic-form games to dynamic situations in which the environment changes in response to the players’ choices.

Two-player games

Stochastic two-player games on

directed graphs Director may refer to: Literature * ''Director'' (magazine), a British magazine * ''The Director'' (novel), a 1971 novel by Henry Denker * ''The Director'' (play), a 2000 play by Nancy Hasty Music * Director (band), an Irish rock band * ''D ...

are widely used for modeling and analysis of discrete systems operating in an unknown (adversarial) environment. Possible configurations of a system and its environment are represented as vertices, and the transitions correspond to actions of the system, its environment, or "nature". A run of the system then corresponds to an infinite path in the graph. Thus, a system and its environment can be seen as two players with antagonistic objectives, where one player (the system) aims at maximizing the probability of "good" runs, while the other player (the environment) aims at the opposite. In many cases, there exists an equilibrium value of this probability, but optimal strategies for both players may not exist. We introduce basic concepts and algorithmic questions studied in this area, and we mention some long-standing open problems. Then, we mention selected recent results.

Theory

The ingredients of a stochastic game are: a finite set of players

I

; a state space

M

(either a finite set or a

measurable space In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured. Definition Consider a set X and a σ-algebra \mathcal A on X. Then ...

(M,)

); for each player

i\in I

, an action set

S^i

(either a finite set or a measurable space

(S^i,^i)

); a transition probability

P

from

M\times S

, where

S=\times_S^i

is the action profiles, to

M

, where

P(A \mid m, s)

is the probability that the next state is in

A

given the current state

m

and the current action profile

s

; and a payoff function

g

from

M\times S

R^I

, where the

i

-th coordinate of

g

g^i

, is the payoff to player

i

as a function of the state

m

and the action profile

s

. The game starts at some initial state

m_1

. At stage

t

, players first observe

m_t

, then simultaneously choose actions

s^i_t\in S^i

, then observe the action profile

s_t=(s^i_t)_i

, and then nature selects

m_

according to the probability

P(\cdot\mid m_t,s_t)

. A play of the stochastic game,

m_1,s_1,\ldots,m_t,s_t,\ldots

, defines a stream of payoffs

g_1,g_2,\ldots

, where

g_t=g(m_t,s_t)

. The discounted game

\Gamma_\lambda

with discount factor

\lambda

(

0<\lambda \leq 1

) is the game where the payoff to player

i

\lambda \sum_^(1-\lambda)^g^i_t

. The

n

-stage game is the game where the payoff to player

i

\bar^i_n:=\frac1n\sum_^ng^i_t

. The value

v_n(m_1)

, respectively

v_(m_1)

, of a two-person zero-sum stochastic game

\Gamma_n

, respectively

\Gamma_

, with finitely many states and actions exists, and Truman Bewley and

Elon Kohlberg Elon commonly refers to Elon Musk. Elon may also refer to: People * Elon (name), a given name and surname Places in the United States * Elon, Iowa, an unincorporated community * Elon, North Carolina, a town * Elon, Virginia, an unincorporated c ...

(1976) proved that

v_n(m_1)

converges to a limit as

n

goes to infinity and that

v_(m_1)

converges to the same limit as

\lambda

goes to

0

. The "undiscounted" game

\Gamma_\infty

is the game where the payoff to player

i

is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum

\Gamma_

and in defining equilibrium payoffs of a non-zero-sum

\Gamma_

. The uniform value

v_

of a two-person zero-sum stochastic game

\Gamma_\infty

exists if for every

\varepsilon>0

there is a positive integer

N

and a strategy pair

\sigma_

of player 1 and

\tau_

of player 2 such that for every

\sigma

and

\tau

and every

n\geq N

the expectation of

\bar^i_n

with respect to the probability on plays defined by

\sigma_

and

\tau

is at least

v_ -\varepsilon

, and the expectation of

\bar^i_n

with respect to the probability on plays defined by

\sigma

and

\tau_

is at most

v_ +\varepsilon

Jean-François Mertens Jean-François Mertens (11 March 1946 – 17 July 2012) was a Belgian game theorist and mathematical economist. Mertens contributed to economic theory in regards to order-book of market games, cooperative games, noncooperative games, repeated ga ...

and

Abraham Neyman Abraham Neyman (born June 14, 1949, Israel) is an Israeli mathematician and game theorist, Professor of Mathematics at the Federmann Center for the Study of Rationality and the Einstein Institute of Mathematics at the Hebrew University of Jeru ...

(1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value. If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a

Nash equilibrium In game theory, the Nash equilibrium, named after the mathematician John Nash, is the most common way to define the solution of a non-cooperative game involving two or more players. In a Nash equilibrium, each player is assumed to know the equ ...

. The same is true for a game with infinitely many stages if the total payoff is the discounted sum. The non-zero-sum stochastic game

\Gamma_\infty

has a uniform equilibrium payoff

v_

if for every

\varepsilon>0

there is a positive integer

N

and a strategy profile

\sigma

such that for every unilateral deviation by a player

i

, i.e., a strategy profile

\tau

with

\sigma^j=\tau^j

for all

j\neq i

, and every

n\geq N

the expectation of

\bar^i_n

with respect to the probability on plays defined by

\sigma

is at least

v^i_ -\varepsilon

, and the expectation of

\bar^i_n

with respect to the probability on plays defined by

\tau

is at most

v^i_ +\varepsilon

Nicolas Vieille Nicolas or Nicolás may refer to: People Given name * Nicolas (given name) Mononym * Nicolas (footballer, born 1999), Brazilian footballer * Nicolas (footballer, born 2000), Brazilian footballer Surname Nicolas * Dafydd Nicolas (c.1705–1774), ...

has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff. The non-zero-sum stochastic game

\Gamma_\infty

has a limiting-average equilibrium payoff

v_

if for every

\varepsilon>0

there is a strategy profile

\sigma

such that for every unilateral deviation by a player

i

, the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by

\sigma

is at least

v^i_ -\varepsilon

, and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by

\tau

is at most

v^i_ +\varepsilon

and

(1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value, and

has shown that all two-person stochastic games with finite state and action spaces have a limiting-average equilibrium payoff. In particular, these results imply that these games have a value and an approximate equilibrium payoff, called the liminf-average (respectively, the limsup-average) equilibrium payoff, when the total payoff is the limit inferior (or the limit superior) of the averages of the stage payoffs. Whether every stochastic game with finitely many players, states, and actions, has a uniform equilibrium payoff, or a limiting-average equilibrium payoff, or even a liminf-average equilibrium payoff, is a challenging open question. A

Markov perfect equilibrium A Markov perfect equilibrium is an equilibrium concept in game theory. It has been used in analyses of industrial organization, macroeconomics, and political economy. It is a refinement of the concept of subgame perfect equilibrium to extensive ...

is a refinement of the concept of

sub-game perfect Nash equilibrium In game theory, a subgame perfect equilibrium (or subgame perfect Nash equilibrium) is a refinement of a Nash equilibrium used in dynamic games. A strategy profile is a subgame perfect equilibrium if it represents a Nash equilibrium of every ...

to stochastic games. Stochastic games have been combined with

Bayesian game In game theory, a Bayesian game is a game that models the outcome of player interactions using aspects of Bayesian probability. Bayesian games are notable because they allowed, for the first time in game theory, for the specification of the solut ...

s to model uncertainty over player strategies. The resulting stochastic Bayesian game model is solved via a recursive combination of the Bayesian Nash equilibrium equation and the Bellman optimality equation.

Applications

Stochastic games have applications in

economics Economics () is the social science that studies the production, distribution, and consumption of goods and services. Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analy ...

evolutionary biology Evolutionary biology is the subfield of biology that studies the evolutionary processes (natural selection, common descent, speciation) that produced the diversity of life on Earth. It is also defined as the study of the history of life fo ...

and computer networks. They are generalizations of

s which correspond to the special case where there is only one state.

Notes

External links

Lecture on Stochastic Two-Player Games by Antonin Kucera
{{DEFAULTSORT:Stochastic Game Game theory game classes Mathematical and quantitative methods (economics)

Two-player games

Theory

Applications

See also

Notes

Further reading

External links