The decentralized partially observable Markov decision process (Dec-POMDP) is a model for coordination and

decision-making In psychology, decision-making (also spelled decision making and decisionmaking) is regarded as the cognitive process resulting in the selection of a belief or a course of action among several possible alternative options. It could be either r ...

among multiple agents. It is a

probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

model that can consider

uncertainty Uncertainty refers to Epistemology, epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially ...

in outcomes, sensors and communication (i.e., costly, delayed, noisy or nonexistent communication). It is a generalization of a Markov decision process (MDP) and a

partially observable Markov decision process A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot ...

(POMDP) to consider multiple decentralized agents.

Definition

Formal definition

A Dec-POMDP is a 7-tuple

(S,\,T,R,\,O,\gamma)

, where *

S

is a set of states, *

A_i

is a set of actions for agent ''i'', with

A=\times_i A_i

is the set of joint actions, *

T

is a set of conditional transition probabilities between states,

T(s,a,s')=P(s'\mid s,a)

, *

R: S \times A \to \mathbb

is the reward function. *

\Omega_i

is a set of observations for agent ''i'', with

\Omega=\times_i \Omega_i

is the set of joint observations, *

O

is a set of conditional observation probabilities

O(s',a, o)=P(o\mid s',a)

, and *

\gamma \in

, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...

/math> is the discount factor. At each time step, each agent takes an action

a_i \in A_i

, the state updates based on the transition function

T(s,a,s')

(using the current state and the joint action), each agent observes an

observation Observation is the active acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the perception and recording of data via the use of scientific instruments. Th ...

based on the observation function

O(s',a, o)

(using the next state and the joint action) and a reward is generated for the whole team based on the reward function

R(s,a)

. The goal is to maximize expected cumulative reward over a finite or infinite number of steps. These time steps repeat until some given horizon (called finite horizon) or forever (called infinite horizon). The discount factor

\gamma

maintains a finite sum in the infinite-horizon case (

\gamma \in [0,1)

References

{{Reflist

External links

maspan.org

The Dec-POMDP page
Markov processes