probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, de Finetti's theorem states that exchangeable observations are

conditionally independent In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabi ...

relative to some

latent variable In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...

. An

epistemic probability Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system ...

distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...

could then be assigned to this variable. It is named in honor of

Bruno de Finetti Bruno de Finetti (13 June 1906 – 20 July 1985) was an Italian probabilist statistician and actuary, noted for the "operational subjective" conception of probability. The classic exposition of his distinctive theory is the 1937 , which discuss ...

, and one of its uses is in providing a pragmatic approach to de Finetti's well-known dictum "Probability does not exist". For the special case of an exchangeable sequence of Bernoulli random variables it states that such a sequence is a "

mixture In chemistry, a mixture is a material made up of two or more different chemical substances which can be separated by physical method. It is an impure substance made up of 2 or more elements or compounds mechanically mixed together in any proporti ...

" of sequences of

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

(i.i.d.) Bernoulli random variables. A sequence of random variables is called exchangeable if the joint distribution of the sequence is unchanged by any permutation of a finite set of indices. In general, while the variables of the exchangeable sequence are not ''themselves'' independent, only exchangeable, there is an ''underlying'' family of i.i.d. random variables. That is, there are underlying, generally unobservable, quantities that are i.i.d. – exchangeable sequences are mixtures of i.i.d. sequences.

Background

A Bayesian statistician often seeks the conditional probability distribution of a random quantity given the data. The concept of

exchangeability In statistics, an exchangeable sequence of random variables (also sometimes interchangeable) is a sequence ''X''1, ''X''2, ''X''3, ... (which may be finitely or infinitely long) whose joint probability distribution does not change wh ...

was introduced by de Finetti. De Finetti's theorem explains a mathematical relationship between independence and exchangeability. An infinite sequence :

X_1, X_2, X_3, \dots

of random variables is said to be exchangeable if for any

natural number In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positive in ...

''n'' and any finite sequence ''i''₁, ..., ''i''_''n'' and any permutation of the sequence π: → , :

(X_,\dots,X_) \text (X_,\dots,X_)

both have the same

joint probability distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

. If an identically distributed sequence is

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

, then the sequence is exchangeable; however, the converse is false—there exist exchangeable random variables that are not statistically independent, for example the Pólya urn model.

Statement of the theorem

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

''X'' has a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...

if Pr(''X'' = 1) = ''p'' and Pr(''X'' = 0) = 1 − ''p'' for some ''p'' ∈ (0, 1). De Finetti's theorem states that the probability distribution of any infinite exchangeable sequence of Bernoulli random variables is a "

" of the probability distributions of independent and identically distributed sequences of Bernoulli random variables. "Mixture", in this sense, means a weighted average, but this need not mean a finite or countably infinite (i.e., discrete) weighted average: it can be an integral over a measure rather than a sum. More precisely, suppose ''X''₁, ''X''₂, ''X''₃, ... is an infinite exchangeable sequence of Bernoulli-distributed random variables. Then there is some probability measure ''m'' on the interval , 1and some random variable ''Y'' such that * The probability measure of ''Y'' is ''m'', and * The

conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...

of the whole sequence ''X''₁, ''X''₂, ''X''₃, ... given the value of ''Y'' is described by saying that ** ''X''₁, ''X''₂, ''X''₃, ... are

given ''Y'', and ** For any ''i'' ∈ , the conditional probability that ''X''_''i'' = 1, given the value of ''Y'', is ''Y''.

Another way of stating the theorem

Suppose

X_1,X_2,X_3,\ldots

is an infinite exchangeable sequence of Bernoulli random variables. Then

X_1,X_2,X_3,\ldots

are conditionally independent and identically distributed given the exchangeable sigma-algebra (i.e., the sigma-algebra consisting of events that are measurable with respect to

X_1,X_2,\ldots

and invariant under finite permutations of the indices).

A plain-language consequence of the theorem

According to

David Spiegelhalter Sir David John Spiegelhalter (born 16 August 1953) is a British statistician and a Fellow of Churchill College, Cambridge. From 2007 to 2018 he was Winton Professorship of the Public Understanding of Risk, Winton Professor of the Public Under ...

(ref 1) the theorem provides a pragmatic approach to de Finetti's statement that "Probability does not exist". If our view of the probability of a sequence of events is subjective but remains unaffected by the order in which we make our observations, then the sequence can be regarded as ''exchangeable''. De Finetti's theorem then implies that believing the sequence to be exchangeable is mathematically equivalent to acting as if the events are ''independent'' and have an objective underlying probability of occurring, with our uncertainty about what that probability is being expressed by a subjective probability distribution function. According to Spiegelhalter: ″This is remarkable: it shows that, starting from a specific, but purely subjective, expression of convictions, we should act as if events were driven by objective chances."

Example

As a concrete example, we construct a sequence :

X_1, X_2, X_3, \dots

of random variables, by "mixing" two i.i.d. sequences as follows. We assume ''p'' = 2/3 with probability 1/2 and ''p'' = 9/10 with probability 1/2. Given the event ''p'' = 2/3, the conditional distribution of the sequence is that the ''X''_i are independent and identically distributed and ''X''₁ = 1 with probability 2/3 and ''X''₁ = 0 with probability 1 − 2/3. Given the event ''p'' = 9/10, the conditional distribution of the sequence is that the ''X''_i are independent and identically distributed and ''X''₁ = 1 with probability 9/10 and ''X''₁ = 0 with probability 1 − 9/10. This can be interpreted as follows: Make two biased coins, one showing "heads" with 2/3 probability and one showing "heads" with 9/10 probability. Flip a fair coin once to decide which biased coin to use for all flips that are recorded. Here "heads" at flip i means X_i=1. The independence asserted here is ''conditional'' independence, i.e. the Bernoulli random variables in the sequence are conditionally independent given the event that ''p'' = 2/3, and are conditionally independent given the event that ''p'' = 9/10. But they are not unconditionally independent; they are positively

correlated In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...

. In view of the

strong law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

, we can say that :

\lim_ \frac = \begin
2/3 & \text1/2, \\
9/10 & \text1/2.
\end

Rather than concentrating probability 1/2 at each of two points between 0 and 1, the "mixing distribution" can be any

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

supported on the interval from 0 to 1; which one it is depends on the joint distribution of the infinite sequence of Bernoulli random variables. The definition of exchangeability, and the statement of the theorem, also makes sense for finite length sequences :

X_1,\dots, X_n,

but the theorem is not generally true in that case. It is true if the sequence can be extended to an exchangeable sequence that is infinitely long. The simplest example of an exchangeable sequence of Bernoulli random variables that cannot be so extended is the one in which ''X''₁ = 1 − ''X''₂ and ''X''₁ is either 0 or 1, each with probability 1/2. This sequence is exchangeable, but cannot be extended to an exchangeable sequence of length 3, let alone an infinitely long one.

As a categorical limit

De Finetti's theorem can be expressed as a categorical limit in the

category of Markov kernels In mathematics, the category of Markov kernels, often denoted Stoch, is the category whose objects are measurable spaces and whose morphisms are Markov kernels. It is analogous to the category of sets and functions, but where the arrows can be ...

. Let

(X,\mathcal)

be a

standard Borel space In mathematics, a standard Borel space is the Borel space associated with a Polish space. Except in the case of discrete Polish spaces, the standard Borel space is unique, up to isomorphism of measurable spaces. Formal definition A measurable ...

, and consider the space of sequences on

X

, the countable product

X^\mathbb

(equipped with the

product sigma-algebra Product may refer to: Business * Product (business), an item that can be offered to a market to satisfy the desire or need of a customer. * Product (project management), a deliverable or set of deliverables that contribute to a business solution ...

). Given a finite

permutation In mathematics, a permutation of a set can mean one of two different things: * an arrangement of its members in a sequence or linear order, or * the act or process of changing the linear order of an ordered set. An example of the first mean ...

\sigma

, denote again by

\sigma

the permutation action on

X^\mathbb

, as well as the

Markov kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finit ...

X^\mathbb\to X^\mathbb

induced by it. In terms of

category theory Category theory is a general theory of mathematical structures and their relations. It was introduced by Samuel Eilenberg and Saunders Mac Lane in the middle of the 20th century in their foundational work on algebraic topology. Category theory ...

, we have a

diagram A diagram is a symbolic Depiction, representation of information using Visualization (graphics), visualization techniques. Diagrams have been used since prehistoric times on Cave painting, walls of caves, but became more prevalent during the Age o ...

with a single object,

X^\mathbb

, and a countable number of arrows, one for each permutation. Recall now that a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

p

is equivalently a

from the one-point measurable space. A

p

X^\mathbb

is exchangeable if and only if, as Markov kernels,

\sigma\circ p=p

for every permutation

\sigma

. More generally, given any standard Borel space

Y

, one can call a Markov kernel

k:Y\to X

''exchangeable'' if

\sigma\circ k=k

for every

\sigma

, i.e. if the following diagram commutes, Definetti-cone

giving a

cone In geometry, a cone is a three-dimensional figure that tapers smoothly from a flat base (typically a circle) to a point not contained in the base, called the '' apex'' or '' vertex''. A cone is formed by a set of line segments, half-lines ...

. De Finetti's theorem can be now stated as the fact that the space

PX

of probability measures over

X

( Giry monad) forms a

universal Universal is the adjective for universe. Universal may also refer to: Companies * NBCUniversal, a media and entertainment company that is a subsidiary of Comcast ** Universal Animation Studios, an American Animation studio, and a subsidiary of N ...

(or limit) cone. More in detail, consider the Markov kernel

\mathrm_\mathbb:PX\to X^\mathbb

constructed as follows, using the

Kolmogorov extension theorem In mathematics, the Kolmogorov extension theorem (also known as Kolmogorov existence theorem, the Kolmogorov consistency theorem or the Daniell-Kolmogorov theorem) is a theorem that guarantees that a suitably "consistent" collection of finite-dim ...

: :

\mathrm_\mathbb(A_1\times\dots\times A_n\times X\times\dots, p) = p(A_1)\cdots p(A_n)

for all measurable subsets

A_1,\dots,A_n

X

. Note that we can interpret this kernel as taking a probability measure

p\in PX

as input and returning an iid sequence on

X^\mathbb

distributed according to

p

. Since iid sequences are exchangeable,

\mathrm_\mathbb:PX\to X^\mathbb

is an exchangeable kernel in the sense defined above. The kernel

\mathrm_\mathbb:PX\to X^\mathbb

doesn't just form a cone, but a limit cone: given any exchangeable kernel

k:Y\to X

, there exists a unique kernel

\tilde:Y\to PX

such that

k=\mathrm_\mathbb\circ\tilde

, i.e. making the following diagram commute: Definetti-limit

In particular, for any exchangeable probability measure

p

X^\mathbb

, there exists a unique probability measure

\tilde

PX

(i.e. a probability measure over probability measures) such that

p=\mathrm_\mathbb\circ\tilde

, i.e. such that for all measurable subsets

A_1,\dots,A_n

X

, :

p(A_1\times\dots\times A_n\times X\times\dots) = \int_ \mathrm_\mathbb(A_1\times\dots\times A_n\times X\times\dots, q) \, \tilde(dq) = \int_ q(A_1)\cdots q(A_n) \, \tilde(dq) .

In other words,

p

is a

of iid measures on

X

(the ones formed by

q

in the integral above).

Extensions

Versions of de Finetti's theorem for ''finite'' exchangeable sequences, and for ''Markov exchangeable'' sequences have been proved by Diaconis and Freedman and by Kerns and Szekely. Two notions of partial exchangeability of arrays, known as ''separate'' and ''joint exchangeability'' lead to extensions of de Finetti's theorem for arrays by Aldous and Hoover. The computable de Finetti theorem shows that if an exchangeable sequence of real random variables is given by a computer program, then a program which samples from the mixing measure can be automatically recovered. In the setting of

free probability Free probability is a mathematics, mathematical theory that studies non-commutative random variables. The "freeness" or free independence property is the analogue of the classical notion of statistical independence, independence, and it is connecte ...

, there is a noncommutative extension of de Finetti's theorem which characterizes noncommutative sequences invariant under quantum permutations. Extensions of de Finetti's theorem to quantum states have been found to be useful in

quantum information Quantum information is the information of the state of a quantum system. It is the basic entity of study in quantum information theory, and can be manipulated using quantum information processing techniques. Quantum information refers to both t ...

, in topics like

quantum key distribution Quantum key distribution (QKD) is a secure communication method that implements a cryptographic protocol involving components of quantum mechanics. It enables two parties to produce a shared random secret key known only to them, which then can b ...

and entanglement detection. A multivariate extension of de Finetti’s theorem can be used to derive

Bose–Einstein statistics In quantum statistics, Bose–Einstein statistics (B–E statistics) describes one of two possible ways in which a collection of non-interacting identical particles may occupy a set of available discrete energy states at thermodynamic equilibri ...

from the statistics of classical (i.e. independent) particles.

References

External links

*
What is so cool about De Finetti's representation theorem?
*{{nlab, id=de+Finetti's+theorem, title=De Finetti's theorem Theorems in probability theory Bayesian statistics Integral representations