In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, the asymptotic equipartition property (AEP) is a general property of the output samples of a
stochastic source. It is fundamental to the concept of
typical set In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asympt ...
used in theories of
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
.
Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the
law of large numbers
In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
and
ergodic theory
Ergodic theory is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, "statistical properties" refers to properties which are expressed through the behav ...
.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through
Cramér's large deviation theorem, which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in
large deviations theory
In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insura ...
; intuitively, it is the large deviations that would violate equipartition, but these are unlikely.
In the field of
pseudorandom number generation, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning ''sufficient'' typicality.
Definition
Given a discrete-time stationary ergodic stochastic process
on the
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...
, the asymptotic equipartition property is an assertion that,
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...
,
where
or simply
denotes the
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate is a function assigning an entropy to a stochastic process.
For a strongly stationary process, the conditional entropy for latest random variable eventually ...
of
, which must exist for all discrete-time
stationary process
In mathematics and statistics, a stationary process (also called a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose statistical properties, such as mean and variance, do not change over time. M ...
es including the ergodic ones. The asymptotic equipartition property is proved for finite-valued (i.e.
) stationary ergodic stochastic processes in the
Shannon–McMillan–Breiman theorem using the ergodic theory and for any
i.i.d. sources directly using the law of large numbers in both the discrete-valued case (where
is simply the
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
of a symbol) and the continuous-valued case (where
is the differential entropy instead). The definition of the asymptotic equipartition property can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven
almost sure
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur has ...
in all cases.
Discrete-time i.i.d. sources
Given
is an
i.i.d. source which may take values in the alphabet
, its
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
is i.i.d. with
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
. The weak
law of large numbers
In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
gives the asymptotic equipartition property with
convergence in probability
In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...
,
since the entropy is equal to the expectation of
The strong law of large numbers asserts the stronger almost sure convergence,
Convergence in the sense of L1 asserts an even stronger
Discrete-time finite-valued stationary ergodic sources
Consider a finite-valued sample space
, i.e.
, for the discrete-time
stationary ergodic process In probability theory, a stationary ergodic process is a stochastic process which exhibits both stationarity and ergodicity. In essence this implies that the random process will not change its statistical properties with time and that its statistic ...
defined on the
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...
. The Shannon–McMillan–Breiman theorem, due to
Claude Shannon
Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...
,
Brockway McMillan
Brockway McMillan (March 30, 1915 – December 3, 2016) was an American government official and scientist, who served as the eighth Under Secretary of the Air Force and the second Director of the National Reconnaissance Office.
McMillan was ...
, and
Leo Breiman
Leo Breiman (January 27, 1928 – July 5, 2005) was an American statistician at the University of California, Berkeley and a member of the United States National Academy of Sciences.
Breiman's work helped to bridge the gap between statistics an ...
, states that we have convergence in the sense of L1.
Chung Kai-lai
Kai Lai Chung (traditional Chinese: 鍾開萊; simplified Chinese: 钟开莱; September 19, 1917 – June 2, 2009) was a Chinese-American mathematician known for his significant contributions to modern probability theory.
Biography
Chung wa ...
generalized this to the case where
may take value in a set of countable infinity, provided that the entropy rate is still finite.
Non-stationary discrete-time source producing independent symbols
The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the asymptotic equipartition property to hold. Indeed, as is quite clear intuitively, the asymptotic equipartition property requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.
Consider a source that produces independent symbols, possibly with different output statistics at each instant, for which the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that
for all ''i'', for some ''M'' > 0, the following holds (AEP):
where
Applications
The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the
source coding theorem
In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the opera ...
for non-stationary source (with independent output symbols) and
noisy-channel coding theorem
In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible (in theory) to communicate discrete ...
for non-stationary memoryless channels.
Measure-theoretic form
is a measure-preserving map on the probability space
.
If
is a finite or countable partition of
, then its entropy is
with the convention that
.
We only consider partitions with finite entropy:
.
If
is a finite or countable partition of
, then we construct a sequence of partitions by iterating the map:
where
is the least upper bound partition, that is, the least refined partition that refines both
and
:
Write
to be the set in
where
falls in. So, for example,
is the
-letter initial segment of the
name of
.
Write
to be the information (in units of ''
nats'') about
we can recover, if we know which element in the partition
that
falls in:
Similarly, the conditional information of partition
, conditional on partition
, about
, is
is the
Kolmogorov-Sinai entropy
In mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special cas ...
In other words, by definition, there is a convergence in expectation. The SMB theorem states that when
is ergodic, there is convergence in L1.
If
is not necessarily ergodic, then the underlying probability space would be split up into multiple subsets, each invariant under
. In this case, we still have L1 convergence to some function, but that function is no longer a constant function.
When
is ergodic,
is trivial, and so the function
simplifies into the constant function