In the mathematical theory of

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

, the entropy rate or source information rate is a function assigning an

entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...

to a

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...

. For a strongly stationary process, the conditional entropy for latest random variable eventually tend towards this rate value.

Definition

A process

X

with a

countable In mathematics, a Set (mathematics), set is countable if either it is finite set, finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function fro ...

index gives rise to the sequence of its joint entropies

H_n(X_1, X_2, \dots X_n)

. If the limit exists, the entropy rate is defined as :

H(X) := \lim_ \tfrac H_n.

Note that given any sequence

(a_n)_n

with

a_0=0

and letting

\Delta a_k := a_k - a_

, by telescoping one has

a_n=\Delta a_k

. The entropy rate thus computes the mean of the first

n

such entropy changes, with

n

going to The ''going-to'' future is a grammatical construction used in English to refer to various types of future occurrences. It is made using appropriate forms of the expression ''to be going to''.Fleischman, Suzanne, ''The Future in Thought and Lang ...

infinity Infinity is something which is boundless, endless, or larger than any natural number. It is denoted by \infty, called the infinity symbol. From the time of the Ancient Greek mathematics, ancient Greeks, the Infinity (philosophy), philosophic ...

. The behaviour of joint entropies from one index to the next is also explicitly subject in some characterizations of entropy.

Discussion

While

X

may be understood as a sequence of random variables, the entropy rate

H(X)

represents the average entropy change per one random variable, in the long term. It can be thought of as a general property of stochastic sources - this is the subject of the

asymptotic equipartition property In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression. Roughly speaking, the t ...

For strongly stationary processes

A stochastic process also gives rise to a sequence of conditional entropies, comprising more and more random variables. For strongly stationary stochastic processes, the entropy rate equals the limit of that sequence :

H(X) = \lim_ H(X_n, X_, X_, \dots X_1)

The quantity given by the limit on the right is also denoted

H'(X)

, which is motivated to the extent that here this is then again a rate associated with the process, in the above sense.

For Markov chains

Since a stochastic process defined by a

Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...

that is irreducible and

aperiodic A periodic function, also called a periodic waveform (or simply periodic wave), is a function that repeats its values at regular intervals or periods. The repeatable part of the function or waveform is called a ''cycle''. For example, the tr ...

has a

stationary distribution Stationary distribution may refer to: * and , a special distribution for a Markov chain such that if the chain starts with its stationary distribution, the marginal distribution of all states at any time will always be the stationary distribution. ...

, the entropy rate is independent of the initial distribution. For example, consider a Markov chain defined on a

number of states. Given its right stochastic transition matrix

P_

and an entropy :

h_i := -\sum_ P_ \log P_

associated with each state, one finds :

\displaystyle H(X) = \sum_ \mu_i h_i,

where

\mu_i

is the asymptotic distribution of the chain. In particular, it follows that the entropy rate of an i.i.d.

is the same as the entropy of any individual member in the process.

For hidden Markov models

The entropy rate of hidden Markov models (HMM) has no known closed-form solution. However, it has known upper and lower bounds. Let the underlying Markov chain

X_

be stationary, and let

Y_

be the observable states, then we have

H(Y_n, X_1 , Y_ ) \leq H(Y) \leq H(Y_n, Y_ )

and at the limit of

n \to \infty

, both sides converge to the middle.

Applications

The entropy rate may be used to estimate the complexity of stochastic processes. It is used in diverse applications ranging from characterizing the complexity of languages, blind source separation, through to optimizing quantizers and data compression algorithms. For example, a maximum entropy rate criterion may be used for

feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

References

{{Reflist

External links

Cover, T. and Thomas, J. Elements of Information Theory. John Wiley and Sons, Inc. Second Edition, 2006.
Information theory Entropy Markov models Temporal rates