In the mathematical theory of

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

, the entropy rate or source information rate of a

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...

is, informally, the time density of the average information in a stochastic process. For stochastic processes with a

countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbe ...

index, the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

rate

H(X)

is the limit of the

joint entropy In information theory, joint entropy is a measure of the uncertainty associated with a set of variables. Definition The joint Shannon entropy (in bits) of two discrete random variables X and Y with images \mathcal X and \mathcal Y is defined ...

n

members of the process

X_k

divided by

n

, as

n

tends to

infinity Infinity is that which is boundless, endless, or larger than any natural number. It is often denoted by the infinity symbol . Since the time of the ancient Greeks, the philosophical nature of infinity was the subject of many discussions a ...

: :

H(X) = \lim_ \frac H(X_1, X_2, \dots X_n)

when the limit exists. An alternative, related quantity is: :

H'(X) = \lim_ H(X_n, X_, X_, \dots X_1)

For strongly stationary stochastic processes,

H(X) = H'(X)

. The entropy rate can be thought of as a general property of stochastic sources; this is the

asymptotic equipartition property In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression. Roughly speaking, the t ...

. The entropy rate may be used to estimate the complexity of stochastic processes. It is used in diverse applications ranging from characterizing the complexity of languages, blind source separation, through to optimizing quantizers and data compression algorithms. For example, a maximum entropy rate criterion may be used for

feature selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model const ...

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machin ...

Entropy rates for Markov chains

Since a stochastic process defined by a

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...

that is irreducible,

aperiodic A periodic function is a function that repeats its values at regular intervals. For example, the trigonometric functions, which repeat at intervals of 2\pi radians, are periodic functions. Periodic functions are used throughout science to de ...

and positive recurrent has a

stationary distribution Stationary distribution may refer to: * A special distribution for a Markov chain such that if the chain starts with its stationary distribution, the marginal distribution of all states at any time will always be the stationary distribution. Assum ...

, the entropy rate is independent of the initial distribution. For example, for such a Markov chain

Y_k

defined on a

number of states, given the transition matrix

P_

H(Y)

is given by: :

\displaystyle H(Y) = - \sum_ \mu_i P_ \log P_

where

\mu_i

is the

asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ap ...

of the chain. A simple consequence of this definition is that an

i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...

has an entropy rate that is the same as the

of any individual member of the process.

References

* Cover, T. and Thomas, J. (1991) Elements of Information Theory, John Wiley and Sons, Inc., {{ISBN, 0-471-06259-6}

Information theory Entropy Markov models Temporal rates

Entropy rates for Markov chains

See also

References