The entropic vector or entropic function is a concept arising in

information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...

. It represents the possible values of Shannon's information entropy that subsets of one set of random variables may take. Understanding which vectors are entropic is a way to represent all possible inequalities between entropies of various subsets. For example, for any two random variables

X_1,X_2

, their joint entropy (the entropy of the random variable representing the pair

X_1,X_2

) is at most the sum of the entropies of

X_1

and of

X_2

: :

H(X_1,X_2) \leq H(X_1) + H(X_2)

Other information-theoretic measures such as

conditional information In information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable Y given that the value of another random variable X is known. Here, information is measured in shannons, na ...

, mutual information, or total correlation can be expressed in terms of joint entropy and are thus related by the corresponding inequalities. Many inequalities satisfied by entropic vectors can be derived as linear combinations of a few basic ones, called ''Shannon-type inequalities''. However, it has been proven that already for

n=4

variables, no finite set of linear inequalities is sufficient to characterize all entropic vectors.

Definition

Shannon's information entropy of a random variable

X

is denoted

H(X)

. For a tuple of random variables

X_1,X_2,\ldots,X_n

, we denote the joint entropy of a subset

X_,X_,\dots,X_

H(X_,X_,\dots,X_)

, or more concisely as

H(X_I)

, where

I=\

. Here

X_I

can be understood as the random variable representing the tuple

(X_,X_,\dots,X_)

. For the empty subset

I=\emptyset

X_I

denotes a deterministic variable with entropy 0. A vector ''h'' in

\mathbb^

indexed by subsets of

\

is called an ''entropic vector'' of order

n

if there exists a tuple of random variables

X_1,X_2,\ldots,X_n

such that

h(I)=H(X_I)

for each subset

I \subseteq \

. The set of all entropic vectors of order

n

is denoted by

\Gamma_n^*

. Zhang and Yeung proved that it is not closed (for

n \geq 3

), but its closure,

\bar

, is a convex cone and hence characterized by the (infinitely many) linear inequalities it satisfies. Describing the region

\bar

is thus equivalent to characterizing all possible inequalities on joint entropies.

Example

Let ''X'',''Y'' be two independent random variables with

discrete uniform distribution In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of ''n'' values has equal probability 1/''n''. Anothe ...

over the set

\

. Then :

H(X) = H(Y) = 1

(since each is uniformly distributed over a two-element set), and :

H(X,Y)= H(X) + H(Y) = 2

(since the two variables are independent, which means the pair

(X_1,X_2)

is uniformly distributed over

(0,0),(0,1),(1,0),(1,1)

.) The corresponding entropic vector is thus:

v = (0,1,1,2)^ \in \Gamma_2^*

On the other hand, the vector

(0,1,1,3)^

is not entropic (that is,

(0,1,1,3)^ \not\in \Gamma_2^*

), because any pair of random variables (independent or not) should satisfy

H(X,Y) \leq H(X) + H(Y)

Characterizing entropic vectors: the region Γ_''n''^*

Shannon-type inequalities and Γ_''n''

For a tuple of random variables

X_1,X_2,\ldots,X_n

, their entropies satisfy: :

1)   \quad   H(X_\empty) = 0

2)   \quad   H(X_I) \leq H(X_J)

, for any

I \subseteq J \subseteq \

In particular,

H(X_I) \geq 0

, for any

I \subseteq \

. The Shannon inequality says that an entropic vector is

submodular In mathematics, a submodular set function (also known as a submodular function) is a set function whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an ...

: :

3)   \quad    H(X_I) + H(X_J) \geq H(X_) + H(X_)

, for any

I,J \subseteq \

It is equivalent to the inequality stating that the

conditional mutual information In probability theory, particularly information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third. Definition For random var ...

is non-negative: :

\begin
    I(X;Y\mid Z) &= H(X\mid Z) - H(X\mid Y,Z) \\
             &= H(X\mid Z)+H(Y\mid Z)-H(X,Y\mid Z)\\
             &= H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)\\
             &\geq 0
\end

(For one direction, observe this the last form expresses Shannon's inequality for subsets

X,Z

and

Y,Z

of the tuple

X,Y,Z

; for the other direction, substitute

X=X_I

Y=X_J

Z=X_

). Many inequalities can be derived as linear combinations of Shannon inequalities; they are called Shannon-type inequalities or ''basic information inequalities'' of Shannon's information measures. The set of vectors that satisfies them is called

\Gamma_n

; it contains

\Gamma_n^*

. Software has been developed to automate the task of proving Shannon-type inequalities. Given an inequality, such software is able to determine whether the given inequality is a valid Shannon-type inequality (i.e., whether it contains the cone

\Gamma_n

Non-Shannon-type inequalities

The question of whether Shannon-type inequalities are the only ones, that is, whether they completely characterize the region

\Gamma_n^*

, was first asked by Te Su Han in 1981 and more precisely by

Nicholas Pippenger Nicholas John Pippenger is a researcher in computer science. He has produced a number of fundamental results many of which are being widely used in the field of theoretical computer science, database processing and compiler optimization. He has ...

in 1986. It is not hard to show that this is true for two variables, that is,

\Gamma_2^* = \Gamma_2

. For three variables, Zhang and Yeung proved that

\Gamma_3^* \neq \Gamma_3

; however, it is still asymptotically true, meaning that the closure is equal:

\overline = \Gamma_3

. In 1998, Zhang and Yeung showed that

\overline \neq \Gamma_n

for all

n\geq 4

, by proving that the following inequality on four random variables (in terms of conditional mutual information) is true for any entropic vector, but is not Shannon-type: :

2I(X_3,X_4) \leq I(X_1,X_2) + I(X_1:X_3,X_4) + 3I(X_3:X_4, X_1) + I(X_3:X_4,  X_2)

Further inequalities and infinite families of inequalities have been found. These inequalities provide outer bounds for

\overline

better than the Shannon-type bound

\Gamma_n

. In 2007, Matus proved that no finite set of linear inequalities is sufficient (to deduce all as linear combinations), for

n\geq 4

variables. In other words, the region

\overline

is not polyhedral. Whether they can be characterized in some other way (allowing to effectively decide whether a vector is entropic or not) remains an open problem. Analogous questions for von Neumann entropy in quantum information theory have been considered.

Inner bounds

Some inner bounds of

\overline

are also known. One example is that

\overline

contains all vectors in

\Gamma_4

which additionally satisfy the following inequality (and those obtained by permuting variables), known as

Ingleton's inequality In mathematics, Ingleton's inequality is an inequality that is satisfied by the rank function of any representable matroid. In this sense it is a necessary condition for representability of a matroid over a finite field. Let ''M'' be a matroid an ...

for entropy: :

I(X_1;X_2)+I(X_3;X_4\mid X_1)+I(X_3;X_4\mid X_2)-I(X_3;X_4) \geq 0

Entropy and groups

Group-characterizable vectors and quasi-uniform distributions

Consider a group

G

and subgroups

G_1, G_2, \dots, G_n

G

. Let

G_I

denote

\bigcap_ G_i

for

I \subseteq \

; this is also a subgroup of

G

. It is possible to construct a probability distribution for

n

random variables

X_1,\dots,X_n

such that :

H(X_I) = \log \frac

. (The construction essentially takes an element

a

G

uniformly at random and lets

X_i

be the corresponding coset

aG_i

). Thus any information-theoretic inequality implies a group-theoretic one. For example, the basic inequality

H(X,Y) \leq H(X) + H(Y)

implies that :

, G,  \cdot , G_1 \cap G_2,  \geq , G_1,  \cdot , G_2, .

It turns out the converse is essentially true. More precisely, a vector is said to be ''group-characterizable'' if it can be obtained from a tuple of subgroups as above. The set of group-characterizable vectors is denoted

\Upsilon^n

. As said above,

\Upsilon^n \subseteq \Gamma^*_n

. On the other hand,

\Gamma^*_n

(and thus

\overline

) is contained in the topological closure of the convex closure of

\Upsilon^n

. In other words, a linear inequality holds for all entropic vectors if and only if it holds for all vectors

h

of the form

h_I=\log \frac

, where

I

goes over subsets of some tuple of subgroups

G_1, G_2, \dots, G_n

in a group

G

. Group-characterizable vectors that come from an abelian group satisfy Ingleton's inequality.

Kolmogorov complexity

Kolmogorov complexity satisfies essentially the same inequalities as entropy. Namely, denote the Kolmogorov complexity of a finite string

x

K(x)

(that is, the length of the shortest program that outputs

x

). The joint complexity of two strings

x,y

, defined as the complexity of an encoding of the pair

\langle x,y\rangle

, can be denoted

K(x,y)

. Similarly, the conditional complexity can be denoted

K(x, y)

(the length of the shortest program that outputs

x

given

y

). Andrey Kolmogorov noticed these notions behave similarly to Shannon entropy, for example: :

K(a)+K(b) \geq K(a,b) - O(\log , a,  + \log , b, )

In 2000, Hammer et al. proved that indeed an inequality holds for entropic vectors if and only if the corresponding inequality in terms of Kolmogorov complexity holds up to logarithmic terms for all tuples of strings.

References

* Thomas M. Cover, Joy A. Thomas. ''Elements of information theory'' New York: Wiley, 1991. * Raymond Yeung. ''A First Course in Information Theory'', Chapter 12, ''Information Inequalities'', 2002, Print {{ISBN, 0-306-46791-7 Information theory

Definition

Example

Characterizing entropic vectors: the region Γ''n''*

Shannon-type inequalities and Γ''n''

Non-Shannon-type inequalities

Inner bounds

Entropy and groups

Group-characterizable vectors and quasi-uniform distributions

Kolmogorov complexity

See also

References

Characterizing entropic vectors: the region Γ_''n''^*

Shannon-type inequalities and Γ_''n''