information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...

, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (digital

information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...

) nearly error-free up to a computable maximum rate through the channel. This result was presented by

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American people, American mathematician, electrical engineering, electrical engineer, and cryptography, cryptographer known as a "father of information theory". As a 21-year-o ...

in 1948 and was based in part on earlier work and ideas of

Harry Nyquist Harry Nyquist (, ; February 7, 1889 – April 4, 1976) was a Swedish-American physicist and electronic engineer who made important contributions to communication theory. Personal life Nyquist was born in the village Nilsby of the parish Stora K ...

and

Ralph Hartley Ralph Vinton Lyon Hartley (November 30, 1888 – May 1, 1970) was an American electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory. Biography Hartley was ...

. The Shannon limit or Shannon capacity of a communication channel refers to the maximum

rate Rate or rates may refer to: Finance * Rates (tax), a type of taxation system in the United Kingdom used to fund local government * Exchange rate, rate at which one currency will be exchanged for another Mathematics and science * Rate (mathema ...

of error-free data that can theoretically be transferred over the channel if the link is subject to random data transmission errors, for a particular noise level. It was first described by Shannon (1948), and shortly after published in a book by Shannon and

Warren Weaver Warren Weaver (July 17, 1894 – November 24, 1978) was an American scientist, mathematician, and science administrator. He is widely recognized as one of the pioneers of machine translation and as an important figure in creating support for scien ...

entitled ''

The Mathematical Theory of Communication "A Mathematical Theory of Communication" is an article by mathematician Claude E. Shannon published in ''Bell System Technical Journal'' in 1948. It was renamed ''The Mathematical Theory of Communication'' in the 1949 book of the same name, a sma ...

'' (1949). This founded the modern discipline of

Overview

Stated by

in 1948, the theorem describes the maximum possible efficiency of error-correcting methods versus levels of noise interference and data corruption. Shannon's theorem has wide-ranging applications in both communications and

data storage Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are conside ...

. This theorem is of foundational importance to the modern field of

. Shannon only gave an outline of the proof. The first rigorous proof for the discrete case is due to

Amiel Feinstein Amiel is both a surname and a given name. Notable people with the name include: Surname: * Barbara Amiel (born 1940), writer and wife of Conrad Black * Gausbert Amiel (fl. 13th century), troubadour * Henri-Frédéric Amiel (1821–1881), Swiss phi ...

in 1954. The Shannon theorem states that given a noisy channel with channel capacity ''C'' and information transmitted at a rate ''R'', then if

R < C

there exist

code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...

s that allow the

probability of error In statistics, the term "error" arises in two ways. Firstly, it arises in the context of decision making, where the probability of error may be considered as being the probability of making a wrong decision and which would have a different value f ...

at the receiver to be made arbitrarily small. This means that, theoretically, it is possible to transmit information nearly without error at any rate below a limiting rate, ''C''. The converse is also important. If

R > C

, an arbitrarily small probability of error is not achievable. All codes will have a probability of error greater than a certain positive minimal level, and this level increases as the rate increases. So, information cannot be guaranteed to be transmitted reliably across a channel at rates beyond the channel capacity. The theorem does not address the rare situation in which rate and capacity are equal. The channel capacity

C

can be calculated from the physical properties of a channel; for a band-limited channel with Gaussian noise, using the

Shannon–Hartley theorem In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy-channel coding ...

. Simple schemes such as "send the message 3 times and use a best 2 out of 3 voting scheme if the copies differ" are inefficient error-correction methods, unable to asymptotically guarantee that a block of data can be communicated free of error. Advanced techniques such as Reed–Solomon codes and, more recently, low-density parity-check (LDPC) codes and

turbo code In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closely ...

s, come much closer to reaching the theoretical Shannon limit, but at a cost of high computational complexity. Using these highly efficient codes and with the computing power in today's digital signal processors, it is now possible to reach very close to the Shannon limit. In fact, it was shown that LDPC codes can reach within 0.0045 dB of the Shannon limit (for binary

Additive white Gaussian noise Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics: * ''Additive'' because it is added to any nois ...

(AWGN) channels, with very long block lengths).

Mathematical statement

The basic mathematical model for a communication system is the following: :

\xrightarrow text \begin\hline \text \\ f_n \\ \hline\end \xrightarrow mathrm \begin\hline \text \\ p(y, x) \\ \hline\end \xrightarrow mathrm \begin\hline \text \\ g_n \\ \hline\end \xrightarrow mathrm /math>
A message ''W'' is transmitted through a noisy channel by using encoding and decoding functions. An encoder maps ''W'' into a pre-defined sequence of channel symbols of length ''n''. In its most basic model, the channel distorts each of these symbols independently of the others. The output of the channel –the received sequence– is fed into a decoder which maps the sequence into an estimate of the message. In this setting, the probability of error is defined as:

:: P_e = \text\left\. Theorem (Shannon, 1948):

: 1. For every discrete memoryless channel, the channel capacity, defined in terms of the

mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...

I(X; Y)

as ::

\ C = \sup_ I(X;Y)

: has the following property. For any

\epsilon>0

and

R, for large enough N, there exists a code of length N and rate \geq R and a decoding algorithm, such that the maximal probability of block error is \leq \epsilon .

: 2. If a probability of bit error p_b is acceptable, rates up to R(p_b) are achievable, where

:: R(p_b) = \frac . : and H_2(p_b) is the '' binary entropy function''

:: H_2(p_b)=- \left_b \log_2  + (1-p_b) \log_2 () \right /math>

: 3. For any p_b, rates greater than R(p_b) are not achievable.

(MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991), p. 198; Shannon (1948) thm. 11)

Outline of proof

As with the several other major results in information theory, the proof of the noisy channel coding theorem includes an achievability result and a matching converse result. These two components serve to bound, in this case, the set of possible rates at which one can communicate over a noisy channel, and matching serves to show that these bounds are tight bounds. The following outlines are only one set of many different styles available for study in information theory texts.

Achievability for discrete memoryless channels

This particular proof of achievability follows the style of proofs that make use of the asymptotic equipartition property (AEP). Another style can be found in information theory texts using error exponents. Both types of proofs make use of a random coding argument where the codebook used across a channel is randomly constructed - this serves to make the analysis simpler while still proving the existence of a code satisfying a desired low probability of error at any data rate below the channel capacity. By an AEP-related argument, given a channel, length

n

strings of source symbols

X_1^

, and length

n

strings of channel outputs

Y_1^

, we can define a ''jointly typical set'' by the following: :

A_\varepsilon^ = \

We say that two sequences

and

Y_1^n

are ''jointly typical'' if they lie in the jointly typical set defined above. Steps # In the style of the random coding argument, we randomly generate

2^

codewords of length n from a probability distribution Q. # This code is revealed to the sender and receiver. It is also assumed that one knows the transition matrix

p(y, x)

for the channel being used. # A message W is chosen according to the uniform distribution on the set of codewords. That is,

Pr(W = w) = 2^, w = 1, 2, \dots, 2^

. # The message W is sent across the channel. # The receiver receives a sequence according to

P(y^n, x^n(w))= \prod_^np(y_i, x_i(w))

# Sending these codewords across the channel, we receive

Y_1^n

, and decode to some source sequence if there exists exactly 1 codeword that is jointly typical with Y. If there are no jointly typical codewords, or if there are more than one, an error is declared. An error also occurs if a decoded codeword doesn't match the original codeword. This is called ''typical set decoding''. The probability of error of this scheme is divided into two parts: # First, error can occur if no jointly typical X sequences are found for a received Y sequence # Second, error can occur if an incorrect X sequence is jointly typical with a received Y sequence. * By the randomness of the code construction, we can assume that the average probability of error averaged over all codes does not depend on the index sent. Thus, without loss of generality, we can assume ''W'' = 1. * From the joint AEP, we know that the probability that no jointly typical X exists goes to 0 as n grows large. We can bound this error probability by

\varepsilon

. * Also from the joint AEP, we know the probability that a particular

X_1^(i)

and the

Y_1^n

resulting from ''W'' = 1 are jointly typical is

\le 2^

. Define:

E_i = \, i = 1, 2, \dots, 2^

as the event that message i is jointly typical with the sequence received when message 1 is sent. :

\begin
P(\text) &  = P(\text, W=1) \le P(E_1^c) + \sum_^P(E_i) \\
&  \le P(E_1^c) + (2^-1)2^ \\
&  \le \varepsilon + 2^.
\end

We can observe that as

n

goes to infinity, if

R < I(X;Y)

for the channel, the probability of error will go to 0. Finally, given that the average codebook is shown to be "good," we know that there exists a codebook whose performance is better than the average, and so satisfies our need for arbitrarily low error probability communicating across the noisy channel.

Weak converse for discrete memoryless channels

Suppose a code of

2^

codewords. Let W be drawn uniformly over this set as an index. Let

X^n

and

Y^n

be the transmitted codewords and received codewords, respectively. #

nR = H(W) = H(W, Y^n) + I(W;Y^n)

using identities involving entropy and mutual information #

\le H(W, Y^n) + I(X^n(W);Y^)

since X is a function of W #

\le 1 + P_e^nR + I(X^n(W);Y^n)

by the use of Fano's Inequality #

\le 1 + P_e^nR + nC

by the fact that capacity is maximized mutual information. The result of these steps is that

P_e^ \ge 1 - \frac - \frac

. As the block length

n

goes to infinity, we obtain

P_e^

is bounded away from 0 if R is greater than C - we can get arbitrarily low rates of error only if R is less than C.

Strong converse for discrete memoryless channels

A strong converse theorem, proven by Wolfowitz in 1957,Robert Gallager. ''Information Theory and Reliable Communication.'' New York:

John Wiley & Sons John Wiley & Sons, Inc., commonly known as Wiley (), is an American multinational publishing company founded in 1807 that focuses on academic publishing and instructional materials. The company produces books, journals, and encyclopedias, in p ...

, 1968. states that, :

P_e \geq 1- \frac - e^

for some finite positive constant

A

. While the weak converse states that the error probability is bounded away from zero as

n

goes to infinity, the strong converse states that the error goes to 1. Thus,

C

is a sharp threshold between perfectly reliable and completely unreliable communication.

Channel coding theorem for non-stationary memoryless channels

We assume that the channel is memoryless, but its transition probabilities change with time, in a fashion known at the transmitter as well as the receiver. Then the channel capacity is given by :

C=\lim \inf \max_\frac\sum_^nI(X_i;Y_i).

The maximum is attained at the capacity achieving distributions for each respective channel. That is,

C=\lim \inf \frac\sum_^n C_i

where

C_i

is the capacity of the i''th'' channel.

Outline of the proof

The proof runs through in almost the same way as that of channel coding theorem. Achievability follows from random coding with each symbol chosen randomly from the capacity achieving distribution for that particular channel. Typicality arguments use the definition of typical sets for non-stationary sources defined in the asymptotic equipartition property article. The technicality of

lim inf In mathematics, the limit inferior and limit superior of a sequence can be thought of as limiting (that is, eventual and extreme) bounds on the sequence. They can be thought of in a similar fashion for a function (see limit of a function). For a ...

comes into play when

\frac\sum_^n C_i

does not converge.

Notes

References

* Cover T. M., Thomas J. A., ''Elements of Information Theory'',

, 1991. * Fano, R. A., ''Transmission of information; a statistical theory of communications'',

MIT Press The MIT Press is a university press affiliated with the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts (United States). It was established in 1962. History The MIT Press traces its origins back to 1926 when MIT publish ...

, 1961. * Feinstein, Amiel, "A New basic theorem of information theory", ''

IEEE Transactions on Information Theory ''IEEE Transactions on Information Theory'' is a monthly peer-reviewed scientific journal published by the IEEE Information Theory Society. It covers information theory and the mathematics of communications. It was established in 1953 as ''IRE Tran ...

'', 4(4): 2-22, 1954. * MacKay, David J. C.,
Information Theory, Inference, and Learning Algorithms
',

Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press A university press is an academic publishing hou ...

, 2003. ree online* Shannon, C. E.
''A Mathematical Theory of Communication''
''The Bell System Technical Journal'' 27,3: 379–423, 1948. * Shannon, C. E.
''A Mathematical Theory of Communication''
Urbana, IL: University of Illinois Press, 1948 (reprinted 1998). * Wolfowitz, J.,
The coding of messages subject to chance errors
, ''Illinois J. Math.'', 1: 591–606, 1957.

External links

On Shannon and Shannon's law

Shannon's Noisy Channel Coding Theorem
{{DEFAULTSORT:Noisy-Channel Coding Theorem Information theory Theorems in discrete mathematics Telecommunication theory Coding theory