HOME

TheInfoList



OR:

A binary symmetric channel (or BSCp) is a common
communications channel A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for informat ...
model used in
coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are stud ...
and
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
. In this model, a transmitter wishes to send a
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
(a zero or a one), and the receiver will receive a bit. The bit will be "flipped" with a "crossover
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
" of ''p'', and otherwise is received correctly. This model can be applied to varied communication channels such as telephone lines or
disk drive Disk storage (also sometimes called drive storage) is a general category of storage mechanisms where data is recorded by various electronic, magnetic, optical, or mechanical changes to a surface layer of one or more rotating disks. A disk drive is ...
storage. The
noisy-channel coding theorem In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...
applies to BSCp, saying that information can be transmitted at any rate up to the channel capacity with arbitrarily low error. The channel capacity is 1 - \operatorname H_\text(p) bits, where \operatorname H_\text is the
binary entropy function In information theory, the binary entropy function, denoted \operatorname H(p) or \operatorname H_\text(p), is defined as the entropy of a Bernoulli process with probability p of one of two values. It is a special case of \Eta(X), the entropy fun ...
. Codes including Forney's code have been designed to transmit information efficiently across the channel.


Definition

A binary symmetric channel with crossover probability p, denoted by BSCp, is a channel with binary input and binary output and probability of error p. That is, if X is the transmitted
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
and Y the received variable, then the channel is characterized by the
conditional probabilities In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
: :\begin \operatorname X = 0 &= 1 - p \\ \operatorname X = 1 &= p \\ \operatorname X = 0 &= p \\ \operatorname X = 1 &= 1 - p \end It is assumed that 0 \le p \le 1/2. If p > 1/2, then the receiver can swap the output (interpret 1 when it sees 0, and vice versa) and obtain an equivalent channel with crossover probability 1 - p \le 1/2.


Capacity

The channel capacity of the binary symmetric channel, in
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s, is: :\ C_ = 1 - \operatorname H_\text(p), where \operatorname H_\text(p) is the
binary entropy function In information theory, the binary entropy function, denoted \operatorname H(p) or \operatorname H_\text(p), is defined as the entropy of a Bernoulli process with probability p of one of two values. It is a special case of \Eta(X), the entropy fun ...
, defined by: :\operatorname H_\text(x)=x\log_2\frac+(1-x)\log_2\frac :


Noisy-channel coding theorem

Shannon's
noisy-channel coding theorem In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...
gives a result about the rate of information that can be transmitted through a communication channel with arbitrarily low error. We study the particular case of \text_p. The noise e that characterizes \text_ is a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
consisting of n independent random bits (n is defined below) where each random bit is a 1 with probability p and a 0 with probability 1-p. We indicate this by writing "e \in \text_". What this theorem actually implies is, a message when picked from \^k, encoded with a random encoding function E, and sent across a noisy \text_, there is a very high probability of recovering the original message by decoding, if k or in effect the rate of the channel is bounded by the quantity stated in the theorem. The decoding error probability is exponentially small.


Proof

The theorem can be proved directly with a
probabilistic method The probabilistic method is a nonconstructive method, primarily used in combinatorics and pioneered by Paul Erdős, for proving the existence of a prescribed kind of mathematical object. It works by showing that if one randomly chooses objects fr ...
. Consider an encoding function E: \^k \to \^n that is selected at random. This means that for each message m \in \^k, the value E(m) \in \^n is selected at random (with equal probabilities). For a given encoding function E, the decoding function D:\^n \to \^k is specified as follows: given any received codeword y \in \^n, we find the message m\in\^ such that the
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
\Delta(y, E(m)) is as small as possible (with ties broken arbitrarily). (D is called a
maximum likelihood decoding In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, su ...
function.) The proof continues by showing that at least one such choice (E,D) satisfies the conclusion of theorem, by integration over the probabilities. Suppose p and \epsilon are fixed. First we show that, for a fixed m \in \^ and E chosen randomly, the probability of failure over \text_p noise is exponentially small in ''n''. At this point, the proof works for a fixed message m. Next we extend this result to work for all messages m. We achieve this by eliminating half of the codewords from the code with the argument that the proof for the decoding error probability holds for at least half of the codewords. The latter method is called expurgation. This gives the total process the name ''random coding with expurgation''. : :


Converse of Shannon's capacity theorem

The converse of the capacity theorem essentially states that 1 - H(p) is the best rate one can achieve over a binary symmetric channel. Formally the theorem states: The intuition behind the proof is however showing the number of errors to grow rapidly as the rate grows beyond the channel capacity. The idea is the sender generates messages of dimension k, while the channel \text_p introduces transmission errors. When the capacity of the channel is H(p), the number of errors is typically 2^ for a code of block length n. The maximum number of messages is 2^. The output of the channel on the other hand has 2^ possible values. If there is any confusion between any two messages, it is likely that 2^2^ \ge 2^. Hence we would have k \geq \lceil (1 - H(p + \epsilon)n) \rceil, a case we would like to avoid to keep the decoding error probability exponentially small.


Codes

Very recently, a lot of work has been done and is also being done to design explicit error-correcting codes to achieve the capacities of several standard communication channels. The motivation behind designing such codes is to relate the rate of the code with the fraction of errors which it can correct. The approach behind the design of codes which meet the channel capacities of \text or the
binary erasure channel In coding theory and information theory, a binary erasure channel (BEC) is a communications channel model. A transmitter sends a bit (a zero or a one), and the receiver either receives the bit correctly, or with some probability P_e receives a me ...
\text have been to correct a lesser number of errors with a high probability, and to achieve the highest possible rate. Shannon's theorem gives us the best rate which could be achieved over a \text_, but it does not give us an idea of any explicit codes which achieve that rate. In fact such codes are typically constructed to correct only a small fraction of errors with a high probability, but achieve a very good rate. The first such code was due to George D. Forney in 1966. The code is a concatenated code by concatenating two different kinds of codes.


Forney's code

Forney constructed a
concatenated code In coding theory, concatenated codes form a class of error-correcting codes that are derived by combining an inner code and an outer code. They were conceived in 1966 by Dave Forney as a solution to the problem of finding a code that has both expo ...
C^ = C_\text \circ C_\text to achieve the capacity of the noisy-channel coding theorem for \text_p. In his code, * The outer code C_\text is a code of block length N and rate 1-\frac over the field F_, and k = O(\log N). Additionally, we have a decoding algorithm D_\text for C_\text which can correct up to \gamma fraction of worst case errors and runs in t_\text(N) time. * The inner code C_\text is a code of block length n, dimension k, and a rate of 1 - H(p) - \frac. Additionally, we have a decoding algorithm D_\text for C_\text with a decoding error probability of at most \frac over \text_p and runs in t_\text(N) time. For the outer code C_\text, a Reed-Solomon code would have been the first code to have come in mind. However, we would see that the construction of such a code cannot be done in polynomial time. This is why a binary linear code is used for C_\text. For the inner code C_\text we find a
linear code In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as ...
by exhaustively searching from the
linear code In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as ...
of block length n and dimension k, whose rate meets the capacity of \text_p, by the noisy-channel coding theorem. The rate R(C^) = R(C_\text) \times R(C_\text) = (1-\frac) ( 1 - H(p) - \frac ) \geq 1 - H(p)-\epsilon which almost meets the \text_p capacity. We further note that the encoding and decoding of C^ can be done in polynomial time with respect to N. As a matter of fact, encoding C^ takes time O(N^)+O(Nk^) = O(N^). Further, the decoding algorithm described takes time Nt_\text(k) + t_\text(N) = N^ as long as t_\text(N) = N^; and t_\text(k) = 2^.


Decoding error probability

A natural decoding algorithm for C^ is to: * Assume y_^ = D_\text(y_i), \quad i \in (0, N) * Execute D_\text on y^ = (y_1^ \ldots y_N^) Note that each block of code for C_\text is considered a symbol for C_\text. Now since the probability of error at any index i for D_\text is at most \tfrac and the errors in \text_p are independent, the expected number of errors for D_\text is at most \tfrac by linearity of expectation. Now applying
Chernoff bound In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...
, we have bound error probability of more than \gamma N errors occurring to be e^\frac. Since the outer code C_\text can correct at most \gamma N errors, this is the decoding error probability of C^. This when expressed in asymptotic terms, gives us an error probability of 2^. Thus the achieved decoding error probability of C^ is exponentially small as the noisy-channel coding theorem. We have given a general technique to construct C^. For more detailed descriptions on C_\text and C_\text please read the following references. Recently a fe