coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied ...

, a linear code is an error-correcting code for which any linear combination of

codewords In communication, a code word is an element of a standardized code or protocol. Each code word is assembled in accordance with the specific rules of the code and assigned a unique meaning. Code words are typically used for reasons of reliability, ...

is also a codeword. Linear codes are traditionally partitioned into

block code In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract defini ...

s and

convolutional code In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of t ...

s, although

turbo code In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closel ...

s can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes (cf.

syndrome decoding In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, su ...

). Linear codes are used in forward error correction and are applied in methods for transmitting symbols (e.g.,

bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...

s) on a

communications channel A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for informat ...

so that, if errors occur in the communication, some errors can be corrected or detected by the recipient of a message block. The codewords in a linear block code are blocks of symbols that are encoded using more symbols than the original value to be sent. A linear code of length ''n'' transmits blocks containing ''n'' symbols. For example, the ,4,3 Hamming code is a linear binary code which represents 4-bit messages using 7-bit codewords. Two distinct codewords differ in at least three bits. As a consequence, up to two errors per codeword can be detected while a single error can be corrected. This code contains 2⁴=16 codewords.

Definition and parameters

A linear code of length ''n'' and dimension ''k'' is a linear subspace ''C'' with

dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coor ...

''k'' of the

vector space In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called ''vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but can ...

\mathbb_q^n

where

\mathbb_q

is the

finite field In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subtr ...

with ''q'' elements. Such a code is called a ''q''-ary code. If ''q'' = 2 or ''q'' = 3, the code is described as a binary code, or a ternary code respectively. The vectors in ''C'' are called ''codewords''. The size of a code is the number of codewords and equals ''q''^''k''. The weight of a codeword is the number of its elements that are nonzero and the distance between two codewords is the

Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...

between them, that is, the number of elements in which they differ. The distance ''d'' of the linear code is the minimum weight of its nonzero codewords, or equivalently, the minimum distance between distinct codewords. A linear code of length ''n'', dimension ''k'', and distance ''d'' is called an 'n'',''k'',''d''code (or, more precisely,

,k,d q

code). We want to give

\mathbb_q^n

the standard basis because each coordinate represents a "bit" that is transmitted across a "noisy channel" with some small probability of transmission error (a binary symmetric channel). If some other basis is used then this model cannot be used and the Hamming metric does not measure the number of errors in transmission, as we want it to.

Generator and check matrices

As a linear subspace of

\mathbb_q^n

, the entire code ''C'' (which may be very large) may be represented as the

span Span may refer to: Science, technology and engineering * Span (unit), the width of a human hand * Span (engineering), a section between two intermediate supports * Wingspan, the distance between the wingtips of a bird or aircraft * Sorbitan ester ...

of a set of

k

codewords (known as a

basis Basis may refer to: Finance and accounting * Adjusted basis, the net cost of an asset after adjusting for various tax-related items *Basis point, 0.01%, often used in the context of interest rates * Basis trading, a trading strategy consisting ...

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices ...

). These basis codewords are often collated in the rows of a matrix G known as a generating matrix for the code ''C''. When G has the block matrix form

\boldsymbol = P /math>, where I_k denotes the k \times k identity matrix and P is some k \times (n-k) matrix, then we say G is in standard form.

A matrix ''H'' representing a linear function \phi : \mathbb_q^n\to \mathbb_q^whose

kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...

is ''C'' is called a

check matrix In coding theory, a parity-check matrix of a linear block code ''C'' is a matrix which describes the linear relations that the components of a codeword must satisfy. It can be used to decide whether a particular vector is a codeword and is also used ...

of ''C'' (or sometimes a parity check matrix). Equivalently, ''H'' is a matrix whose

null space In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the Domain of a function, domain of the map which is mapped to the zero vector. That is, given a linear map between two vector space ...

is ''C''. If ''C'' is a code with a generating matrix ''G'' in standard form,

\boldsymbol = P /math>, then \boldsymbol = I_/math> is a check matrix for C.   The code generated by ''H'' is called the dual code of C. It can be verified that G is a k \times n matrix, while H is a (n-k) \times n matrix.

Linearity guarantees that the minimum

''d'' between a codeword ''c''₀ and any of the other codewords ''c'' ≠ ''c''₀ is independent of ''c''₀. This follows from the property that the difference ''c'' − ''c''₀ of two codewords in ''C'' is also a codeword (i.e., an element of the subspace ''C''), and the property that ''d''(''c'', c₀) = ''d''(''c'' − ''c''₀, 0). These properties imply that :

\min_d(c,c_0)=\min_d(c-c_0, 0)=\min_d(c, 0)=d.

In other words, in order to find out the minimum distance between the codewords of a linear code, one would only need to look at the non-zero codewords. The non-zero codeword with the smallest weight has then the minimum distance to the zero codeword, and hence determines the minimum distance of the code. The distance ''d'' of a linear code ''C'' also equals the minimum number of linearly dependent columns of the check matrix ''H''. ''Proof:'' Because

\boldsymbol \cdot \boldsymbol^T = \boldsymbol

, which is equivalent to

\sum_^n (c_i \cdot \boldsymbol) = \boldsymbol

, where

\boldsymbol

is the

i^

column of

\boldsymbol

. Remove those items with

c_i=0

, those

\boldsymbol

with

c_i \neq 0

are linearly dependent. Therefore,

d

is at least the minimum number of linearly dependent columns. On another hand, consider the minimum set of linearly dependent columns

\

where

S

is the column index set.

\sum_^n (c_i \cdot \boldsymbol) = \sum_ (c_j \cdot \boldsymbol) + \sum_ (c_j \cdot \boldsymbol) =  \boldsymbol

. Now consider the vector

\boldsymbol

such that

c_j'=0

j \notin S

. Note

\boldsymbol \in C

because

\boldsymbol \cdot \boldsymbol^T = \boldsymbol

. Therefore, we have

d \le wt(\boldsymbol)

, which is the minimum number of linearly dependent columns in

\boldsymbol

. The claimed property is therefore proven.

Example: Hamming codes

As the first class of linear codes developed for error correction purpose, ''Hamming codes'' have been widely used in digital communication systems. For any positive integer

r \ge 2

, there exists a

^r-1, 2^r-r-1,3 2

Hamming code. Since

d=3

, this Hamming code can correct a 1-bit error. Example : The linear block code with the following generator matrix and parity check matrix is a

,4,3 2

Hamming code. :

\boldsymbol=\begin 1\ 0\ 0\ 0\ 1\ 1\ 0 \\ 0\ 1\ 0\ 0\ 0\ 1\ 1 \\ 0\ 0\ 1\ 0\ 1\ 1\ 1 \\ 0\ 0\ 0\ 1\ 1\ 0\ 1 \end   ,

\boldsymbol=\begin 1\ 0\ 1\ 1\ 1\ 0\ 0 \\ 1\ 1\ 1\ 0\ 0\ 1\ 0 \\ 0\ 1\ 1\ 1\ 0\ 0\ 1  \end

Example: Hadamard codes

Hadamard code The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mar ...

is a

^r, r, 2^2

linear code and is capable of correcting many errors. Hadamard code could be constructed column by column : the

i^

column is the bits of the binary representation of integer

i

, as shown in the following example. Hadamard code has minimum distance

2^

and therefore can correct

2^-1

errors. Example: The linear block code with the following generator matrix is a

,3,4 2

Hadamard code:

\boldsymbol_=\begin 0\ 0\ 0\ 0\ 1\ 1\ 1\ 1\\ 0\ 0\ 1\ 1\ 0\ 0\ 1\ 1\\ 0\ 1\ 0\ 1\ 0\ 1\ 0\  1\end

is a special case of

Reed–Muller code Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in ...

. If we take the first column (the all-zero column) out from

\boldsymbol_

, we get

,3,4 2

''simplex code'', which is the ''dual code '' of Hamming code.

Nearest neighbor algorithm

The parameter d is closely related to the error correcting ability of the code. The following construction/algorithm illustrates this (called the nearest neighbor decoding algorithm): Input: A ''received vector'' v in

\mathbb_q^n

. Output: A codeword

w

C

closest to

v

, if any. *Starting with

t=0

, repeat the following two steps. *Enumerate the elements of the ball of (Hamming) radius

t

around the received word

v

, denoted

B_t(v)

. **For each

w

B_t(v)

, check if

w

C

. If so, return

w

as the solution. *Increment

t

. Fail only when

t > (d - 1)/2

so enumeration is complete and no solution has been found. We say that a linear

C

t

-error correcting if there is at most one codeword in

B_t(v)

, for each

v

\mathbb_q^n

Popular notation

Codes in general are often denoted by the letter ''C'', and a code of length ''n'' and of

rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...

''k'' (i.e., having ''k'' code words in its basis and ''k'' rows in its ''generating matrix'') is generally referred to as an (''n'', ''k'') code. Linear block codes are frequently denoted as 'n'', ''k'', ''d''codes, where ''d'' refers to the code's minimum Hamming distance between any two code words. (The 'n'', ''k'', ''d''notation should not be confused with the (''n'', ''M'', ''d'') notation used to denote a ''non-linear'' code of length ''n'', size ''M'' (i.e., having ''M'' code words), and minimum Hamming distance ''d''.)

Singleton bound

''Lemma'' (

Singleton bound In coding theory, the Singleton bound, named after Richard Collom Singleton, is a relatively crude upper bound on the size of an arbitrary block code C with block length n, size M and minimum distance d. It is also known as the Joshibound. proved b ...

): Every linear ,k,dcode C satisfies

k+d \leq n+1

. A code C whose parameters satisfy k+d=n+1 is called maximum distance separable or MDS. Such codes, when they exist, are in some sense best possible. If C₁ and C₂ are two codes of length n and if there is a permutation p in the

symmetric group In abstract algebra, the symmetric group defined over any set is the group whose elements are all the bijections from the set to itself, and whose group operation is the composition of functions. In particular, the finite symmetric group ...

S_n for which (c₁,...,c_n) in C₁ if and only if (c_p(1),...,c_p(n)) in C₂, then we say C₁ and C₂ are permutation equivalent. In more generality, if there is an

n\times n

monomial matrix In mathematics, a generalized permutation matrix (or monomial matrix) is a matrix with the same nonzero pattern as a permutation matrix, i.e. there is exactly one nonzero entry in each row and each column. Unlike a permutation matrix, where the n ...

M\colon \mathbb_q^n \to \mathbb_q^n

which sends C₁ isomorphically to C₂ then we say C₁ and C₂ are equivalent. ''Lemma'': Any linear code is permutation equivalent to a code which is in standard form.

Bonisoli's theorem

A code is defined to be equidistant if and only if there exists some constant ''d'' such that the distance between any two of the code's distinct codewords is equal to ''d''. In 1984 Arrigo Bonisoli determined the structure of linear one-weight codes over finite fields and proved that every equidistant linear code is a sequence of dual Hamming codes.

Examples

Some examples of linear codes include: *

Repetition code In coding theory, the repetition code is one of the most basic error-correcting codes. In order to transmit a message over a noisy channel that may corrupt the transmission in a few places, the idea of the repetition code is to just repeat the me ...

s *

Parity code A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes ...

s *

Cyclic code In coding theory, a cyclic code is a block code, where the circular shifts of each codeword gives another word that belongs to the code. They are error-correcting codes that have algebraic properties that are convenient for efficient error detecti ...

s * Hamming codes * Golay code, both the

binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ...

and

ternary Ternary (from Latin ''ternarius'') or trinary is an adjective meaning "composed of three items". It can refer to: Mathematics and logic * Ternary numeral system, a base-3 counting system ** Balanced ternary, a positional numeral system, useful ...

versions *

Polynomial code In coding theory, a polynomial code is a type of linear code whose set of valid code words consists of those polynomials (usually of some fixed length) that are divisible by a given fixed polynomial (of shorter length, called the ''generator polyno ...

s, of which

BCH code In coding theory, the Bose–Chaudhuri–Hocquenghem codes (BCH codes) form a class of cyclic error-correcting codes that are constructed using polynomials over a finite field (also called ''Galois field''). BCH codes were invented in 1959 ...

s are an example * Reed–Solomon codes *

s *

Goppa code In mathematics, an algebraic geometric code (AG-code), otherwise known as a Goppa code, is a general type of linear code constructed by using an algebraic curve X over a finite field \mathbb_q. Such codes were introduced by Valerii Denisovich Go ...

s *

Low-density parity-check codes In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph (subclass of the b ...

* Expander codes *

Multidimensional parity-check code A multidimensional parity-check code (MDPC) is a simple type of error correcting code that operates by arranging the message into a multidimensional grid, and calculating a parity digit for each row and column. In general, an ''n''-dimensional ...

s *

Toric code The toric code is a topological quantum error correcting code, and an example of a stabilizer code, defined on a two-dimensional spin lattice. It is the simplest and most well studied of the quantum double models. It is also the simplest example o ...

s *

Turbo code In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closel ...

Generalization

Hamming space In statistics and coding theory, a Hamming space (named after American mathematician Richard Hamming) is usually the set of all 2^N binary strings of length ''N''. It is used in the theory of coding signals and transmission. More generally, a Ham ...

s over non-field alphabets have also been considered, especially over

finite ring In mathematics, more specifically abstract algebra, a finite ring is a ring that has a finite number of elements. Every finite field is an example of a finite ring, and the additive part of every finite ring is an example of an abelian finite grou ...

s, most notably Galois rings over Z₄. This gives rise to

module Module, modular and modularity may refer to the concept of modularity. They may also refer to: Computing and engineering * Modular design, the engineering discipline of designing complex devices using separately designed sub-components * Mo ...

s instead of vector spaces and ring-linear codes (identified with

submodule In mathematics, a module is a generalization of the notion of vector space in which the field of scalars is replaced by a ring. The concept of ''module'' generalizes also the notion of abelian group, since the abelian groups are exactly the mo ...

s) instead of linear codes. The typical metric used in this case the

Lee distance In coding theory, the Lee distance is a distance between two strings x_1 x_2 \dots x_n and y_1 y_2 \dots y_n of equal length ''n'' over the ''q''-ary alphabet of size . It is a metric defined as \sum_^n \min(, x_i - y_i, ,\, q - , x_i - y_i, ). I ...

. There exist a

Gray isometry The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit (binary digit). For example, the representat ...

between

\mathbb_2^

(i.e. GF(2^2m)) with the Hamming distance and

\mathbb_4^m

(also denoted as GR(4,m)) with the Lee distance; its main attraction is that it establishes a correspondence between some "good" codes that are not linear over

\mathbb_2^

as images of ring-linear codes from

\mathbb_4^m

. More recently, some authors have referred to such codes over rings simply as linear codes as well.

References

Bibliography

* Chapter 5 contains a more gentle introduction (than this article) to the subject of linear codes.

External links

''q''-ary code generator program

Code Tables: Bounds on the parameters of various types of codes
''IAKS, Fakultät für Informatik, Universität Karlsruhe (TH)]''. Online, up to date table of the optimal binary codes, includes non-binary codes.
The database of Z4 codes
Online, up to date database of optimal Z4 codes. {{DEFAULTSORT:Linear Code Coding theory Finite fields