Locally Recoverable Code
   HOME

TheInfoList



OR:

Locally recoverable codes are a family of
error correction code In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The centra ...
s that were introduced first by D. S. Papailiopoulos and A. G. Dimakis and have been widely studied in
information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
due to their applications related to distributive and
cloud storage Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers (so ...
systems. An , k, d, r LRC is an
, k, d The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
linear code In coding theory, a linear code is an error-correcting code for which any linear combination of Code word (communication), codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although t ...
such that there is a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-orie ...
f_ that takes as input i and a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of r other
coordinates In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the Position (geometry), position of the Point (geometry), points or other geometric elements on a manifold such as ...
of a codeword c = (c_, \ldots, c_) \in C different from c_, and outputs c_.


Overview

Erasure-correcting codes, or simply erasure codes, for distributed and
cloud storage Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers (so ...
systems, are becoming more and more popular as a result of the present spike in demand for
cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...
and storage services. This has inspired researchers in the fields of
information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
and
coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and computer data storage, data sto ...
to investigate new facets of codes that are specifically suited for use with storage systems. It is well-known that LRC is a
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
that needs only a limited
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of other symbols to be accessed in order to restore every symbol in a codeword. This idea is very important for distributed and
cloud storage Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers (so ...
systems since the most common error case is when one storage node fails (erasure). The main objective is to recover as much
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
as possible from the fewest additional storage nodes in order to restore the node. Hence, Locally Recoverable Codes are crucial for such systems. The following
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
of the LRC follows from the description above: an , k, r/math>-Locally Recoverable Code (LRC) of length n is a
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
that produces an n-symbol codeword from k information symbols, and for any symbol of the codeword, there exist at most r other symbols such that the value of the symbol can be recovered from them. The locality
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
satisfies 1 \leq r \leq k because the entire codeword can be found by accessing k symbols other than the erased symbol. Furthermore, Locally Recoverable Codes, having the minimum
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
d, can recover d-1 erasures.


Definition

Let C be a
, k, d The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
linear code In coding theory, a linear code is an error-correcting code for which any linear combination of Code word (communication), codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although t ...
. For i \in \, let us denote by r_ the minimum number of other
coordinates In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the Position (geometry), position of the Point (geometry), points or other geometric elements on a manifold such as ...
we have to look at to recover an erasure in
coordinate In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
i. The number r_i is said to be the ''locality of the i-th
coordinate In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
'' of the code. The ''locality'' of the code is defined as
r = \max\.
An , k, d, r ''locally recoverable code'' (LRC) is an
, k, d The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
q
linear code In coding theory, a linear code is an error-correcting code for which any linear combination of Code word (communication), codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although t ...
C \in \mathbb F_q^n with locality r. Let C be an
, k, d The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
q-locally recoverable code. Then an erased component can be recovered linearly, i.e. for every i \in \, the space of
linear equation In mathematics, a linear equation is an equation that may be put in the form a_1x_1+\ldots+a_nx_n+b=0, where x_1,\ldots,x_n are the variables (or unknowns), and b,a_1,\ldots,a_n are the coefficients, which are often real numbers. The coeffici ...
s of the code contains elements of the form x_i = f(x_, \ldots, x_), where i_j \neq i.


Optimal locally recoverable codes

Theorem Let n = (r+1)s and let C be an
, k, d The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
-locally recoverable code having s disjoint locality sets of size r+1. Then
d \leq n - k - \left\lceil\frac\right\rceil + 2.
An , k, d, r-LRC C is said to be optimal if the minimum
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
of C satisfies
d = n - k - \left\lceil\frac\right\rceil + 2 .


Tamo–Barg codes

Let f \in \mathbb F_ /math> be a
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
and let \ell be a positive
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
. Then f is said to be (r, \ell)-good if :• f has degree r+1, :• there exist distinct
subset In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
s A_, \ldots, A_ of \mathbb F_ such that ::– for any i \in \, f(A_) = \ for some t_ \in \mathbb F_ , i.e., f is constant on A_, ::– \# A_ = r + 1, ::– A_ \cap A_ = \varnothing for any i \neq j. We say that is a splitting covering for f.


Tamo–Barg construction

The Tamo–Barg construction utilizes good polynomials. :• Suppose that a (r, \ell)-good polynomial f(x) over \mathbb F_ is given with splitting covering i \in \. :• Let s \leq \ell-1 be a positive
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
. :• Consider the following \mathbb F_q-
vector space In mathematics and physics, a vector space (also called a linear space) is a set (mathematics), set whose elements, often called vector (mathematics and physics), ''vectors'', can be added together and multiplied ("scaled") by numbers called sc ...
of
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
s V = \left\. :• Let T = \bigcup_^\ell A_i. :• The code \ is an ((r+1)\ell,(s+1)r,d,r)-optimal locally coverable code, where \operatorname_T denotes evaluation of g at all points in the
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
T.


Parameters of Tamo–Barg codes

:• Length. The length is the number of evaluation points. Because the sets A_i are disjoint for i \in \, the length of the code is , T, = (r+1)\ell. :• Dimension. The
dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coo ...
of the code is (s+1)r, for s\ell-1, as each g_i has degree at most \deg(f(x))-2, covering a
vector space In mathematics and physics, a vector space (also called a linear space) is a set (mathematics), set whose elements, often called vector (mathematics and physics), ''vectors'', can be added together and multiplied ("scaled") by numbers called sc ...
of
dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coo ...
\deg(f(x))-1=r, and by the construction of V, there are s+1 distinct g_i. :• Distance. The
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
is given by the fact that V \subseteq \mathbb F_q , where k = r + 1 - 2 + s(r+1), and the obtained code is the Reed-Solomon code of degree at most k, so the minimum
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
equals (r+1)\ell-((r+1)-2+s(r+1)). :• Locality. After the erasure of the single component, the evaluation at a_i \in A_i, where , A_i, =r+1, is unknown, but the evaluations for all other a \in A_i are known, so at most r evaluations are needed to uniquely determine the erased component, which gives us the locality of r. : To see this, g restricted to A_j can be described by a
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
h of degree at most \deg(f(x))-2 = r + 1 - 2 = r - 1 thanks to the form of the elements in V (i.e., thanks to the fact that f is constant on A_j, and the g_i's have degree at most \deg(f(x))-2). On the other hand , A_j \backslash \, = r, and r evaluations uniquely determine a
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
of degree r-1. Therefore h can be constructed and evaluated at a_j to recover g(a_j).


Example of Tamo–Barg construction

We will use x^5 \in \mathbb F_ /math> to construct 5, 8, 6, 4/math>-LRC. Notice that the degree of this
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
is 5, and it is constant on A_i for i \in \, where A_1 = \, A_2 = 2A_1, A_3 = 3A_1, A_4 = 4A_1, A_5 = 5A_1, A_6 = 6A_1, A_7 = 11A_1, and A_8 = 15A_1: A_1^5 = \, A_2^5 = \, A_3^5 = \, A_4^5 = \, A_5^5 = \, A_6^5 = \, A_7^5 = \, A_8^5 = \. Hence, x^5 is a (4,8)-good polynomial over \mathbb F_ by the definition. Now, we will use this
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
to construct a code of
dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coo ...
k = 8 and length n = 15 over \mathbb F_. The locality of this code is 4, which will allow us to recover a single
server Server may refer to: Computing *Server (computing), a computer program or a device that provides requested information for other programs or devices, called clients. Role * Waiting staff, those who work at a restaurant or a bar attending custome ...
failure by looking at the information contained in at most 4 other servers. Next, let us define the encoding
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
: f_a(x) = \sum_^ f_i(x)x^i, where f_i(x) = \sum_^ a_g(x)^j. So, f_a(x) = a_ + a_x^5 + a_x + a_x^6 + a_x^2 + a_x^7 + a_x^3 + a_x^8. Thus, we can use the obtained encoding
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
if we take our
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
to encode as the
row vector In linear algebra, a column vector with elements is an m \times 1 matrix consisting of a single column of entries, for example, \boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end. Similarly, a row vector is a 1 \times n matrix for some , co ...
a = (a_, a_, a_, a_, a_, a_, a_, a_). Encoding the
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
m to a length 15 message
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
c by multiplying m by the
generator matrix In coding theory, a generator matrix is a matrix whose rows form a basis for a linear code. The codewords are all of the linear combinations of the rows of this matrix, that is, the linear code is the row space of its generator matrix. Terminolo ...
:
G = \begin 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1 & 32 & 32 & 32 & 32 & 32 & 38 & 38 & 38 & 38 & 38\\ 1 & 10 & 16 & 18 & 37 & 2 & 20 & 32 & 33 & 36 & 3 & 7 & 13 & 29 & 30\\ 1 & 10 & 16 & 18 & 37 & 23 & 25 & 40 & 31 & 4 & 32 & 20 & 2 & 36 & 33\\ 1 & 18 & 10 & 37 & 16 & 4 & 31 & 40 & 23 & 25 & 9 & 8 & 5 & 21 & 39\\ 1 & 18 & 10 & 37 & 16 & 5 & 8 & 9 & 39 & 21 & 14 & 17 & 26 & 19 & 6\\ 1 & 16 & 37 & 10 & 18 & 8 & 5 & 9 & 21 & 39 & 27 & 15 & 24 & 35 & 22\\ 1 & 16 & 37 & 10 & 18 & 10 & 37 & 1 & 16 & 18 & 1 & 37 & 10 & 18 & 16 \end .
For example, the encoding of information
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
m = (1, 1, 1, 1, 1, 1, 1, 1) gives the codeword c = mG = (8, 8, 5, 9, 21, 3, 36, 31, 32, 12, 2, 20, 37, 33, 21). Observe that we constructed an optimal LRC; therefore, using the
Singleton bound In coding theory, the Singleton bound, named after the American mathematician Richard Collom Singleton (1928–2007), is a relatively crude upper bound on the size of an arbitrary block code C with block length n, size M and minimum distance d. It ...
, we have that the
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
of this code is d = n - k - \left\lceil\frac\right\rceil + 2 = 15 - 8 - 2 + 2 = 7. Thus, we can recover any 6 erasures from our codeword by looking at no more than 8 other components.


Locally recoverable codes with availability

A code C has all-symbol locality r and availability t if every code symbol can be recovered from t disjoint repair sets of other symbols, each
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of
size Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to three geometrical measures: length, area, or volume. Length can be generalized ...
at most r symbols. Such codes are called (r,t)_a-LRC. Theorem The minimum
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
of ,k,dq-LRC having locality r and availability t satisfies the
upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is every element of . Dually, a lower bound or minorant of is defined to be an element of that is less ...
d \leq n - \sum_^t \left\lfloor\frac\right\rfloor.
If the code is
systematic Systematic may refer to: Science * Short for systematic error * Systematic fault * Systematic bias, errors that are introduced by an inaccuracy inherent to the system Economy * Systematic trading, a way of defining trade goals, risk control ...
and locality and availability apply only to its information symbols, then the code has information locality r and availability t, and is called (r,t)_i-LRC. Theorem The minimum
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
d of an ,k,dq linear (r,t)_i-LRC satisfies the
upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is every element of . Dually, a lower bound or minorant of is defined to be an element of that is less ...
d \leq n-k-\left\lceil\frac\right\rceil+2.


References

{{reflist Cryptography Information theory Error detection and correction