In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, Luby transform codes (LT codes) are the first class of practical
fountain code In coding theory, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the origi ...
s that are near-optimal
erasure correcting codes. They were invented by
Michael Luby
Michael George Luby is a mathematician and computer scientist, CEO of BitRipple, Senior Research Scientist at the International Computer Science Institute (ICSI), former VP Technology at Qualcomm, co-founder and former Chief Technology Offic ...
in 1998 and published in 2002.
[M.Luby, "LT Codes", The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002.](_blank)
/ref> Like some other fountain code In coding theory, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the origi ...
s, LT codes depend on sparse bipartite graph
In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V ar ...
s to trade reception overhead for encoding and decoding speed. The distinguishing characteristic of LT codes is in employing a particularly simple algorithm based on the exclusive or
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
operation () to encode and decode the message.[The ''exclusive or'' (XOR) operation, symbolized by ⊕, has the very useful property that ''A'' ⊕ ''A'' = 0, where ''A'' is an arbitrary string of ]bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s.
LT codes are ''rateless'' because the encoding algorithm can in principle produce an infinite number of message packets (i.e., the percentage of packets that must be received to decode the message can be arbitrarily small). They are ''erasure correcting codes'' because they can be used to transmit digital data reliably on an erasure channel.
The next generation beyond LT codes are Raptor codes
In computer science, Raptor codes (''rapid tornado''; see Tornado codes) are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an ...
(see for example IETF RFC 5053 or IETF RFC 6330), which have linear time encoding and decoding. Raptor codes are fundamentally based on LT codes, i.e., encoding for Raptor codes uses two encoding stages, where the second stage is LT encoding. Similarly, decoding with Raptor codes primarily relies upon LT decoding, but LT decoding is intermixed with more advanced decoding techniques. The RaptorQ code specified in IETF RFC 6330, which is the most advanced fountain code, has vastly superior decoding probabilities and performance compared to using only an LT code.
Why use an LT code?
The traditional scheme for transferring data across an erasure channel depends on continuous two-way communication.
*The sender encodes and sends a packet of information.
*The receiver attempts to decode the received packet. If it can be decoded, the receiver sends an acknowledgment back to the transmitter. Otherwise, the receiver asks the transmitter to send the packet again.
*This two-way process continues until all the packets in the message have been transferred successfully.
Certain networks, such as ones used for cellular wireless broadcasting, do not have a feedback channel. Applications on these networks still require reliability. Fountain code In coding theory, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the origi ...
s in general, and LT codes in particular, get around this problem by adopting an essentially one-way communication protocol.
*The sender encodes and sends packet after packet of information.
*The receiver evaluates each packet as it is received. If there is an error, the erroneous packet is discarded. Otherwise the packet is saved as a piece of the message.
*Eventually the receiver has enough valid packets to reconstruct the entire message. When the entire message has been received successfully the receiver signals that transmission is complete.
As mentioned above, the RaptorQ code specified in IETF RFC 6330 outperforms an LT code in practice.
LT encoding
The encoding process begins by dividing the uncoded message into ''n'' blocks of roughly equal length. Encoded packets are then produced with the help of a pseudorandom number generator
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
.
*The degree ''d'', 1 ≤ ''d'' ≤ ''n'', of the next packet is chosen at random.
*Exactly ''d'' blocks from the message are randomly chosen.
*If ''M''''i'' is the ''i''th block of the message, the data portion of the next packet is computed as
::
:where are the randomly chosen indices for the ''d'' blocks included in this packet.
*A prefix is appended to the encoded packet, defining how many blocks ''n'' are in the message, how many blocks ''d'' have been exclusive-ored into the data portion of this packet, and the list of indices .
*Finally, some form of error-detecting code (perhaps as simple as a cyclic redundancy check) is applied to the packet, and the packet is transmitted.
This process continues until the receiver signals that the message has been received and successfully decoded.
LT decoding
The decoding process uses the "exclusive or
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
" operation to retrieve the encoded message.
*If the current packet isn't clean, or if it replicates a packet that has already been processed, the current packet is discarded.
*If the current cleanly received packet is of degree ''d'' > 1, it is first processed against all the fully decoded blocks in the message queuing area (as described more fully in the next step), then stored in a buffer area if its reduced degree is greater than 1.
*When a new, clean packet of degree ''d'' = 1 (block ''M''''i'') is received (or the degree of the current packet is reduced to 1 by the preceding step), it is moved to the message queueing area, and then matched against all the packets of degree ''d'' > 1 residing in the buffer. It is exclusive-ored into the data portion of any buffered packet that was encoded using ''M''''i'', the degree of that matching packet is decremented, and the list of indices for that packet is adjusted to reflect the application of ''M''''i''.
*When this process unlocks a block of degree ''d'' = 2 in the buffer, that block is reduced to degree 1 and is in its turn moved to the message queueing area, and then processed against the packets remaining in the buffer.
*When all ''n'' blocks of the message have been moved to the message queueing area, the receiver signals the transmitter that the message has been successfully decoded.
This decoding procedure works because ''A'' ''A'' = 0 for any bit string ''A''. After ''d'' − 1 distinct blocks have been exclusive-ored into a packet of degree ''d'', the original unencoded content of the unmatched block is all that remains. In symbols we have
:
Variations
Several variations of the encoding and decoding processes described above are possible. For instance, instead of prefixing each packet with a list of the actual message block indices , the encoder might simply send a short "key" which served as the seed for the pseudorandom number generator
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
(PRNG) or index table used to construct the list of indices. Since the receiver equipped with the same RNG or index table can reliably recreate the "random" list of indices from this seed, the decoding process can be completed successfully. Alternatively, by combining a simple LT code of low average degree with a robust error-correcting code, a raptor code can be constructed that will outperform an optimized LT code in practice.[Fountain codes](_blank)
by D.J.C. MacKay, first published in IEEE Proc.-Commun., Vol. 152, No. 6, December 2005.
Optimization of LT codes
There is only one parameter that can be used to optimize a straight LT code: the degree distribution function (described as a pseudorandom number generator for the degree ''d'' in the LT encoding section above). In practice the other "random" numbers (the list of indices ) are invariably taken from a uniform distribution on [0, ''n''), where ''n'' is the number of blocks into which the message has been divided.[Optimizing the Degree Distribution of LT Codes with an Importance Sampling Approach](_blank)
by Esa Hyytiä, Tuomas Tirronen, and Jorma Virtamo (2006).
Luby himself discussed the "ideal soliton distribution" defined by
:
This degree distribution theoretically minimizes the expected number of redundant code words that will be sent before the decoding process can be completed. However the ideal soliton distribution does not work well in practice because any fluctuation around the expected behavior makes it likely that at some step in the decoding process there will be no available packet of (reduced) degree 1 so decoding will fail. Furthermore, some of the original blocks will not be xor-ed into any of the transmission packets. Therefore, in practice, a modified distribution, the "robust soliton distribution", is substituted for the ideal distribution. The effect of the modification is, generally, to produce more packets of very small degree (around 1) and fewer packets of degree greater than 1, except for a spike of packets at a fairly large quantity chosen to ensure that all original blocks will be included in some packet.
See also
*Online codes In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message (with high probability). '' ...
*Raptor codes
In computer science, Raptor codes (''rapid tornado''; see Tornado codes) are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an ...
*Tornado codes In coding theory, Tornado codes are a class of erasure codes that support error correction. Tornado codes require a constant C more redundant blocks than the more data-efficient Reed–Solomon erasure codes, but are much faster to generate and can ...
Notes and references
External links
"Implementation of Luby transform Code in C#"
"Introduction to fountain codes: LT Codes with Python"
Coding theory