Fountain codes
   HOME

TheInfoList



OR:

In
coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied ...
, fountain codes (also known as rateless erasure codes) are a class of
erasure codes In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of ''k'' symbols into a longer message (code word) with ''n'' symbols such that the ...
with the property that a potentially limitless
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
of encoding symbols can be generated from a given set of source symbols such that the original source symbols can ideally be recovered from any subset of the encoding symbols of size equal to or only slightly larger than the number of source symbols. The term ''fountain'' or ''rateless'' refers to the fact that these codes do not exhibit a fixed
code rate In telecommunication and information theory, the code rate (or information rateHuffman, W. Cary, and Pless, Vera, ''Fundamentals of Error-Correcting Codes'', Cambridge, 2003.) of a forward error correction code is the proportion of the data-st ...
. A fountain code is optimal if the original ''k'' source symbols can be recovered from any ''k'' successfully received encoding symbols (i.e., excluding those that were erased). Fountain codes are known that have efficient encoding and decoding
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s and that allow the recovery of the original ''k'' source symbols from any ''k’'' of the encoding symbols with high probability, where ''k’'' is just slightly larger than ''k''.
LT codes In computer science, Luby transform codes (LT codes) are the first class of practical fountain codes that are near-optimal erasure correcting codes. They were invented by Michael Luby in 1998 and published in 2002. Like some other fountain codes, L ...
were the first practical realization of fountain codes.
Raptor codes In computer science, Raptor codes (''rapid tornado''; see Tornado codes) are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an ...
and
online codes In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message (with high probability). '' ...
were subsequently introduced, and achieve linear time encoding and decoding
complexity Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, leading to nonlinearity, randomness, collective dynamics, hierarchy, and emergence. The term is generally used to ch ...
through a pre-coding stage of the input symbols.


Applications

Fountain codes are flexibly applicable at a fixed
code rate In telecommunication and information theory, the code rate (or information rateHuffman, W. Cary, and Pless, Vera, ''Fundamentals of Error-Correcting Codes'', Cambridge, 2003.) of a forward error correction code is the proportion of the data-st ...
, or where a fixed code rate cannot be determined a priori, and where efficient encoding and decoding of large amounts of data is required. One example is that of a data carousel, where some large file is continuously broadcast to a set of receivers. Using a fixed-rate erasure code, a receiver missing a source symbol (due to a transmission error) faces the coupon collector's problem: it must successfully receive an encoding symbol which it does not already have. This problem becomes much more apparent when using a traditional short-length erasure code, as the file must be split into several blocks, each being separately encoded: the receiver must now collect the required number of missing encoding symbols for ''each'' block. Using a fountain code, it suffices for a receiver to retrieve ''any'' subset of encoding symbols of size slightly larger than the set of source symbols. (In practice, the broadcast is typically scheduled for a fixed period of time by an operator based on characteristics of the network and receivers and desired delivery reliability, and thus the fountain code is used at a code rate that is determined dynamically at the time when the file is scheduled to be broadcast.) Another application is that of
hybrid ARQ Hybrid automatic repeat request (hybrid ARQ or HARQ) is a combination of high-rate forward error correction (FEC) and automatic repeat request (ARQ) error-control. In standard ARQ, redundant bits are added to data to be transmitted using an er ...
in
reliable multicast A reliable multicast is any computer networking protocol that provides a '' reliable'' sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer. Overview Multicast is a n ...
scenarios: parity information that is requested by a receiver can potentially be useful for ''all'' receivers in the multicast group.


Fountain codes in standards

Raptor code In computer science, Raptor codes (''rapid tornado''; see Tornado codes) are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as a ...
s are the most efficient fountain codes at this time, having very efficient linear time encoding and decoding algorithms, and requiring only a small constant number of
XOR Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false). It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
operations per generated symbol for both encoding and decoding.
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...
RFC 5053 specifies in detail a systematic Raptor code, which has been adopted into multiple standards beyond the IETF, such as within the 3GPP
MBMS Multimedia Broadcast Multicast Services (MBMS) is a point-to-multipoint interface specification for existing 3GPP cellular networks, which is designed to provide efficient delivery of broadcast and multicast services, both within a cell as well ...
standard for broadcast file delivery and streaming services, the
DVB-H DVB-H (Digital Video Broadcasting - Handheld) is one of three prevalent mobile TV formats. It is a technical specification for bringing broadcast services to mobile handsets. DVB-H was formally adopted as ETSI standard EN 302 304 in November 2 ...
IPDC standard for delivering IP services over
DVB Digital Video Broadcasting (DVB) is a set of international open standards for digital television. DVB standards are maintained by the DVB Project, an international industry consortium, and are published by a Joint Technical Committee (JTC) o ...
networks, and
DVB-IPTV IP over DVB implies that Internet Protocol datagrams are distributed using some digital television system, for example DVB-H, DVB-T, DVB-S, DVB-C or their successors like DVB-S2. This may take the form of IP over MPEG, where the datagrams are tran ...
for delivering commercial TV services over an IP network. This code can be used with up to 8,192 source symbols in a source block, and a total of up to 65,536 encoded symbols generated for a source block. This code has an average relative reception overhead of 0.2% when applied to source blocks with 1,000 source symbols, and has a relative reception overhead of less than 2% with probability 99.9999%. The relative reception overhead is defined as the extra encoding data required beyond the length of the source data to recover the original source data, measured as a percentage of the size of the source data. For example, if the relative reception overhead is 0.2%, then this means that source data of size 1 
megabyte The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...
can be recovered from 1.002 megabytes of encoding data. A more advanced Raptor code with greater flexibility and improved reception overhead, called RaptorQ, has been specified in
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...
RFC 6330. The specified RaptorQ code can be used with up to 56,403 source symbols in a source block, and a total of up to 16,777,216 encoded symbols generated for a source block. This code is able to recover a source block from any set of encoded symbols equal to the number of source symbols in the source block with high probability, and in rare cases from slightly more than the number of source symbols in the source block. The RaptorQ code is an integral part of the ROUTE instantiation specified in ATSC A-331 (ATSC 3.0)


Fountain codes for data storage

Erasure codes are used in data storage applications due to massive savings on the number of storage units for a given level of redundancy and reliability. The requirements of erasure code design for data storage, particularly for distributed storage applications, might be quite different relative to communication or data streaming scenarios. One of the requirements of coding for data storage systems is the systematic form, i.e., the original message symbols are part of the coded symbols. Systematic form enables reading off the message symbols without decoding from a storage unit. In addition, since the bandwidth and communication load between storage nodes can be a bottleneck, codes that allow minimum communication are very beneficial particularly when a node fails and a system reconstruction is needed to achieve the initial level of redundancy. In that respect, fountain codes are expected to allow efficient repair process in case of a failure: When a single encoded symbol is lost, it should not require too much communication and computation among other encoded symbols in order to resurrect the lost symbol. In fact, repair latency might sometimes be more important than storage space savings. Repairable fountain codes are projected to address fountain code design objectives for storage systems. A detailed survey about fountain codes and their applications can be found at. A different approach to distributed storage using fountain codes has been proposed in Liquid Cloud Storage. Liquid Cloud Storage is based on using a large erasure code such as the RaptorQ code specified in
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...
RFC 6330 (which provides significantly better data protection than other systems), using a background repair process (which significantly reduces the repair bandwidth requirements compared to other systems), and using a stream data organization (which allows fast access to data even when not all encoded symbols are available).


See also

*
Online codes In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message (with high probability). '' ...
*
Linear network coding In computer networking, linear network coding is a program in which intermediate nodes transmit data from source nodes to sink nodes by means of linear combinations. Linear network coding may be used to improve a network's throughput, efficiency, ...
*
Secret sharing Secret sharing (also called secret splitting) refers to methods for distributing a secret among a group, in such a way that no individual holds any intelligible information about the secret, but when a sufficient number of individuals combine t ...
* Tornado codes, the precursor to ''fountain codes''


Notes


References

* * * . * * * . * . {{DEFAULTSORT:Fountain Code Coding theory Capacity-approaching codes