HOME

TheInfoList




Golomb coding is a
lossless data compression Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contrast, lossy compression permits reconstruction only of an approximation of the original d ...
method using a family of
data compression In signal processing Signal processing is an electrical engineering Electrical engineering is an engineering discipline concerned with the study, design, and application of equipment, devices, and systems which use electricity, electro ...
codes invented by Solomon W. Golomb in the 1960s. Alphabets following a
geometric distribution In probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expres ...
will have a Golomb code as an optimal
prefix codeA prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix (computer science), prefix (initial segment) of any other code word in the ...
, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.


Rice coding

Rice coding (invented by Robert F. Rice) denotes using a subset of the family of Golomb codes to produce a simpler (but possibly suboptimal) prefix code. Rice used this set of codes in an
adaptive coding Adaptive coding refers to variants of entropy encoding In information theory an entropy encoding is a Lossless compression , lossless data compression scheme that is independent of the specific characteristics of the medium. One of the main types ...
scheme; "Rice coding" can refer either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable parameter that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes Rice codes convenient for use on a computer since multiplication and division by 2 can be implemented more efficiently in
binary arithmetic In mathematics and digital electronics Digital electronics is a field of electronics Electronics comprises the physics, engineering, technology and applications that deal with the emission, flow and control of electrons in vacuum and matter ...
. Rice was motivated to propose this simpler subset due to the fact that geometric distributions are often varying with time, not precisely known, or both, so selecting the seemingly optimal code might not be very advantageous. Rice coding is used as the
entropy encoding In information theory, an entropy coding (or entropy encoding) is a lossless compression , lossless data compression scheme that is independent of the specific characteristics of the medium. One of the main types of entropy coding creates and assi ...
stage in a number of lossless
image compression Image compression is a type of data compression In signal processing Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as audio signal processing, sound, image proc ...
and
audio data compression In signal processing Signal processing is an electrical engineering Electrical engineering is an engineering discipline concerned with the study, design, and application of equipment, devices, and systems which use electricity, electro ...
methods.


Overview


Construction of codes

Golomb coding uses a tunable parameter to divide an input value into two parts: , the result of a division by , and , the remainder. The quotient is sent in
unary codingUnary coding, or the unary numeral system The unary numeral system is the simplest numeral system to represent natural numbers: to represent a number ''N'', a symbol representing 1 is repeated ''N'' times. In the unary system, the number 0 (zero) ...
, followed by the remainder in truncated binary encoding. When M=1, Golomb coding is equivalent to unary coding. Golomb–Rice codes can be thought of as codes that indicate a number by the position of the ''bin'' (), and the ''offset'' within the bin (). The example figure shows the position and offset for the encoding of integer using Golomb–Rice parameter , with source probabilities following a geometric distribution with . Formally, the two parts are given by the following expression, where is the nonnegative integer being encoded: :q = \left \lfloor \frac \right \rfloor and :r = x - qM. Both and will be encoded using variable numbers of bits: by a unary code, and by bits for Rice code, or a choice between and bits for Golomb code (i.e. is not a power of 2), with b = \lfloor\log_2(M)\rfloor. If r < 2^ - M, then use bits to encode ; otherwise, use +1 bits to encode . Clearly, b=\log_2(M) if is a power of 2 and we can encode all values of with bits. The integer treated by Golomb was the run length of a
Bernoulli process In probability and statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is convent ...
, which has a
geometric distribution In probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expres ...
starting at 0. The best choice of parameter is a function of the corresponding Bernoulli process, which is parameterized by p = P(x=0) the probability of success in a given
Bernoulli trial In the theory of probability Probability is the branch of mathematics Mathematics (from Greek: ) includes the study of such topics as numbers ( and ), formulas and related structures (), shapes and spaces in which they are containe ...
. is either the median of the distribution or the median ±1. It can be determined by these inequalities: : (1-p)^M + (1-p)^ \leq 1 < (1-p)^ + (1-p)^M, which are solved by : M = \left\lceil -\frac\right\rceil. For the example with : : M = \left\lceil -\frac\right\rceil = \left\lceil 2.634 \right\rceil = 3. The Golomb code for this distribution is equivalent to the
Huffman code In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorit ...
for the same probabilities, if it were possible to compute the Huffman code for the infinite set of source values.


Use with signed integers

Golomb's scheme was designed to encode sequences of non-negative numbers. However it is easily extended to accept sequences containing negative numbers using an ''overlap and interleave'' scheme, in which all values are reassigned to some positive number in a unique and reversible way. The sequence begins: 0, -1, 1, -2, 2, -3, 3, -4, 4 ... The ''n''th negative value (i.e., ) is mapped to the ''n''th odd number (), and the ''m''th positive value is mapped to the ''m''th even number (). This may be expressed mathematically as follows: a positive value is mapped to (x^\prime=2, x, =2x, x\ge0), and a negative value is mapped to (y^\prime=2, y, -1=-2y-1, y<0). Such a code may be used for simplicity, even if suboptimal. Truly optimal codes for two-sided geometric distributions include multiple variants of the Golomb code, depending on the distribution parameters, including this one.


Simple algorithm

Note below that this is the Rice–Golomb encoding, where the remainder code uses simple truncated binary encoding, also named "Rice coding" (other varying-length binary encodings, like arithmetic or Huffman encodings, are possible for the remainder codes, if the statistic distribution of remainder codes is not flat, and notably when not all possible remainders after the division are used). In this algorithm, if the ''M'' parameter is a power of 2, it becomes equivalent to the simpler Rice encoding. # Fix the parameter ''M'' to an integer value. # For ''N'', the number to be encoded, find ## quotient = ''q'' = floor(''N''/''M'') ## remainder = ''r'' = ''N'' modulo ''M'' # Generate codeword ## The code format : , where ## Quotient code (in
unary codingUnary coding, or the unary numeral system The unary numeral system is the simplest numeral system to represent natural numbers: to represent a number ''N'', a symbol representing 1 is repeated ''N'' times. In the unary system, the number 0 (zero) ...
) ### Write a ''q''-length string of 1 bits (alternatively, of 0 bits) ### Write a 0 bit (respectively, a 1 bit) ## Remainder code (in truncated binary encoding) ### Let b = \lfloor\log_2(M)\rfloor #### If r < 2^-M code ''r'' in binary representation using ''b'' bits. #### If r \ge 2^-M code the number r+2^-M in binary representation using ''b'' + 1 bits.


Example

Set . Thus b = \lfloor\log_2(10)\rfloor = 3. The cutoff is 2^-M = 16-10 = 6. For example, with a Rice–Golomb encoding using parameter , the decimal number 42 would first be split into = 4 and = 2, and would be encoded as qcode(),rcode() = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the code is enough to say when ends and begins ; both the qcode and rcode are self-delimited).


Use for run-length encoding

:''Note that and are reversed in this section compared to the use in earlier sections.'' Given an alphabet of two symbols, or a set of two events, ''P'' and ''Q'', with probabilities ''p'' and () respectively, where , Golomb coding can be used to encode runs of zero or more ''P''′s separated by single ''Q''′s. In this application, the best setting of the parameter ''M'' is the nearest integer to - \frac. When ''p'' = 1/2, ''M'' = 1, and the Golomb code corresponds to unary ( ''P''′s followed by a ''Q'' is encoded as ''n'' ones followed by a zero). If a simpler code is desired, one can assign Golomb–Rice parameter (i.e., Golomb parameter M=2^b) to the integer nearest to - \log_2(-\log_2 p); although not always the best parameter, it is usually the best Rice parameter and its compression performance is quite close to the optimal Golomb code. (Rice himself proposed using various codes for the same data to figure out which was best. A later
JPL The Jet Propulsion Laboratory (JPL) is a federally funded research and development center and NASA The National Aeronautics and Space Administration (NASA; ) is an independent agencies of the United States government, independent agency o ...
researcher proposed various methods of optimizing or estimating the code parameter.) Consider using a Rice code with a binary portion having bits to run-length encode sequences where ''P'' has a probability . If \mathbb textk\text/math> is the probability that a bit will be part of an -bit run (k-1 ''P''s and one ''Q'') and (\textk\text) is the compression ratio of that run, then the expected compression ratio is :\begin \mathbb
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothin ...

text
&= \sum_^\infty (\textk\text) \cdot \mathbb textk\text\\ &= \sum_^\infty \frac \cdot kp^ (1-p)^2 \\ &= (1-p)^2 \sum_^\infty (b+1+j) \cdot \sum_^ p^ \\ &= (1-p)^2 \sum_^\infty (b+1+j) \cdot \left(p^ - p^\right) \\ &= (1-p) \cdot \left (b + \sum_^\infty p^ \right ) \\ &= (1-p) \cdot \left(b + ^\right ) \\ \end Compression is often expressed in terms of 1-\mathbb
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothin ...

text
/math>, the proportion compressed. For p \approx 1, the run-length coding approach results in compression ratios close to
entropy Entropy is a scientific concept as well as a measurable physical property that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamic ...
. For example, using Rice code b=6 for p=0.99 yields compression, while the entropy limit is .


Adaptive run-length Golomb–Rice encoding

When a probability distribution for integers is not known, the optimal parameter for a Golomb–Rice encoder cannot be determined. Thus, in many applications, a two-pass approach is used: first, the block of data is scanned to estimate a probability density function (PDF) for the data. The Golomb–Rice parameter is then determined from that estimated PDF. A simpler variation of that approach is to assume that the PDF belongs to a parametrized family, estimate the PDF parameters from the data, and then compute the optimal Golomb–Rice parameter. That is the approach used in most of the applications discussed below. An alternative approach to efficiently encode integer data whose PDF is not known, or is varying, is to use a backwards-adaptive encoder. Th
run-length Golomb–Rice (RLGR) code
achieves that using a very simple algorithm that adjusts the Golomb–Rice parameter up or down, depending on the last encoded symbol. A decoder can follow the same rule to track the variation of the encoding parameters, so no side information needs to be transmitted, just the encoded data. Assuming a generalized Gaussian PDF, which covers a wide range of statistics seen in data such as prediction errors or transform coefficients in multimedia codecs, the RLGR encoding algorithm can perform very well in such applications.


Applications

Numerous signal codecs use a Rice code for
prediction Image:Old Farmer's Almanac 1793 cover.jpg, frame, ''The Old Farmer's Almanac'' is famous in the US for its (not necessarily accurate) long-range weather predictions. A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecas ...
residues. In predictive algorithms, such residues tend to fall into a two-sided
geometric distribution In probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expres ...
, with small residues being more frequent than large residues, and the Rice code closely approximates the Huffman code for such a distribution without the overhead of having to transmit the Huffman table. One signal that does not match a geometric distribution is a
sine wave A sine wave or sinusoid is any of certain mathematical curves that describe a smooth periodic oscillation Oscillation is the repetitive variation, typically in time, of some measure about a central value (often a point of Mechanical equilib ...

sine wave
, because the differential residues create a sinusoidal signal whose values are not creating a geometric distribution (the highest and lowest residue values have similar high frequency of occurrences, only the median positive and negative residues occur less often). Several lossless
audio codecsAn audio codec is a device or computer program capable of encoding or decoding a digital data stream (a codec A codec is a device or computer program which encodes or Decoding methods, decodes a data stream or signal. ''Codec'' is a portmanteau of ...
, such as Shorten,
FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...
,FLAC documentation: format overview
/ref>
Apple Lossless Apple Lossless, also known as Apple Lossless Audio Codec (ALAC), or Apple Lossless Encoder (ALE), is an audio coding format, and its reference audio codec implementation, developed by Apple Inc. for lossless data compression of digital music. After ...
, and
MPEG-4 ALS MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/ IEC 14496-3:2005/Amd 2:2006 in 20 ...
, use a Rice code after the linear prediction step (called "adaptive FIR filter" in Apple Lossless). Rice coding is also used in the FELICS lossless image codec. The Golomb–Rice coder is used in the entropy coding stage of Rice algorithm based ''lossless image codecs''. One such experiment yields the compression ratio graph shown. The
JPEG-LSLossless JPEG is a 1993 addition to JPEG standard by the Joint Photographic Experts Group to enable lossless compression. However, the term may also be used to refer to all lossless compression schemes developed by the group, including JPEG 2000 and ...
scheme uses Rice–Golomb to encode the prediction residuals. Th
run-length Golomb–Rice (RLGR)
adaptive version of Golomb–Rice coding, mentioned above, is used for encoding screen content in virtual machines in th
RemoteFX
component of the Microsoft Remote Desktop Protocol.


See also

*
Elias delta coding Elias δ code or Elias delta code is a Universal code (data compression), universal code encoding the positive integers developed by Peter Elias. Encoding To code a number ''X'' ≥ 1: # Let ''N'' = ⌊log2 ''X''⌋; be the highest powe ...


References


Further reading

* Golomb, Solomon W. (1966)
Run-length encodings. IEEE Transactions on Information Theory, IT--12(3):399--401
* * Robert F. Rice (1979)
, "Some Practical Universal Noiseless Coding Techniques", Jet Propulsion Laboratory, Pasadena, California, JPL Publication 79—22, March 1979.
* Witten, Ian Moffat, Alistair Bell, Timothy. "Managing Gigabytes: Compressing and Indexing Documents and Images." Second Edition. Morgan Kaufmann Publishers, San Francisco CA. 1999 * David Salomon. "Data Compression",. * H. S. Malvar
Adaptive run-length/Golomb–Rice encoding of quantized generalized Gaussian sources with unknown statistics
Proc. Data Compression Conference, 2006.
RLGR Entropy Encoding
Microsoft MS-RDPRFX Open Specification, RemoteFX codec for Remote Desktop Protocol. * S. Büttcher, C. L. A. Clarke, and G. V. Cormack
Information Retrieval: Implementing and Evaluating Search Engines
MIT Press, Cambridge MA, 2010. {{DEFAULTSORT:Golomb Coding Lossless compression algorithms