HOME

TheInfoList



OR:

In information theory, an entropy coding (or entropy encoding) is any
lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
method that attempts to approach the lower bound declared by Shannon's
source coding theorem In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy. Named after Claude Shannon, the source coding theorem ...
, which states that any lossless data compression method must have expected code length greater or equal to the entropy of the source. More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies \mathbb E_ (d(x))\geq \mathbb E_ \log_b(P(x))/math>, where l is the number of symbols in a code word, d is the coding function, b is the number of symbols used to make output codes and P is the probability of the source symbol. An entropy coding attempts to approach this lower bound. Two of the most common entropy coding techniques are
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
and
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...
. If the approximate entropy characteristics of a data stream are known in advance (especially for
signal compression Signal compression is the use of various techniques to increase the quality or quantity of signal parameters transmitted through a given telecommunications channel. Types of signal compression include: * Bandwidth compression *Data compression *Dy ...
), a simpler static code may be useful. These static codes include universal codes (such as
Elias gamma coding Elias γ code or Elias gamma code is a universal code encoding positive integers developed by Peter Elias. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand. Encoding To code a number ''x'' � ...
or
Fibonacci coding In mathematics and computing, Fibonacci coding is a universal code which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains n ...
) and Golomb codes (such as
unary coding Unary coding, or the unary numeral system and also sometimes called thermometer code, is an entropy encoding that represents a natural number, ''n'', with a code of length ''n'' + 1 ( or ''n'' ), usually ''n'' ones followed by a zero (if ...
or
Rice coding Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb ...
). Since 2014, data compressors have started using the
asymmetric numeral systems Asymmetric numeral systems (ANS)J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp''The use of asymmetric numeral systems as an accurate replacement for Huffman coding'' Picture Coding Symposium, 2015.J. Duda''Asymmetric numeral systems: entropy coding ...
family of entropy coding techniques, which allows combination of the compression ratio of
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...
with a processing cost similar to
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
.


Entropy as a measure of similarity

Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.


See also

*
Arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...
*
Asymmetric numeral systems Asymmetric numeral systems (ANS)J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp''The use of asymmetric numeral systems as an accurate replacement for Huffman coding'' Picture Coding Symposium, 2015.J. Duda''Asymmetric numeral systems: entropy coding ...
(ANS) *
Context-adaptive binary arithmetic coding Context-adaptive binary arithmetic coding (CABAC) is a form of entropy encoding used in the H.264/MPEG-4 AVC and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it i ...
(CABAC) *
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
*
Range coding Range coding (or range encoding) is an entropy coding method defined by G. Nigel N. Martin in a 1979 paper,, 100000)) using the probability distribution . The encoder breaks down the range Information_Theory,_Inference,_and_Learning_Algorithms
',_by_ Information_Theory,_Inference,_and_Learning_Algorithms
',_by_David_MacKay_(scientist)">David_MacKay_(2003),_gives_an_introduction_to_Shannon_theory_and_data_compression,_including_the_Huffman_coding_ In_computer_science_and_information_theory,_a_Huffman_code_is_a_particular_type_of_optimal__prefix_code_that_is_commonly_used_for__lossless_data_compression._The_process_of_finding_or_using_such_a_code_proceeds_by_means_of_Huffman_coding,_an_algo_...
_and_arithmetic_coding_ Arithmetic_coding_(AC)_is_a_form_of__entropy_encoding_used_in__lossless_data_compression._Normally,_a__string_of_characters_is_represented_using_a_fixed_number_of__bits_per_character,_as_in_the_ASCII_code._When_a_string_is_converted_to_arithmetic_...
. *_
Source_Coding
''_by_Thomas_Wiegand.html" ;"title="David_MacKay_(scientist).html" ;"title=", 100000) into three subranges: A: [ ...


References


External links

*
Information Theory, Inference, and Learning Algorithms
', by David MacKay (scientist)">David MacKay (2003), gives an introduction to Shannon theory and data compression, including the
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
and
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...
. *
Source Coding
'' by Thomas Wiegand">T. Wiegand and H. Schwarz (2011). {{Compression Methods Lossless compression algorithms Entropy and information