HOME

TheInfoList



OR:

Truncated binary encoding is an
entropy encoding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...
typically used for uniform
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s with a finite alphabet. It is parameterized by an alphabet with total size of number ''n''. It is a slightly more general form of binary encoding when ''n'' is not a
power of two A power of two is a number of the form where is an integer, that is, the result of exponentiation with number two as the base and integer  as the exponent. In a context where only integers are considered, is restricted to non-negative ...
. If ''n'' is a power of two, then the coded value for 0 ≤ ''x'' < ''n'' is the simple binary code for ''x'' of length log2(''n''). Otherwise let ''k'' = floor(log2(''n'')), such that 2''k'' < ''n'' < 2''k''+1 and let ''u'' = 2''k''+1 − ''n''. Truncated binary encoding assigns the first ''u'' symbols codewords of length ''k'' and then assigns the remaining ''n'' − ''u'' symbols the last ''n'' − ''u'' codewords of length ''k'' + 1. Because all the codewords of length ''k'' + 1 consist of an unassigned codeword of length ''k'' with a "0" or "1" appended, the resulting code is a
prefix code A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix (initial segment) of any other code word in the system. It is trivially t ...
.


History

Used since at least 1984, phase-in codes, also known as economy codes, Job van der Zwan
"Phase-in Codes"
are also known as truncated binary encoding.


Example with ''n'' = 5

For example, for the alphabet , ''n'' = 5 and 22 ≤ ''n'' < 23, hence ''k'' = 2 and ''u'' = 23 − 5 = 3. Truncated binary encoding assigns the first ''u'' symbols the codewords 00, 01, and 10, all of length 2, then assigns the last ''n'' − ''u'' symbols the codewords 110 and 111, the last two codewords of length 3. For example, if ''n'' is 5, plain binary encoding and truncated binary encoding allocates the following
codewords In communication, a code word is an element of a standardized code or protocol. Each code word is assembled in accordance with the specific rules of the code and assigned a unique meaning. Code words are typically used for reasons of reliability, ...
. Digits shown struck are not transmitted in truncated binary. It takes 3 bits to encode ''n'' using straightforward binary encoding, hence 23 − ''n'' = 8 − 5 = 3 are unused. In numerical terms, to send a value ''x'', where 0 ≤ ''x'' < ''n'', and where there are 2''k'' ≤ ''n'' < 2''k''+1 symbols, there are ''u'' = 2''k''+1 − ''n'' unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number ''x'' in truncated binary is: if ''x'' is less than ''u'', encode it in ''k'' binary bits; if ''x'' is greater than or equal to ''u'', encode the value ''x'' + ''u'' in ''k'' + 1 binary bits.


Example with ''n'' = 10

Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 24 − 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.) To decode, read the first ''k'' bits. If they encode a value less than ''u'', decoding is complete. Otherwise, read an additional bit and subtract ''u'' from the result.


Example with ''n'' = 7

Here is a more extreme case: with ''n'' = 7 the next power of 2 is 8, so ''k'' = 2 and ''u'' = 23 − 7 = 1: This last example demonstrates that a leading zero bit does not always indicate a short code; if ''u'' < 2''k'', some long codes will begin with a zero bit.


Simple algorithm

Generate the truncated binary encoding for a value ''x'', 0 ≤ ''x'' < ''n'', where ''n'' > 0 is the size of the alphabet containing ''x''. ''n'' need not be a power of two. string TruncatedBinary (int x, int n) The routine Binary is expository; usually just the rightmost len bits of the variable ''x'' are desired. Here we simply output the binary code for ''x'' using len bits, padding with high-order 0s if necessary. string Binary (int x, int len)


On efficiency

If ''n'' is not a power of two, and ''k''-bit symbols are observed with probability ''p'', then (''k'' + 1)-bit symbols are observed with probability 1 − ''p''. We can calculate the expected number of bits per symbol b_e as : b_e = p k + (1 - p) (k + 1). Raw encoding of the symbol has b_u = k + 1 bits. Then relative space saving ''s'' (see
Data compression ratio Data compression ratio, also known as compression power, is a measurement of the relative reduction in size of data representation produced by a data compression algorithm. It is typically expressed as the division of uncompressed size by compresse ...
) of the encoding can be defined as : s = 1 - \frac = 1 - \frac. When simplified, this expression leads to : s = \frac = \frac. This indicates that relative efficiency of truncated binary encoding increases as probability ''p'' of ''k''-bit symbols increases, and the raw-encoding symbol bit-length b_u decreases.


See also

*
Benford's law Benford's law, also known as the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation that in many real-life sets of numerical data, the leading digit is likely to be small.Arno Berger and Theodore ...
*
Golomb coding Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb ...


References

{{DEFAULTSORT:Truncated Binary Encoding Lossless compression algorithms