HOME

TheInfoList




Lossless compression is a class of
data compression In signal processing Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as audio signal processing, sound, image processing, images, and scientific measurements. Signal ...
algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contrast,
lossy compression In information technology, lossy compression or irreversible compression is the class of data compression, data encoding methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to r ...
permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates (and therefore reduced media sizes). By operation of the
pigeonhole principle In mathematics Mathematics (from Ancient Greek, Greek: ) includes the study of such topics as quantity (number theory), mathematical structure, structure (algebra), space (geometry), and calculus, change (mathematical analysis, analysis). It ...
, no lossless compression algorithm can efficiently compress all possible data. For this reason, many different algorithms exist that are designed either with a specific type of input data in mind or with specific assumptions about what kinds of redundancy the uncompressed data are likely to contain. Therefore, compression ratios tend to be stronger on human- and machine-readable documents and code in comparison to entropic binary data (random bytes). Lossless data compression is used in many applications. For example, it is used in the ZIP file format and in the
GNU GNU () is an extensive collection of free software Free software (or libre software) is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any ...

GNU
tool
gzip gzip is a and a used for . The program was created by and as a replacement for the program used in early systems, and intended for use by (the "g" is from "GNU"). Version 0.1 was first publicly released on 31 October 1992, and version 1 ...
. It is also often used as a component within lossy data compression technologies (e.g. lossless mid/side joint stereo preprocessing by
MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio Digital audio is a representation of sound recorded in, or converted into, Digital signal (signal processing), digital form. In digital a ...

MP3
encoders and other lossy audio encoders). Lossless compression is used in cases where it is important that the original and the decompressed data be identical, or where deviations from the original data would be unfavourable. Typical examples are executable programs, text documents, and source code. Some image file formats, like PNG or
GIF The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algor ...
, use only lossless compression, while others like
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an for storing images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by , , , , image manipulation, , and page-layout applications. The fo ...

TIFF
and
MNG Multiple-image Network Graphics (MNG) is a graphics file format Image file formats are standardized means of organizing and storing digital image A digital image is an image An SAR radar imaging, radar image acquired by the SIR-C/ ...
may use either lossless or lossy methods.
Lossless audio In signal processing Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as audio signal processing, sound, image processing, images, and scientific measurements. Signal ...
formats are most often used for archiving or production purposes, while smaller lossy audio files are typically used on portable players and in other cases where storage space is limited or exact replication of the audio is unnecessary.


Lossless compression techniques

Most lossless compression programs do two things in sequence: the first step generates a ''statistical model'' for the input data, and the second step uses this model to map input data to bit sequences in such a way that "probable" (e.g. frequently encountered) data will produce shorter output than "improbable" data. The primary encoding algorithms used to produce bit sequences are
Huffman coding In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algor ...
(also used by the deflate algorithm) and
arithmetic coding Arithmetic coding is a form of entropy encoding In information theory an entropy encoding is a Lossless compression , lossless data compression scheme that is independent of the specific characteristics of the medium. One of the main types of en ...
. Arithmetic coding achieves compression rates close to the best possible for a particular statistical model, which is given by the
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent in the variable's possible outcomes. The concept of information entropy was introduced by Claude Shannon in hi ...
, whereas Huffman compression is simpler and faster but produces poor results for models that deal with symbol probabilities close to 1. There are two primary ways of constructing statistical models: in a ''static'' model, the data is analyzed and a model is constructed, then this model is stored with the compressed data. This approach is simple and modular, but has the disadvantage that the model itself can be expensive to store, and also that it forces using a single model for all data being compressed, and so performs poorly on files that contain heterogeneous data. ''Adaptive'' models dynamically update the model as the data is compressed. Both the encoder and decoder begin with a trivial model, yielding poor compression of initial data, but as they learn more about the data, performance improves. Most popular types of compression used in practice now use adaptive coders. Lossless compression methods may be categorized according to the type of data they are designed to compress. While, in principle, any general-purpose lossless compression algorithm (''general-purpose'' meaning that they can accept any bitstring) can be used on any type of data, many are unable to achieve significant compression on data that are not of the form for which they were designed to compress. Many of the lossless compression techniques used for text also work reasonably well for
indexed image In computing, indexed color is a technique to manage digital image A digital image is an image File:TEIDE.JPG, An Synthetic aperture radar, SAR radar imaging, radar image acquired by the SIR-C/X-SAR radar on board the Space Shuttle Endeav ...
s.


Multimedia

These techniques take advantage of the specific characteristics of images such as the common phenomenon of contiguous 2-D areas of similar tones. Every pixel but the first is replaced by the difference to its left neighbor. This leads to small values having a much higher probability than large values. This is often also applied to sound files, and can compress files that contain mostly low frequencies and low volumes. For images, this step can be repeated by taking the difference to the top pixel, and then in videos, the difference to the pixel in the next frame can be taken. A hierarchical version of this technique takes neighboring pairs of data points, stores their difference and sum, and on a higher level with lower resolution continues with the sums. This is called
discrete wavelet transform . The original image is high-pass filtered, yielding the three large images, each describing local changes in brightness (details) in the original image. It is then low-pass filtered and downscaled, yielding an approximation image; this image is hig ...
.
JPEG2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding their ...
additionally uses data points from other pairs and multiplication factors to mix them into the difference. These factors must be integers, so that the result is an integer under all circumstances. So the values are increased, increasing file size, but hopefully the distribution of values is more peaked. The adaptive encoding uses the probabilities from the previous sample in sound encoding, from the left and upper pixel in image encoding, and additionally from the previous frame in video encoding. In the wavelet transformation, the probabilities are also passed through the hierarchy.


Historical legal issues

Many of these methods are implemented in open-source and proprietary tools, particularly LZW and its variants. Some algorithms are patented in the
United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country Continental United States, primarily located in North America. It consists of 50 U.S. state, states, a Washington, D.C., ...

United States
and other countries and their legal usage requires licensing by the patent holder. Because of patents on certain kinds of LZW compression, and in particular licensing practices by patent holder Unisys that many developers considered abusive, some open source proponents encouraged people to avoid using the
Graphics Interchange Format The Graphics Interchange Format (GIF; or , #Pronunciation, see pronunciation) is a Raster graphics, bitmap Image file formats, image format that was developed by a team at the online services provider CompuServe led by American computer scient ...
(GIF) for compressing still image files in favor of
Portable Network Graphics Portable Network Graphics (PNG, officially pronounced , sometimes pronounced ) is a raster-graphics file format that supports lossless data compression Lossless compression is a class of data compression In signal processing Signa ...
(PNG), which combines the
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including ...
-based deflate algorithm with a selection of domain-specific prediction filters. However, the patents on LZW expired on June 20, 2003. Many of the lossless compression techniques used for text also work reasonably well for
indexed image In computing, indexed color is a technique to manage digital image A digital image is an image File:TEIDE.JPG, An Synthetic aperture radar, SAR radar imaging, radar image acquired by the SIR-C/X-SAR radar on board the Space Shuttle Endeav ...
s, but there are other techniques that do not work for typical text that are useful for some images (particularly simple bitmaps), and other techniques that take advantage of the specific characteristics of images (such as the common phenomenon of contiguous 2-D areas of similar tones, and the fact that color images usually have a preponderance of a limited range of colors out of those representable in the color space). As mentioned previously, lossless sound compression is a somewhat specialized area. Lossless sound compression algorithms can take advantage of the repeating patterns shown by the wave-like nature of the data – essentially using
autoregressive In statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with ...
models to predict the "next" value and encoding the (hopefully small) difference between the expected value and the actual data. If the difference between the predicted and the actual data (called the ''error'') tends to be small, then certain difference values (like 0, +1, −1 etc. on sample values) become very frequent, which can be exploited by encoding them in few output bits. It is sometimes beneficial to compress only the differences between two versions of a file (or, in
video compression In signal processing, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either Lossy compression, lossy or Lossless comp ...
, of successive images within a sequence). This is called
delta encoding Delta encoding is a way of storing or transmitting data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its essence and t ...
(from the Greek letter Δ, which in mathematics, denotes a difference), but the term is typically only used if both versions are meaningful outside compression and decompression. For example, while the process of compressing the error in the above-mentioned lossless audio compression scheme could be described as delta encoding from the approximated sound wave to the original sound wave, the approximated version of the sound wave is not meaningful in any other context.


Lossless compression methods

No lossless compression algorithm can efficiently compress all possible data (see the section
LimitationsLimitation may refer to: *A disclaimer for research done in an experiment An experiment is a procedure carried out to support, refute, or validate a hypothesis. Experiments provide insight into Causality, cause-and-effect by demonstrating what ...
below for details). For this reason, many different algorithms exist that are designed either with a specific type of input data in mind or with specific assumptions about what kinds of redundancy the uncompressed data are likely to contain. Some of the most common lossless compression algorithms are listed below.


General purpose

*
Run-length encoding Run-length encoding (RLE) is a form of lossless data compression Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contrast, lossy compression ...
(RLE) – Simple scheme that provides good compression of data containing many runs of the same value *
Huffman coding In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algor ...
– Entropy encoding, pairs well with other algorithms *
Arithmetic coding Arithmetic coding is a form of entropy encoding In information theory, an entropy coding (or entropy encoding) is a lossless compression , lossless data compression scheme that is independent of the specific characteristics of the medium. One of ...
– Entropy encoding *
ANS Ans may refer to: *Ans Ans may refer to: *Ans, a Belgian municipality in the province of Liège *Ans, Denmark Ans is a town in Silkeborg Municipality, Denmark. References Cities and towns in the Central Denmark Region Silkeborg Municipality ...
– Entropy encoding, used by
LZFSE LZFSE (Lempel–Ziv Finite State Entropy) is an open source lossless data compression Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contras ...
and
Zstandard Zstandard (or zstd) is a lossless data compression In signal processing, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compres ...
* Lempel-Ziv compression (LZ77 and LZ78) – Dictionary-based algorithm that forms the basis for many other algorithms ** Lempel–Ziv–Storer–Szymanski (LZSS) – Used by
WinRAR WinRAR is a trialware file archiver utility for Windows, developed by Eugene Roshal of win.rar GmbH. It can create and view archives in RAR (file format), RAR or Zip (file format), ZIP file formats, and unpack numerous archive file formats. To en ...
in tandem with Huffman coding *** Deflate – Combines LZSS compression with Huffman coding, used by ZIP,
gzip gzip is a and a used for . The program was created by and as a replacement for the program used in early systems, and intended for use by (the "g" is from "GNU"). Version 0.1 was first publicly released on 31 October 1992, and version 1 ...
, and PNG images **
Lempel–Ziv–Welch Lempel–Ziv–Welch (LZW) is a universal lossless data compression Lossless compression is a class of data compression In signal processing Signal processing is an electrical engineering Electrical engineering is an engin ...
(LZW) – Used by
GIF The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algor ...

GIF
images and Unix's
compress compress is a Unix shell A Unix shell is a command-line interpreter or shell Shell may refer to: Architecture and design * Shell (structure)A shell is a type of structural element which is characterized by its geometry, being a three-di ...
utility **
Lempel–Ziv–Markov chain algorithm The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm In and , an algorithm () is a finite sequence of , computer-implementable instructions, typically to solve a class of problems or to perform a computation. Algorithms are al ...
(LZMA) – Very high compression ratio, used by
7zip 7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip uses its own 7z archive format, but can ...
and xz *
Burrows–Wheeler transform The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated ch ...
reversible transform for making textual data more compressible, used by
bzip2 bzip2 is a free and open-source Free and open-source software (FOSS) is software that is both free software and open-source software where anyone is free software license, freely licensed to use, copy, study, and change the software in an ...
*
Prediction by partial matching Prediction by partial matching (PPM) is an adaptive statistics, statistical data compression technique based on context modeling and prediction. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol i ...
(PPM) – Optimized for compressing
plain text In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and sof ...

plain text


Audio

*
Apple Lossless Apple Lossless, also known as Apple Lossless Audio Codec (ALAC), or Apple Lossless Encoder (ALE), is an audio coding format, and its reference audio codec implementation, developed by Apple Inc. for lossless data compression of digital music. After ...
(ALAC – Apple Lossless Audio Codec) *
Adaptive Transform Acoustic Coding Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony is a Japanese Multinational corporation, multinational conglomerate (company), conglomerate corporation headquartered in ...
(ATRAC) *
Audio Lossless Coding MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 MPEG-4 Part 3 or MPEG-4 Audio (formally ISO The International Organization for Standardization (ISO; ) is an international standard are technical sta ...
(also known as MPEG-4 ALS) *
Direct Stream Transfer Super Audio CD (SACD) is a read-only optical disc drive. (CD-R), showing characteristic iridescence. In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the st ...

Direct Stream Transfer
(DST) *
Dolby TrueHD 180px, Dolby TrueHD logo Dolby TrueHD is a lossless compression, lossless, Surround sound, multi-channel audio codec developed by Dolby Laboratories for home video, used principally in Blu-ray Disc and compatible hardware. Dolby TrueHD, along with ...

Dolby TrueHD
*
DTS-HD Master Audio DTS-HD Master Audio (DTS-HD MA; known as DTS++ before 2004) is a Surround sound, multi-channel, Lossless data compression, lossless audio codec developed by DTS (sound system), DTS as an extension of the Lossy compression, lossy DTS (sound system)# ...
*
Free Lossless Audio Codec FLAC (; Free Lossless Audio Codec) is an audio coding format An audio coding format (or sometimes audio compression format) is a content representation format for storage or transmission of digital audio Digital audio is a representatio ...
(FLAC) *
Meridian Lossless Packing 200px, The Advanced Resolution logo Meridian Lossless Packing, also known as Packed PCM (PPCM), is a lossless compression technique for compressing PCM audio data developed by Meridian Audio, Ltd. MLP is the standard lossless compression method f ...
(MLP) *
Monkey's Audio Monkey's Audio is an algorithm and file format ogg-file: 154 kilobytes. A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage ...
(Monkey's Audio APE) *
MPEG-4 SLS MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/ IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 MPEG-4 is a method of defining compression of audio and visual (AV) digital data. It wa ...
(also known as HD-AAC) *
OptimFROG OptimFROG is a proprietary {{Short pages monitor