Image compression is a type of

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

applied to

digital image A digital image is an image composed of picture elements, also known as pixels, each with '' finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...

s, to reduce their cost for storage or

transmission Transmission or transmit may refer to: Science and technology * Power transmission ** Electric power transmission ** Transmission (mechanical device), technology that allows controlled application of power *** Automatic transmission *** Manual tra ...

Algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s may take advantage of

visual perception Visual perception is the ability to detect light and use it to form an image of the surrounding Biophysical environment, environment. Photodetection without image formation is classified as ''light sensing''. In most vertebrates, visual percept ...

and the

statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

properties of image data to provide superior results compared with generic

methods which are used for other digital data.

Lossy and lossless image compression

Image compression may be

lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...

. Lossless compression is preferred for archival purposes and often for medical imaging, technical drawings,

clip art Clip art (also clipart, clip-art) is a type of graphic art. Pieces are pre-made images used to illustrate any medium. Today, clip art is used extensively and comes in many forms, both electronic and printed. However, most clip art today is creat ...

, or comics. Lossy compression methods, especially when used at low

bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...

s, introduce

compression artifact A compression artifact (or artefact) is a noticeable distortion of media (including Image, images, Sound recording, audio, and video) caused by the application of lossy compression. Lossy data compression involves discarding some of the medi ...

s. Lossy methods are especially suitable for natural images such as photographs in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. Lossy compression that produces negligible differences may be called visually lossless. Methods for

lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

: *

Transform coding Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless (perfectly reversible) on its own but is used to enable better (more targeted) quantization, whi ...

– This is the most commonly used method. **

Discrete Cosine Transform A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...

(DCT) – The most widely used form of lossy compression. It is a type of

Fourier-related transform This is a list of linear transformations of functions related to Fourier analysis. Such transformations map a function to a set of coefficients of basis functions, where the basis functions are sinusoidal and are therefore strongly localized in t ...

, and was originally developed by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974. The DCT is sometimes referred to as "DCT-II" in the context of a family of discrete cosine transforms (see

discrete cosine transform A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequency, frequencies. The DCT, first proposed by Nasir Ahmed (engineer), Nasir Ahmed in 1972, is a widely ...

). It is generally the most efficient form of image compression. *** DCT is used in

JPEG JPEG ( , short for Joint Photographic Experts Group and sometimes retroactively referred to as JPEG 1) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degr ...

, the most popular lossy format, and the more recent HEIF. ** The more recently developed wavelet transform is also used extensively, followed by quantization and

entropy coding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...

. *

Color quantization In computer graphics, color quantization or color image quantization is quantization applied to color spaces; it is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as ...

- Reducing the

color space A color space is a specific organization of colors. In combination with color profiling supported by various physical devices, it supports reproducible representations of colorwhether such representation entails an analog or a digital represe ...

to a few "representative" colors in the image. The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette. This method can be combined with dithering to avoid

posterization Posterization or posterisation of an image is the conversion of a continuous gradation of tone to several regions of fewer tones, causing abrupt changes from one tone to another. This was originally done with photographic processes to create ...

. ** Whole-image palette, typically 256 colors, used in GIF and PNG file formats. ** block palette, typically 2 or 4 colors for each block of 4x4 pixels, used in BTC, CCC, S2TC, and S3TC. *

Chroma subsampling Chroma subsampling is the practice of encoding images by implementing less resolution for Chrominance, chroma information than for luma (video), luma information, taking advantage of the human visual system's lower acuity for color differences t ...

. This takes advantage of the fact that the human eye perceives spatial changes of brightness more sharply than those of color, by averaging or dropping some of the chrominance information in the image. * Fractal compression. * More recently, methods based on

Machine Learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

were applied, using

Multilayer perceptron In deep learning, a multilayer perceptron (MLP) is a name for a modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is ...

Convolutional neural network A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...

Generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...

s and Diffusion models. Implementations are available in

OpenCV OpenCV (Open Source Computer Vision Library) is a Library (computing), library of programming functions mainly for Real-time computing, real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez ...

TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

's Image Processing Toolbox (IPT), and the High-Fidelity Generative Image Compression (HiFiC) open source project. Methods for

lossless compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...

: *

Run-length encoding Run-length encoding (RLE) is a form of lossless data compression in which ''runs'' of data (consecutive occurrences of the same data value) are stored as a single occurrence of that data value and a count of its consecutive occurrences, rather th ...

– used in default method in PCX and as one of possible in BMP, TGA,

TIFF Tag Image File Format or Tagged Image File Format, commonly known by the abbreviations TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is w ...

* Predictive coding – used in DPCM *

Entropy encoding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...

– the two most common entropy encoding techniques are

arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...

and

Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...

* Adaptive dictionary algorithms such as LZW – used in

GIF The Graphics Interchange Format (GIF; or , ) is a Raster graphics, bitmap Image file formats, image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released ...

and

* DEFLATE – used in PNG, MNG, and

Chain code A chain code is a lossless compression based image segmentation method for binary images based upon tracing image contours. The basic principle of chain coding, like other contour codings, is to separately encode each connected component (topology), ...

Other properties

The best image quality at a given compression rate (or

) is the main goal of image compression, however, there are other important properties of image compression schemes: Scalability generally refers to a quality reduction achieved by manipulation of the bitstream or file (without decompression and re-compression). Other names for scalability are ''progressive coding'' or ''embedded bitstreams''. Despite its contrary nature, scalability also may be found in lossless codecs, usually in form of coarse-to-fine pixel scans. Scalability is especially useful for previewing images while downloading them (e.g., in a web browser) or for providing variable quality access to e.g., databases. There are several types of scalability: * Quality progressive or layer progressive: The bitstream successively refines the reconstructed image. * Resolution progressive: First encode a lower image resolution; then encode the difference to higher resolutions. * Component progressive: First encode grey-scale version; then adding full color. Region of interest coding. Certain parts of the image are encoded with higher quality than others. This may be combined with scalability (encode these parts first, others later). Meta information. Compressed data may contain information about the image which may be used to categorize, search, or browse images. Such information may include color and texture statistics, small preview images, and author or copyright information. Processing power. Compression algorithms require different amounts of

processing power In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instruction ...

to encode and decode. Some high compression algorithms require high processing power. The quality of a compression method often is measured by the

peak signal-to-noise ratio Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic ...

. It measures the amount of noise introduced through a lossy compression of the image, however, the subjective judgment of the viewer also is regarded as an important measure, perhaps, being the most important measure.

History

Entropy coding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...

started in the late 1940s with the introduction of

Shannon–Fano coding In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is one of two related techniques for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). * Shann ...

, the basis for

which was published in 1952.

dates back to the late 1960s, with the introduction of

fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...

(FFT) coding in 1968 and the

Hadamard transform The Hadamard transform (also known as the Walsh–Hadamard transform, Hadamard–Rademacher–Walsh transform, Walsh transform, or Walsh–Fourier transform) is an example of a generalized class of Fourier transforms. It performs an orthogonal ...

in 1969. An important development in image

was the

(DCT), a

technique first proposed by Nasir Ahmed, T. Natarajan and K. R. Rao in 1973.

was introduced by the

Joint Photographic Experts Group The Joint Photographic Experts Group (JPEG) is the joint committee between ISO/ IEC JTC 1/ SC 29 and ITU-T Study Group 16 that created and maintains the JPEG, JPEG 2000, JPEG XR, JPEG XT, JPEG XS, JPEG XL, and related digital image standard ...

(JPEG) in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used

image file format An image file format is a file format for a digital image. There are many formats that can be used, such as JPEG, PNG, and GIF. Most formats up until 2022 were for storing 2D images, not 3D ones. The data stored in an image file format may be co ...

. JPEG was largely responsible for the wide proliferation of

digital images A digital image is an image composed of picture elements, also known as pixels, each with '' finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...

and

digital photo Digital photography uses cameras containing arrays of electronics, electronic photodetectors interfaced to an analog-to-digital converter (ADC) to produce images focused by a lens (optics), lens, as opposed to an exposure on photographic film. ...

s, with several billion JPEG images produced every day as of 2015.

Lempel–Ziv–Welch Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lem ...

(LZW) is a

algorithm developed by Abraham Lempel, Jacob Ziv and Terry Welch in 1984. It is used in the

format, introduced in 1987. DEFLATE, a lossless compression algorithm developed by

Phil Katz Phillip Walter Katz (November 3, 1962 – April 14, 2000) was a computer programmer best known as the co-creator of the ZIP file format for data compression, and the author of PKZIP, a program for creating zip files that ran under DOS. ...

and specified in 1996, is used in the

Portable Network Graphics Portable Network Graphics (PNG, officially pronounced , colloquially pronounced ) is a raster graphics, raster-graphics file graphics file format, format that supports lossless data compression. PNG was developed as an improved, non-patented ...

(PNG) format. The

JPEG 2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding their ...

standard was developed from 1997 to 2000 by a JPEG committee chaired by Touradj Ebrahimi (later the JPEG president). In contrast to the DCT algorithm used by the original JPEG format, JPEG 2000 instead uses

discrete wavelet transform In numerical analysis and functional analysis Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure (for example, Inner product sp ...

(DWT) algorithms. It uses the CDF 9/7 wavelet transform (developed by

Ingrid Daubechies Baroness Ingrid Daubechies ( ; ; born 17 August 1954) is a Belgian-American physicist and mathematician. She is best known for her work with wavelets in image compression. Daubechies is recognized for her study of the mathematical methods that ...

in 1992) for its lossy compression algorithm, and the Le Gall–Tabatabai (LGT) 5/3 wavelet transform (developed by Didier Le Gall and Ali J. Tabatabai in 1988) for its lossless compression algorithm.

technology, which includes the

Motion JPEG 2000 Motion JPEG 2000 (MJ2 or MJP2) is a file format for motion sequences of JPEG 2000 images and associated audio, based on the MP4 and QuickTime format. Filename extensions for Motion JPEG 2000 video files are .mj2 and .mjp2, as defined in RFC 3745 ...

extension, was selected as the

video coding standard A video coding format (or sometimes video compression format) is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on ...

for

digital cinema Digital cinema is the digital technology used within the film industry to distribute or project motion pictures as opposed to the historical use of reels of motion picture film, such as 35 mm film. Whereas film reels have to be shipped to mo ...

in 2004. The evolution of image compression technologies has led to continuous improvements in both efficiency and quality. From the early developments in entropy coding and transform coding to the introduction of JPEG and JPEG 2000, these innovations have significantly impacted the way digital images are stored, transmitted, and processed. Modern compression methods allow users to optimize image files for faster loading times and better storage utilization, while maintaining high image quality. As compression technologies advance, these methods continue to play a crucial role in various fields, including web development, digital media, and content management.

Huffman Coding

Huffman coding is a fundamental technique used in image compression algorithms to achieve efficient data representation. Named after its inventor David A. Huffman, this method is widely employed in various image compression standards such as JPEG and PNG.

Principle of Huffman Coding

Huffman coding is a form of entropy encoding that assigns variable-length codes to input symbols based on their frequencies of occurrence. The basic principle is to assign shorter codes to more frequently occurring symbols and longer codes to less frequent symbols, thereby reducing the average code length compared to fixed-length codes.

Application in Image Compression

In image compression, Huffman coding is typically applied after other transformations like Discrete Cosine Transform (DCT) in the case of JPEG compression. After transforming the image data into a frequency domain representation, Huffman coding is used to encode the transformed coefficients efficiently.

Steps in Huffman Coding for Image Compression

# Frequency Analysis: Calculate the frequency of occurrence of each symbol or symbol combination in the transformed image data. # Constructing the Huffman Tree: Build a Huffman tree based on the symbol frequencies. The tree is constructed recursively by combining the nodes with the lowest frequencies until a single root node is formed. # Assigning Codewords: Traverse the Huffman tree to assign variable-length codewords to each symbol, with shorter codewords assigned to more frequent symbols. # Encoding: Replace the original symbols in the image data with their corresponding Huffman codewords to generate the compressed data stream.

Benefits of Huffman Coding in Image Compression

* Lossless Compression: Huffman coding can be used in both lossy and lossless image compression techniques, providing flexibility in balancing between compression ratio and image quality. * Efficiency: By assigning shorter codes to frequently occurring symbols, Huffman coding reduces the average code length, resulting in efficient data representation and reduced storage requirements. * Compatibility: Huffman coding is widely supported and can be seamlessly integrated into existing image compression standards and algorithms.

Conclusion

Huffman coding plays a crucial role in image compression by efficiently encoding image data into a compact representation. Its ability to adaptively assign variable-length codewords based on symbol frequencies makes it an essential component in modern image compression techniques, contributing to the reduction of storage space and transmission bandwidth while maintaining image quality.

Notes and references

{{Compression formats Data compression