Quantization, in mathematics and

Spectra of Quantized Signals

, ''

Modern Communication Principles

', McGraw–Hill, , 1967 (p. 196). Mean squared error is also called the quantization ''noise power''. Adding one bit to the quantizer halves the value of Δ, which reduces the noise power by the factor ¼. In terms of decibels, the noise power change is $\backslash scriptstyle\; 10\backslash cdot\; \backslash log\_(1/4)\backslash \; \backslash approx\backslash \; -6\backslash \; \backslash mathrm.$ Because the set of possible output values of a quantizer is countable, any quantizer can be decomposed into two distinct stages, which can be referred to as the ''classification'' stage (or ''forward quantization'' stage) and the ''reconstruction'' stage (or ''inverse quantization'' stage), where the classification stage maps the input value to an integer ''quantization index'' $k$ and the reconstruction stage maps the index $k$ to the ''reconstruction value'' $y\_k$ that is the output approximation of the input value. For the example uniform quantizer described above, the forward quantization stage can be expressed as :$k\; =\; \backslash left\backslash lfloor\; \backslash frac\; +\; \backslash frac\backslash right\backslash rfloor$, and the reconstruction stage for this example quantizer is simply :$y\_k\; =\; k\; \backslash cdot\; \backslash Delta$. This decomposition is useful for the design and analysis of quantization behavior, and it illustrates how the quantized data can be communicated over a communication channel – a ''source encoder'' can perform the forward quantization stage and send the index information through a communication channel, and a ''decoder'' can perform the reconstruction stage to produce the output approximation of the original input data. In general, the forward quantization stage may use any function that maps the input data to the integer space of the quantization index data, and the inverse quantization stage can conceptually (or literally) be a table look-up operation to map each quantization index to a corresponding reconstruction value. This two-stage decomposition applies equally well to

Statistical analysis of amplitude quantized sampled data systems

, ''Trans. AIEE Pt. II: Appl. Ind.'', Vol. 79, pp. 555–568, Jan. 1961. The additive noise model is commonly used for the analysis of quantization error effects in digital filtering systems, and it can be very useful in such analysis. It has been shown to be a valid model in cases of high resolution quantization (small $\backslash Delta$ relative to the signal strength) with smooth PDFs. Additive noise behavior is not always a valid assumption. Quantization error (for quantizers defined as described here) is deterministically related to the signal and not entirely independent of it. Thus, periodic signals can create periodic quantization noise. And in some cases it can even cause limit cycles to appear in digital signal processing systems. One way to ensure effective independence of the quantization error from the source signal is to perform '' dithered quantization'' (sometimes with '' noise shaping''), which involves adding random (or

. The RMS values of the AC error are exactly the same in both cases, so there is no special advantage of rounding over truncation in situations where the DC term of the error can be ignored (such as in AC coupled systems). In either case, the standard deviation, as a percentage of the full signal range, changes by a factor of 2 for each 1-bit change in the number of quantization bits. The potential signal-to-quantization-noise power ratio therefore changes by 4, or $\backslash scriptstyle\; 10\backslash cdot\; \backslash log\_(4)$, approximately 6 dB per bit.
At lower amplitudes the quantization error becomes dependent on the input signal, resulting in distortion. This distortion is created after the anti-aliasing filter, and if these distortions are above 1/2 the sample rate they will alias back into the band of interest. In order to make the quantization error independent of the input signal, the signal is dithered by adding noise to the signal. This slightly reduces signal to noise ratio, but can completely eliminate the distortion.

digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner ar ...

, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding
Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with .
Rounding is often done to obt ...

and truncation
In mathematics and computer science, truncation is limiting the number of digits right of the decimal point.
Truncation and floor function
Truncation of positive real numbers can be done using the floor function. Given a number x \in \mat ...

are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data si ...

algorithms.
The difference between an input value and its quantized value (such as round-off error
A roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are d ...

) is referred to as quantization error. A device or algorithmic function that performs quantization is called a quantizer. An analog-to-digital converter
In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provid ...

is an example of a quantizer.
Example

For example,rounding
Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with .
Rounding is often done to obt ...

a real number $x$ to the nearest integer value forms a very basic type of quantizer – a ''uniform'' one. A typical (''mid-tread'') uniform quantizer with a quantization ''step size'' equal to some value $\backslash Delta$ can be expressed as
:$Q(x)\; =\; \backslash Delta\; \backslash cdot\; \backslash left\backslash lfloor\; \backslash frac\; +\; \backslash frac\; \backslash right\backslash rfloor$,
where the notation $\backslash lfloor\; \backslash \; \backslash rfloor$ denotes the floor function
In mathematics and computer science, the floor function is the function that takes as input a real number , and gives as output the greatest integer less than or equal to , denoted or . Similarly, the ceiling function maps to the least ...

.
The essential property of a quantizer is having a countable-set of possible output-values members smaller than the set of possible input values. The members of the set of output values may have integer, rational, or real values. For simple rounding to the nearest integer, the step size $\backslash Delta$ is equal to 1. With $\backslash Delta\; =\; 1$ or with $\backslash Delta$ equal to any other integer value, this quantizer has real-valued inputs and integer-valued outputs.
When the quantization step size (Δ) is small relative to the variation in the signal being quantized, it is relatively simple to show that the mean squared error produced by such a rounding operation will be approximately $\backslash Delta^2/\; 12$.W. R. Bennett,Spectra of Quantized Signals

, ''

Bell System Technical Journal
The ''Bell Labs Technical Journal'' is the in-house scientific journal for scientists of Nokia Bell Labs, published yearly by the IEEE society. The managing editor is Charles Bahr.
The journal was originally established as the ''Bell System Techni ...

'', Vol. 27, pp. 446–472, July 1948.Seymour Stein and J. Jay Jones, Modern Communication Principles

', McGraw–Hill, , 1967 (p. 196). Mean squared error is also called the quantization ''noise power''. Adding one bit to the quantizer halves the value of Δ, which reduces the noise power by the factor ¼. In terms of decibels, the noise power change is $\backslash scriptstyle\; 10\backslash cdot\; \backslash log\_(1/4)\backslash \; \backslash approx\backslash \; -6\backslash \; \backslash mathrm.$ Because the set of possible output values of a quantizer is countable, any quantizer can be decomposed into two distinct stages, which can be referred to as the ''classification'' stage (or ''forward quantization'' stage) and the ''reconstruction'' stage (or ''inverse quantization'' stage), where the classification stage maps the input value to an integer ''quantization index'' $k$ and the reconstruction stage maps the index $k$ to the ''reconstruction value'' $y\_k$ that is the output approximation of the input value. For the example uniform quantizer described above, the forward quantization stage can be expressed as :$k\; =\; \backslash left\backslash lfloor\; \backslash frac\; +\; \backslash frac\backslash right\backslash rfloor$, and the reconstruction stage for this example quantizer is simply :$y\_k\; =\; k\; \backslash cdot\; \backslash Delta$. This decomposition is useful for the design and analysis of quantization behavior, and it illustrates how the quantized data can be communicated over a communication channel – a ''source encoder'' can perform the forward quantization stage and send the index information through a communication channel, and a ''decoder'' can perform the reconstruction stage to produce the output approximation of the original input data. In general, the forward quantization stage may use any function that maps the input data to the integer space of the quantization index data, and the inverse quantization stage can conceptually (or literally) be a table look-up operation to map each quantization index to a corresponding reconstruction value. This two-stage decomposition applies equally well to

vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathemat ...

as well as scalar quantizers.
Mathematical properties

Because quantization is a many-to-few mapping, it is an inherently non-linear and irreversible process (i.e., because the same output value is shared by multiple input values, it is impossible, in general, to recover the exact input value when given only the output value). The set of possible input values may be infinitely large, and may possibly be continuous and therefore uncountable (such as the set of all real numbers, or all real numbers within some limited range). The set of possible output values may be finite or countably infinite. The input and output sets involved in quantization can be defined in a rather general way. For example, vector quantization is the application of quantization to multi-dimensional (vector-valued) input data.Types

Analog-to-digital converter

Ananalog-to-digital converter
In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provid ...

(ADC) can be modeled as two processes: sampling and quantization. Sampling converts a time-varying voltage signal into a discrete-time signal
In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled.
Discrete time
Discrete time views values of variables as occurring at distinct, separate "po ...

, a sequence of real numbers. Quantization replaces each real number with an approximation from a finite set of discrete values. Most commonly, these discrete values are represented as fixed-point words. Though any number of quantization levels is possible, common word-lengths are 8-bit
In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses ...

(256 levels), 16-bit (65,536 levels) and 24-bit (16.8 million levels). Quantizing a sequence of numbers produces a sequence of quantization errors which is sometimes modeled as an additive random signal called quantization noise because of its stochastic
Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselve ...

behavior. The more levels a quantizer uses, the lower is its quantization noise power.
Rate–distortion optimization

'' Rate–distortion optimized'' quantization is encountered in source coding for lossy data compression algorithms, where the purpose is to manage distortion within the limits of thebit rate
In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time.
The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...

supported by a communication channel or storage medium. The analysis of quantization in this context involves studying the amount of data (typically measured in digits or bits or bit ''rate'') that is used to represent the output of the quantizer, and studying the loss of precision that is introduced by the quantization process (which is referred to as the ''distortion'').
Mid-riser and mid-tread uniform quantizers

Most uniform quantizers for signed input data can be classified as being of one of two types: ''mid-riser'' and ''mid-tread''. The terminology is based on what happens in the region around the value 0, and uses the analogy of viewing the input-output function of the quantizer as astairway
Stairs are a structure designed to bridge a large vertical distance between lower and higher levels by dividing it into smaller vertical distances. This is achieved as a diagonal series of horizontal platforms called steps which enable passage ...

. Mid-tread quantizers have a zero-valued reconstruction level (corresponding to a ''tread'' of a stairway), while mid-riser quantizers have a zero-valued classification threshold (corresponding to a '' riser'' of a stairway).
Mid-tread quantization involves rounding. The formulas for mid-tread uniform quantization are provided in the previous section.
Mid-riser quantization involves truncation. The input-output formula for a mid-riser uniform quantizer is given by:
:$Q(x)\; =\; \backslash Delta\backslash cdot\backslash left(\backslash left\backslash lfloor\; \backslash frac\backslash right\backslash rfloor\; +\; \backslash frac1\backslash right)$,
where the classification rule is given by
:$k\; =\; \backslash left\backslash lfloor\; \backslash frac\; \backslash right\backslash rfloor$
and the reconstruction rule is
:$y\_k\; =\; \backslash Delta\backslash cdot\backslash left(k+\backslash tfrac1\backslash right)$.
Note that mid-riser uniform quantizers do not have a zero output value – their minimum output magnitude is half the step size. In contrast, mid-tread quantizers do have a zero output level. For some applications, having a zero output signal representation may be a necessity.
In general, a mid-riser or mid-tread quantizer may not actually be a ''uniform'' quantizer – i.e., the size of the quantizer's classification intervals may not all be the same, or the spacing between its possible output values may not all be the same. The distinguishing characteristic of a mid-riser quantizer is that it has a classification threshold value that is exactly zero, and the distinguishing characteristic of a mid-tread quantizer is that is it has a reconstruction value that is exactly zero.
Dead-zone quantizers

A dead-zone quantizer is a type of mid-tread quantizer with symmetric behavior around 0. The region around the zero output value of such a quantizer is referred to as the ''dead zone'' or ''deadband
A deadband or dead-band (also known as a dead zone or a neutral zone) is a band of input values in the domain of a transfer function in a control system or signal processing system where the output is zero (the output is 'dead' - no action occur ...

''. The dead zone can sometimes serve the same purpose as a noise gate
A noise gate or gate is an electronic device or software that is used to control the volume of an audio signal. Comparable to a compressor, which attenuates signals ''above'' a threshold, such as loud attacks from the start of musical notes, ...

or squelch
In telecommunications, squelch is a circuit function that acts to suppress the audio (or video) output of a receiver in the absence of a strong input signal. Essentially, squelch is a specialized type of noise gate designed to suppress weak ...

function. Especially for compression applications, the dead-zone may be given a different width than that for the other steps. For an otherwise-uniform quantizer, the dead-zone width can be set to any value $w$ by using the forward quantization rule
:$k\; =\; \backslash sgn(x)\; \backslash cdot\; \backslash max\backslash left(0,\; \backslash left\backslash lfloor\; \backslash frac+1\backslash right\backslash rfloor\backslash right)$,
where the function is the sign function (also known as the ''signum'' function). The general reconstruction rule for such a dead-zone quantizer is given by
:$y\_k\; =\; \backslash sgn(k)\; \backslash cdot\backslash left(\backslash frac+\backslash Delta\backslash cdot\; (,\; k,\; -1+r\_k)\backslash right)$,
where $r\_k$ is a reconstruction offset value in the range of 0 to 1 as a fraction of the step size. Ordinarily, $0\; \backslash le\; r\_k\; \backslash le\; \backslash tfrac1$ when quantizing input data with a typical probability density function (PDF) that is symmetric around zero and reaches its peak value at zero (such as a Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...

, Laplacian
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is ...

, or generalized Gaussian PDF). Although $r\_k$ may depend on $k$ in general, and can be chosen to fulfill the optimality condition described below, it is often simply set to a constant, such as $\backslash tfrac1$. (Note that in this definition, $y\_0\; =\; 0$ due to the definition of the function, so $r\_0$ has no effect.)
A very commonly used special case (e.g., the scheme typically used in financial accounting and elementary mathematics) is to set $w=\backslash Delta$ and $r\_k=\backslash tfrac1$ for all $k$. In this case, the dead-zone quantizer is also a uniform quantizer, since the central dead-zone of this quantizer has the same width as all of its other steps, and all of its reconstruction values are equally spaced as well.
Noise and error characteristics

Additive noise model

A common assumption for the analysis ofquantization error
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and ...

is that it affects a signal processing system in a similar manner to that of additive white noise – having negligible correlation with the signal and an approximately flat power spectral density. Bernard Widrow,Statistical analysis of amplitude quantized sampled data systems

, ''Trans. AIEE Pt. II: Appl. Ind.'', Vol. 79, pp. 555–568, Jan. 1961. The additive noise model is commonly used for the analysis of quantization error effects in digital filtering systems, and it can be very useful in such analysis. It has been shown to be a valid model in cases of high resolution quantization (small $\backslash Delta$ relative to the signal strength) with smooth PDFs. Additive noise behavior is not always a valid assumption. Quantization error (for quantizers defined as described here) is deterministically related to the signal and not entirely independent of it. Thus, periodic signals can create periodic quantization noise. And in some cases it can even cause limit cycles to appear in digital signal processing systems. One way to ensure effective independence of the quantization error from the source signal is to perform '' dithered quantization'' (sometimes with '' noise shaping''), which involves adding random (or

pseudo-random
A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process.
Background
The generation of random numbers has many uses, such as for rand ...

) noise to the signal prior to quantization.
Quantization error models

In the typical case, the original signal is much larger than oneleast significant bit
In computing, bit numbering is the convention used to identify the bit positions in a binary numeral system, binary number.
Bit significance and indexing
In computing, the least significant bit (LSB) is the bit position in a Binary numeral sy ...

(LSB). When this is the case, the quantization error is not significantly correlated with the signal, and has an approximately uniform distribution. When rounding is used to quantize, the quantization error has a mean of zero and the root mean square
In mathematics and its applications, the root mean square of a set of numbers x_i (abbreviated as RMS, or rms and denoted in formulas as either x_\mathrm or \mathrm_x) is defined as the square root of the mean square (the arithmetic mean of th ...

(RMS) value is the standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, w ...

of this distribution, given by $\backslash scriptstyle\; \backslash mathrm\backslash \; \backslash approx\backslash \; 0.289\backslash ,\backslash mathrm$. When truncation is used, the error has a non-zero mean of $\backslash scriptstyle\; \backslash mathrm$ and the RMS value is $\backslash scriptstyle\; \backslash mathrm$. Although rounding yields less RMS error than truncation, the difference is only due to the static (DC) term of $\backslash scriptstyle\; \backslash mathrm$Quantization noise model

Quantization noise is a model of quantization error introduced by quantization in the ADC. It is a rounding error between the analog input voltage to the ADC and the output digitized value. The noise is non-linear and signal-dependent. It can be modelled in several different ways. In an ideal ADC, where the quantization error is uniformly distributed between −1/2 LSB and +1/2 LSB, and the signal has a uniform distribution covering all quantization levels, the Signal-to-quantization-noise ratio (SQNR) can be calculated from :$\backslash mathrm\; =\; 20\; \backslash log\_(2^Q)\; \backslash approx\; 6.02\; \backslash cdot\; Q\backslash \; \backslash mathrm\; \backslash ,\backslash !$ where Q is the number of quantization bits. The most common test signals that fulfill this are full amplitudetriangle wave
A triangular wave or triangle wave is a non-sinusoidal waveform named for its triangular shape. It is a periodic, piecewise linear, continuous real function.
Like a square wave, the triangle wave contains only odd harmonics. However, ...

s and sawtooth wave
The sawtooth wave (or saw wave) is a kind of non-sinusoidal waveform. It is so named based on its resemblance to the teeth of a plain-toothed saw with a zero rake angle. A single sawtooth, or an intermittently triggered sawtooth, is called a ...

s.
For example, a 16-bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two most ...

ADC has a maximum signal-to-quantization-noise ratio of 6.02 × 16 = 96.3 dB.
When the input signal is a full-amplitude sine wave the distribution of the signal is no longer uniform, and the corresponding equation is instead
:$\backslash mathrm\; \backslash approx\; 1.761\; +\; 6.02\; \backslash cdot\; Q\; \backslash \; \backslash mathrm\; \backslash ,\backslash !$
Here, the quantization noise is once again ''assumed'' to be uniformly distributed. When the input signal has a high amplitude and a wide frequency spectrum this is the case. In this case a 16-bit ADC has a maximum signal-to-noise ratio of 98.09 dB. The 1.761 difference in signal-to-noise only occurs due to the signal being a full-scale sine wave instead of a triangle or sawtooth.
For complex signals in high-resolution ADCs this is an accurate model. For low-resolution ADCs, low-level signals in high-resolution ADCs, and for simple waveforms the quantization noise is not uniformly distributed, making this model inaccurate. In these cases the quantization noise distribution is strongly affected by the exact amplitude of the signal.
The calculations are relative to full-scale input. For smaller signals, the relative quantization distortion can be very large. To circumvent this issue, analog companding can be used, but this can introduce distortion.
Design

Granular distortion and overload distortion

Often the design of a quantizer involves supporting only a limited range of possible output values and performing clipping to limit the output to this range whenever the input exceeds the supported range. The error introduced by this clipping is referred to as ''overload'' distortion. Within the extreme limits of the supported range, the amount of spacing between the selectable output values of a quantizer is referred to as its ''granularity'', and the error introduced by this spacing is referred to as ''granular'' distortion. It is common for the design of a quantizer to involve determining the proper balance between granular distortion and overload distortion. For a given supported number of possible output values, reducing the average granular distortion may involve increasing the average overload distortion, and vice versa. A technique for controlling the amplitude of the signal (or, equivalently, the quantization step size $\backslash Delta$) to achieve the appropriate balance is the use of '' automatic gain control'' (AGC). However, in some quantizer designs, the concepts of granular error and overload error may not apply (e.g., for a quantizer with a limited range of input data or with a countably infinite set of selectable output values).Rate–distortion quantizer design

A scalar quantizer, which performs a quantization operation, can ordinarily be decomposed into two stages: ;Classification :A process that classifies the input signal range into $M$ non-overlapping '' intervals'' $\backslash \_^$, by defining $M-1$ ''decision boundary'' values $\backslash \_^$, such that $I\_k\; =\; [b\_~,~b\_k)$ for $k\; =\; 1,2,\backslash ldots,M$, with the extreme limits defined by $b\_0\; =\; -\backslash infty$ and $b\_M\; =\; \backslash infty$. All the inputs $x$ that fall in a given interval range $I\_k$ are associated with the same quantization index $k$. ;Reconstruction :Each interval $I\_k$ is represented by a ''reconstruction value'' $y\_k$ which implements the mapping $x\; \backslash in\; I\_k\; \backslash Rightarrow\; y\; =\; y\_k$. These two stages together comprise the mathematical operation of $y\; =\; Q(x)$. Entropy coding techniques can be applied to communicate the quantization indices from a source encoder that performs the classification stage to a decoder that performs the reconstruction stage. One way to do this is to associate each quantization index $k$ with a binary codeword $c\_k$. An important consideration is the number of bits used for each codeword, denoted here by $\backslash mathrm(c\_k)$. As a result, the design of an $M$-level quantizer and an associated set of codewords for communicating its index values requires finding the values of $\backslash \_^$, $\backslash \_^$ and $\backslash \_^$ which optimally satisfy a selected set of design constraints such as the ''bit rate'' $R$ and ''distortion'' $D$. Assuming that an information source $S$ produces random variables $X$ with an associated PDF $f(x)$, the probability $p\_k$ that the random variable falls within a particular quantization interval $I\_k$ is given by: :$p\_k\; =\; P;\; href="/html/ALL/l/\_\backslash in\_I\_k.html"\; ;"title="\; \backslash in\; I\_k">\; \backslash in\; I\_k$. The resulting bit rate $R$, in units of average bits per quantized value, for this quantizer can be derived as follows: :$R\; =\; \backslash sum\_^\; p\_k\; \backslash cdot\; \backslash mathrm(c\_)\; =\; \backslash sum\_^\; \backslash mathrm(c\_k)\; \backslash int\_^\; f(x)dx$. If it is assumed that distortion is measured by mean squared error, the distortion D, is given by: :$D\; =\; E;\; href="/html/ALL/l/x-Q(x))^2.html"\; ;"title="x-Q(x))^2">x-Q(x))^2$. A key observation is that rate $R$ depends on the decision boundaries $\backslash \_^$ and the codeword lengths $\backslash \_^$, whereas the distortion $D$ depends on the decision boundaries $\backslash \_^$ and the reconstruction levels $\backslash \_^$. After defining these two performance metrics for the quantizer, a typical rate–distortion formulation for a quantizer design problem can be expressed in one of two ways: # Given a maximum distortion constraint $D\; \backslash le\; D\_\backslash max$, minimize the bit rate $R$ # Given a maximum bit rate constraint $R\; \backslash le\; R\_\backslash max$, minimize the distortion $D$ Often the solution to these problems can be equivalently (or approximately) expressed and solved by converting the formulation to the unconstrained problem $\backslash min\backslash left\backslash $ where the Lagrange multiplier $\backslash lambda$ is a non-negative constant that establishes the appropriate balance between rate and distortion. Solving the unconstrained problem is equivalent to finding a point on the convex hull of the family of solutions to an equivalent constrained formulation of the problem. However, finding a solution – especially a closed-form solution – to any of these three problem formulations can be difficult. Solutions that do not require multi-dimensional iterative optimization techniques have been published for only three PDFs: the uniform, exponential, andLaplacian
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is ...

distributions. Iterative optimization approaches can be used to find solutions in other cases.
Note that the reconstruction values $\backslash \_^$ affect only the distortion – they do not affect the bit rate – and that each individual $y\_k$ makes a separate contribution $d\_k$ to the total distortion as shown below:
:$D\; =\; \backslash sum\_^\; d\_k$
where
:$d\_k\; =\; \backslash int\_^\; (x-y\_k)^2\; f(x)dx$
This observation can be used to ease the analysis – given the set of $\backslash \_^$ values, the value of each $y\_k$ can be optimized separately to minimize its contribution to the distortion $D$.
For the mean-square error distortion criterion, it can be easily shown that the optimal set of reconstruction values $\backslash \_^$ is given by setting the reconstruction value $y\_k$ within each interval $I\_k$ to the conditional expected value (also referred to as the ''centroid
In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...

'') within the interval, as given by:
:$y^*\_k\; =\; \backslash frac1\; \backslash int\_^\; x\; f(x)dx$.
The use of sufficiently well-designed entropy coding techniques can result in the use of a bit rate that is close to the true information content of the indices $\backslash \_^$, such that effectively
:$\backslash mathrm(c\_k)\; \backslash approx\; -\backslash log\_2\backslash left(p\_k\backslash right)$
and therefore
:$R\; =\; \backslash sum\_^\; -p\_k\; \backslash cdot\; \backslash log\_2\backslash left(p\_k\backslash right)$.
The use of this approximation can allow the entropy coding design problem to be separated from the design of the quantizer itself. Modern entropy coding techniques such as arithmetic coding
Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...

can achieve bit rates that are very close to the true entropy of a source, given a set of known (or adaptively estimated) probabilities $\backslash \_^$.
In some designs, rather than optimizing for a particular number of classification regions $M$, the quantizer design problem may include optimization of the value of $M$ as well. For some probabilistic source models, the best performance may be achieved when $M$ approaches infinity.
Neglecting the entropy constraint: Lloyd–Max quantization

In the above formulation, if the bit rate constraint is neglected by setting $\backslash lambda$ equal to 0, or equivalently if it is assumed that a fixed-length code (FLC) will be used to represent the quantized data instead of avariable-length code
In coding theory a variable-length code is a code which maps source symbols to a ''variable'' number of bits.
Variable-length codes can allow sources to be compressed and decompressed with ''zero'' error (lossless data compression) and still be ...

(or some other entropy coding technology such as arithmetic coding that is better than an FLC in the rate–distortion sense), the optimization problem reduces to minimization of distortion $D$ alone.
The indices produced by an $M$-level quantizer can be coded using a fixed-length code using $R\; =\; \backslash lceil\; \backslash log\_2\; M\; \backslash rceil$ bits/symbol. For example, when $M=$256 levels, the FLC bit rate $R$ is 8 bits/symbol. For this reason, such a quantizer has sometimes been called an 8-bit quantizer. However using an FLC eliminates the compression improvement that can be obtained by use of better entropy coding.
Assuming an FLC with $M$ levels, the rate–distortion minimization problem can be reduced to distortion minimization alone. The reduced problem can be stated as follows: given a source $X$ with PDF $f(x)$ and the constraint that the quantizer must use only $M$ classification regions, find the decision boundaries $\backslash \_^$ and reconstruction levels $\backslash \_^M$ to minimize the resulting distortion
:$D=E;\; href="/html/ALL/l/x-Q(x))^2.html"\; ;"title="x-Q(x))^2">x-Q(x))^2$.
Finding an optimal solution to the above problem results in a quantizer sometimes called a MMSQE (minimum mean-square quantization error) solution, and the resulting PDF-optimized (non-uniform) quantizer is referred to as a ''Lloyd–Max'' quantizer, named after two people who independently developed iterative methods to solve the two sets of simultaneous equations resulting from $=\; 0$ and $=\; 0$, as follows:
:$=\; 0\; \backslash Rightarrow\; b\_k\; =$,
which places each threshold at the midpoint between each pair of reconstruction values, and
:$=\; 0\; \backslash Rightarrow\; y\_k\; =\; =\; \backslash frac1\; \backslash int\_^\; x\; f(x)\; dx$
which places each reconstruction value at the centroid (conditional expected value) of its associated classification interval.
Lloyd's Method I algorithm, originally described in 1957, can be generalized in a straightforward way for application to vector data. This generalization results in the Linde–Buzo–Gray (LBG) or k-means classifier optimization methods. Moreover, the technique can be further generalized in a straightforward way to also include an entropy constraint for vector data.
Uniform quantization and the 6 dB/bit approximation

The Lloyd–Max quantizer is actually a uniform quantizer when the input PDF is uniformly distributed over the range $[y\_1-\backslash Delta/2,~y\_M+\backslash Delta/2)$. However, for a source that does not have a uniform distribution, the minimum-distortion quantizer may not be a uniform quantizer. The analysis of a uniform quantizer applied to a uniformly distributed source can be summarized in what follows: A symmetric source X can be modelled with $f(x)=\; \backslash tfrac1$, for $x\; \backslash in\; [-X\_\; ,\; X\_]$ and 0 elsewhere. The step size $\backslash Delta\; =\; \backslash tfrac$ and the ''signal to quantization noise ratio'' (SQNR) of the quantizer is :$=\; 10\backslash log\_\; =\; 10\backslash log\_=\; 10\backslash log\_M^2=\; 20\backslash log\_M$. For a fixed-length code using $N$ bits, $M=2^N$, resulting in $=\; 20\backslash log\_\; =\; N\backslash cdot(20\backslash log\_2)\; =\; N\backslash cdot\; 6.0206\backslash ,\backslash rm$, or approximately 6 dB per bit. For example, for $N$=8 bits, $M$=256 levels and SQNR = 8×6 = 48 dB; and for $N$=16 bits, $M$=65536 and SQNR = 16×6 = 96 dB. The property of 6 dB improvement in SQNR for each extra bit used in quantization is a well-known figure of merit. However, it must be used with care: this derivation is only for a uniform quantizer applied to a uniform source. For other source PDFs and other quantizer designs, the SQNR may be somewhat different from that predicted by 6 dB/bit, depending on the type of PDF, the type of source, the type of quantizer, and the bit rate range of operation. However, it is common to assume that for many sources, the slope of a quantizer SQNR function can be approximated as 6 dB/bit when operating at a sufficiently high bit rate. At asymptotically high bit rates, cutting the step size in half increases the bit rate by approximately 1 bit per sample (because 1 bit is needed to indicate whether the value is in the left or right half of the prior double-sized interval) and reduces the mean squared error by a factor of 4 (i.e., 6 dB) based on the $\backslash Delta^2/12$ approximation. At asymptotically high bit rates, the 6 dB/bit approximation is supported for many source PDFs by rigorous theoretical analysis. Moreover, the structure of the optimal scalar quantizer (in the rate–distortion sense) approaches that of a uniform quantizer under these conditions.In other fields

Many physical quantities are actually quantized by physical entities. Examples of fields where this limitation applies include electronics (due toelectrons
The electron ( or ) is a subatomic particle with a negative one elementary electric charge. Electrons belong to the first generation of the lepton particle family,
and are generally thought to be elementary particles because they have no kn ...

), optics (due to photons
A photon () is an elementary particle that is a quantum of the electromagnetic field, including electromagnetic radiation such as light and radio waves, and the force carrier for the electromagnetic force. Photons are massless, so they always ...

), biology
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...

(due to DNA), physics (due to Planck limits) and chemistry (due to molecules
A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioc ...

).
See also

* Beta encoder *Color quantization
In computer graphics, color quantization or color image quantization is quantization applied to color spaces; it is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as v ...

* Data binning Data binning, also called data discrete binning or data bucketing, is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a '' bin'', are replaced b ...

* Discretization
* Discretization error
In numerical analysis, computational physics, and simulation, discretization error is the error resulting from the fact that a function of a continuous variable is represented in the computer by a finite number of evaluations, for example, on ...

* Posterization
* Pulse-code modulation
Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the a ...

* Quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile ...

* Quantization (image processing)
Quantization, involved in image processing, is a lossy compression technique achieved by compressing a range of values to a single quantum (discrete) value. When the number of discrete symbols in a given stream is reduced, the stream becomes more ...

* Regression dilution – a bias in parameter estimates caused by errors such as quantization in the explanatory or independent variable
Notes

References

* * * *Further reading

* {{DEFAULTSORT:Quantization (Signal Processing) Digital signal processing Computer graphic artifacts Digital audio Noise (electronics) Signal processing Telecommunication theory