In
digital audio
Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, sa ...
using
pulse-code modulation
Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the ...
(PCM), bit depth is the number of
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s of information in each
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of s ...
, and it directly corresponds to the resolution of each sample. Examples of bit depth include
Compact Disc Digital Audio
Compact Disc Digital Audio (CDDA or CD-DA), also known as Digital Audio Compact Disc or simply as Audio CD, is the standard format for audio compact discs. The standard is defined in the ''Red Book'', one of a series of Rainbow Books (named ...
, which uses 16 bits per sample, and
DVD-Audio
DVD-Audio (commonly abbreviated as DVD-A) is a digital format for delivering high-fidelity audio content on a DVD. DVD-Audio uses most of the storage on the disc for high-quality audio and is not intended to be a video delivery format.
The sta ...
and
Blu-ray Disc
The Blu-ray Disc (BD), often known simply as Blu-ray, is a Digital media, digital optical disc data storage format. It was invented and developed in 2005 and released on June 20, 2006 worldwide. It is designed to supersede the DVD format, and c ...
which can support up to 24 bits per sample.
In basic implementations, variations in bit depth primarily affect the noise level from
quantization error
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and ...
—thus the
signal-to-noise ratio
Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in deci ...
(SNR) and
dynamic range
Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base-1 ...
. However, techniques such as
dither
Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images. Dither is routinely used in processing of both digital audio and video data, and is often ...
ing,
noise shaping
Noise shaping is a technique typically used in digital audio, image, and video processing, usually in combination with dithering, as part of the process of quantization or bit-depth reduction of a digital signal. Its purpose is to increase the ap ...
, and
oversampling
In signal processing, oversampling is the process of sampling a signal at a sampling frequency significantly higher than the Nyquist rate. Theoretically, a bandwidth-limited signal can be perfectly reconstructed if sampled at the Nyquist rate o ...
can mitigate these effects without changing the bit depth. Bit depth also affects
bit rate
In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time.
The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction w ...
and file size.
Bit depth is only meaningful in reference to a PCM
digital signal
A digital signal is a signal that represents data as a sequence of discrete values; at any given time it can only take on, at most, one of a finite number of values. This contrasts with an analog signal, which represents continuous values; at ...
. Non-PCM formats, such as
lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
formats, do not have associated bit depths.
Binary representation
A PCM signal is a sequence of digital audio samples containing the data providing the necessary information to
reconstruct
Reconstruction may refer to:
Politics, history, and sociology
*Reconstruction (law), the transfer of a company's (or several companies') business to a new company
*'' Perestroika'' (Russian for "reconstruction"), a late 20th century Soviet Unio ...
the original
analog signal
An analog signal or analogue signal (see spelling differences) is any continuous signal representing some other quantity, i.e., ''analogous'' to another quantity. For example, in an analog audio signal, the instantaneous signal voltage varies c ...
. Each sample represents the
amplitude
The amplitude of a periodic variable is a measure of its change in a single period (such as time or spatial period). The amplitude of a non-periodic signal is its magnitude compared with a reference value. There are various definitions of amplit ...
of the signal at a specific point in time, and the samples are uniformly spaced in time. The amplitude is the only information explicitly stored in the sample, and it is typically stored as either an
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
or a
floating point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
number, encoded as a
binary number
A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" ( one).
The base-2 numeral system is a positional notatio ...
with a fixed number of digits: the sample's ''bit depth'', also referred to as
word length
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word s ...
or word size.
The resolution indicates the number of discrete values that can be represented over the range of analog values. The resolution of binary integers increases
exponentially
Exponential may refer to any of several mathematical topics related to exponentiation, including:
*Exponential function, also:
**Matrix exponential, the matrix analogue to the above
*Exponential decay, decrease at a rate proportional to value
*Expo ...
as the word length increases. Adding one bit doubles the resolution, adding two quadruples it and so on. The number of possible values that can be represented by an integer bit depth can be calculated by using
2''n'', where ''n'' is the bit depth.
Thus, a
16-bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two mos ...
system has a resolution of 65,536 (2
16) possible values.
Integer PCM audio data is typically stored as
signed numbers in
two's complement
Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
format.
Today, most audio
file format
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.
Some file formats ...
s and
digital audio workstation
A digital audio workstation (DAW) is an electronic device or application software used for Sound recording and reproduction, recording, editing and producing audio files. DAWs come in a wide variety of configurations from a single software pro ...
s (DAWs) support PCM formats with samples represented by floating point numbers.
Both the
WAV
Waveform Audio File Format (WAVE, or WAV due to its filename extension; pronounced "wave") is an audio file format standard, developed by IBM and Microsoft, for storing an audio bitstream on PCs. It is the main format used on Microsoft Wind ...
file format and the
AIFF
Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data for personal computers and other electronic audio devices. The format was developed by Apple Inc. in 1988 based on Electronic Arts' Interchange File ...
file format support floating point representations.
Unlike integers, whose bit pattern is a single series of bits, a floating point number is instead composed of separate fields whose mathematical relation forms a number. The most common standard is
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
which is composed of three fields: a
sign bit
In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...
which represents whether the number is positive or negative, an exponent and a
mantissa which is raised by the exponent. The mantissa is expressed as a
binary fraction in IEEE base-two floating point formats.
Quantization
The bit depth limits the
signal-to-noise ratio
Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in deci ...
(SNR) of the reconstructed signal to a maximum level determined by
quantization error
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and ...
. The bit depth has no impact on the
frequency response
In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of sy ...
, which is constrained by the
sample rate
In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples".
A sample is a value of the signal at a point in time and/or sp ...
.
Quantization error introduced during
analog-to-digital conversion
In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provi ...
(ADC) can be
modeled as quantization noise. It is a rounding error between the analog input voltage to the ADC and the output digitized value. The noise is
nonlinear
In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many othe ...
and signal-dependent.
In an ideal ADC, where the quantization error is uniformly distributed between
least significant bit (LSB) and where the signal has a uniform distribution covering all quantization levels, the
signal-to-quantization-noise ratio
Signal-to-quantization-noise ratio (SQNR or SNqR) is widely used quality measure in analysing digitizing schemes such as pulse-code modulation (PCM). The SQNR reflects the relationship between the maximum nominal signal strength and the quantizati ...
(SQNR) can be calculated from
:
where b is the number of quantization bits and the result is measured in
decibel
The decibel (symbol: dB) is a relative unit of measurement equal to one tenth of a bel (B). It expresses the ratio of two values of a power or root-power quantity on a logarithmic scale. Two signals whose levels differ by one decibel have a po ...
s (dB).
Therefore, 16-bit digital audio found on
CDs has a theoretical maximum SNR of 98 dB and professional 24-bit digital audio tops out as 146 dB. , digital audio converter technology is limited to a SNR of about 123 dB (
effectively 21-bits) because of real-world limitations in
integrated circuit
An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Large numbers of tiny ...
design. Still, this approximately matches the performance of the human
auditory system. Multiple converters can be used to cover different ranges of the same signal, being combined to record a wider dynamic range in the long-term, while still being limited by the single converter's dynamic range in the short term, which is called ''dynamic range extension''.
Floating point
The resolution of floating-point samples is less straightforward than integer samples because floating-point values are not evenly spaced. In floating-point representation, the space between any two adjacent values is in proportion to the value. This greatly increases the SNR compared to an integer system because the accuracy of a high-level signal will be the same as the accuracy of an identical signal at a lower level.
The trade-off between floating point and integers is that the space between large floating-point values is greater than the space between large integer values of the same bit depth. Rounding a large floating-point number results in a greater error than rounding a small floating-point number whereas rounding an integer number will always result in the same level of error. In other words, integers have round-off that is uniform, always rounding the LSB to 0 or 1, and floating point has SNR that is uniform, the quantization noise level is always of a certain proportion to the signal level.
A floating-point noise floor will rise as the signal rises and fall as the signal falls, resulting in audible variance if the bit depth is low enough.
Audio processing
Most processing operations on digital audio involve the re-quantization of samples and thus introduce additional rounding error analogous to the original quantization error introduced during analog-to-digital conversion. To prevent rounding error larger than the implicit error during ADC, calculations during processing must be performed at higher precisions than the input samples.
Digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
(DSP) operations can be performed in either
fixed point or floating-point precision. In either case, the precision of each operation is determined by the precision of the hardware operations used to perform each step of the processing and not the resolution of the input data. For example, on
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
processors, floating-point operations are performed with
single
Single may refer to:
Arts, entertainment, and media
* Single (music), a song release
Songs
* "Single" (Natasha Bedingfield song), 2004
* "Single" (New Kids on the Block and Ne-Yo song), 2008
* "Single" (William Wei song), 2016
* "Single", by ...
or
double precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Flo ...
and fixed-point operations at 16-, 32- or 64-bit resolution. Consequently, all processing performed on Intel-based hardware will be performed with these constraints regardless of the source format.
Fixed point
digital signal processor
A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio si ...
s often support specific word lengths in order to support specific signal resolutions. For example, the
Motorola 56000
The Motorola DSP56000 (also known as 56K) is a family of digital signal processor (DSP) chips produced by Motorola Semiconductor (later Freescale Semiconductor then NXP) starting in 1986 with later models are still being produced in the 2020s. The ...
DSP chip uses 24-bit multipliers and 56-bit accumulators to perform
multiply-accumulate operations on two 24-bit samples without overflow or truncation. On devices that do not support large accumulators, fixed point results may be truncated, reducing precision. Errors compound through multiple stages of DSP at a rate that depends on the operations being performed. For uncorrelated processing steps on audio data without a DC offset, errors are assumed to be random with zero mean. Under this assumption, the standard deviation of the distribution represents the error signal, and quantization error scales with the square root of the number of operations. High levels of precision are necessary for algorithms that involve repeated processing, such as
convolution
In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...
.
High levels of precision are also necessary in recursive algorithms, such as
infinite impulse response
Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response h(t) which does not become exactly zero past a certain point, but continues indefinitely. This is in ...
(IIR) filters. In the particular case of IIR filters, rounding error can degrade frequency response and cause instability.
Dither
The noise introduced by quantization error, including rounding errors and loss of precision introduced during audio processing, can be mitigated by adding a small amount of random noise, called
dither
Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images. Dither is routinely used in processing of both digital audio and video data, and is often ...
, to the signal prior to quantizing. Dithering eliminates non-linear quantization error behavior, giving very low distortion, but at the expense of a slightly raised
noise floor
In signal theory, the noise floor is the measure of the signal created from the sum of all the noise sources and unwanted signals within a measurement system, where noise is defined as any signal other than the one being monitored.
In radio com ...
. Recommended dither for 16-bit digital audio measured using
ITU-R 468 noise weighting is about 66 dB below
alignment level
The alignment level in an audio signal chain or on an audio recording is a defined anchor point that represents a reasonable or typical level. It does not represent a particular sound level or signal level or digital representation, but it can b ...
, or 84 dB below digital
full scale
In electronics and signal processing, full scale represents the maximum amplitude a system can represent.
In digital systems, a signal is said to be at digital full scale when its magnitude has reached the maximum representable value. Once a si ...
, which is comparable to microphone and room noise level, and hence of little consequence in 16-bit audio.
24-bit and 32-bit audio do not require dithering, as the noise level of the digital converter is always louder than the required level of any dither that might be applied. 24-bit audio could theoretically encode 144 dB of dynamic range, and 32-bit audio can achieve 192 dB, but this is almost impossible to achieve in the real world, as even the best sensors and microphones rarely exceed 130 dB.
Dither can also be used to increase the effective dynamic range. The ''perceived'' dynamic range of 16-bit audio can be 120 dB or more with
noise-shaped dither, taking advantage of the frequency response of the human ear.
Dynamic range and headroom
Dynamic range
Dynamic range (abbreviated DR, DNR, or DYR) is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base-1 ...
is the difference between the largest and smallest signal a system can record or reproduce. Without dither, the dynamic range correlates to the quantization noise floor. For example, 16-bit integer resolution allows for a dynamic range of about 96 dB. With the proper application of dither, digital systems can reproduce signals with levels lower than their resolution would normally allow, extending the effective dynamic range beyond the limit imposed by the resolution. The use of techniques such as
oversampling
In signal processing, oversampling is the process of sampling a signal at a sampling frequency significantly higher than the Nyquist rate. Theoretically, a bandwidth-limited signal can be perfectly reconstructed if sampled at the Nyquist rate o ...
and noise shaping can further extend the dynamic range of sampled audio by moving quantization error out of the frequency band of interest.
If the signal's maximum level is lower than that allowed by the bit depth, the recording has
headroom. Using higher bit depths during
studio recording The term studio recording means any recording made in a studio, as opposed to a live recording, which is usually made in a concert venue or a theatre, with an audience attending the performance.
Studio cast recordings
In the case of Broadway mu ...
can make headroom available while maintaining the same dynamic range. This reduces the risk of
clipping
Clipping may refer to:
Words
* Clipping (morphology), the formation of a new word by shortening it, e.g. "ad" from "advertisement"
* Clipping (phonetics), shortening the articulation of a speech sound, usually a vowel
* Clipping (publications) ...
without increasing quantization errors at low volumes.
Oversampling
Oversampling is an alternative method to increase the dynamic range of PCM audio without changing the number of bits per sample. In oversampling, audio samples are acquired at a multiple of the desired sample rate. Because quantization error is assumed to be uniformly distributed with frequency, much of the quantization error is shifted to ultrasonic frequencies and can be removed by the
digital to analog converter
In electronics, a digital-to-analog converter (DAC, D/A, D2A, or D-to-A) is a system that converts a digital signal into an analog signal. An analog-to-digital converter (ADC) performs the reverse function.
There are several DAC architec ...
during playback.
For an increase equivalent to ''n'' additional bits of resolution, a signal must be oversampled by
:
For example, a 14-bit ADC can produce 16-bit 48 kHz audio if operated at 16× oversampling, or 768 kHz. Oversampled PCM, therefore, exchanges fewer bits per sample for more samples in order to obtain the same resolution.
Dynamic range can also be enhanced with oversampling at signal reconstruction, absent oversampling at the source. Consider 16× oversampling at reconstruction. Each sample at reconstruction would be unique in that for each of the original sample points sixteen are inserted, all having been calculated by a digital
reconstruction filter
In a mixed-signal system ( analog and digital), a reconstruction filter, sometimes called an anti-imaging filter, is used to construct a smooth analog signal from a digital input, as in the case of a digital to analog converter ( DAC) or other samp ...
. The mechanism of increased effective bit depth is as previously discussed, that is, quantization noise power has not been reduced, but the noise spectrum has been spread over 16× the audio bandwidth.
Historical note—The compact disc standard was developed by a collaboration between Sony and Philips. The first Sony consumer unit featured a 16-bit DAC; the first Philips units dual 14-bit DACs. This caused confusion in the marketplace and even in professional circles, because 14-bit PCM allows for 84 dB SNR, 12 dB less than 16-bit PCM. Philips had implemented 4× oversampling with first order
noise shaping
Noise shaping is a technique typically used in digital audio, image, and video processing, usually in combination with dithering, as part of the process of quantization or bit-depth reduction of a digital signal. Its purpose is to increase the ap ...
which theoretically realized the full 96 dB dynamic range of the CD format. In practice the Philips CD100 was rated at 90 dB SNR in the audio band of 20 Hz–20 kHz, the same as Sony's CDP-101.
Noise shaping
Oversampling a signal results in equal quantization noise per unit of bandwidth at all frequencies and a dynamic range that improves with only the square root of the oversampling ratio. Noise shaping is a technique that adds additional noise at higher frequencies which cancels out some error at lower frequencies, resulting in a larger increase in dynamic range when oversampling. For ''n''th-order noise shaping, the dynamic range of an oversampled signal is improved by an additional 6''n'' dB relative to oversampling without noise shaping. For example, for a 20 kHz analog audio sampled at 4× oversampling with second-order noise shaping, the dynamic range is increased by 30 dB. Therefore, a 16-bit signal sampled at 176 kHz would have a bit depth equal to a 21-bit signal sampled at 44.1 kHz without noise shaping.
Noise shaping is commonly implemented with
delta-sigma modulation
Delta-sigma (ΔΣ; or sigma-delta, ΣΔ) modulation is a method for encoding analog signals into digital signals as found in an analog-to-digital converter (ADC). It is also used to convert high bit-count, low-frequency digital signals into ...
. Using delta-sigma modulation,
Direct Stream Digital
Direct Stream Digital (DSD) is a trademark used by Sony and Philips for their system for digitally encoding audio signals for the Super Audio CD (SACD).
DSD uses pulse-density modulation encoding - a technology to store audio signals on digital ...
achieves a theoretical 120 dB SNR at audio frequencies using 1-bit audio with 64× oversampling.
Applications
Bit depth is a fundamental property of digital audio implementations. Depending on application requirements and equipment capabilities, different bit depths are used for different applications.
Bit rate and file size
Bit depth affects
bit rate
In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time.
The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction w ...
and file size. Bits are the basic unit of data used in computing and digital communications. Bit rate refers to the amount of data, specifically bits, transmitted or received per second. In
MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
and other
lossy compressed audio format
An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data (excluding metadata) is called the audio coding format and can be uncompressed, or compressed to reduce the file size, ofte ...
s, bit rate describes the amount of information used to encode an audio signal. It is usually measured in
kb/s
In telecommunications, data-transfer rate is the average number of bits ( bitrate), characters or symbols ( baudrate), or data blocks per unit time passing through a communication link in a data-transmission system. Common data rate units are mu ...
.
See also
*
Audio system measurements
Audio system measurements are a means of quantifying system performance. These measurements are made for several purposes. Designers take measurements so that they can specify the performance of a piece of equipment. Maintenance engineers mak ...
*
Color depth
Color depth or colour depth (see spelling differences), also known as bit depth, is either the number of bits used to indicate the color of a single pixel, or the number of bits used for each color component of a single pixel. When referring to ...
, corresponding concept for digital images
*
Effective number of bits
Effective number of bits (ENOB) is a measure of the dynamic range of an analog-to-digital converter (ADC), digital-to-analog converter, or their associated circuitry. The resolution of an ADC is specified by the number of bits used to represent t ...
Notes
References
*
{{refend
Digital audio