The modified discrete cosine transform (MDCT) is a transform based on the type-IV
discrete cosine transform (DCT-IV), with the additional property of being
lapped
Lapping is a machining process in which two surfaces are rubbed together with an abrasive between them, by hand movement or using a machine.
Lapping often follows other subtractive processes with more aggressive material removal as a first step ...
: it is designed to be performed on consecutive blocks of a larger
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid
artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used
lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
technique in
audio data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
. It is employed in most modern
audio coding standards
An audio coding format (or sometimes audio compression format) is a content representation format for storage or transmission of digital audio (such as in digital television, digital radio and in audio and video files). Examples of audio coding f ...
, including
MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
,
Dolby Digital
Dolby Digital, originally synonymous with Dolby AC-3, is the name for what has now become a family of audio compression technologies developed by Dolby Laboratories. Formerly named Dolby Stereo Digital until 1995, the audio compression is lossy ...
(AC-3),
Vorbis
Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder (codec) for lossy audio compression. Vorbis is most commonly used in conj ...
(Ogg),
Windows Media Audio
Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The ...
(WMA),
ATRAC
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to h ...
,
Cook
Cook or The Cook may refer to:
Food preparation
* Cooking, the preparation of food
* Cook (domestic worker), a household staff member who prepares food
* Cook (professional), an individual who prepares food for consumption in the food industry
* ...
,
Advanced Audio Coding
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate.
AAC has been stan ...
(AAC),
High-Definition Coding (HDC),
LDAC,
Dolby AC-4 Dolby AC-4 is an audio compression technology developed by Dolby Laboratories. Dolby AC-4 bitstreams can contain audio channels and/or audio objects. Dolby AC-4 has been adopted by the DVB project and standardized by the ETSI.
History
Its develop ...
, and
MPEG-H 3D Audio
MPEG-H 3D Audio, specified as ISO/IEC 23008-3 (MPEG-H Part 3), is an audio coding standard developed by the ISO/IEC Moving Picture Experts Group (MPEG) to support coding audio as audio channels, audio objects, or higher order ambisonics (HOA). MP ...
, as well as
speech coding
Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...
standards such as
AAC-LD
The MPEG-4 Low Delay Audio Coder (a.k.a. AAC Low Delay, or AAC-LD) is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the ...
(LD-MDCT),
G.722.1
G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate (24 and 32 kbit/s) wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps (kilo- samples per second) audio coding. It is a partial imple ...
,
G.729.1
G.729.1 is an 8-32 kbit/s embedded speech and audio codec providing bitstream interoperability with G.729, G.729 Annex A and G.729 Annex B. Its official name is ''G.729-based embedded variable bit rate codec: An 8-32 kbit/s scalable wideband cod ...
,
CELT
The Celts (, see pronunciation for different usages) or Celtic peoples () are. "CELTS location: Greater Europe time period: Second millennium B.C.E. to present ancestry: Celtic a collection of Indo-European peoples. "The Celts, an ancient ...
,
[Presentation of the CELT codec](_blank)
by Timothy B. Terriberry (65 minutes of video, see als
presentation slides
in PDF) and
Opus
''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.).
Opus or OPUS may refer to:
Arts and entertainment Music
* Opus number, (abbr. Op.) specifying order of (usually) publicatio ...
.
The
discrete cosine transform (DCT) was first proposed by
Nasir Ahmed in 1972,
and demonstrated by Ahmed with T. Natarajan and
K. R. Rao in 1974.
The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the
University of Surrey
The University of Surrey is a public research university in Guildford, Surrey, England. The university received its royal charter in 1966, along with a number of other institutions following recommendations in the Robbins Report. The institut ...
in 1987, following earlier work by Princen and Bradley (1986) to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the
discrete sine transform, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band
polyphase quadrature filter
A polyphase quadrature filter, or PQF, is a filter bank which splits an input signal into a given number N (mostly a power of 2) of equidistant sub-bands. These sub-bands are subsampled by a factor of N, so they are critically sample (signal), samp ...
(PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a ''hybrid'' filter bank or a ''subband'' MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used)
MPEG-4 AAC-SSR variant (by
Sony
, commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...
) uses a four-band PQF bank followed by an MDCT. Similar to MP3,
ATRAC
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to h ...
uses stacked
quadrature mirror filter In digital signal processing, a quadrature mirror filter is a filter whose magnitude response is the mirror image around \pi/2 of that of another filter. Together these filters, first introduced by Croisier et al., are known as the quadrature mirror ...
s (QMF) followed by an MDCT.
Definition
As a lapped transform, the MDCT is a bit unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a
linear function
In mathematics, the term linear function refers to two distinct but related notions:
* In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For dist ...
(where R denotes the set of
real number
In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
s). The 2''N'' real numbers ''x''
0, ..., ''x''
2''N''-1 are transformed into the ''N'' real numbers ''X''
0, ..., ''X''
''N''-1 according to the formula:
: