MPEG-4 Part 3 or MPEG-4 Audio (formally

ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...

IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...

14496-3) is the third part of the

MPEG-4 MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related tec ...

international standard developed by

Moving Picture Experts Group The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and f ...

. It specifies

audio coding An audio coding format (or sometimes audio compression format) is a content representation format for storage or transmission of digital audio (such as in digital television, digital radio and in audio and video files). Examples of audio coding f ...

methods. The first version of ISO/IEC 14496-3 was published in 1999. The MPEG-4 Part 3 consists of a variety of audio coding technologies – from

lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

(

HVXC Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 (MPEG-4 Audio) standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate ...

CELP Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...

), general audio coding (

AAC AAC may refer to: Aviation * Advanced Aircraft, a company from Carlsbad, California * Alaskan Air Command, a radar network * American Aeronautical Corporation, a company from Port Washington, New York * American Aviation, a company from Cleveland, ...

TwinVQ TwinVQ (transform-domain weighted interleave vector quantization) is an audio compression technique developed by Nippon Telegraph and Telephone Corporation (NTT) Human Interface Laboratories (now Cyber Space Laboratories) in 1994. The compression ...

, BSAC),

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...

audio compression (

MPEG-4 SLS MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 Audio) standard to allow lossless audio compression scalable to lossy MPEG-4 General Audio ...

Audio Lossless Coding MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006. ...

MPEG-4 DST MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related tec ...

), a

Text-To-Speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...

Interface (TTSI),

Structured Audio MPEG-4 Structured Audio is an ISO/IEC standard for describing sound. It was published as subpart 5 of MPEG-4 Part 3 (ISO/IEC 14496-3:1999) in 1999. It allows the transmission of synthetic music and sound effects at very low bit rates (from 0.01 t ...

(using SAOL, SASL,

MIDI MIDI (; Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, computers, and re ...

) and many additional audio synthesis and coding techniques. MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback. MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.

Versions

Subparts

MPEG-4 Part 3 contains following subparts: * Subpart 1: Main (list of Audio Object Types, Profiles, Levels, interface to ISO/IEC 14496-1, MPEG-4 Audio transport stream, etc.) * Subpart 2: Speech coding –

(Harmonic Vector eXcitation Coding) * Subpart 3: Speech coding –

(Code Excited Linear Prediction) * Subpart 4: General Audio Coding (GA) (Time/Frequency Coding) –

, BSAC * Subpart 5:

(SA) * Subpart 6:

Text to Speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...

Interface (TTSI) * Subpart 7: Parametric Audio Coding – HILN (Harmonic and Individual Line plus Noise) * Subpart 8: Technical description of parametric coding for high quality audio (SSC,

Parametric Stereo Parametric Stereo (abbreviated as PS) is an audio compression algorithm used as an audio coding format for digital audio. It is considered an Audio Object Type of MPEG-4 Part 3 (MPEG-4 Audio) that serves to enhance the coding efficiency of low band ...

) * Subpart 9:

MPEG-1 MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, making ...

MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic video coding format, coding of moving pictures and associated audio information". It describes a combination of Lossy compression, lossy video compression and ...

Audio in MPEG-4 * Subpart 10: Technical description of lossless coding of oversampled audio (MPEG-4 DST –

Direct Stream Transfer Super Audio CD (SACD) is an optical disc format for audio storage introduced in 1999. It was developed jointly by Sony and Philips Electronics and intended to be the successor to the Compact Disc (CD) format. The SACD format allows multiple au ...

) * Subpart 11:

(ALS) * Subpart 12:

Scalable Lossless Coding MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/ IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 ( MPEG-4 Audio) standard to allow lossless audio compression scalable to lossy MPEG-4 General Aud ...

(SLS)

MPEG-4 Audio Object Types

MPEG-4 Audio includes a system for handling a diverse group of audio formats in a uniform manner. Each format is assigned a unique Audio Object Type to represent it. Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports a different list of object types.

Audio Profiles

The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.

Audio storage and transport

There is no standard for transport of

elementary stream An elementary stream (ES) as defined by the MPEG communication protocol is usually the output of an audio encoder or video encoder. An ES contains only one kind of data (e.g. audio, video, or closed caption). An elementary stream is often referred t ...

s over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution. The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the

Delivery Multimedia Integration Framework DMIF, or Delivery Multimedia Integration Framework, is a uniform interface between the application and the transport, that allows the MPEG-4 application developer to stop worrying about that transport. DMIF was defined in MPEG-4 Part 6 (ISO/IEC 14 ...

(DMIF) in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g.,

MPEG transport stream MPEG transport stream (MPEG-TS, MTS) or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast systems ...

Real-time Transport Protocol The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applicatio ...

(RTP), etc. Transport in Real-time Transport Protocol is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4). LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.

Bifurcation in the AAC technical standard

The

Advanced Audio Coding Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate. AAC has been stan ...

in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4 was enhanced relative to the previous standard

Part 7 (Advanced Audio Coding), in order to provide better

sound quality Sound quality is typically an assessment of the accuracy, fidelity, or intelligibility of audio output from an electronic device. Quality can be measured objectively, such as when tools are used to gauge the accuracy with which the device re ...

for a given encoding bitrate. It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard. The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles: Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR). The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR).

HE-AAC

High-Efficiency Advanced Audio Coding High-Efficiency Advanced Audio Coding (HE-AAC) is an audio coding format for lossy data compression of digital audio defined as an MPEG-4 Audio profile in ISO/IEC 14496–3. It is an extension of Low Complexity AAC (AAC-LC) optimized for lo ...

is an extension of AAC LC using

spectral band replication Spectral band replication (SBR) is a technology to enhance audio or speech codecs, especially at low bit rates and is based on harmonic redundancy in the frequency domain. It can be combined with any audio compression codec: the codec itself tran ...

(SBR), and

(PS). It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.

AAC-SSR

AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards. It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding (AAC) in 1997. The audio signal is first split into 4 bands using a 4 band

polyphase quadrature filter A polyphase quadrature filter, or PQF, is a filter bank which splits an input signal into a given number N (mostly a power of 2) of equidistant sub-bands. These sub-bands are subsampled by a factor of N, so they are critically sample (signal), samp ...

bank. Then these 4 bands are further split using MDCTs with a size ''k'' of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size ''k'' of 128 or 1024 directly on the audio signal. The advantage of this technique is that short block switching can be done separately for every

PQF A polyphase quadrature filter, or PQF, is a filter bank which splits an input signal into a given number N (mostly a power of 2) of equidistant sub-bands. These sub-bands are subsampled by a factor of N, so they are critically sampled. An importa ...

band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around (1,2,3) * fs/8 is worse than normal MPEG-4 AAC LC. MPEG-4 AAC-SSR is very similar to

ATRAC Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to h ...

and

ATRAC-3 Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to ...

Why AAC-SSR was introduced

The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate. Example: * 4 subbands: bitrate = 128 kbit/s, sample rate = 48 kHz, f_lowpass = 20 kHz * 3 subbands: bitrate ~ 120 kbit/s, sample rate = 48 kHz, f_lowpass = 18 kHz * 2 subbands: bitrate ~ 100 kbit/s, sample rate = 24 kHz, f_lowpass = 12 kHz * 1 subband: bitrate ~ 65 kbit/s, sample rate = 12 kHz, f_lowpass = 6 kHz Note: although possible, the resulting quality is much worse than typical for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is achieved by using intensity stereo and reduced NMRs. This degrades audible quality less than transmitting 6 kHz bandwidth with perfect quality.

BSAC

Bit Sliced Arithmetic Coding is an MPEG-4 standard (ISO/IEC 14496-3 subpart 4) for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and

graceful degradation Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...

at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in the range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting (DMB) applications.

Licensing

In 2002, the MPEG-4 Audio Licensing Committee selected the Via Licensing Corporation as the Licensing Administrator for the MPEG-4 Audio

patent pool In patent law, a patent pool is a consortium of at least two companies agreeing to cross-license patents relating to a particular technology. The creation of a patent pool can save patentees and licensees time and money, and, in case of blocking ...

References

External links

Apple: MPEG-4: AAC"AAC" (VideoLAN WIKI)

EBU subjective listening tests on low-bitrate audio codecs

AAC radio stations
– Online radio stations in AAC format
Tuner2
– Directory of radio stations in AAC+ format at various bitrates
RadioFeeds UK & Ireland
– Page containing plenty of terrestrial stations webcasting in AAC+ format.

A page comparing codecs including HE-AAC @64 kbit/s by listening tests. (Page is offline)
Official MPEG web site
* – RTP Payload Format for MPEG-4 Audio/Visual Streams * – RTP Payload Format for Transport of MPEG-4 Elementary Streams * – The Codecs Parameter for "Bucket" Media Types * – MIME Type Registration for MPEG-4 {{MPEG Audio codecs Lossy compression algorithms MPEG-4