History
The predecessor of MPEG-1 for video coding was thePatents
Due to its age, MPEG-1 is no longer covered by any essential patents and can thus be used without obtaining a licence or paying any fees. The ISO patent database lists one patent for ISO 11172, US 4,472,747, which expired in 2003. The near-complete draft of the MPEG-1 standard was publicly available as ISO CD 11172 by December 6, 1991. Reference 3 in the paper is to Committee Draft of Standard ISO/IEC 11172, December 6, 1991. Neither the July 2008 Kuro5hin article "Patent Status of MPEG-1, H.261 and MPEG-2", nor an August 2008 thread on the gstreamer-devel mailing list were able to list a single unexpired MPEG-1 Video and MPEG-1 Audio Layer I/II patent. A May 2009 discussion on the whatwg mailing list mentioned US 5,214,678 patent as possibly covering MPEG-1 Audio Layer II. Filed in 1990 and published in 1993, this patent is now expired. A full MPEG-1 decoder and encoder, with "Layer III audio", could not be implemented royalty free since there were companies that required patent fees for implementations of MPEG-1 Audio Layer III, as discussed in theFormer patent holders
The following corporations filed declarations with ISO saying they held patents for the MPEG-1 Video (ISO/IEC-11172-2) format, although all such patents have since expired. *Applications
*Most popularPart 1: Systems
Part 1 of the MPEG-1 standard covers ''systems'', and is defined in ISO/IEC-11172-1. MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. ThisElementary streams, packets, and clock references
*Elementary Streams (ES) are the raw bitstreams of MPEG-1 audio and video encoded data (output from an encoder). These files can be distributed on their own, such as is the case with MP3 files. *Packetized Elementary Streams (PES) are elementary streamsProgram streams
Program Streams (PS) are concerned with combining multiple packetized elementary streams (usually just one audio and video PES) into a single stream, ensuring simultaneous delivery, and maintaining synchronization. The PS structure is known as aMultiplexing
To generate the PS, the multiplexer will interleave the (two or more) packetized elementary streams. This is done so the packets of the simultaneous streams can be transferred over the samePart 2: Video
Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC-11172-2. The design was heavily influenced byColor space
Before encoding video to MPEG-1, the color-space is transformed to Y′CbCr (Y′=Luma, Cb=Chroma Blue, Cr=Chroma Red).Resolution/bitrate
MPEG-1 supports resolutions up to 4095×4095 (12 bits), and bit rates up to 100 Mbit/s. MPEG-1 videos are most commonly seen usingFrame/picture/block types
MPEG-1 has several frame/picture types that serve different purposes. The most important, yet simplest, is I-frame.I-frames
"I-frame" is an abbreviation for "P-frames
"P-frame" is an abbreviation for "Predicted-frame". They may also be called forward-predicted frames or inter-frames (B-frames are also inter-frames). P-frames exist to improve compression by exploiting the temporal (over time) redundancy in a video. P-frames store only the ''difference'' in image from the frame (either an I-frame or P-frame) immediately preceding it (this reference frame is also called the ''B-frames
"B-frame" stands for "bidirectional-frame" or "bipredictive frame". They may also be known as backwards-predicted frames or B-pictures. B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames (i.e. two anchor frames). It is therefore necessary for the player to first decode the next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed. This means decoding B-frames requires largerD-frames
MPEG-1 has a unique frame type not found in later video standards. "D-frames" or DC-pictures are independently coded images (intra-frames) that have been encoded using DC transform coefficients only (AC coefficients are removed when encoding D-frames—see DCT below) and hence are very low quality. D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through a video at high speed. Given moderately higher-performance decoding equipment, fast preview can be accomplished by decoding I-frames instead of D-frames. This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames (thus improving compression of the video content). For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards.Macroblocks
MPEG-1 operates on video in a series of 8×8 blocks for quantization. However, to reduce the bit rate needed for motion vectors and because chroma (color) is subsampled by a factor of 4, each pair of (red and blue) chroma blocks corresponds to 4 different luma blocks. This set of 6 blocks, with a resolution of 16×16, is processed together and called a ''macroblock''. A macroblock is the smallest independent unit of (color) video. Motion vectors (see below) operate solely at the macroblock level. If the height or width of the video are not exact multiples of 16, full rows and full columns of macroblocks must still be encoded and decoded to fill out the picture (though the extra decoded pixels are not displayed).Motion vectors
To decrease the amount of temporal redundancy in a video, only blocks that change are updated, (up to the maximum GOP size). This is known as conditional replenishment. However, this is not very effective by itself. Movement of the objects, and/or the camera may result in large portions of the frame needing to be updated, even though only the position of the previously encoded objects has changed. Through motion estimation, the encoder can compensate for this movement and remove a large amount of redundant information. The encoder compares the current frame with adjacent parts of the video from the anchor frame (previous I- or P- frame) in a diamond pattern, up to a (encoder-specific) predefinedDCT
Each 8×8 block is encoded by first applying a ''forward'' discrete cosine transform (FDCT) and then a quantization process. The FDCT process (by itself) is theoretically lossless, and can be reversed by applying an ''Inverse'' DCT ( IDCT) to reproduce the original values (in the absence of any quantization and rounding errors). In reality, there are some (sometimes large) rounding errors introduced both by quantization in the encoder (as described in the next section) and by IDCT approximation error in the decoder. The minimum allowed accuracy of a decoder IDCT approximation is defined by ISO/IEC 23002-1. (Prior to 2006, it was specified by IEEE 1180-1990.) The FDCT process converts the 8×8 block of uncompressed pixel values (brightness or color difference values) into an 8×8 indexed array of ''frequency coefficient'' values. One of these is the (statistically high in variance) "DC coefficient", which represents the average value of the entire 8×8 block. The other 63 coefficients are the statistically smaller "AC coefficients", which have positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient. An example of an encoded 8×8 FDCT block: : Since the DC coefficient value is statistically correlated from one block to the next, it is compressed usingQuantization
Quantization is, essentially, the process of reducing the accuracy of a signal, by dividing it by some larger step size and rounding to an integer value (i.e. finding the nearest multiple, and discarding the remainder). The frame-level quantizer is a number from 0 to 31 (although encoders will usually omit/disable some of the extreme values) which determines how much information will be removed from a given frame. The frame-level quantizer is typically either dynamically selected by the encoder to maintain a certain user-specified bitrate, or (much less commonly) directly specified by the user. A "quantization matrix" is a string of 64 numbers (ranging from 0 to 255) which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image. An example quantization matrix: : Quantization is performed by taking each of the 64 ''frequency'' values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix. Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more ''strongly quantized'' (drastically reduced). MPEG-1 actually uses two separate quantization matrices, one for intra-blocks (I-blocks) and one for inter-block (P- and B- blocks) so quantization of different block types can be done independently, and so, more effectively. This quantization process usually reduces a significant number of the ''AC coefficients'' to zero, (known as sparse data) which can then be more efficiently compressed by entropy coding (lossless compression) in the next step. An example quantized DCT block: : Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the primary source of most MPEG-1 videoEntropy coding
Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed upon decoding, to produce exactly the same (original) values. Since these lossless data compression steps don't add noise into, or otherwise change the contents (unlike quantization), it is sometimes referred to as noiseless coding. Since lossless compression aims to remove as much redundancy as possible, it is known asGOP configurations for specific applications
I-frames store complete frame info within the frame and are therefore suited for random access. P-frames provide compression using motion vectors relative to the previous frame ( I or P ). B-frames provide maximum compression but require the previous as well as next frame for computation. Therefore, processing of B-frames requires more buffer on the decoded side. A configuration of thePart 3: Audio
Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3. MPEG-1 Audio utilizesLayer I
MPEG-1 Audio Layer I is a simplified version of MPEG-1 Audio Layer II. Layer I uses a smaller 384-sample frame size for very low delay, and finer resolution. This is advantageous for applications like teleconferencing, studio editing, etc. It has lower complexity than Layer II to facilitateLayer II
MPEG-1 Audio Layer II (the first version of MP2, often informally called MUSICAM) is aHistory/MUSICAM
MPEG-1 Audio Layer II was derived from the MUSICAM (''Masking pattern adapted Universal Subband Integrated Coding And Multiplexing'') audio codec, developed byTechnical details
MP2 is a time-domain encoder. It uses a low-delay 32 sub-band polyphasedQuality
Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offerLayer III
MPEG-1 Audio Layer III (the first version ofHistory/ASPEC
MPEG-1 Audio Layer III was derived from the ''Adaptive Spectral Perceptual Entropy Coding'' (ASPEC) codec developed by Fraunhofer as part of theTechnical details
MP3 is a frequency-domain audio transform encoder. Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2. MP3 works on 1152 samples like MP2, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place. MP3 does not benefit from the 32 sub-band polyphased filter bank, instead just using an 18-point MDCT transformation on each output to split the data into 576 frequency components, and processing it in the frequency domain. This extraQuality
MP3's more fine-grained and selective quantization does prove notably superior to MP2 at lower-bitrates. It is able to provide nearly equivalent audio quality to Layer II, at a 15% lower bitrate (approximately). 128 kbit/s is considered the "sweet spot" for MP3; meaning it provides generally acceptable quality stereo sound on most music, and there are diminishing quality improvements from increasing the bitrate further. MP3 is also regarded as exhibiting artifacts that are less annoying than Layer II, when both are used at bitrates that are too low to possibly provide faithful reproduction. Layer III audio files use the extension ".mp3".MPEG-2 audio extensions
ThePart 4: Conformance testing
Part 4 of the MPEG-1 standard covers conformance testing, and is defined in ISO/IEC-11172-4. Conformance: Procedures for testing conformance. Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder.Part 5: Reference software
Part 5 of the MPEG-1 standard includes reference software, and is defined in ISO/IEC TR 11172–5. Simulation: Reference software. C reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing. This includes the ''ISO Dist10'' audio encoder code, whichFile extension
.mpg is one of a number of file extensions for MPEG-1 orSee also
*References
External links