SMPTE ST 2117-1, informally known as VC-6, is a

video coding format A video coding format (or sometimes video compression format) is a content representation format for storage or transmission of digital video content (such as in a data file or bitstream). It typically uses a standardized video compression algori ...

Overview

The VC-6

codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...

is optimized for intermediate, mezzanine or contribution coding applications. Typically, these applications involve compressing finished compositions for editing, contribution, primary distribution, archiving and other applications where it is necessary to preserve image quality as close to the original as possible, whilst reducing bitrates, and optimizing processing, power and storage requirements. VC-6, like other codecs in this category uses only

intra-frame Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing ...

compressions, where each frame is stored independently and can be decoded with no dependencies on any other frame. The codec implements

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...

and

lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

compression, depending on the encoding parameters that have been selected. It was standardized in 2020. Earlier variants of the codec have been deployed by

V-Nova V-NOVA is a multinational IP and Technology company headquartered in London, UK. It is best known for innovation in data compression technology for video and images. V-Nova has partnered with large organizations including Sky, Xilinx, Nvidia, Eu ...

since 2015 under the trade name Perseus. The codec is based on hierarchical data structures called s-trees, and does not involve DCT or

wavelet transform In mathematics, a wavelet series is a representation of a square-integrable ( real- or complex-valued) function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal ...

compression. The compression mechanism is independent of the data being compressed, and can be applied to

pixels In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the sm ...

as well as other non-image data. Unlike DCT based codecs, VC-6 is based on hierarchical, repeatable s-tree structures that are similar to modified quadtrees. These simple structures provide intrinsic capabilities, such as massive parallelism and the ability to choose the type of filtering used to reconstruct higher-resolution images from lower-resolution images. In the VC-6 standard an up-sampler developed with an in-loop

Convolutional Neural Network In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...

is provided to optimize the detail in the reconstructed image, without requiring a large computational overhead. The ability to navigate spatially within the VC-6 bitstream at multiple levels also provides the ability for decoding devices to apply more resources to different regions of the image allowing for Region-of-Interest applications to operate on compressed bitstreams without requiring a decode of the full-resolution image.

History

At the

NAB Show NAB Show is an annual trade show produced by the National Association of Broadcasters. It takes place in April, and has been held since 1991 at the Las Vegas Convention Center in Las Vegas, Nevada. The show's tagline is "Where Content Comes t ...

in 2015,

claimed "2x–3x average compression gains, at all quality levels, under practical real-time operating scenarios versus

H.264 Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distr ...

HEVC High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding (AVC, H.264, or MPEG-4 Part 10). In compar ...

and

JPEG2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding the ...

.". Making this announcement on 1 April before a major trade show attracted the attention of many compression experts. Since then,

have deployed and licensed the technology, known at the time as Perseus, in both contribution and distribution applications around the world including

Sky Italia Sky Italia S.r.l. is an Italian satellite television platform owned by the American media conglomerate Comcast. Sky Italia also broadcasts three national free-to-air television channels: TV8, Cielo and Sky TG24. As of 2018, following an ag ...

, Fast Filmz,

Harmonic Inc A harmonic is a wave with a frequency that is a positive integer multiple of the ''fundamental frequency'', the frequency of the original periodic signal, such as a sinusoidal wave. The original signal is also called the ''1st harmonic'', the ...

, and others. A variant of the technology optimized for enhancing distribution codec will soon be standardized as MPEG-5 Part-2 LCEVC.

Core concepts

Planes

The standard describes a compression algorithm that is applied to independent planes of data. These planes might be

RGB The RGB color model is an additive color model in which the red, green and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three addi ...

RGBA RGBA stands for red green blue alpha. While it is sometimes described as a color space, it is actually a three-channel RGB color model supplemented with a fourth ''alpha channel''. Alpha indicates how opaque each pixel is and allows an image to ...

pixels originating in a camera,

YCbCr YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-diff ...

pixels from a conventional TV-centric video source or some other planes of data. There may be up to 255 independent planes of data, and each plane can have a grid of data values of dimensions up to 65535 x 65535. Th
SMPTE ST 2117-1
standard focuses on compressing planes of data values, typically pixels. To compress and decompress the data in each plane, VC-6 uses hierarchical representations of small tree-like structure that carry metadata used to predict other trees. There are 3 fundamental structures repeated in each plane.

S-tree

The core compression structure in VC-6 is the s-tree. It is similar to the

quadtree A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four ...

structure common in other schemes. An s-tree is comprised nodes arranged in a tree structure, where each node links to 4 nodes in the next layer. The total number of layers above the root node is known as the rise of the s-tree. Compression is achieved in an s-tree by using metadata to signal whether levels can be predicted with selective carrying of enahndement data in the bitstream. The more data that can be predicted, the less information that is sent, and the better the compression ratio.

Tableau

The standard defines a tableau as the root node, or the highest layer of an s-tree, that contains nodes for another s-tree. Like the generic s-trees from which they are constructed, tableaux are arranged in layers with metadata in the nodes indicating whether or not higher layers are predicted or transmitted in the bitstream.

Echelon

The hierarchical s-tree and tableau structures in the standard are used to carry enhancements (called resid-vals) and other metadata to reduce the amount of raw data that needs to be carried in the bitstream payload. The final hierarchical tool is an ability to arrange the tableaux, so that data from each plane (i.e. pixels) can be dequantized at different resolutions and used as predictors for higher resolutions. Each of these resolutions is defined by the standard as an echelon. Each echelon within a plane is identified by an index, where a more negative index indicates a low resolution and a larger, more positive index indicates a higher resolution.

Bitstream overview

VC-6 is an example of

intra-frame coding Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing ...

, where each picture is coded without referencing other pictures. It is also intra-plane, where no information from one plane is used to predict another plane. As a result, the VC-6 bitstream contains all of the information for all of the planes of a single image. An image sequence is created by concatenating the bitstreams for multiple images, or by packaging them in a container such as

MXF MXF or mxf may refer to: * Material Exchange Format, a container format for professional digital video and audio media * MXF, the IATA and FAA LID code for Maxwell Air Force Base, Alabama, United States * mxf, the ISO 639-3 code for Malgbe language ...

or Quicktime or

Matroska Matroska is a project to create a container format that can hold an unlimited number of video, audio, picture, or subtitle tracks in one file. The Matroska Multimedia Container is similar in concept to other containers like AVI, MP4, or Advanc ...

. The VC-6 bitstream is defined in the standard. by pseudo code, and a reference decoder has been demonstrated based on that definition. The primary header is the only fixed structure defined by the standard. The secondary header contains marker and sizing information depending on the values in the primary header. The tertiary header is entirely calculated, and then the payload structure is derived from the parameters calculated during header decoding

Decoding overview

The standard defines a process called plane reconstruction for decoding images from a bitstream. The process starts with the echelon having the lowest index. No predictions are used for this echelon. Firstly, the bitstream rules are used to reconstruct residuals. Next, desparsification and

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...

decoding processes are performed to fill the grid with data values at each coordinate. These values are then dequantised to create full-range values that can be used as predictions for the echelon with the next highest index. Each echelon uses the upsampler specified in the header to create a predicted plane from the echelon below which is added to the residual grid from the current echelon that can be upsampled as a prediction for the next echelon. The final, full-resolution, echelon, defined by the standard, is at index 0, and its results are displayed, rather than used for another echelon.

Upsampler options

Basic options

The standard defines a number of basic upsamplers to create higher-resolution predictions from lower-resolution echelons. There are two linear upsamplers, bicubic and sharp, and a nearest-neighbour upsampler.

Convolutional Neural Network Upsampler

Six different non-linear upsamplers are defined by a set of processes and coefficients that are provided in JSON format. These coefficients were generated using Convolutional Neural Network techniques.

References

{{Compression formats High-definition television SMPTE standards Video codecs HD DVD Open standards covered by patents Video compression Lossless compression algorithms