MPEG-G
   HOME

TheInfoList



OR:

MPEG-G
ISO / IEC 23092
is an
ISO/IEC ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
standard designed for genomic information representation by the collaboration of the
ISO/IEC JTC 1/SC 29 ISO/IEC JTC 1/SC 29, entitled ''Coding of audio, picture, multimedia and hypermedia information'', is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the I ...
/WG 9 (
MPEG The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...
) an
ISO TC 276 "Biotechnology"
Work Group 5. The goal of the standard is to provide interoperable solutions for data storage, access, and protection across different possible implementations for data information generated by
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
machines and their subsequent processing and analysis. The standard is composed of different parts, each one addressing a specific aspect, such as compression, metadata association, Application Programming Interfaces (
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
s), and a reference software for data decoding. Together with the reference decoder software, commercial and open source implementations started to be available in 2019, covering progressively more of the published parts of the standard.


Background

The advent of
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
(HTS) technologies has revolutionized the field of quantitative biology. Availability of large collections of genomic information has now entered everyday practice and has become a cornerstone of a number of disciplines, ranging from biological research to personalized medicine in the clinic. At the moment, genomic information is mostly exchanged through a variety of data formats, such as
FASTA FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics. History The original FASTA program ...
/
FASTQ FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. I ...
for unaligned sequencing reads and SAM/ BAM/
CRAM Cram may refer to: * Cram (surname), a surname, and list of notable persons having the surname * Cram.com, a website for creating and sharing flashcards * Cram (Australian game show), a television show * ''Cram'' (game show), a TV game show that ...
for aligned reads. The ISO/IEC 23092 (MPEG-G) standard aims to provide a unified format for the efficient representation and compression of such diverse data, both for file storage and data transport. In order to do that, the standard is divided in several parts.


Structure of the standard

The MPEG-G standard utilizes technology and data representation architectures previously validated in the field of digital media. They allow to compress and transport genome sequencing data even in complex scenarios, for instance when access is needed to large amounts of possibly distributed data, or when part of the data needs to be encrypted for privacy reasons. Conceptually, such requirements lead to the definition of a number of mutually interrelated mechanisms, which are summarized in the following list: * Data format and compression * Data streaming * Compressed file concatenation * Incremental update of sequencing data and metadata * Selective access to compressed data, e.g. fast queries by genomic range * Metadata association * Enforcement of privacy rules * Selective encryption of data and metadata * Annotation and linkage of genomic segments. In turn, some of these topic have been collected together, in order to make the standard easier to understand and implement. As a result, the ISO/IEC 23092 standard is physically structured as a series of separate document, as follows:


ISO/IEC 23092-1 MPEG-G Part 1

ISO/IEC 23092-1
specifies how the genomic data is organized within MPEG-G structures for transport (i.e., streaming) and storage. Formats of genomic record, reference record, MPEG-G file and transport stream are defined in this part. It introduces Access Unit as the container of the compressed genomic data and provides a reference conversion process among different formats.


ISO/IEC 23092-2 MPEG-G Part 2


specifies the syntax and methods for MPEG-G lossless compression of sequencing data and lossy compression of associated quality scores. MPEG-G, as is typical for MPEG standards, only specifies the decoding process while the encoding process is left open to algorithmic and implementation-specific innovations. All MPEG-G conformed decoders produce identical outputs from the multiplexed bitstreams included in MPEG-G files and the data streams in streaming scenarios. The input data of the encoder are genomic records or metadata, with optional reference data, while its output is MPEG-G file or transport streams.


ISO/IEC 23092-3 MPEG-G Part 3


specifies a metadata format and provides genomic data representation APIs to support interoperability among existing tools and systems. Part 3 specifies how an MPEG-G compliant bitstream can be integrated with metadata as well as mechanisms to implement access control, integrity verification, authentication and authorization mechanisms. This part also contains an informative section devoted to the mapping between SAM and MPEG-G data structures, including backward compatibility with existing SAM content. It defines:


ISO/IEC 23092-4 MPEG-G Part 4


ISO/IEC 23092-4
ref name="part-4" /> specifies genomic information representation reference software, referred to as the genomic model (GM). It consists of two components: the reference encoder software and the reference decoder software. While the reference decoder software is provided to assess the conformance to the requirements o


ref name="part-2" /> an

the reference encoder software serves as a guide for the implementation of the aforementioned standards. The reference encoder software calle
Genie
ref name="genie" /> is an open source software developed by a group of individuals from multiple universities and companies around the world. It features the following components:


ISO/IEC 23092-5 MPEG-G Part 5


specifies conformance of the coding of genomic information. Part 5 provides a means to test and validate the correct implementation of the MPEG-G technology in different devices and applications to ensure the interoperability among all systems. It specifies a normative procedure to assess conformity to the standard on an exhaustive set of compressed data.


MIME Type and Filename extensions

No MIME type (RFC 6838 based IANA media type) currently defined for MPEG-G file. No conventional file extensions are defined.


See also

*
MPEG The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...
*
ISO/IEC JTC 1/SC 29 ISO/IEC JTC 1/SC 29, entitled ''Coding of audio, picture, multimedia and hypermedia information'', is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the I ...


References


External links


mpeg-g.org

MPEG web site












{{DEFAULTSORT:Mpeg-g ISO/IEC standards Open standards covered by patents