Mass Spectrometry Data Format
   HOME

TheInfoList



OR:

Mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the
Trans-Proteomic Pipeline The Trans-Proteomic Pipeline (TPP) is an open-source data analysis software for proteomics developed at the Institute for Systems Biology (ISB) by the Ruedi Aebersold group under the Seattle Proteome Center. The TPP includes PeptideProphet, Pr ...
at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.


Open formats


JCAMP-DX

This format was one of the earliest attempts to supply a standardized file format for data exchange in mass spectrometry.
JCAMP-DX JCAMP-DX are text-based file formats created by JCAMP for storing spectroscopic data. It started as a file format for Infrared spectroscopy. It was later expanded to cover Nuclear magnetic resonance spectroscopy, mass spectrometry, electron ma ...
was initially developed for infrared spectrometry. JCAMP-DX is an ASCII based format and therefore not very compact even though it includes standards for file compression. JCAMP was officially released in 1988. Together with the
American Society for Mass Spectrometry The American Society for Mass Spectrometry (ASMS) is a professional association based in the United States that supports the scientific field of mass spectrometry. As of 2018, the society had approximately 10,000 members primarily from the US, b ...
a JCAMP-DX format for mass spectrometry was developed with aim to preserve legacy data.


ANDI-MS or netCDF

The Analytical Data Interchange Format for Mass Spectrometry is a format for exchanging data. Many mass spectrometry software packages can read or write ANDI files. ANDI is specified in the ASTM E1947 Standard. ANDI is based on netCDF which is a software tool library for writing and reading data files. ANDI was initially developed for chromatography-MS data and therefore was not used in the
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
gold rush where new formats based on XML were developed.


AnIML

AnIML is a joined effort of IUPAC and ASTM International to create an XML based standard that covers a wide variety of analytical techniques including mass spectrometry.


mzData

mzData was the first attempt by the Proteomics Standards Initiative (PSI) from the Human Proteome Organization (HUPO) to create a standardized format for Mass Spectrometry data. This format is now deprecated, and replaced by mzML.


mzXML

mzXML is a XML (eXtensible Markup Language) based common file format for
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
mass spectrometric data. This format was developed at the Seattle Proteome Center/Institute for Systems Biology while the HUPO-PSI was trying to specify the standardized mzData format, and is still in use in the proteomics community.


YAFMS

Yet Another Format for Mass Spectrometry (YAFMS) is a suggestion to save data in four table relational server-less
database schema The database schema is the structure of a database described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divide ...
with data extraction and appending being exercised using SQL queries.


mzML

As two formats (mzData and mzXML) for representing the same information is an undesirable state, a joint effort was set by HUPO-PSI, the SPC/ISB and instrument vendors to create a unified standard borrowing the best aspects of both mzData and mzXML, and intended to replace them. Originally called dataXML, it was officially announced as mzML. The first specification was published in June 2008. This format was officially released at the 2008
American Society for Mass Spectrometry The American Society for Mass Spectrometry (ASMS) is a professional association based in the United States that supports the scientific field of mass spectrometry. As of 2018, the society had approximately 10,000 members primarily from the US, b ...
Meeting, and is since then relatively stable with very few updates. On 1 June 2009, mzML 1.1.0 was released. There are no planned further changes as of 2013.


mzAPI

Instead of defining new file formats and writing converters for proprietary vendor formats a group of scientists proposed to define a common application program interface to shift the burden of standards compliance to the instrument manufacturers' existing data access libraries.


mz5

The mz5 format addresses the performance problems of the previous XML based formats. It uses the mzML ontology, but saves the data using the HDF5 backend for reduced storage space requirements and improved read/write speed.


imzML

The imzML standard was proposed to exchange data from
mass spectrometry imaging Mass spectrometry imaging (MSI) is a technique used in mass spectrometry to visualize the spatial distribution of molecules, as biomarkers, metabolites, peptides or proteins by their molecular masses. After collecting a mass spectrum at one spot ...
in a standardized XML file based on the mzML ontology. It splits experimental data into XML and spectral data in a binary file. Both files are linked by a universally unique identifier.


mzDB

mzDB saves data in an SQLite database to save on storage space and improve access times as the data points can be queried from a
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
.


Toffee

Toffee is an open lossless file format for data-independent acquisition mass spectrometry. It leverages HDF5 and aims to achieve file sizes similar to those from the proprietary and closed vendor formats.


mzMLb

mzMLb is another take on using a HDF5 backend for performant raw data saving. It, however, preserves the mzML XML data structure and stays compliant to the existing standard.


Proprietary formats

Below is a table of different file format extensions. : (*) Note that the RAW formats of each vendor are not interchangeable; software from one cannot handle the RAW files from another.
(**) Micromass was acquired by Waters in 1997
(***) Finnigan is a division of Thermo


Software


Viewers

There are several viewers for mzXML, mzML and mzData: MZmine, PEAKS,
Insilicos Insilicos is a life science software company founded in 2002 by Erik Nilsson, Brian Pratt and Bryan Prazen. Insilicos develops scientific computing software to provide software for disease diagnoses. Technology Insilicos' key technologies in ...
, MS-Spectre, TOPPView (mzXML, mzML and mzData), Spectra Viewer, SeeMS, msInspect, jmzML, Mascot Distiller, Elsci Peaksel There is a viewer for ITA images. ITA and ITM images can be parsed with the pySPM python library.


Converters

Known converters for mzData to mzXML: :Hermes: A Java "mzData, mzXML, mzML" converter to all directions: publicly available, runs with a graphical user interface, by the Institute of Molecular Systems Biology, ETH Zurich :FileConverter: A command line tool that converts to/from various mass spectrometry formats, part of TOPPTOPP
Known converters for mzXML: : The Institute for Systems Biology maintains a list of converters Known converters for mzML: :msConvert: A command line tool converting to/from various mass spectrometry formats. A GUI is also available for Windows users. : ReAdW: The Institute for Systems Biology command line converter for Thermo RAW files, part of the TransProteomicPipeline. The latest update of this tool was made in September 2009. Users are now redirected by the TPP development team to use the msConvert software (see above). :FileConverter: A command line tool that converts to/from various mass spectrometry formats, part of TOPP Converters for proprietary formats: :msConvert: A command line tool converting to/from various mass spectrometry formats including multiple proprietary formats. A GUI is also available for Windows users. : CompassXport, Bruker's free tool generating mzXML (and now mzData) files for many of their native file formats (.baf). : MASSTransit, a software to change data between proprietary formats, by
Palisade Corporation A palisade, sometimes called a stakewall or a paling, is typically a fence or defensive wall made from iron or wooden stakes, or tree trunks, and used as a defensive structure or enclosure. Palisades can form a stockade. Etymology ''Palisade'' ...
and distributed by
Scientific Instrument Services, Inc Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence fo ...
and PerkinElmer. Purchased from Palisade by John Wiley and Sons in 2020 and incorporated into KnowItAll Spectroscopy softwar
(list of file formats supported).
: Aston, native support for several Agilent Chemstation, Agilent Masshunter and Thermo Isodat file formats : unfinnigan, native support for Finnigan (*.RAW) file formats :
OpenChrom OpenChrom is an open source software for the analysis and visualization of mass spectrometric and chromatographic data. Its focus is to handle native data files from several mass spectrometry systems (e.g. GC/MS, LC/MS, Py-GC/MS, HPLC-MS), vendor ...
, an open source software with support to convert various native file formats including its own open .ocb format to store chromatograms, peaks and identification results Currently available converters are : :: MassWolf, for
Micromass Waters Corporation is a publicly traded Analytical Laboratory instrument and software company headquartered in Milford, Massachusetts. The company employs more than 7,800 people, with manufacturing facilities located in Milford, Taunton, Massachu ...
MassLynx .Raw format :: mzStar, for SCIEX/ ABI SCIEX/ABI Analyst format :: wiff2dtawiff2dta at sourceforge
/ref> for SCIEX/ ABI SCIEX/ABI Analyst format to mzXML, DTA, MGF and PMF


See also

* Mass spectrometry software


References

{{reflist Bioinformatics software Mass spectrometry software Proteomics