Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as
gas- or
liquid chromatography
In chemical analysis, chromatography is a laboratory technique for the separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it through a system (a ...
and has found widespread adoption in the fields of
analytical chemistry
Analytical chemistry studies and uses instruments and methods to separate, identify, and quantify matter. In practice, separation, identification or quantification may constitute the entire analysis or be combined with another method. Separati ...
and
biochemistry
Biochemistry or biological chemistry is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology and ...
where it can be used to identify and characterize
small molecules
Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs ...
and
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s (
proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gotthard album), 1999
* ''Open'' (Cowboy Junkies album), 2001
* ''Open'' ( ...
,
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
-based data formats have recently been developed by the
Trans-Proteomic Pipeline
The Trans-Proteomic Pipeline (TPP) is an open-source data analysis software for proteomics developed at the Institute for Systems Biology (ISB) by the Ruedi Aebersold group under the Seattle Proteome Center. The TPP includes PeptideProphet, Prot ...
at the
Institute for Systems Biology
Institute for Systems Biology (ISB) is a non-profit research institution located in Seattle, Washington, United States. ISB concentrates on systems biology, the study of relationships and interactions between various parts of biological systems, ...
to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Open formats
JCAMP-DX
This format was one of the earliest attempts to supply a standardized file format for data exchange in mass spectrometry.
JCAMP-DX
JCAMP-DX are text-based file formats created by Joint Committee on Atomic and Molecular Physical Data, JCAMP for storing Spectroscopy, spectroscopic data. It started as a file format for Infrared spectroscopy. It was later expanded to cover Nuclea ...
was initially developed for infrared spectrometry. JCAMP-DX is an
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
based format and therefore not very compact even though it includes standards for file compression. JCAMP was officially released in 1988. Together with the
American Society for Mass Spectrometry a JCAMP-DX format for mass spectrometry was developed with aim to preserve legacy data.
ANDI-MS or netCDF
The Analytical Data Interchange Format for Mass Spectrometry is a format for exchanging data. Many mass spectrometry software packages can read or write ANDI files. ANDI is specified in the ASTM E1947 Standard. ANDI is based on
netCDF
NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidat ...
which is a software tool library for writing and reading data files. ANDI was initially developed for chromatography-MS data and therefore was not used in the
proteomics gold rush where new formats based on
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
were developed.
AnIML
AnIML The Analytical Information Markup Language (AnIML) is an open ASTM XML standard for storing and sharing any analytical chemistry and biological data.
AnIML and FAIR data
A main reason of using AnIML is that FAIR data (Findable, Accessible, Inte ...
is a joined effort of
IUPAC
The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
and
ASTM International
ASTM International, formerly known as American Society for Testing and Materials, is an international standards organization that develops and publishes voluntary consensus technical standards for a wide range of materials, products, systems, ...
to create an XML based standard that covers a wide variety of analytical techniques including mass spectrometry.
mzData
mzData was the first attempt by the
Proteomics Standards Initiative (PSI) from the
Human Proteome Organization The Human Proteome Organization (HUPO) is an international consortium of national proteomics research associations, government researchers, academic institutions, and industry partners. The organization was launched in February 2001, and it promote ...
(HUPO) to create a standardized format for Mass Spectrometry data.
This format is now deprecated, and replaced by mzML.
mzXML
mzXML is a
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
(eXtensible Markup Language) based common file format for
proteomics mass spectrometric data.
This format was developed at the Seattle Proteome Center/Institute for Systems Biology while the HUPO-PSI was trying to specify the standardized mzData format, and is still in use in the proteomics community.
YAFMS
Yet Another Format for Mass Spectrometry (YAFMS) is a suggestion to save data in four table relational server-less
database schema with data extraction and appending being exercised using
SQL queries.
mzML
As two formats (mzData and mzXML) for representing the same information is an undesirable state, a joint effort was set by HUPO-PSI, the SPC/ISB and instrument vendors to create a unified standard borrowing the best aspects of both mzData and mzXML, and intended to replace them. Originally called dataXML, it was officially announced as mzML. The first specification was published in June 2008.
This format was officially released at the 2008
American Society for Mass Spectrometry Meeting, and is since then relatively stable with very few updates.
On 1 June 2009, mzML 1.1.0 was released. There are no planned further changes as of 2013.
mzAPI
Instead of defining new file formats and writing converters for proprietary vendor formats a group of scientists proposed to define a common
application program interface
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
to shift the burden of standards compliance to the instrument manufacturers' existing data access libraries.
mz5
The mz5 format addresses the performance problems of the previous XML based formats. It uses the mzML ontology, but saves the data using the
HDF5
Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non- ...
backend for reduced storage space requirements and improved read/write speed.
imzML
The imzML standard was proposed to exchange data from
mass spectrometry imaging
Mass spectrometry imaging (MSI) is a technique used in mass spectrometry to visualize the spatial distribution of molecules, as biomarkers, metabolites, peptides or proteins by their molecular masses. After collecting a mass spectrum at one spot, ...
in a standardized XML file based on the mzML ontology. It splits experimental data into XML and spectral data in a binary file. Both files are linked by a
universally unique identifier
A universally unique identifier (UUID) is a 128-bit label used for information in computer systems. The term globally unique identifier (GUID) is also used.
When generated according to the standard methods, UUIDs are, for practical purposes, u ...
.
mzDB
mzDB saves data in an
SQLite
SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the m ...
database to save on storage space and improve access times as the data points can be queried from a
relational database.
Toffee
Toffee is an open lossless file format for
data-independent acquisition mass spectrometry. It leverages
HDF5
Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non- ...
and aims to achieve file sizes similar to those from the proprietary and closed vendor formats.
mzMLb
mzMLb is another take on using a
HDF5
Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non- ...
backend for performant raw data saving. It, however, preserves the mzML XML data structure and stays compliant to the existing standard.
Proprietary formats
Below is a table of different file format extensions.
:
(*) Note that the RAW formats of each vendor are not interchangeable; software from one cannot handle the RAW files from another.
(**) Micromass was acquired by Waters in 1997
(***) Finnigan is a division of Thermo
Software
Viewers
There are several viewers for mzXML, mzML and mzData: MZmine, PEAKS,
Insilicos
Insilicos is a life science software company founded in 2002 by Erik Nilsson, Brian Pratt and Bryan Prazen. Insilicos develops scientific computing software to provide software for disease diagnoses.
Technology
Insilicos' key technologies inc ...
, MS-Spectre, TOPPView (mzXML, mzML and mzData), Spectra Viewer, SeeMS, msInspect, jmzML, Mascot Distiller, Elsci Peaksel
There is a viewer for ITA images. ITA and ITM images can be parsed with the pySPM python library.
Converters
Known converters for mzData to mzXML:
:Hermes: A Java "mzData, mzXML, mzML" converter to all directions: publicly available, runs with a graphical user interface, by the Institute of Molecular Systems Biology, ETH Zurich
:FileConverter: A command line tool that converts to/from various mass spectrometry formats,
part of TOPP
[TOPP](_blank)
Known converters for mzXML:
: The Institute for Systems Biology maintains a list of converters
Known converters for mzML:
:msConvert:
A command line tool converting to/from various mass spectrometry formats. A GUI is also available for Windows users.
: ReAdW: The Institute for Systems Biology command line converter for Thermo RAW files, part of the TransProteomicPipeline. The latest update of this tool was made in September 2009. Users are now redirected by the TPP development team to use the msConvert software (see above).
:FileConverter: A command line tool that converts to/from various mass spectrometry formats,
part of TOPP
Converters for proprietary formats:
:msConvert:
A command line tool converting to/from various mass spectrometry formats including multiple proprietary formats. A GUI is also available for Windows users.
: CompassXport,
Bruker
Bruker Corporation is an American manufacturer of scientific instruments for molecular and materials research, as well as for industrial and applied analysis. It is headquartered in Billerica, Massachusetts, and is the publicly traded parent compa ...
's free tool generating mzXML (and now mzData) files for many of their native file formats (.baf).
: MASSTransit, a software to change data between proprietary formats, by
Palisade Corporation and distributed by
Scientific Instrument Services, Inc and
PerkinElmer
PerkinElmer, Inc., previously styled Perkin-Elmer, is an American global corporation focused in the business areas of diagnostics, life science research, food, environmental and industrial testing. Its capabilities include detection, imaging, in ...
. Purchased from Palisade by John Wiley and Sons in 2020 and incorporated into KnowItAll Spectroscopy softwar
(list of file formats supported).: Aston, native support for several Agilent Chemstation, Agilent Masshunter and Thermo Isodat file formats
: unfinnigan, native support for Finnigan (*.RAW) file formats
:
OpenChrom, an open source software with support to convert various native file formats including its own open .ocb format to store chromatograms, peaks and identification results
Currently available converters are :
:: MassWolf, for
Micromass MassLynx .Raw format
:: mzStar, for
SCIEX
SCIEX is a manufacturer of mass spectrometry instrumentation used in biomedical and environmental applications. Originally started by scientists from the University of Toronto Institute for Aerospace Studies, it is now part of Danaher Corporation ...
/
ABI SCIEX/ABI Analyst format
:: wiff2dta
wiff2dta at sourceforge
/ref> for SCIEX
SCIEX is a manufacturer of mass spectrometry instrumentation used in biomedical and environmental applications. Originally started by scientists from the University of Toronto Institute for Aerospace Studies, it is now part of Danaher Corporation ...
/ ABI SCIEX/ABI Analyst format to mzXML, DTA, MGF and PMF
See also
*Mass spectrometry software
Mass spectrometry software is software used for data acquisition, analysis, or representation in mass spectrometry.
Proteomics software
In protein mass spectrometry, tandem mass spectrometry (also known as MS/MS or MS2) experiments are used f ...
References
{{reflist
Bioinformatics software
Mass spectrometry software
Proteomics