Contents
Mass spectrometry data format
Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Open formats
JCAMP-DX
This format was one of the earliest attempts to supply a standardized file format for data exchange in mass spectrometry. JCAMP-DX was initially developed for infrared spectrometry. JCAMP-DX is an ASCII based format and therefore not very compact even though it includes standards for file compression. JCAMP was officially released in 1988. Together with the American Society for Mass Spectrometry a JCAMP-DX format for mass spectrometry was developed with aim to preserve legacy data.
ANDI-MS or netCDF
The Analytical Data Interchange Format for Mass Spectrometry is a format for exchanging data. Many mass spectrometry software packages can read or write ANDI files. ANDI is specified in the ASTM E1947 Standard. ANDI is based on netCDF which is a software tool library for writing and reading data files. ANDI was initially developed for chromatography-MS data and therefore was not used in the proteomics gold rush where new formats based on XML were developed.
AnIML
AnIML is a joined effort of IUPAC and ASTM International to create an XML based standard that covers a wide variety of analytical techniques including mass spectrometry.
mzData
mzData was the first attempt by the Proteomics Standards Initiative (PSI) from the Human Proteome Organization (HUPO) to create a standardized format for Mass Spectrometry data. This format is now deprecated, and replaced by mzML.
mzXML
mzXML is a XML (eXtensible Markup Language) based common file format for proteomics mass spectrometric data. This format was developed at the Seattle Proteome Center/Institute for Systems Biology while the HUPO-PSI was trying to specify the standardized mzData format, and is still in use in the proteomics community.
YAFMS
Yet Another Format for Mass Spectrometry (YAFMS) is a suggestion to save data in four table relational server-less database schema with data extraction and appending being exercised using SQL queries.
mzML
As two formats (mzData and mzXML) for representing the same information is an undesirable state, a joint effort was set by HUPO-PSI, the SPC/ISB and instrument vendors to create a unified standard borrowing the best aspects of both mzData and mzXML, and intended to replace them. Originally called dataXML, it was officially announced as mzML. The first specification was published in June 2008. This format was officially released at the 2008 American Society for Mass Spectrometry Meeting, and is since then relatively stable with very few updates. On 1 June 2009, mzML 1.1.0 was released. There are no planned further changes as of 2013.
mzAPI
Instead of defining new file formats and writing converters for proprietary vendor formats a group of scientists proposed to define a common application program interface to shift the burden of standards compliance to the instrument manufacturers' existing data access libraries.
mz5
The mz5 format addresses the performance problems of the previous XML based formats. It uses the mzML ontology, but saves the data using the HDF5 backend for reduced storage space requirements and improved read/write speed.
imzML
The imzML standard was proposed to exchange data from mass spectrometry imaging in a standardized XML file based on the mzML ontology. It splits experimental data into XML and spectral data in a binary file. Both files are linked by a universally unique identifier.
mzDB
mzDB saves data in an SQLite database to save on storage space and improve access times as the data points can be queried from a relational database.
Toffee
Toffee is an open lossless file format for data-independent acquisition mass spectrometry. It leverages HDF5 and aims to achieve file sizes similar to those from the proprietary and closed vendor formats.
mzMLb
mzMLb is another take on using a HDF5 backend for performant raw data saving. It, however, preserves the mzML XML data structure and stays compliant to the existing standard.
Allotrope
The Allotrope Foundation curates a HDF5 and Triplestore based file format named Allotrope Data Format (ADF) and a flat JSON representation ASM short for Allotrope Simple Model. Both are based on the Allotrope Foundation Ontologies (AFO) and contain schemas for mass spectrometry and chromatography coupled with MS detectors.
Proprietary formats
Below is a table of different file format extensions. ! align="left"| Company ! align="left" |Extension ! align="left" |File type () Note that the RAW formats of each vendor are not interchangeable; software from one cannot handle the RAW files from another. () Micromass was acquired by Waters in 1997 () Finnigan is a division of Thermo
Software
Viewers
There are several viewers for mzXML, mzML and mzData. These viewers are of two types: Free Open Source Software (FOSS) or Proprietary. In the FOSS viewer category, one can find MZmine, mineXpert2 (mzXML, mzML, native timsTOF, xy, MGF, BafAscii) MS-Spectre, TOPPView (mzXML, mzML and mzData), Spectra Viewer, SeeMS, msInspect, jmzML. In the proprietary category, one can find PEAKS, Insilicos, Mascot Distiller, Elsci Peaksel. There is a viewer for ITA images. ITA and ITM images can be parsed with the pySPM python library.
Converters
Known converters for mzData to mzXML: Known converters for mzXML: Known converters for mzML: Converters for proprietary formats: Currently available converters are :
This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.
Wikipedia® is a registered trademark of the
Wikimedia Foundation, Inc.
Bliptext is not
affiliated with or endorsed by Wikipedia or the
Wikimedia Foundation.