<<

correspondence METLIN MS2 molecular standards database: a broad chemical and biological resource

To the Editor — Tandem mass spectrometry the METLIN MS2 database of molecular developed to aid in the identification of (MS2) data provide high-confidence standards offers an opportunity to quantify unknowns and the discovery of novel molecular identification of known this improved confidence (Fig. 1). METLIN molecules4 and operates by using fragment and preliminary characterization of novel, now hosts over 850,000 molecular standards ion data to help align an unknown unknown molecules (unknowns). However, with MS2 data generated in both positive to compounds with similar fragmentation for databases to be an effective resource, and negative ionization modes at multiple data within a database to help further broad chemical space coverage is necessary. collision energies, collectively containing identify and characterize it4. METLIN’s Consequently, we have created METLIN over 4,000,000 curated high-resolution fragment ion similarity searching and (http://metlin.scripps.edu) a highly annotated tandem mass spectra. Thus, the size of neutral loss similarity searching are applied and structurally diverse database of over METLIN makes molecular annotation and in identifying endogenous metabolites, 850,000 molecular standards. METLIN’s identification more feasible. In comparison, drugs and drug metabolites, as well as tandem mass spectral library numerically the National Institute of Standards and biotransformation products of xenobiotics5. covers almost 1% of PubChem’s 93 million Technology (NIST) MS2 database, the METLIN facilitates both endogenous and compounds, essentially a number that can be next largest molecular standards database, exogenous compound identification. For characterized as the known chemical space. contains 15,000 standards (Fig. 1a). example, it contains MS2 data for over The utility of MS2 data acquisition is The combination of METLIN’s 60,000 analogs and over 6,000 especially advantageous when coupled with molecular standards and systematically analogs (Fig. 1b), among others. liquid chromatography–mass spectrometry acquired experimental data allows for the The expanded METLIN database will (LC–MS) analysis of complex samples1,2. examination of the impact that MS2 data has enable new types of analyses. First, we Such datasets typically have tens of on the identification of known molecules. expect that a MS2 database of this size can thousands of features3, and while accurate For example, when METLIN is searched substantially reduce the magnitude of false single-stage MS measurements of precursor against precursor m/z values at varying parts positives that molecular identification based molecular ions can provide putative per million (ppm) errors, the number of solely on molecular ion values can generate. identifications, these measurements alone hits typically ranges from tens to hundreds Second, while very high accuracy MS2 data are not sufficient to structurally characterize of compounds. However, with the addition are useful, they do not substantially enhance the number of compounds having identical of MS2 data, the false positive rate can identification confidence. Therefore, or similar molecular weights. Therefore, be minimized to only a few compounds. low-resolution instrumentation can be more characterizing every feature in an LC–MS Beyond providing molecular identification broadly used for relatively sophisticated dataset is challenging and currently not through its multiple search capabilities (for experiments by chemists and biologists who possible. However, implementation of MS2 example, MS2, batch, name and elemental do not have access to high-end equipment6. provides structural information that greatly composition), METLIN’s expansion And finally, METLIN can be applied for increases the confidence of molecular will facilitate similarity searching. The identification of unknown compounds via identifications2. The recent expansion of similarity searching algorithm was originally fragment ion and neutral loss similarity

a Tandem MS databases b Database endogenous and exogenous chemical substructures

900K 850K METLIN mzCloud

METLIN spectra 50K 2 HMDB MoNA 25K Pyrimidine

Pyrazole GNPS data 2 Purine 400K 2K with MS 50K

No. of molecular standards 1K NIST MoNA mzCloud GNPS HMDB

15K 10K 9K 6K 2K 0 No. of compounds with experimental MS 0 Database Substructure frequency within databases

Fig. 1 | The METLIN MS2 database. a, Comparison of databases containing MS2 data from molecular standards9,10. b, MS2 database comparison of commonly observed endogenous substructures and drug scaffolds.

Nature Methods | VOL 17 | October 2020 | 953–954 | www.nature.com/naturemethods 953 correspondence

searching. For example, synthetic chemists Jingchuan Xue1,4, Carlos Guijas 1,4, 8. Clayton, T. A., Baker, D., Lindon, J. C., Everett, J. R. & Nicholson, J. K. Proc. Natl Acad. Sci. USA 106, 14728–14733 (2009). 1 2 can apply METLIN to the structural H. Paul Benton , Benedikt Warth and 9. Wang, M. et al. Nat. Biotechnol. 38, 23–26 (2020). elucidation of unexpected products, while Gary Siuzdak 1,3 ✉ 10. Wishart, D. S. et al. Nucleic Acids Res. 46, D608–D617 biochemists can use it to identify the 1Scripps Center for Metabolomics and Mass (2018). D1. plethora of bacterial and human metabolites Spectrometry, Scripps Research Institute, La Jolla, Acknowledgements 2 in microbiome and exposome studies, and CA, USA. Department of Food and This research was partially funded by US National it has unexplored potential in the chemical, Toxicology, Faculty of Chemistry, University of Institutes of Health grants R35 GM130385 (G.S.), P30 7,8 toxicological and pharmaceutical sciences . Vienna, Vienna, Austria. 3Department of Molecular MH062261 (G.S.), P01 DA026146 (G.S.) and U01 CA235493 (G.S.) and by Ecosystems and Networks and Computational Biology, Scripps Research Integrated with Genes and Molecular Assemblies 4 Data availability Institute, La Jolla, CA, USA. Tese authors (ENIGMA), a Scientific Focus Area Program at Lawrence The data in the METLIN database are contributed equally: Jingchuan Xue, Carlos Guijas. Berkeley National Laboratory for the US Department available at http://metlin.scripps.edu. The ✉e-mail: [email protected] of Energy, Office of Science, Office of Biological and data in other databases mentioned in this Environmental Research, under contract number study were obtained from their websites Published online: 24 August 2020 DE-AC02-05CH11231 (G.S.). (accessed in February 2020) or published https://doi.org/10.1038/s41592-020-0942-5 Author contributions papers: MoNA (https://mona.fiehnlab. J.X., C.G., H.P.B., B.W. and G.S. contributed to data ucdavis.edu/), mzCloud (https://www. References collection, analysis, and manuscript writing. J.X. and C.G. mzcloud.org/), GNPS (https://gnps.ucsd. 1. Guijas, C. et al. Anal. Chem. 90, 3156–3164 (2018). share first authorship. 2. Tautenhahn, R. et al. Nat. Biotechnol. 30, edu/)9, HMDB (https://hmdb.ca/)10 and 826–828 (2012). Competing interests NIST 17 (https://chemdata.nist.gov/). ❐ 3. Kafader, J. O. et al. Nat. Methods 17, 391–394 (2020). 4. Benton, H. P., Wong, D. M., Trauger, S. A. & Siuzdak, G. The authors declare no competing interests. Anal. Chem. 80, 6382–6389 (2008). 5. Flasch, M. et al. ACS Chem. Biol. 15, 970–981 (2020). Additional information Editorial note: This article has been peer 6. Xue, J. et al. Anal. Chem. 92, 6051–6059 (2020). Supplementary information is available for this paper at reviewed. 7. Quinn, R. A. et al. Nature 579, 123–129 (2020). https://doi.org/10.1038/s41592-020-0942-5.

954 Nature Methods | VOL 17 | October 2020 | 953–954 | www.nature.com/naturemethods