Predicting in Silico Electron Ionization Mass Spectra Using Quantum Chemistry

Predicting in Silico Electron Ionization Mass Spectra Using Quantum Chemistry

UC Davis UC Davis Previously Published Works Title Predicting in silico electron ionization mass spectra using quantum chemistry. Permalink https://escholarship.org/uc/item/5n80507v Journal Journal of cheminformatics, 12(1) ISSN 1758-2946 Authors Wang, Shunyang Kind, Tobias Tantillo, Dean J et al. Publication Date 2020-10-20 DOI 10.1186/s13321-020-00470-3 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Wang et al. J Cheminform (2020) 12:63 https://doi.org/10.1186/s13321-020-00470-3 Journal of Cheminformatics RESEARCH ARTICLE Open Access Predicting in silico electron ionization mass spectra using quantum chemistry Shunyang Wang1,2, Tobias Kind1, Dean J. Tantillo2 and Oliver Fiehn1* Abstract Compound identifcation by mass spectrometry needs reference mass spectra. While there are over 102 million com- pounds in PubChem, less than 300,000 curated electron ionization (EI) mass spectra are available from NIST or MoNA mass spectral databases. Here, we test quantum chemistry methods (QCEIMS) to generate in silico EI mass spectra (MS) by combining molecular dynamics (MD) with statistical methods. To test the accuracy of predictions, in silico mass spectra of 451 small molecules were generated and compared to experimental spectra from the NIST 17 mass spectral library. The compounds covered 43 chemical classes, ranging up to 358 Da. Organic oxygen compounds had a lower matching accuracy, while computation time exponentially increased with molecular size. The param- eter space was probed to increase prediction accuracy including initial temperatures, the number of MD trajectories and impact excess energy (IEE). Conformational fexibility was not correlated to the accuracy of predictions. Overall, QCEIMS can predict 70 eV electron ionization spectra of chemicals from frst principles. Improved methods to calcu- late potential energy surfaces (PES) are still needed before QCEIMS mass spectra of novel molecules can be generated at large scale. Keywords: Quantum chemistry, Similarity score, Mass spectra, QCEIMS Introduction identifed in GC–MS based metabolomics [4]. To solve Mass spectrometry is the most important analytical tech- this problem, the size and complexity of MS libraries nique to detect and analyze small molecules. Gas chro- must be increased. Several approaches have been devel- matography coupled to mass spectrometry (GC/MS) is oped to compute 70 eV mass spectra, including machine frequently used for such molecules and has been stand- learning [5, 6], reaction rule-based methods [7] and a ardized with electron ionization (EI) at 70 eV more than method based on physical principles, the recently devel- 50 years ago [1]. Yet, current mass spectral libraries are oped quantum chemical software Quantum Chemical still insufcient in breadth and scope to identify all chem- Electron Ionization Mass Spectrometry (QCEIMS) [8]. icals detected: there are only 306,622 EI-MS compound While empirical and machine learning methods depend spectra in the NIST 17 mass spectral database [2], while on experimental mass spectral data for development, PubChem has recorded 102 million known chemical quantum chemical methods only consider physical laws. compounds of which 14 million are commercially avail- Tus, in principle, QCEIMS can compute spectra for able. Tat means there is a large discrepancy between any given compound structure. Yet, approximations and compounds and associated reference mass spectra [3]. parameter estimations are needed to allow predictions in For example, less than 30% of all detected peaks can be a timely manner, reducing the accuracy of QCEIMS pre- dictions. QCEIMS uses Born–Oppenheimer molecular *Correspondence: [email protected] dynamics (MD) to calculate fragment ions within pico- 1 West Coast Metabolomics Center, UC Davis Genome Center, University second reaction times with femtosecond intervals for the of California, 451 Health Sciences Drive, Davis, CA 95616, USA MD trajectories. A statistical sampling process is used to Full list of author information is available at the end of the article © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/ zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Wang et al. J Cheminform (2020) 12:63 Page 2 of 11 count the number of observed fragments and to derive Parallel cluster calculation with QCEIMS the peak abundances for each observed ion [9] (Fig. 1). We utilized the QCEIMS program for in silico fragmen- It is unclear how reliable QCEIMS predictions are tation with the following parameters: 70 eV ionization because the methods have not yet been tested on hun- energy, 500 K initial temperature and 0.5 femtosecond dreds of compounds. MS matching accuracy is neither (fs) time steps. For molecular dynamics, we used the easily predictable nor quantifable, because theoretical semiempirical OM2 method [16] (Quantum-Chemi- and experimental EI mass spectra have not been com- cal-Orthogonalization-Corrected Method) using the pared on a large scale. To test how structural constraints MNDO99 (v2013) [17] software. Te impact excess afect prediction accuracies, we utilized the QCEIMS energy (IEE) satisfed the Poisson type distribution. Te method to predict spectra of 451 compounds with difer- Orca software (3.0.0) [18] was employed to calculate the ent molecular fexibility, sizes and chemical classes. vertical SCF ionization potential at the PBE0 [19] – D3 [20] /SV(p) [21] level. Methods We conducted QCEIMS calculations on cluster nodes Molecular structure preparation equipped with two Intel Xeon E5-2699Av4 CPUs, 44 We used ChemAxon’s [10] MarvinView and Marvin- cores and 88 threads in total, operated at 2.40 GHz. Each Sketch (v18.23) to manipulate structures. First, small node was equipped with 128 GByte RAM and a 240 molecules were manually chosen from the NIST 17 GByte Intel DCS3500 datacenter grade SSD. In order to mass spectral database. 3-D coordinates were generated conduct and monitor the calculation process, we devel- using the Merck Molecular Force Field (MMFF94) [11] oped a SLURM job script to submit batch jobs. While with Avogadro (v1.2.0) [12] in Molfles (*.mol) format. the initial ground state molecular dynamics simulation We used OpenBabel (v2.3.90) [13] to convert structures is only single-threaded, all subsequent calculations were to the TurboMole format (*.tmol) as required by the massively parallelized. Because QCEIMS executes multi- QCEIMS (v2.16) program. We used the QCEIMS plotms ple trajectory calculations at once, we oversubscribed the program to export JCAMP-DX mass spectra. External parallel number of CPU threads to be used to 66 (instead additional conformers were generated independently by of 44) during QCEIMS production runs. Such a CPU conformational search packages, including GMMX from oversubscription is possible, because molecular dynam- Gaussian [14], the conformer generator in ChemAxon’s ics (OM2 with MNDO99) and density functional theory MarvinSketch and by using RDKit [15] (v2019.03.1). (DFT) calculations are executed in a heterogeneous way by diferent programs [8]. Te speed advantage of using more threads than CPU cores available was confrmed with benchmarks. Similarity score evaluation Ground- QCEIMS generated several outputs and logging fles, state MD including the in silico mass spectrum in JCAMP Internal exchange format (*.jdx), structures of fragments (*.xyz) conversion and molecular dynamics trajectories (*.xyz). We then gy Ion state used experimental mass spectra from the NIST17 data- rnal ener base as references to compare with our computational Inte results. In GC–MS, mass spectral similarity scores (0 MD Electronic Electronic ground state excited state to 1000) describe how well experimental spectra match + Fragmentation recorded library spectra [22, 23]. Here we used the int same principle for QCEIMS-generated spectra as input. Generate We used two diferent kinds of similarity scores (see spectrum Eqs. 1–3): 2 m/z ( I I ) Cos = U L Fig. 1 Workfow of QCEIMS. (1) generating conformers by 2 2 (1) I IU equilibrium molecular dynamics; (2) ionizing each neutral starting L structure by assigning impact excess energy (IEE) to kinetic energy; (3) generating EI fragments by parallel molecular dynamics; (4) ( W W )2 assigning charges on each fragment using ionization potential (IP) U L Dot = 2 2 (2) energies and peak intensity counts, then assembling fragments to WL WU obtain summary spectra Wang et al. J Cheminform (2020) 12:63 Page 3 of 11 m n W = Peak intensity] Mass] (3) molecule, 3-cyclobutene-1,2-dione (Fig. 2). Te observed fragment

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us