Using Mass Spectrometry As a Reaction Detector
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/855148; this version posted June 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Reactomics: using mass spectrometry as a 2 reaction detector 3 4 Miao Yu1, Lauren Petrick1,2* 5 1 Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount 6 Sinai, New York, NY, 10029, United States 7 2 Institute for Exposomic Research, Icahn School of Medicine at Mount Sinai, New York, NY, 8 10029, United States 9 *Corresponding author: Email: [email protected] Phone: +1-212-241-7351. Fax: +1- 10 646-537-9654. 11 bioRxiv preprint doi: https://doi.org/10.1101/855148; this version posted June 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 12 13 Abstract 14 15 Untargeted metabolomics analysis captures chemical reactions among small molecules. Common 16 mass spectrometry-based metabolomics workflows first identify the small molecules 17 significantly associated with the outcome of interest, then begin exploring their biochemical 18 relationships to understand biological fate (environmental studies) or biological impact 19 (physiological response). We suggest an alternative by which biochemical relationships can be 20 directly retrieved through untargeted high-resolution paired mass distance (PMD) analysis 21 without a priori knowledge of the identities of participating compounds. Retrieval is done using 22 high resolution mass spectrometry as a chemical reaction detector, where PMDs calculated from 23 the mass spectrometry data are linked to biochemical reactions obtained via data mining of small 24 molecule and reaction databases, i.e. ‘Reactomics’. We demonstrate applications of reactomics 25 including PMD network analysis, source appointment of unknown compounds, and biomarker 26 reaction discovery as a complement to compound discovery analyses used in traditional 27 untargeted workflows. An R implementation of reactomics analysis and the reaction/PMD 28 databases is available as the pmd package (https://yufree.github.io/pmd/). 29 bioRxiv preprint doi: https://doi.org/10.1101/855148; this version posted June 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 30 31 Introduction 32 33 Untargeted metabolomics or non-targeted analysis using high resolution mass spectrometry 34 (HRMS) is one of the most popular analysis methods for unbiased measurement of organic 35 compounds 1,2. A typical metabolomics sample analysis workflow will follow a detection, 36 annotation, MS/MS validation and/or standards validation process, from which interpretation of 37 the relationships between these annotated or identified compounds can then be linked to 38 biological pathways or disease development, for example. However, difficulty annotating or 39 identifying unknown compounds always limits the interpretation of findings 3. One practical 40 solution to this is matching experimentally obtained fragment ions to a mass spectral database 4, 41 but many compounds remain unreported/absent, thereby preventing annotation. Rules or data 42 mining-based prediction of in silico fragment ions is successful in many applications 2,5, but 43 these approaches are prone to overfitting the known compounds, leading to false positives. 44 Ultimately such workflows require final validation with commercially available or synthetically 45 generated analytical standards, which may not be available, for unequivocal identification. 46 47 Potential molecular structures could be discerned using biochemical knowledge, through the 48 integration of known relationships between biochemical reactions (e.g. pathway analysis) 3. Such 49 methods are readily used to annotate compounds by chemical class. For example, the Referenced 50 Kendrick Mass Defect (RKMD) was able to predict lipid class using specific mass distances for 51 lipids and heteroatoms 6, and isotope patterns in combination with specific mass distances 52 characteristic of halogenated compounds such as +Cl/-H, +Br/-H were used to screen 53 halogenated chemical compounds in environmental samples 7. For these examples, known 54 relationships among compounds were used to annotate unknown compounds, as a 55 complementary approach to obtaining compound identifications. 56 57 The most common relationships among compounds are chemical reactions. Substrate-product 58 pairs in a reaction form by exchanging functional groups or atoms. Almost all organic 59 compounds originate from biochemical processes, such as carbon fixation 8,9. Like base pairing 60 in DNA 10, organic compounds follow biochemical reaction rules, resulting in characteristic mass bioRxiv preprint doi: https://doi.org/10.1101/855148; this version posted June 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 61 differences between the paired substrates and their products. We present a new concept, paired 62 mass distance (PMD), that reflects such reaction rules by calculating the mass differences 63 between two compounds or charged ions. PMD can be used to extract biological inference 64 without identifying unknown compounds. 65 66 Exploiting mass differences for compound identification is not new. Mass distances have been 67 used to reveal isotopologue information when peaks show a PMD of 1 Da 11, adducts from a 68 single compound 12 such as PMD 22.98 Da between adducts [M+Na]+ and [M+H]+, or adducts 69 formed via complex in-source reactions 13 from mass spectrometry data. Such between- 70 compound information has also been used to make annotations of unknown compounds 4,14, to 71 classify compounds 15 or to perform pathway-independent metabolomic network analysis 16. 72 However, these calculations of PMD were used to identify compounds or pathways in ultimately 73 facilitate interpretations of the relationships between important compounds. Here, we propose 74 that PMD can be used directly, skipping the step for annotation or identification of individual 75 compounds, to aggregate information at the reaction level, called ‘Reactomics’. 76 77 HRMS can directly measure PMDs with the mass accuracy needed to provide reaction level 78 specificity. Therefore, HRMS has the potential to be used as a reaction detector to enable 79 reaction level study investigations. Here, we use multiple databases and experimental data to 80 provide a proof-of-concept for using mass spectrometry in ‘reactomics’. We also discuss 81 potential applications such as PMD network analysis, biomarker reaction discovery, and source 82 appointment of unknown compounds. We envision that these applications will reveal the 83 measurable reaction level changes without the need to assign molecular structure to unknown 84 compounds. 85 Results and Discussion 86 Definitions 87 88 We first define a reaction PMD (PMDR) using a theoretic framework. Then we demonstrate how 89 a PMDR can be calculated using KEGG reaction R00025 as an example (see Equation 1). There bioRxiv preprint doi: https://doi.org/10.1101/855148; this version posted June 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 90 are three KEGG reaction classes (RC00126, RC02541, and RC02759) associated with this 91 reaction, which is catalyzed by enzyme 1.13.12.16. 92 93 Ethylnitronate + Oxygen + Reduced FMN <=> Acetaldehyde + Nitrite + FMN + Water [1] 94 95 In general, we define a chemical reaction (PMDR) as follows: 96 97 S1 + S2 + … + Sn <=> P1 + P2 + … + Pm (n >= 1, m >= 1) [2] 98 99 Where: S means substrates and P mean products, and n and m the number of substrates and 100 products, respectively. A PMD matrix [M1] for this reaction is generated: 101 S1 S2 ... Sn P1 |S1-P1| |S2-P1| ... |Sn-P1| P2 |S1-P2| |S2-P2| ... |Sn-P2| ... ... ... ... ... Pm |S1-Pm| |S2-Pm| ... |Sn-Pm| 102 [M1] 103 For each substrate, Sk, and each product, Pi, we calculate a PMD (|Sn-Pm|) . 104 Assuming that the minimum PMD would have a similar structure or molecular framework 105 between substrate and products, we select the minimum numeric PMD for each substrate as the 106 substrate PMD (PMDSk) of the reaction (Eq. 2). 107 108 PMDSk = min(|Sk-P1|,|Sk-P2|,...,|Sk-Pm|) (1<=k<=n) [2] 109 110 Then, the PMDR , or overall Reaction PMD, is defined as the set of substrates’ PMD(s) (Eq. 3): 111 112 PMDR = {PMDS1, PMDS2,...,PMDSn} [3] 113 bioRxiv preprint doi: https://doi.org/10.1101/855148; this version posted June 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 114 For KEGG reaction R00025, S1 is Ethylnitronate, S2 is Oxygen, S3 is Reduced FMN, P1 is 115 Acetylaldehyde, P2 is Nitrite, P3 is FMN, P4 is Water, n=4, and m =3. A PMD matrix [M2] for 116 this reaction can be seen below, where we define PMDEthylnitronate = 27.023 Da, PMDOxygen = 117 12.036 Da and PMDReduced FMN = 2.016 Da. 118 Ethylnitronate Oxygen Reduced FMN Acetaldehyde 29.998 Da 12.036 Da 414.094 Da Nitrite 27.023 Da 15.011Da 411.120 Da FMN 382.080 Da 424.115 Da 2.016 Da H2O 56.014 Da 13.979 Da 440.110 Da 119 [M2] 120 121 In our example, there are three PMDR calculated from three PMDS: PMDR is 27.023 Da, which 122 is equivalent to the mass difference between two carbon atoms and three hydrogen atoms: PMDR 123 is 12.036 Da for the additions of two carbon atoms and four hydrogen atoms and loss of one 124 oxygen atom: and PMDR is 2.016 Da for the addition of two hydrogen atoms. However, other 125 reactions may have multiple PMDS that generate the same PMDR value, such as certain 126 combination reactions or replacement reactions.