Expanding the Coverage of the Metabolic Landscape in Cultivated Rice with Integrated Computational Approaches
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.03.04.976266; this version posted March 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Expanding the Coverage of the Metabolic Landscape in 2 Cultivated Rice with Integrated Computational Approaches 3 4 5 Xuetong Li1,4,#,a, Hongxia Zhou1,4,#,b, Ning Xiao2,c, Xueting Wu1,d, Yuanhong Shan1,e, 6 Longxian Chen1, 4,f, Cuiting Wang1,g, Zixuan Wang1,h, Jirong Huang3,*,i, Aihong Li2,*,j, 7 and Xuan Li1,4,*,k 8 9 1Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant 10 Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, 11 Shanghai 200032, China 12 2Lixiahe Agricultural Research Institute of Jiangsu Province, Yangzhou 225007, 13 China 14 3Department of Biology, College of Life and Environmental Sciences, Shanghai 15 Normal University, Shanghai 200234, China 16 4University of Chinese Academy of Sciences, Beijing 100039, China 17 18 19 20 # Equal contribution. 21 * Corresponding authors. 22 Email: [email protected] (Li X), [email protected] (Li A), [email protected] 23 (Huang J) 24 25 26 Running title: Li X et al / Expanding the Coverage of the Metabolic Landscape 1 / 32 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.04.976266; this version posted March 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 27 aORCID: 0000-0003-0029-2296 28 bORCID: 0000-0001-9206-2580 29 cORCID: 0000-0001-6181-2684 30 dORCID: 0000-0002-8644-124X 31 eORCID: 0000-0002-2169-7308 32 fORCID: 0000-0002-1209-1945 33 gORCID: 0000-0002-8251-5774 34 hORCID: 0000-0002-4198-7230 35 iORCID: 0000-0002-4032-4566 36 jORCID: 0000-0001-6161-9796 37 kORCID: 0000-0002-7909-7241 38 39 40 Total word counts (from “Introduction” to “Conclusions” or “Materials and methods”): 41 4856 42 Total references: 71 43 Total figures: 5 44 Total tables: 0 45 Total supplementary figures: 10 46 Total supplementary tables: 13 47 Total supplementary files: 1 48 Total letters in the article title: 96 49 Total letters in running title: 51 50 Total word counts in abstract: 205 51 2 / 32 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.04.976266; this version posted March 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 52 Abstract 53 Genome-scale metabolomics analysis is increasingly used for pathway and function 54 discovery in post-genomics era. The great potential offered by developed mass 55 spectrometry (MS)-based technology has been hindered by the obstacle that only a 56 small portion of detected metabolites were identifiable so far. To address the critical 57 issue of low identification coverage in metabolomics, we adopted a deep 58 metabolomics analysis strategy by integrating advanced algorithms and expanded 59 reference databases. The experimental reference spectra, and in silico reference 60 spectra were adopted to facilitate the structural annotation. To further characterize the 61 structure of metabolites, two approaches, structural motif search combined with 62 neutral loss scanning, and metabolite association network were incorporated into our 63 strategy. An untargeted metabolomics analysis was performed on 150 rice cultivars 64 using Ultra Performance Liquid Chromatography (UPLC)-Quadrupole (Q)-Orbitrap 65 mass spectrometer. 1939 of 4491 metabolite features in MS/MS spectral tag (MS2T) 66 library were annotated, representing an extension of annotation coverage by an order 67 of magnitude on rice. The differential accumulation patterns of flavonoids between 68 indica and japonica cultivars were revealed, especially O-sulfated flavonoids. A series 69 of closely-related flavonolignans were characterized, adding further evidence for the 70 crucial role of tricin-oligolignols in lignification. Our study provides a great template 71 in the exploration of phytochemical diversity for more plant species. 72 73 KEYWORDS: Untargeted metabolomics; MS/MS spectral tag; Structural 74 characterization; Phytochemical diversity; Flavonoid derivatives 75 3 / 32 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.04.976266; this version posted March 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 76 Introduction 77 It is estimated that there are from 200,000 to 1,000,000 metabolites produced in green 78 plants, underlying their broad chemical diversity and metabolic complexity [1]. 79 Genome-scale metabolomics analysis has become a powerful tool in the elucidation 80 of functional gene and pathway for diverse phytochemicals [2-5]. The more recent 81 progresses in UPLC coupled with high-resolution MS, allow detecting metabolites at 82 unparalleled sensitivity, resolution, accuracy, and throughput [6]. However, the great 83 power in advanced liquid-phase separation and mass spectrometry technology has 84 been limited, considering a vast majority of metabolite features detected from plants 85 remain unidentified in current status [7, 8]. It is a major challenge to detect and 86 identify the massive amount of heterogeneous phytochemicals with high dynamic 87 range in concentrations, chemical and physical properties, and structures. The lagging 88 in identification of metabolites from plant sources can be attributed to various factors, 89 e.g., the insufficient performance of early MS-based platforms, the structural 90 complexity of diverse metabolites, the limited availability of reference mass spectra 91 from standard compounds, and the low throughput for processing and structure 92 elucidating of mass spectral data [9-12]. It is critical to handle and resolve the 93 metabolomics data efficiently, in order to bridge the gap between technological 94 advance and demands of plant metabolomics research. In recent years, progresses 95 have been made in the improvement of metabolite annotation coverage through 96 collecting reference mass spectra from more standard compounds [13-16], and 97 developing computer-assisted approaches to facilitate the structure elucidation of 98 metabolites [17-20]. 99 Rice (Oryza sativa L.) is one of the major staple foods worldwide, and it is 100 critical to explore its chemical compositions and metabolic traits for the enhancement 101 of grains quality and nutritional value [21, 22]. The two major subspecies of 102 cultivated rice, indica and japonica, formed during domestication, display distinct 103 features in morphology and physiology [23-25]. In recent years, a series of studies on 104 rice metabolomics were performed, which provides the foundation for the metabolic 105 components of rice [2, 5, 26, 27]. However, there are plenty of unknown metabolite 106 features in above studies and the metabolic diversity of rice still needs further efforts 107 to explore. Other studies focused on phytochemical genomics to dissect the 108 underlying genetics basis of biosynthesis and physiological function of metabolites 4 / 32 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.04.976266; this version posted March 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 109 during the evolution and adaptation of plants [28]. The metabolic quantitative trait 110 loci (mQTL) mapping and metabolic genome-wide association study (mGWAS) were 111 used to reveal the genetic polymorphisms and candidate genes that affected metabolic 112 traits in rice [2, 5, 27, 29]. 113 Our current study was designed to address the key issue in plant metabolomics, 114 that is, the low identification coverage of metabolites. We sought to expand the 115 annotation coverage with computational approaches, by adopting a deep 116 metabolomics analysis strategy that combines experimental and in silico reference 117 mass spectral libraries, and advanced algorithms. The structural motif search 118 combined with neutral loss scanning and metabolite association network methods 119 were integrated in our strategy to facilitate the characterization of structure and 120 potential function of novel metabolites without reference from above libraries. As a 121 proof-of-concept study, using state-of-the-art UPLC-Q-Orbitrap mass spectrometer 122 platform, we performed an untargeted metabolomics analysis on a core collection of 123 150 indica and japonica cultivars grown in northeastern and southeastern China. A 124 MS2T library for rice grains was constructed containing 4491 metabolite features, and 125 of which 1939 were annotated. The annotation coverage of rice metabolome was 126 significantly improved through our strategy. Further, our analyses revealed the 127 systematic difference of metabolomes between indica and japonica subspecies and 128 major differential accumulation patterns of flavonoid derivatives, especially 129 O-sulfated flavonoids. A group of closely-related flavonolignans were newly uncover 130 in rice, which provided further evidence for the crucial role of tricin-oligolignols in 131 lignification of monocots. Our deep metabolomics analysis strategy expanded our 132 understanding of phytochemical diversity and function in rice, which has profound 133 implication for improving the quality and nutritional