Download and Further Explo- Ration
Total Page:16
File Type:pdf, Size:1020Kb
Priya et al. Plant Methods (2018) 14:4 https://doi.org/10.1186/s13007-017-0269-0 Plant Methods RESEARCH Open Access Terzyme: a tool for identifcation and analysis of the plant terpenome Piyush Priya1, Archana Yadav1, Jyoti Chand1 and Gitanjali Yadav1,2* Abstract Background: Terpenoid hydrocarbons represent the largest and most ancient group of phytochemicals, such that the entire chemical library of a plant is often referred to as its ‘terpenome’. Besides having numerous pharmacological properties, terpenes contribute to the scent of the rose, the favors of cinnamon and the yellow of sunfowers. Rapidly increasing -omics datasets provide an unprecedented opportunity for terpenome detection, paving the way for auto- mated web resources dedicated to phytochemical predictions in genomic data. Results: We have developed Terzyme, a predictive algorithm for identifcation, classifcation and assignment of broad substrate unit to terpene synthase (TPS) and prenyl transferase (PT) enzymes, known to generate the enormous struc- tural and functional diversity of terpenoid compounds across the plant kingdom. Terzyme uses sequence information, plant taxonomy and machine learning methods for predicting TPSs and PTs in genome and proteome datasets. We demonstrate a signifcant enrichment of the currently identifed terpenome by running Terzyme on more than 40 plants. Conclusions: Terzyme is the result of a rigorous analysis of evolutionary relationships between hundreds of charac- terized sequences of TPSs and PTs with known specifcities, followed by analysis of genome-wide gene distribution patterns, ontology based clustering and optimization of various parameters for building accurate profle Hidden Markov Models. The predictive webserver and database is freely available at http://nipgr.res.in/terzyme.html and would serve as a useful tool for deciphering the species-specifc phytochemical potential of plant genomes. Keywords: Terpenome, Terpene synthase (TPS), Prenyl transferase (PT), Hidden Markov Models (HMM), GO clustering, Pathway mapping, Phytochemicals Background species has its unique arsenal of secondary metabolites, Modern plants have adapted to the sessile nature of life many of which are of great signifcance to humans [2, 3]. on land by evolving mechanisms for chemical communi- Isoprenoids or ‘terpenoids’ represent the largest, most cation and defence, mediated via low molecular weight ancient group of phytochemicals, and the entire chemical compounds, often with complex structures, which have library of a plant is often referred to as the ‘terpenome’ the ability to function in diverse physiological, develop- [4]. Well-known terpenoids include citral, menthol, mental and evolutionary processes [1]. Tese phytochem- camphor, cannabinoids and the curcuminoids found icals, grouped together as plant secondary metabolites, in turmeric and mustard seeds. Biosynthesis of terpe- have diversifed in both structure and function via gene nes requires the condensation of universal precursor C 5 duplications followed by sub-functionalisation and posi- isoprene units to form C15 or C20 prenyl diphosphates tive selection for metabolite expansion, such that each (PDPs), catalyzed by short chain prenyl transferase (PT) enzymes, followed by multi-step cyclization reactions catalysed by a huge family of unique enzymes called the *Correspondence: [email protected]; [email protected] terpene synthases (TPSs) [5, 6]. TPSs catalyze one of the 1 Computational Biology Laboratory, National Institute of Plant Genome most complex reactions known to chemistry and biology, Research, Aruna Asaf Ali Marg, New Delhi 110067, India wherein, hundreds of regio- and stereo-specifc products Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Priya et al. Plant Methods (2018) 14:4 Page 2 of 18 can be made from a single substrate by binding and steer- In this work, we present a comprehensive attempt to ing polyisoprene substrates through a precise, multistep identify and classify the PT and TPS gene families in 42 cyclization cascade that is initiated by the propagation of plant species for which nuclear genome sequence data a highly reactive carbocation [7, 8]. Both PTs and TPSs is available in the public domain, leading to the develop- have a distinct ‘terpene fold’ composed largely of inert ment of Terzyme, an interactive online webserver and amino acids (aa) lining a central active site [9]. Tey can database for predictive identifcation and analysis of the exhibit very high specifcity in product formation with plant terpenome. We also present a detailed computa- remarkable stereochemical precision [10], as well as huge tional analysis that was undertaken to assess TPS gene chemical promiscuity [11]. Molecular investigations of distribution patterns, domain organization and poten- TPSs are an active area of research from the perspec- tial functional roles, in order to understand the evolu- tive of metabolic engineering. TPSs have been identifed tion of novel biochemical functions in diferent lineages, and characterized in model plant species of commercial and to unravel the complexity of the plant terpenome. and agronomic value such as Arabidopsis thaliana [12], Assessment of genome wide distribution patterns as well Citrus [13], Vitis vinifera [14] and Solanum lycopersicum as clustering among the genes of the identifed TPSs is [15], as well as in various gymnosperms [16–18]. important in view of the fact that plants are well known Te TPS gene family in plants reveals functional diver- for the occurrence of both genic and chromosomal dupli- sifcation with members showing clear divergence in dif- cations that have resulted in the widespread existence of ferent lineages despite similar sequences and structures gene families in this kingdom, apart from being associ- [19]. Tis makes it quite challenging to assign substrate ated with subsequent evolutionary divergence via sub- specifcity to a newly annotated TPS sequence [20]. TPSs functionalization or neo-functionalization [23]. We hope have been classifed according to two major classifca- that Terzyme will provide insights into the concept of tion schemes; one based on their functional roles and lineage-specifc expansion in the PT and TPS families in product formation, whereas the other is based entirely various plant species together with their functional roles on sequence homology. As per the former scheme, TPSs and to understand the evolution of terpene biosynthetic are divided into three subclasses, namely, monoterpene machinery in plants. synthases (Mono-TPSs), sesquiterpene synthases (Ses- qui-TPSs), and diterpene synthases (Di-TPSs), depend- Results ing upon the number of isoprene units condensed by the Annotated terpenome data enzyme, which may be two, three or four, respectively. In Te curated terpenome data was compiled as described terms of protein length, Di-TPSs are the longest (> 850aa) in methods, involving retrieval of sequences from the as compared to monoterpene synthases (ranging from NCBI Protein Database via keyword specifc search for 600 to 650aa), and sesquiterpene synthases (between 550 prenyl transferases (PTs) as well as all TPS functional and 580aa long), and this diference arises from an inter- classes (mono-, di- and sesqui-TPSs) as well as the spersed sequence element in Di-TPSs, conserved both in sequence homology based gene-family classes, namely location and amino acid composition [21]. According to TPSa to TPSg [19]. Te function-based (FB dataset) the second TPS classifcation scheme, seven families are consisted of 401 representatives sequences, including recognized currently, from TPSa to TPSg, with the origi- 154 monoterpene synthases, 71 diterpene synthases and nal clades of TPSe and TPSf merged into a single TPS-e/f 176 sesquiterpene synthases, as shown in Table 1. Tese subfamily [19, 20, 22]. Of these, the TPSc clade is pro- sequences represent diverse taxonomic classes of green posed to be the most ancient, and contains mono- and plants, including land plants, which further include seed bifunctional copalyl diphosphate synthase (CPS) proteins plants, with the exception of chlorophytes. For prenyl from gymnosperms as well as angiosperms. TPSd clade transferases, a total of 301 PT sequences were compiled is specifc to gymnosperms while the TPSa, b and g sub- as shown in the last column of Table 1, and this data was families are angiosperm-specifc. TPS-e/f combines the called the PT dataset. Mosses have not been reported sister subclade-e (representing kaurene synthase B) and to have any prenyl transferases at all. Additional fle 1: its derivative subclade-f that contains linalool synthases, Table 1 provides a detailed list of accession IDs for each hypothesized to be dicot-specifc [20, 22]. From a physi- sequence used in the FB and PT datasets. Te gene-fam- ological viewpoint, TPSe and TPSc subfamily members ily based GB dataset was also compiled as described in including the (−)-CPS