Genome Sequencing of Turmeric Provides Evolutionary Insights Into Its Medicinal Properties
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Title: Genome sequencing of turmeric provides evolutionary insights into its medicinal properties 2 Authors: Abhisek Chakraborty, Shruti Mahajan, Shubham K. Jaiswal, Vineet K. Sharma* 3 4 Affiliation: 5 MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and 6 Research Bhopal 7 8 *Corresponding Author email: 9 Vineet K. Sharma - [email protected] 10 11 E-mail addresses of authors: 12 Abhisek Chakraborty - [email protected], Shruti Mahajan - [email protected], Shubham K. 13 Jaiswal - [email protected], Vineet K. Sharma - [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 14 ABSTRACT 15 Curcuma longa, or turmeric, is traditionally known for its immense medicinal properties and has 16 diverse therapeutic applications. However, the absence of a reference genome sequence is a limiting 17 factor in understanding the genomic basis of the origin of its medicinal properties. In this study, we 18 present the draft genome sequence of Curcuma longa, the first species sequenced from 19 Zingiberaceae plant family, constructed using 10x Genomics linked reads. For comprehensive gene 20 set prediction and for insights into its gene expression, the transcriptome sequencing of leaf tissue 21 was also performed. The draft genome assembly had a size of 1.24 Gbp with ~74% repetitive 22 sequences, and contained 56,036 coding gene sequences. The phylogenetic position of Curcuma 23 longa was resolved through a comprehensive genome-wide phylogenetic analysis with 16 other 24 plant species. Using 5,294 orthogroups, the comparative evolutionary analysis performed across 17 25 species including Curcuma longa revealed evolution in genes associated with secondary metabolism, 26 plant phytohormones signaling, and various biotic and abiotic stress tolerance responses. These 27 mechanisms are crucial for perennial and rhizomatous plants such as Curcuma longa for defense and 28 environmental stress tolerance via production of secondary metabolites, which are associated with 29 the wide range of medicinal properties in Curcuma longa. 30 31 INTRODUCTION 32 Turmeric, a common name for Curcuma longa, is traditionally used as a herb and spice since 4,000 33 years in Southern Asia [1]. It has a long history of usage in medicinal applications, as an edible dye, 34 as a preservative in many food materials, in religious ceremonies, and is now widely used in 35 cosmetics throughout the world [1], [2]. It is a perennial rhizomatous monocot herb belonging to the 36 Curcuma genus that contains ~70 species belonging to the family Zingiberaceae that comprises of 37 more than 1,300 species, which are known to be widely distributed in tropical Africa, Asia and 38 America [3], [4]. This family is enriched in rhizomatous and aromatic plants with a variety of 39 bioactive compounds. Zingiberaceae family members are known to be associated with endophytes, 40 which produce various secondary metabolites in the species of Zingiberaceae family, thus in turn 41 confer medicinal properties [4]. 42 Secondary metabolism is one of the key adaptations in plants to cope with the environmental 43 conditions through the production of a wide range of common or plant-specific secondary 44 metabolites [5], [6]. These secondary metabolites also play a key role in plant defense mechanisms, 45 and several of these metabolites have numerous pharmacological applications in herbal medicines 46 and phytotherapy [7]. The pathways for biosynthesis of various secondary metabolites including bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 47 phenylpropanoids, flavonoids (such as curcuminoids), terpenoids, and alkaloids are found in 48 Curcuma longa [8]–[12], and also possesses other constituents such as volatile oils, proteins, resins 49 and sugars [13]. Flavonoids are known to have anti-inflammatory, antioxidant, and anti-cancer 50 activities [14], [15]. Phenylpropanoids are also of great importance because of their antioxidant, 51 anti-cancer, anti-microbial, anti-inflammatory, and wound-healing activities [16]. The other class of 52 secondary metabolites, terpenoids, are also known to possess anti-cancer and anti-malarial 53 properties [17]. 54 The three curcuminoids namely curcumin, demethoxycurcumin, and bisdemethoxycurcumin, are 55 responsible for the yellow colour of turmeric [13]. Among these, the primary bioactive component of 56 turmeric is curcumin, which is a polyphenol-derived flavonoid compound and also known as 57 diferuloylmethane [13]. Curcumin has been known to show broad-spectrum antimicrobial properties 58 against bacteria, fungi and viruses [18], and also possesses anti-diabetic, anti-inflammatory, 59 antifertility, anti-coagulant, hepatoprotective, and hypertension protective properties [13], [19]. 60 Being an excellent scavenger of reactive oxygen species (ROS) and reactive nitrogen species [20], its 61 antioxidant activity also controls DNA damage by lipid peroxidation mediated by free-radicals, and 62 thus provides it with anti-carcinogenic properties [20], [21]. Due to these medicinal properties, 63 turmeric has been of interest for scientists from many decades. 64 Several studies have been carried out to study the secondary metabolites and medicinal properties 65 of this plant [22], [23]. Recently, the transcriptome profiling and analysis of Curcuma longa using 66 rhizome samples have been carried out to identify the secondary metabolite pathways and 67 associated transcripts [11], [17], [24], [25]. However, its reference genome sequence is not yet 68 available, which is much needed to understand the genomic and molecular basis of the evolution of 69 the unique characteristics of Curcuma longa. According to the Plant DNA c-value database, the 70 genome is polyploid with 2n=63 chromosomes and has an estimated size of 1.33 Gbp [26]. 71 Therefore, we performed the first draft genome sequencing and assembly of Curcuma longa using 72 10x Genomics linked reads generated on Illumina platform. The transcriptome of rhizome tissue for 73 this plant has been known from previous studies but the transcriptome of leaf tissue is yet unknown 74 [11], [17], [24], [25]. Thus, we carried out the first transcriptome sequencing of leaf tissue and along 75 with the available RNA-Seq data of rhizome, performed a comprehensive transcriptome analysis that 76 also helped in the gene set construction. We also constructed a genome-wide phylogeny of Curcuma 77 longa with other available monocot genomes. The comparative analysis of Curcuma longa with 78 other monocot genomes revealed adaptive evolution in genes associated with plant defense and 79 secondary metabolism, and provided genomic insights into the medicinal properties of this species. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 80 METHODS 81 Sample collection, library preparation and Sequencing 82 The plant sample was collected from an agricultural farm (23.2280252°N 77.2088987°E) located in 83 Bhopal, Madhya Pradesh, India. The leaves were homogenized in liquid nitrogen for DNA extraction 84 using Carlson lysis buffer. Species identification was performed by PCR amplification of a nuclear 85 gene (Internal Transcribed Spacer ITS) and a chloroplast gene (Maturase K), followed by Sanger 86 sequencing at the in-house facility. The library construction from the extracted DNA was done 87 with the help of Chromium Controller instrument (10x Genomics) using Chromium™ Genome Library 88 & Gel Bead Kit v2 by following the manufacturer’s instructions. RNA extraction was carried out for 89 the powdered leaves using TriZol reagent (Invitrogen, USA). The transcriptomic library was prepared 90 with TruSeq Stranded Total RNA Library Preparation kit by following the manufacturer’s protocol 91 with Ribo-Zero Workflow (Illumina, Inc., USA). The quality of libraries was evaluated on Agilent 2200 92 TapeStation using High Sensitivity D1000 ScreenTape (Agilent, Santa Clara, CA) prior to sequencing. 93 The prepared genomic and transcriptomic libraries were sequenced on Novaseq 6000 (Illumina, Inc., 94 USA) generating 150 bp paired-end reads. The detailed DNA and RNA extraction steps and other 95 methodologies are mentioned in Supplementary Text S1. 96 Genomic data processing and assembly 97 The barcode sequences were trimmed from raw 10x Genomics linked reads using a set of python 98 scripts (https://github.com/ucdavis-bioinformatics/proc10xG). The genome size of Curcuma longa 99 was estimated using a k-mer count distribution method implemented in SGA-preqc (Supplementary 100 Text S1) [27]. A total of 631.11 million raw 10x Genomics linked reads corresponding to ~71X 101 coverage were used