<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Title: Genome sequencing of provides evolutionary insights into its medicinal properties

2 Authors: Abhisek Chakraborty, Shruti Mahajan, Shubham K. Jaiswal, Vineet K. Sharma*

3

4 Affiliation:

5 MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and 6 Research Bhopal 7 8 *Corresponding Author email: 9 Vineet K. Sharma - [email protected]

10

11 E-mail addresses of authors:

12 Abhisek Chakraborty - [email protected], Shruti Mahajan - [email protected], Shubham K. 13 Jaiswal - [email protected], Vineet K. Sharma - [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

14 ABSTRACT

15 longa, or turmeric, is traditionally known for its immense medicinal properties and has 16 diverse therapeutic applications. However, the absence of a reference genome sequence is a limiting 17 factor in understanding the genomic basis of the origin of its medicinal properties. In this study, we 18 present the draft genome sequence of Curcuma longa, the first sequenced from 19 family, constructed using 10x Genomics linked reads. For comprehensive gene 20 set prediction and for insights into its gene expression, the transcriptome sequencing of leaf tissue 21 was also performed. The draft genome assembly had a size of 1.24 Gbp with ~74% repetitive 22 sequences, and contained 56,036 coding gene sequences. The phylogenetic position of Curcuma 23 longa was resolved through a comprehensive genome-wide phylogenetic analysis with 16 other 24 plant species. Using 5,294 orthogroups, the comparative evolutionary analysis performed across 17 25 species including Curcuma longa revealed evolution in genes associated with secondary metabolism, 26 plant phytohormones signaling, and various biotic and abiotic stress tolerance responses. These 27 mechanisms are crucial for perennial and rhizomatous such as Curcuma longa for defense and 28 environmental stress tolerance via production of secondary metabolites, which are associated with 29 the wide range of medicinal properties in Curcuma longa.

30

31 INTRODUCTION

32 Turmeric, a common name for Curcuma longa, is traditionally used as a herb and since 4,000 33 years in Southern Asia [1]. It has a long history of usage in medicinal applications, as an edible dye, 34 as a preservative in many food materials, in religious ceremonies, and is now widely used in 35 cosmetics throughout the world [1], [2]. It is a perennial rhizomatous monocot herb belonging to the 36 Curcuma that contains ~70 species belonging to the family Zingiberaceae that comprises of 37 more than 1,300 species, which are known to be widely distributed in tropical , Asia and 38 America [3], [4]. This family is enriched in rhizomatous and aromatic plants with a variety of 39 bioactive compounds. Zingiberaceae family members are known to be associated with endophytes, 40 which produce various secondary metabolites in the species of Zingiberaceae family, thus in turn 41 confer medicinal properties [4].

42 Secondary metabolism is one of the key adaptations in plants to cope with the environmental 43 conditions through the production of a wide range of common or plant-specific secondary 44 metabolites [5], [6]. These secondary metabolites also play a key role in plant defense mechanisms, 45 and several of these metabolites have numerous pharmacological applications in herbal medicines 46 and phytotherapy [7]. The pathways for biosynthesis of various secondary metabolites including bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

47 phenylpropanoids, flavonoids (such as ), terpenoids, and alkaloids are found in 48 Curcuma longa [8]–[12], and also possesses other constituents such as volatile oils, proteins, resins 49 and sugars [13]. Flavonoids are known to have anti-inflammatory, , and anti-cancer 50 activities [14], [15]. Phenylpropanoids are also of great importance because of their antioxidant, 51 anti-cancer, anti-microbial, anti-inflammatory, and wound-healing activities [16]. The other class of 52 secondary metabolites, terpenoids, are also known to possess anti-cancer and anti-malarial 53 properties [17].

54 The three curcuminoids namely , demethoxycurcumin, and bisdemethoxycurcumin, are 55 responsible for the yellow colour of turmeric [13]. Among these, the primary bioactive component of 56 turmeric is curcumin, which is a polyphenol-derived flavonoid compound and also known as 57 diferuloylmethane [13]. Curcumin has been known to show broad-spectrum antimicrobial properties 58 against bacteria, fungi and viruses [18], and also possesses anti-diabetic, anti-inflammatory, 59 antifertility, anti-coagulant, hepatoprotective, and hypertension protective properties [13], [19]. 60 Being an excellent scavenger of reactive oxygen species (ROS) and reactive nitrogen species [20], its 61 antioxidant activity also controls DNA damage by lipid peroxidation mediated by free-radicals, and 62 thus provides it with anti-carcinogenic properties [20], [21]. Due to these medicinal properties, 63 turmeric has been of interest for scientists from many decades.

64 Several studies have been carried out to study the secondary metabolites and medicinal properties 65 of this plant [22], [23]. Recently, the transcriptome profiling and analysis of Curcuma longa using 66 samples have been carried out to identify the secondary metabolite pathways and 67 associated transcripts [11], [17], [24], [25]. However, its reference genome sequence is not yet 68 available, which is much needed to understand the genomic and molecular basis of the evolution of 69 the unique characteristics of Curcuma longa. According to the Plant DNA c-value database, the 70 genome is polyploid with 2n=63 chromosomes and has an estimated size of 1.33 Gbp [26].

71 Therefore, we performed the first draft genome sequencing and assembly of Curcuma longa using 72 10x Genomics linked reads generated on Illumina platform. The transcriptome of rhizome tissue for 73 this plant has been known from previous studies but the transcriptome of leaf tissue is yet unknown 74 [11], [17], [24], [25]. Thus, we carried out the first transcriptome sequencing of leaf tissue and along 75 with the available RNA-Seq data of rhizome, performed a comprehensive transcriptome analysis that 76 also helped in the gene set construction. We also constructed a genome-wide phylogeny of Curcuma 77 longa with other available monocot genomes. The comparative analysis of Curcuma longa with 78 other monocot genomes revealed adaptive evolution in genes associated with plant defense and 79 secondary metabolism, and provided genomic insights into the medicinal properties of this species. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

80 METHODS

81 Sample collection, library preparation and Sequencing

82 The plant sample was collected from an agricultural farm (23.2280252°N 77.2088987°E) located in 83 Bhopal, Madhya Pradesh, India. The leaves were homogenized in liquid nitrogen for DNA extraction 84 using Carlson lysis buffer. Species identification was performed by PCR amplification of a nuclear 85 gene (Internal Transcribed Spacer ITS) and a chloroplast gene (Maturase K), followed by Sanger 86 sequencing at the in-house facility. The library construction from the extracted DNA was done 87 with the help of Chromium Controller instrument (10x Genomics) using Chromium™ Genome Library 88 & Gel Bead Kit v2 by following the manufacturer’s instructions. RNA extraction was carried out for 89 the powdered leaves using TriZol reagent (Invitrogen, USA). The transcriptomic library was prepared 90 with TruSeq Stranded Total RNA Library Preparation kit by following the manufacturer’s protocol 91 with Ribo-Zero Workflow (Illumina, Inc., USA). The quality of libraries was evaluated on Agilent 2200 92 TapeStation using High Sensitivity D1000 ScreenTape (Agilent, Santa Clara, CA) prior to sequencing. 93 The prepared genomic and transcriptomic libraries were sequenced on Novaseq 6000 (Illumina, Inc., 94 USA) generating 150 bp paired-end reads. The detailed DNA and RNA extraction steps and other 95 methodologies are mentioned in Supplementary Text S1.

96 Genomic data processing and assembly

97 The barcode sequences were trimmed from raw 10x Genomics linked reads using a set of python 98 scripts (https://github.com/ucdavis-bioinformatics/proc10xG). The genome size of Curcuma longa 99 was estimated using a k-mer count distribution method implemented in SGA-preqc (Supplementary 100 Text S1) [27]. A total of 631.11 million raw 10x Genomics linked reads corresponding to ~71X 101 coverage were used for generating a de novo assembly using Supernova assembler v2.1.1 with 102 maxreads=all option and other defaults settings [28]. The haplotype-phased assembled genome was 103 generated using Supernova mkoutput in ‘pseudohap’ style.

104 The 10x Genomics linked reads were run through Longranger basic v2.2.2 105 (https://support.10xgenomics.com/genome-exome/software/pipelines/latest/installation) for 106 barcode processing and were used to detect and correct mis-assemblies in Supernova assembled 107 genome using Tigmint v1.1.2 [29]. The first round of scaffolding was carried out using ARCS v1.1.1 108 (default parameters) to generate more contiguous assembly using 10x Genomics linked reads [30]. 109 Further scaffolding was performed to improve the contiguity using AGOUTI v0.3.3 with the quality 110 filtered paired-end RNA-Seq reads, which were also used in de novo transcriptome assembly [31]. 111 Gap-closing was performed with barcode-processed linked reads using Sealer v2.1.5 with k-mer bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

112 value from 30 to 120 with an interval of 10 bp using a Bloom filter-based approach [32]. Finally, the 113 assembly quality was improved by Pilon v1.23 using barcode-processed linked reads to fix small 114 indels, individual base errors, or local mis-assemblies that could be introduced by the previous 115 scaffolding steps [33]. BUSCO v4.0.4 was used to assess the genome assembly completeness using 116 embryophyta_odb10 database [34]. The other details about the genome assembly post-processing 117 are mentioned in Supplementary Text S2.

118 Transcriptome assembly

119 The RNA-Seq data from our study and from previously available transcriptome studies of Curcuma 120 longa were used for de novo transcriptome assembly [11], [17], [24], [25], [35]. Trimmomatic v0.38 121 was used for adapter removal and quality filtration of raw Illumina sequence data (Supplementary 122 Text S1) [36]. Finally, Trinity v2.9.1 was used with default parameters to perform de novo 123 transcriptome assembly of quality-filtered paired-end and single-end reads [37]. Assembly statistics 124 was calculated using a Perl script available in Trinity software package. The longest isoforms for all 125 genes were extracted from the assembled transcripts. These sequences were clustered to identify 126 the unigenes using CD-HIT-EST v4.8.1 (90% sequence identity, 8 bp seed size) [38].

127 Genome annotation

128 The final polished assembly (length ≥500 bp) was used for genome annotation. RepeatModeler 129 v2.0.1 was used to generate a de novo repeat library for this genome [39]. The resultant repeat 130 sequences were further clustered using CD-HIT-EST v4.8.1 (90% sequence identity, seed size = 8 bp) 131 [38]. This repeat library was used to soft-mask Curcuma longa genome using RepeatMasker v4.1.0 132 (http://www.repeatmasker.org) and was further used for gene set construction. Genome annotation 133 was carried out using MAKER pipeline to predict the final gene models using ab initio gene 134 prediction programs and evidence-based approaches [40]. The transcriptome assembly of Curcuma 135 longa, and protein sequences of Curcuma longa along with its closest species Musa acuminata were 136 used as empirical evidence in MAKER pipeline. AUGUSTUS v3.2.3, BLAST and Exonerate v2.2.0 were 137 used in MAKER pipeline for ab initio gene prediction, evidence alignments and alignments polishing, 138 respectively [41], [42]. Tandem Repeat Finder (TRF) v4.09 was used to detect the tandem repeats 139 present in Curcuma longa genome [43]. Additionally, miRBase database and tRNAscan-SE v2.0.5 140 were used for homology-based identification of miRNAs and de novo prediction of tRNAs, 141 respectively [44], [45]. The other details about genome annotation are mentioned in Supplementary 142 Text S3.

143 Construction of gene set bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

144 The coding genes were predicted by TransDecoder v5.5.0 from the unigenes identified in 145 transcriptome assembly, using Uniref90 and Pfam databases for homology-based searching as ORF 146 retention criteria (https://github.com/TransDecoder/TransDecoder) [46]–[49]. Also, the MAKER 147 pipeline-based gene models were clustered using CD-HIT-EST v4.8.1 (95% sequence identity, 8 bp 148 seed size) [38]. The TransDecoder pipeline derived genes (≥300 bp) were aligned against the MAKER 149 derived gene set (≥300 bp) using BLASTN, and the genes that did not match with identity ≥50%, 150 query coverage ≥50% and e-value 10-9 were added to the MAKER gene set to prepare the final gene 151 set of Curcuma longa [50].

152 Orthogroups identification

153 Representative species from all 15 monocot genus available in Ensembl plants release 47, and model 154 organism Arabidopsis thaliana as an outgroup species were selected for orthogroups identification 155 [51]. To construct the orthogroups, the protein sequences of Curcuma longa obtained from 156 TransDecoder and proteome files for other 16 selected species i.e., Aegilops tauschii, Ananas 157 comosus, Brachypodium distachyon, Dioscorea rotundata, Eragrostis tef, Hordeum vulgare, Leersia 158 perrieri, Musa acuminata, Oryza sativa, Panicum hallii fil2, Saccharum spontaneum, Setaria italica, 159 Sorghum bicolor, Triticum aestivum, Zea mays, and Arabidopsis thaliana obtained from Ensembl 160 release 47, were used. The longest isoforms for all proteins were extracted for all selected species to 161 construct the orthogroups using OrthoFinder v2.3.9 [52].

162 Construction of orthologous gene set and phylogenetic tree

163 Only those orthogroups that contained genes from all 17 species were extracted from all the 164 identified orthogroups. The fuzzy one-to-one orthogroups containing genes from all 17 species were 165 identified from these orthogroups, and extracted using KinFin v1.0 [53]. For cases where the 166 orthologous gene sets comprised of multiple genes for a species, the longest gene was extracted. 167 The fuzzy one-to-one orthogroups were further aligned individually using MAFFT v7.467 for species 168 phylogenetic tree construction [54]. BeforePhylo v0.9.0 (https://github.com/qiyunzhu/BeforePhylo) 169 was used to trim the multiple sequence alignments to remove empty sites and to concatenate the 170 multiple sequence alignments of all fuzzy one-to-one orthologous gene sets across 17 species. This 171 concatenated alignment was used by the rapid hill climbing algorithm-based RAxML v8.2.12 for 172 construction of maximum likelihood species phylogenetic tree with ‘PROTGAMMAGTR’ amino acid 173 substitution model using 100 bootstrap values [55].

174 Identification of genes with higher nucleotide divergence bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

175 Protein sequences of all the orthogroups across 17 species were aligned individually using MAFFT 176 v7.467 [54], and individual maximum likelihood-based phylogenetic trees were built using these 177 alignments by RAxML v8.2.12 (‘PROTGAMMAGTR’ amino acid substitution model, 100 bootstrap 178 values) [55]. The ‘adephylo’ package in R was used to calculate root-to-tip branch length distance 179 values for each species [56]. Curcuma longa genes that showed comparatively higher root-to-tip 180 branch length with respect to the other selected species were extracted, and were considered as the 181 genes with higher nucleotide divergence or higher rate of evolution.

182 Identification of genes with unique substitution having functional impact

183 The unique substitutions in genes that have impact on protein function can identify species-specific 184 amino acid substitutions and are considered as a site-specific evolutionary signature. However, the 185 inclusion of phylogenetically distant species in this analysis may erroneously increase the number of 186 uniquely substituted genes, therefore we restricted this analysis by only considering the monocot 187 species (available on the Ensembl plant release 47) for reliable results. The amino acid positions that 188 were identical across the other 16 species in the individual multiple sequence alignments of all 189 orthogroups but different in Curcuma longa were considered as the uniquely substituted amino acid 190 positions.

191 For identification of uniquely substituted sites, an in-house python script was used. Any gap and ten 192 amino acid sites around any gap in the alignments were not considered in this analysis. The impact 193 of these unique amino acid substitutions on protein function was identified using Sorting Intolerant 194 From Tolerant (SIFT), by utilizing UniProt as the reference database [57], [58].

195 Identification of positively selected genes

196 SATé v2.2.7 was used to generate individual protein alignments for all orthogroups across 17 species 197 [59]. For its iterative alignment approach, SATé uses PRANK, MUSCLE and RAxML for sequence 198 alignment, merging the alignment, and tree estimation, respectively [55], [60], [61]. The ambiguous 199 sites present in the codons were filtered and protein alignment-based nucleotide alignments were 200 performed for protein-coding nucleotide sequences of all orthogroups across 17 species using 201 ‘tranalign’ program of EMBOSS v6.6.0 package [62]. ‘codeml’ from PAML package v4.9a that uses a 202 branch-site model was used to identify positively selected genes using nucleotide alignments of all 203 the orthologs in phylip format and the species phylogenetic tree generated in previous steps [63]. 204 Log-likelihood values were used to perform likelihood ratio tests and chi-square analysis-based p- 205 values were calculated. The genes that qualified against the null model (fixed omega) (FDR-corrected 206 p-values <0.05) were identified as positively selected genes. All codon sites showing greater than bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

207 95% probability for foreground lineage based on Bayes Empirical Bayes (BEB) analysis were termed 208 as positively selected sites.

209 Genes with multiple signs of adaptive evolution (MSA)

210 Among the three signs of adaptive evolution– higher nucleotide divergence, unique substitution 211 having functional impact and positive selection, the Curcuma longa genes that showed at least two 212 of these signs were termed as genes with multiple signs of adaptive evolution or MSA genes [64].

213 Functional annotation

214 KAAS genome annotation server v2.1 was used to assign KEGG Orthology (KO) identifiers and KEGG 215 pathways to the genes [65]. eggNOG-mapper v2 was used for functional annotation of genes using 216 precomputed orthologous groups from eggNOG clusters [66]. WebGeStalt web server was used for 217 GO enrichment analysis, and only the GO categories showing p-values <0.05 in over-representation 218 enrichment analysis were considered further [67]. The assignment of genes into functional 219 categories was manually curated.

220 biosynthesis pathway

221 Coding sequences of four key genes involved in curcuminoid biosynthesis pathway, namely curcumin 222 synthase 1 (CURS1, NCBI accession number BAH56226), curcumin synthase 2 (CURS2, NCBI accession 223 number AB506762), curcumin synthase 3 (CURS3, NCBI accession number AB506763), and diketide- 224 CoA synthase (DCS, NCBI accession number BAH56225) were retrieved [68]. The sequences of these 225 four genes were mapped to the gene set derived from de novo transcriptome assembly generated in 226 this study using BLASTN with query coverage ≥50% and e-value 10-9 [42]. These sequences were also 227 aligned to de novo genome assembly of Curcuma longa constructed in this study, using Exonerate 228 v2.2.0 (https://github.com/nathanweeks/exonerate) with 95% of maximal alignment score and 95% 229 quality threshold, and the best hits were selected to construct the gene structures.

230

231 RESULTS

232 Sequencing of genome and transcriptome

233 A total of 94.8 Gbp of 10x Genomics linked read data and 7.7 Gbp of RNA-Seq data was generated 234 from leaf tissue (Supplementary Tables S1-S2). The total genomic data corresponded to ~71X 235 coverage based on the estimated genome size of 1.33 Gbp [26]. To carry out the transcriptome 236 analysis, RNA-Seq data from this study and from the previous studies was used, which together 237 constituted a total of 43.8 Gbp of RNA-Seq data [11], [17], [24], [25], [35]. All the paired-end RNA- bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

238 Seq reads were trimmed and quality filtered using Trimmomatic v0.38 and used for de novo 239 transcriptome assembly. The detailed workflow for genome and transcriptome analysis is shown in 240 Figure 1.

241 Assembly of Curcuma longa genome and transcriptome

242 The genome size of Curcuma longa was estimated to be 1.15 Gbp using SGA-preqc with barcode- 243 filtered 10x Genomics linked reads, which is close to the previously estimated genome size of 1.33 244 Gbp [26]. Curcuma longa genome assembled using Supernova v2.1.1 had a total size of 1.3 Gbp. 245 After correction of mis-assemblies, scaffolding, gap-closing and polishing, the final draft genome 246 assembly (≥500 bp) of Curcuma longa had the total size of 1.24 Gbp that comprised of 1,84,614 247 scaffolds with N50 value of 18.8 Kbp and GC-content of 38.85% (Supplementary Table S3). BUSCO 248 completeness for Supernova assembled genome was 74% (Complete BUSCOs), which was improved 249 to 81.8% (Complete BUSCOs) in the final polished draft Curcuma longa genome assembly 250 (Supplementary Table S5).

251 The de novo transcriptome assembly of Curcuma longa using Trinity v2.9.1 had a total size of 252 3,45,957,265 bp, with a total of 3,63,662 predicted transcripts corresponding to 1,67,582 genes. The 253 complete assembly had an N50 value of 1,605 bp, an average transcript length of 951 bp and GC- 254 content of 42.88% (Supplementary Table S4). A total of 1,58,054 unigenes were identified after 255 clustering using CD-HIT-EST v4.8.1 to remove the redundant gene sequences. The coding sequence 256 (CDS) prediction from these unigenes using TransDecoder v5.5.0 resulted in 42,233 coding genes.

257 Genome annotation and gene set construction

258 For repeat identification, a de novo custom repeat library was constructed using the final polished 259 Curcuma longa genome by RepeatModeler v2.0.1, which resulted in a total of 2,284 repeat families. 260 The repeat families were clustered into 1,821 representative sequences. These were used to soft- 261 mask the genome assembly using RepeatMasker v4.1.0, which predicted 69.92% of Curcuma longa 262 genome as repetitive sequences, of which 67.58% was identified as interspersed repeats (30.03% 263 unclassified, 35.41% retroelements and 2.15% DNA transposons). Retroelements consisted of 34.7% 264 LTR repeats (21.04% Ty1/Copia and 13.30% Gypsy/DIRS1 elements) (Supplementary Table S6). 265 Additionally, 6.33% of Curcuma longa genome was identified as simple repeats using TRF v4.09. 266 Thus, ~74% of the genome was predicted to be constituted of simple and interspersed repetitive 267 sequences. Among the non-coding RNAs, 3,225 standard amino acid specific tRNAs and 357 hairpin 268 miRNAs were predicted in Curcuma longa genome. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

269 MAKER genome annotation pipeline and TransDecoder v5.5.0 predicted a total of 66,396 and 42,233 270 coding sequences from genome and transcriptome assemblies, respectively. Length-based filtering 271 criteria (≥300 bp) of the above two coding gene sets resulted in 50,744 and 41,785 coding sequences 272 from genome and transcriptome assemblies, respectively. MAKER derived filtered coding sequences 273 were further clustered at 95% sequence identity, which resulted in 40,401 non-redundant coding 274 sequences. These two coding gene sets were merged using BLASTN, resulting in the final Curcuma 275 longa gene set comprising of 56,036 coding gene sequences.

276 Resolving the phylogenetic position of Curcuma longa

277 From the selected 17 plant species, a total of 1,07,510 orthogroups were identified using protein 278 sequences by OrthoFinder v2.3.9. Among these 1,07,510 orthogroups, 5,294 contained protein 279 sequences from all 17 species, and were used for evolutionary analysis. Further, KinFin v1.0 280 predicted a total of 1,151 fuzzy one-to-one orthogroups containing protein sequences from all 17 281 selected species, which were used to construct the maximum likelihood-based phylogenetic tree of 282 Curcuma longa with 15 other monocot species and Arabidopsis thaliana as an outgroup.

283 All 1,151 fuzzy one-to-one orthogroups were aligned, concatenated and filtered for undetermined or 284 missing values. This filtered alignment data consisting of 947,855 alignment positions was used to 285 construct the maximum likelihood-based species phylogenetic tree of Curcuma longa with 15 other 286 monocot species available on Ensembl plants release 47 and Arabidopsis thaliana as the outgroup 287 (Figure 2). Position of Curcuma longa and the other selected monocots in this genome-wide 288 phylogeny were supported by previously reported phylogenies [69]–[73]. In our phylogeny, Ananas 289 comosus showed an earlier divergence among the monocots from Poales order, which was also 290 supported by previously reported studies [69], [72]. Among all selected monocot species in our 291 study, species from Dioscoreales order showed the earliest divergence, supported by the previously 292 reported studies [71], [72]. From our genome-wide phylogeny constructed using the selected 293 monocots, it is apparent that Musa acuminata was comparatively closer to Curcuma longa. 294 Belonging to the same phylogenetic order , Curcuma longa and Musa acuminata shared 295 the same clade in the species phylogenetic tree. Species from Zingiberales order are closer to species 296 from Poales order with respect to the species from Dioscoreales order.

297 Genes with signatures of adaptive evolution

298 Genes with site-specific signatures of adaptive evolution were identified in Curcuma longa. 3,242 299 genes showed unique amino acid substitution with respect to the other selected species. Among 300 these 3,242 genes, 2,475 genes were identified to have functional impacts using Sorting Intolerant bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

301 From Tolerant (SIFT), and were considered further. Further, 1,756 genes were found to contain 302 positively selected codon sites with greater than 95% probability. In addition to these site-specific 303 signatures of evolution, 47 genes showed higher rate of nucleotide divergence, and 175 genes 304 showed positive selection in Curcuma longa with FDR-corrected p-values <0.05. These positively 305 selected genes had positively selected codon sites with greater than 95% probability.

306 A total of 109 genes were identified containing more than one of the signatures of adaptive 307 evolution namely positive selection, unique amino acid substitution with functional impact and 308 higher rate of nucleotide divergence. A total of 102 out of 109 MSA genes were associated with 309 categories essential for plant secondary metabolism and defense responses against environmental 310 stress conditions i.e., biotic stress and abiotic stress (Figure 3).

311 Adaptive evolution of plant defense associated genes

312 Several genes known to provide plant immunity against pathogen infection or disease were found in 313 MSA genes. Since plants lack adaptive immune response, the innate immune response in plants is 314 provided by PAMP-triggered (PTI) and effector-triggered (ETI) immunity with the help of two plant 315 stress hormones, salicylic acid (SA) and jasmonic acid (JA) signaling pathway [74]. Among these MSA 316 genes, ‘IAR3’ is important for production of 12-Hydroxyjasmonic acid, ‘LOX’ gene helps in oxylipins 317 such as JA production by oxidation of fatty acids, ‘LOB’ gene regulates metabolism and 318 components of JA signaling pathways and is modulated by methyl jasmonate treatment, 319 ‘AT4G32690’ regulates nitric oxide mediated JA signaling and JA accumulation in plants, and ‘AOC3’ 320 gene gets upregulated by methyl jasmonate treatment [75]–[79]. Also, ‘WRKY22’ transcription factor 321 is induced by pathogen attack and is involved in SA-signaling pathway mediated plant immunity, 322 ‘PAT14’ gene negatively regulates SA-mediated defense response, and L,L-diaminopimelate 323 aminotransferase negatively regulates SA levels and PR gene expression [74], [80], [81]. Jasmonic 324 acid and salicylic acid elicit the production and accumulation of secondary metabolites such as 325 phenolics, terpenoids, alkaloids, and glycosides, in [82].

326 Three O-fucosyltransferase family proteins - ‘AT1G76270’, ‘AT1G62330’, and ‘AT1G04910’ were 327 found in MSA genes category, and are involved in plant immunity. Previously it has been shown that 328 the lack of fucosylation of genes led to increased disease susceptibility in Arabidopsis sp. by affecting 329 PTI, ETI as well as stomatal and apoplastic defense [83]. Ubiquitin-conjugating enzyme ‘AT2G18600’ 330 and three E3 ubiquitin-protein ligase genes – ‘AT2G17020’, ‘AT5G04460’, ‘SPL2’ also showed 331 multiple signs of adaptive evolution. Ubiquitination-related proteins affect hypersensitive-response 332 (HR) and phytohormone signaling mediated pathogen defense by targeting proteins for proteasomal 333 degradation [84]–[87]. These ubiquitin ligase proteins also regulate various abiotic stress responses bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

334 e.g., drought, salinity, temperature [87]. Among the MSA genes, six ribosomal subunit proteins are 335 regulated by signaling molecules such as methyl jasmonate, salicylic acid and environmental stress 336 e.g., cold, heat, UV, drought, salinity [88], [89]. Two cell cycle related MSA genes – cyclin-A and 337 ‘APC10’ are involved in plant immunity, disease resistance as well as abiotic stress responses [90], 338 [91].

339 Three genes related to phosphoinositide signaling were found to be among the MSA genes. 340 Phosphoinositide signaling, which is modulated in response to mechanical wounding in Arabidopsis 341 thaliana, contribute to phytohormones mediated defense responses [92]. This signaling cascade also 342 plays a role in abiotic stress response such as water stress [93]. Two ERD family proteins – 343 ‘AT4G02900’ and ‘AT1G30360’ were identified as MSA genes. ERD genes are known to provide 344 abiotic stress tolerance via drought tolerance and salicylic acid-dependent plant defense [94]. Six 345 different repeat family genes belonging to WD-family repeats, TPR-like superfamily repeats, ARM 346 superfamily repeats and Galactose oxidase/kelch superfamily repeats were identified as MSA genes. 347 These repeat family proteins are involved in plant innate immunity as well as abiotic stress tolerance 348 [95]–[98].

349 Abundance of secondary metabolism pathways in Curcuma longa genome

350 Genes associated with secondary metabolite biosynthesis and secondary metabolism were found to 351 be abundant (64 out of 109 genes) among the MSA genes in Curcuma longa. Of these, ‘DWF4’ is 352 involved in brassinosteroid biosynthesis, ‘PGT1’ has a role in phenylpropanoid biosynthesis, 353 ‘AT4G14420’ aids in glucosinolate biosynthesis [99]–[101]. Methyltransferases e.g., ‘AT1G33170’, 354 ‘OMT’, ‘AT1G50000’ are also involved in secondary metabolites biosynthesis and responsible for 355 methylation of secondary metabolites, which in turn assist in disease resistance and stress tolerance 356 in plants [102], [103]. These secondary metabolites possess various medicinal applications such as 357 anti-inflammatory, antimicrobial, antioxidant, and anti-carcinogenic properties [7], [16], [104], [105]. 358 Transporter genes e.g., ‘MRS2’ and ‘AT1G29830’ magnesium transporters, ‘NRT’ a nitrate or nitrite 359 transporter, ‘IREG3’ an iron-regulated transporter, ‘TC.APA’ a polyamine transporter, and 360 ‘AT1G51610’ a cation efflux family protein also showed multiple signs of evolution. The evolution of 361 these transporter genes appears significant since the alteration in concentration of magnesium and 362 iron metal ions regulate accumulation and translocation of secondary metabolites [106]–[110].

363 Polyketides, such as curcuminoids, represent a diverse class of secondary metabolites, which are 364 crucial for plants survival under environmental challenges [111]. Curcuminoid biosynthesis pathway 365 is the most important secondary metabolism pathway in C. longa where phenylalanine is converted 366 to coumaroyl-CoA via cinnamic acid and coumaric acid [112]. Notably, the ‘OMT’ gene, which is bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

367 involved in conversion of coumaroyl-CoA to feruloyl-CoA in the curcuminoid biosynthesis pathway 368 [113], showed positive selection and unique amino acid substitution in this study.

369 Further, coumaroyl-CoA and feruloyl-CoA are used for the production of curcumin, 370 demethoxycurcumin, and bisdemethoxycurcumin via coumaroyl-diketide-CoA and feruloyl-diketide- 371 CoA, catalyzed by four enzymes - ‘CURS1’, ‘CURS2’, ‘CURS3’, and ‘DCS’ [112]. To identify these 372 enzymes in the genome and transcriptome assemblies constructed in this study, we mapped the 373 coding gene sequences of ‘CURS1’, ‘CURS2’, ‘CURS3’, and ‘DCS’ genes on these assemblies. All four 374 enzymes were found to be present in the de novo genome assembly and in the gene set derived 375 from de novo transcriptome assembly of Curcuma longa (Supplementary Table S7). Using Exonerate 376 tool, we further constructed the gene structures for these four major curcuminoid biosynthesis 377 genes. Each of the ‘CURS1’, ‘CURS2’ and ‘CURS3’ genes consisted of two exons and one intron. ‘DCS’ 378 gene consisted of three exons and two introns. ‘DCS’ and ‘CURS’ genes are members of chalcone 379 synthase (CHS) family [114], and the genes from CHS family generally consist of two exons and one 380 intron [115], which is consistent with and also further supported by our findings.

381

382 DISCUSSION

383 Curcuma longa is a monocot species from Zingiberaceae plant family and is widely known for its 384 medicinal properties and therapeutic applications [19], [116]. In this study, we carried out the whole 385 genome sequencing and reported the draft genome sequence of Curcuma longa. This is the first 386 genome sequenced from Zingiberaceae plant family, which comprises more than 1300 species, and 387 thus will act as a valuable reference for studying the members of this family including those of 388 Curcuma genus. The application of 10x Genomics linked read sequencing that has the potential to 389 resolve complex polyploid genomes [117], helped in successfully constructing the Curcuma longa 390 draft genome of 1.24 Gbp with a decent N50 of 18.8 Kbp. After assembly correction, scaffolding, 391 gap-closing and polishing, the BUSCO completeness of Supernova v2.1.1 derived Curcuma longa 392 genome improved to 81.8%, which is similar to other plant genomes, thus indicating the usefulness 393 of post-assembly processing [118].

394 Since the construction of a comprehensive gene set was essential to explore the genetic basis of its 395 medicinal properties, both genome and transcriptome assemblies, and an integrated approach using 396 de novo and homology-based methods were used resulting in the final set of 56,036 genes. The 397 identification of all Type III polyketide synthase genes ‘CURS1’, ‘CURS2’, ‘CURS3’, and ‘DCS’, involved 398 in the biosynthesis of the three most important secondary metabolites (curcuminoids) – curcumin, bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

399 demethoxycurcumin and bisdemethoxycurcumin, in both genome and transcriptome assemblies 400 attests to the quality and comprehensiveness of our genome and transcriptome assembly. Further, 401 the revelation of complete gene structures of the above four biosynthesis genes of curcuminoid 402 pathway from the draft genome of this plant is likely to help further studies and for better 403 commercial exploitation of these curcuminoids that find wide applications as coloring agents, food 404 additives and possess antioxidant, anti-inflammatory, anti-microbial, neuroprotective, anti-cancer 405 and many other medicinal properties [119].

406 Repetitive sequence prediction revealed that ~74% of the genome consisted of repeat elements, 407 which is similar to other plant genomes [120]. Notably among the LTR repeat elements, Ty1/Copia 408 elements (21.04%) were more abundant than Gypsy/DIRS1 elements (13.30%), which corroborates 409 with the observations made in the case of Musa acuminata species from the same Zingiberales plant 410 order, and thus appears to be a specific signature of repeat elements in Zingiberales order [121].

411 The genome-wide phylogenetic analysis of Curcuma longa with 15 other representative monocot 412 species available on Ensembl plants revealed the relative position of Curcuma longa, which was 413 supported by previously reported phylogenies using 1,685 gene partitions, and using phytocystatin 414 gene ‘CypCl’ [69], [70]. Ren et al. (2018) also showed similar phylogenetic position of Curcuma longa 415 with other selected monocots – Dioscorea sp., Musa acuminata, Brachypodium distachyon, Oryza 416 sativa, Panicum sp., Setaria italica, Zea mays, Sorghum bicolor using genome and transcriptome data 417 of 105 angiosperms [71]. Also, an updated megaphylogeny for vascular plants showed similar 418 relative phylogenetic position of Curcuma longa with respect to the selected monocots [72]. Further, 419 selected species from Poales order also showed similar relative positions with respect to each other 420 [72], [73]. Absence of any polytomy in the phylogenetic tree is because of large number of genomic 421 loci and a high bootstrap value, or no multiple speciation events took place at the same time. Taken 422 together, the genome-wide phylogenetic analysis of Curcuma longa confirmed its phylogenetic 423 position and will be a useful reference for further studies.

424 Analysis of genes with signatures of adaptive evolution using 5,294 orthologous gene sets revealed 425 that a large proportion (~94%) of the genes with multiple signs of adaptive evolution (MSA) were 426 associated with plant defense mechanisms against biotic and abiotic stress responses, and 427 secondary metabolism. Notable ones among these are the genes associated with Jasmonic acid and 428 salicylic acid signaling pathways. These two pathways are an important components of plant innate 429 immune response [74], and also affect plant secondary metabolism by regulating the production of 430 secondary metabolites [82], thus play a crucial role in plant-pathogen interaction. Jasmonic acid is 431 also reported to have a role in induction and growth of rhizome in vitro through its interaction with bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

432 ethylene, which is important for a rhizomatous plant like C. longa [122]. Further, one of the genes 433 (‘OMT’ gene) for the enzymes involved in curcuminoid biosynthesis pathway was also found to have 434 signatures of adaptive evolution, which is an important observation because curcuminoid is the most 435 important secondary metabolite of C. longa. The genes for the other four key enzymes (CURS1, 436 CURS2, CURS3, DCS) of this pathway could not be found in the list of MSA genes since these genes 437 are unique to Curcuma genus and were absent in the other species considered for the evolutionary 438 analysis. It is important to mention here that the biosynthesis of secondary metabolites such as 439 polyketides (curcuminoids), which are crucial for plants survival under environmental challenges, are 440 regulated by biotic and abiotic stress responses [123]–[125]. Also, it is known that secondary 441 metabolites in plants are primarily produced in response to environmental stress and for plant 442 defense for better survival under various environmental conditions [126], and several of these 443 secondary metabolites also possess medicinal value. This also seems to be the case with C. longa 444 where the abundance of adaptively evolved genes associated with plant defense mechanisms and 445 secondary metabolism makes it tempting to speculate that these genes gradually evolved to confer 446 resistance and environmental adaptation for a perennial rhizomatous plant like C. longa. Further, 447 several of the metabolites produced in the above processes for conferring resistance and 448 environmental adaptation to C. longa possess diverse medicinal properties, and provides turmeric 449 with its medicinal characteristics and traditional significance.

450

451 CONCLUSION

452 Due to the enormous medicinal and traditional value of Curcuma longa, the revelation of its genome 453 sequence was much needed and was achieved in this study. Further, this genome is the first genome 454 sequenced so far from the Zingiberaceae plant family. The genomic analysis revealed multiple signs 455 of adaptive evolution in genes associated with plant defense responses against biotic and abiotic 456 challenges and secondary metabolism, which perhaps are evolved for its perennial persistence and 457 also forms the basis of its medicinal properties. Thus, the availability of draft genome and 458 transcriptome, gene set and associated genomic information, and the comprehensive evolutionary 459 analysis with other monocot species will be a valuable reference to gain further insights into the 460 evolution of diverse medicinal properties of this plant and other members of Zingiberaceae plant 461 family.

462

463 LIST OF ABBREVIATIONS bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

464 PCR Polymerase Chain Reaction

465 BLAST Basic Local Alignment Search Tool

466 miRNA micro RNA

467 tRNA transfer RNA

468 ORF Open Reading Frame

469 LTR Long Terminal Repeat

470 FDR False discovery rate

471 MSA Multiple signs of adaptive evolution

472 DWF4 Dwarf 4

473 PGT1 Phloretin 2’-O-glucosyltransferase

474 OMT O-methyltransferase

475 MRS2 Mitochondrial RNA Splicing 2

476 NRT Nitrate Transporter

477 IREG3 Iron Regulated 3

478 TC.APA basic amino acid/polyamine antiporter

479 IAR3 IAA-Alanine Resistant 3

480 LOX Lipoxygenase

481 LOB LATERAL ORGAN BOUNDARIES

482 AOC3 Amine Oxidase Copper Containing 3

483 PAT14 Protein Acyltransferase 14

484 PR Pathogenesis-related

485 PTI PAMP-triggered immunity

486 ETI Effector-triggered immunity

487 SPL2 SP1-Like 2

488 UV Ultraviolet

489 APC10 Anaphase Promoting Complex subunit 10 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

490 ERD Early-Responsive to Dehydration stress

491 TPR Tetratrico Peptide Repeat

492 ARM Armadillo

493 BUSCO Benchmarking Universal Single-Copy Orthologs

494 KEGG Kyoto Encyclopedia of Genes and Genomes

495 GO Gene ontology

496 COG Clusters of Orthologous Groups

497 DCS diketide-CoA synthase

498 CURS1 Curcumin synthase 1

499 CURS2 Curcumin synthase 2

500 CURS3 Curcumin synthase 3

501

502 COMPETING INTERESTS

503 The authors declare no competing financial and non-financial interest.

504

505 AUTHORS' CONTRIBUTIONS

506 VKS conceived and coordinated the project. SM prepared the DNA and RNA samples, prepared the 507 samples for sequencing, and performed the species identification assay. AC and VKS designed the 508 computational framework of the study. AC performed all the computational analysis presented in 509 the study. AC, SM, and SKJ performed the functional annotation of gene sets. AC, SM, SKJ and VKS 510 analysed the data and interpreted the results. AC constructed the figures. AC, SM, SKJ and VKS wrote 511 the manuscript. All the authors have read and approved the final version of the manuscript.

512

513 ACKNOWLEDGEMENTS

514 AC and SM thank Council of Scientific and Industrial Research (CSIR) for fellowship. SKJ thanks 515 Department of Science and Technology for the DST-INSPIRE fellowship. The authors thank the 516 intramural research funds provided by IISER Bhopal.

517 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

518 DATA AVAILABILITY

519 The raw genome and transcriptome reads of Curcuma longa have been deposited in National Center 520 for Biotechnology Information (NCBI). bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

521 FIGURES

522 523 Figure 1. The complete workflow of the genome and transcriptome analysis of Curcuma longa bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

524

525 Figure 2. Phylogenetic tree of Curcuma longa with 15 other selected species and Arabidopsis 526 thaliana as an outgroup species. The values mentioned at the nodes correspond to the bootstrap 527 values.

528

529 Figure 3. Functional association of MSA genes with medicinal properties of Curcuma longa bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

530 REFERENCES

531 [1] S. Prasad and B. Aggarwal, “Turmeric, the Golden Spice,” 2011.

532 [2] N. H. Al-bahtiti, “A Study of Preservative Effects of Sesame Oil ( Sesamum indicum L . ) On 533 Mashed Potatoes,” vol. 2, no. 11, pp. 6–10, 2015.

534 [3] K. Ewon and A. S. Bhagya, “A review on golden species of Zingiberaceae family around the 535 world: Genus Curcuma,” African J. Agric. Res., 2019, doi: 10.5897/ajar2018.13755.

536 [4] A. Chakraborty, S. Kundu, S. Mukherjee, and B. Ghosh, “Endophytism in Zingiberaceae: 537 Elucidation of Beneficial Impact,” 2019.

538 [5] J. Kroymann, “Natural diversity and adaptation in plant secondary metabolism,” Current 539 Opinion in Plant Biology. 2011, doi: 10.1016/j.pbi.2011.03.021.

540 [6] J. L. Berini et al., “Combinations of abiotic factors differentially alter production of plant 541 secondary metabolites in five woody plant species in the boreal-temperate transition zone,” 542 Front. Plant Sci., 2018, doi: 10.3389/fpls.2018.01257.

543 [7] M. Wink, “Modes of Action of Herbal Medicines and Plant Secondary Metabolites,” 544 Medicines, 2015, doi: 10.3390/medicines2030251.

545 [8] T. Kita, S. Imai, H. Sawada, H. Kumagai, and H. Seto, “The biosynthetic pathway of 546 curcuminoid in turmeric (Curcuma longa) as revealed by 13C-labeled precursors,” Biosci. 547 Biotechnol. Biochem., 2008, doi: 10.1271/bbb.80075.

548 [9] A. Afzal, G. Oriqat, M. Akram Khan, J. Jose, and M. Afzal, “Chemistry and Biochemistry of 549 Terpenoids from Curcuma and Related Species,” J. Biol. Act. Prod. from Nat., 2013, doi: 550 10.1080/22311866.2013.782757.

551 [10] H. J. Koo and D. R. Gang, “Suites of Terpene Synthases Explain Differential Terpenoid 552 Production in and Turmeric Tissues,” PLoS One, 2012, doi: 553 10.1371/journal.pone.0051481.

554 [11] T. E. Sheeja, K. Deepa, R. Santhi, and B. Sasikumar, “Comparative Transcriptome Analysis of 555 Two Species of Curcuma Contrasting in a High-Value Compound Curcumin: Insights into 556 Genetic Basis and Regulation of Biosynthesis,” Plant Mol. Biol. Report., 2015, doi: 557 10.1007/s11105-015-0878-6.

558 [12] N. Singh and A. Sharma, “Turmeric (Curcuma longa): miRNAs and their regulating targets are 559 involved in development and secondary metabolite pathways,” Comptes Rendus - Biol., 2017, bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

560 doi: 10.1016/j.crvi.2017.09.009.

561 [13] J. S. Jurenka, “Anti-inflammatory properties of curcumin, a major constituent of Curcuma 562 longa: A review of preclinical and clinical research,” Alternative Medicine Review. 2009.

563 [14] A. Gupta et al., “ Association of Flavonifractor plautii , a Flavonoid-Degrading Bacterium, with 564 the Gut Microbiome of Colorectal Cancer Patients in India ,” mSystems, 2019, doi: 565 10.1128/msystems.00438-19.

566 [15] “ROLE OF FLAVONOIDS IN HUMAN NUTRITION AS HEALTH PROMOTING NATURAL 567 CHEMICALS - A REVIEW,” J. Appl. Pharm., 2014.

568 [16] L. G. Korkina, “Phenylpropanoids as naturally occurring : From plant defense to 569 human health,” Cellular and Molecular Biology. 2007, doi: 10.1170/T772.

570 [17] R. S. Annadurai et al., “De Novo Transcriptome Assembly (NGS) of Curcuma longa L. Rhizome 571 Reveals Novel Transcripts Related to Anticancer and Antimalarial Terpenoids,” PLoS One, 572 2013, doi: 10.1371/journal.pone.0056217.

573 [18] S. Zorofchian Moghadamtousi, H. Abdul Kadir, P. Hassandarvish, H. Tajik, S. Abubakar, and K. 574 Zandi, “A review on antibacterial, antiviral, and antifungal activity of curcumin,” BioMed 575 Research International. 2014, doi: 10.1155/2014/186864.

576 [19] I. Chattopadhyay, K. Biswas, U. Bandyopadhyay, and R. K. Banerjee, “Turmeric and curcumin: 577 Biological actions and medicinal applications,” Current Science. 2004.

578 [20] A. Rahmani, M. Alsahli, S. Aly, M. Khan, and Y. Aldebasi, “Role of Curcumin in Disease 579 Prevention and Treatment,” Adv. Biomed. Res., 2018, doi: 10.4103/abr.abr_147_16.

580 [21] Sreejayan and M. N. A. Rao, “Nitric oxide scavenging by curcuminoids,” J. Pharm. Pharmacol., 581 1997, doi: 10.1111/j.2042-7158.1997.tb06761.x.

582 [22] R. S. Bhardwaj, K. S. Bhardwaj, D. Ranjeet, and N. Ganesh, “Curcuma longa leaves exhibits a 583 potential antioxidant, antibacterial and immunomodulating properties,” Int. J. Phytomedicine, 584 2011.

585 [23] B. Dutta, “Study of secondary metabolite constituents and curcumin contents of six different 586 species of genus Curcuma,” J. Med. Plants Stud., 2015.

587 [24] A. Sahoo, S. Jena, S. Sahoo, S. Nayak, and B. Kar, “Resequencing of Curcuma longa L. cv. 588 Kedaram through transcriptome profiling reveals various novel transcripts,” Genomics Data, 589 2016, doi: 10.1016/j.gdata.2016.08.010. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

590 [25] A. Sahoo, B. Kar, S. Sahoo, A. Ray, and S. Nayak, “Transcriptome profiling of Curcuma longa L. 591 cv. Suvarna,” Genomics Data, 2016, doi: 10.1016/j.gdata.2016.09.001.

592 [26] J. Pellicer and I. J. Leitch, “The Plant DNA C-values database (release 7.1): an updated online 593 repository of plant genome size data for comparative studies,” New Phytologist. 2020, doi: 594 10.1111/nph.16261.

595 [27] J. T. Simpson and R. Durbin, “Efficient de novo assembly of large genomes using compressed 596 data structures,” Genome Res., 2012, doi: 10.1101/gr.126953.111.

597 [28] N. I. Weisenfeld, V. Kumar, P. Shah, D. M. Church, and D. B. Jaffe, “Direct determination of 598 diploid genome sequences,” Genome Res., 2017, doi: 10.1101/gr.214874.116.

599 [29] S. D. Jackman et al., “Tigmint: Correcting assembly errors using linked reads from large 600 molecules,” BMC Bioinformatics, 2018, doi: 10.1186/s12859-018-2425-6.

601 [30] S. Yeo, L. Coombe, R. L. Warren, J. Chu, and I. Birol, “ARCS: Scaffolding genome drafts with 602 linked reads,” Bioinformatics, 2018, doi: 10.1093/bioinformatics/btx675.

603 [31] S. V. Zhang, L. Zhuo, and M. W. Hahn, “AGOUTI: Improving genome assembly and annotation 604 using transcriptome data,” Gigascience, 2016, doi: 10.1186/s13742-016-0136-3.

605 [32] D. Paulino, R. L. Warren, B. P. Vandervalk, A. Raymond, S. D. Jackman, and I. Birol, “Sealer: A 606 scalable gap-closing application for finishing draft genomes,” BMC Bioinformatics, 2015, doi: 607 10.1186/s12859-015-0663-4.

608 [33] B. J. Walker et al., “Pilon: An integrated tool for comprehensive microbial variant detection 609 and genome assembly improvement,” PLoS One, 2014, doi: 10.1371/journal.pone.0112963.

610 [34] F. A. Simão, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, and E. M. Zdobnov, “BUSCO: 611 Assessing genome assembly and annotation completeness with single-copy orthologs,” 612 Bioinformatics, 2015, doi: 10.1093/bioinformatics/btv351.

613 [35] N. Matasci et al., “Data access for the 1,000 Plants (1KP) project,” GigaScience. 2014, doi: 614 10.1186/2047-217X-3-17.

615 [36] A. M. Bolger, M. Lohse, and B. Usadel, “Trimmomatic: A flexible trimmer for Illumina 616 sequence data,” Bioinformatics, 2014, doi: 10.1093/bioinformatics/btu170.

617 [37] B. J. Haas et al., “De novo transcript sequence reconstruction from RNA-seq using the Trinity 618 platform for reference generation and analysis,” Nat. Protoc., 2013, doi: 619 10.1038/nprot.2013.084. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

620 [38] L. Fu, B. Niu, Z. Zhu, S. Wu, and W. Li, “CD-HIT: Accelerated for clustering the next-generation 621 sequencing data,” Bioinformatics, 2012, doi: 10.1093/bioinformatics/bts565.

622 [39] J. M. Flynn et al., “RepeatModeler2 for automated genomic discovery of transposable 623 element families,” Proc. Natl. Acad. Sci. U. S. A., 2020, doi: 10.1073/pnas.1921046117.

624 [40] M. S. Campbell, C. Holt, B. Moore, and M. Yandell, “Genome Annotation and Curation Using 625 MAKER and MAKER-P,” Curr. Protoc. Bioinforma., 2014, doi: 10.1002/0471250953.bi0411s48.

626 [41] M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, and B. Morgenstern, “AUGUSTUS: A b 627 initio prediction of alternative transcripts,” Nucleic Acids Res., 2006, doi: 10.1093/nar/gkl200.

628 [42] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search 629 tool,” J. Mol. Biol., 1990, doi: 10.1016/S0022-2836(05)80360-2.

630 [43] G. Benson, “Tandem repeats finder: A program to analyze DNA sequences,” Nucleic Acids 631 Res., 1999, doi: 10.1093/nar/27.2.573.

632 [44] S. Griffiths-Jones, H. K. Saini, S. Van Dongen, and A. J. Enright, “miRBase: Tools for microRNA 633 genomics,” Nucleic Acids Res., 2008, doi: 10.1093/nar/gkm952.

634 [45] P. P. Chan and T. M. Lowe, “tRNAscan-SE: Searching for tRNA genes in genomic sequences,” 635 in Methods in Molecular Biology, 2019.

636 [46] B. E. Suzek, Y. Wang, H. Huang, P. B. McGarvey, and C. H. Wu, “UniRef clusters: A 637 comprehensive and scalable alternative for improving sequence similarity searches,” 638 Bioinformatics, 2015, doi: 10.1093/bioinformatics/btu739.

639 [47] A. Bateman, “The Pfam protein families database,” Nucleic Acids Res., 2004, doi: 640 10.1093/nar/gkh121.

641 [48] R. D. Finn, J. Clements, and S. R. Eddy, “HMMER web server: Interactive sequence similarity 642 searching,” Nucleic Acids Res., 2011, doi: 10.1093/nar/gkr367.

643 [49] B. Buchfink, C. Xie, and D. H. Huson, “Fast and sensitive protein alignment using DIAMOND,” 644 Nature Methods. 2014, doi: 10.1038/nmeth.3176.

645 [50] S. K. Jaiswal et al., “Genome Sequence of Peacock Reveals the Peculiar Case of a Glittering 646 Bird,” Front. Genet., 2018, doi: 10.3389/fgene.2018.00392.

647 [51] D. Bolser, D. M. Staines, E. Pritchard, and P. Kersey, “Ensembl plants: Integrating tools for 648 visualizing, mining, and analyzing plant genomics data,” in Methods in Molecular Biology, bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

649 2016.

650 [52] D. M. Emms and S. Kelly, “OrthoFinder: Phylogenetic orthology inference for comparative 651 genomics,” Genome Biol., 2019, doi: 10.1186/s13059-019-1832-y.

652 [53] D. R. Laetsch and M. L. Blaxter, “KinFin: Software for taxon-aware analysis of clustered 653 protein sequences,” G3 Genes, Genomes, Genet., 2017, doi: 10.1534/g3.117.300233.

654 [54] K. Katoh and D. M. Standley, “MAFFT multiple sequence alignment software version 7: 655 Improvements in performance and usability,” Mol. Biol. Evol., 2013, doi: 656 10.1093/molbev/mst010.

657 [55] A. Stamatakis, “RAxML version 8: A tool for phylogenetic analysis and post-analysis of large 658 phylogenies,” Bioinformatics, 2014, doi: 10.1093/bioinformatics/btu033.

659 [56] T. Jombart and S. Dray, “Adephylo: Exploratory Analyses for the Phylogenetic Comparative 660 Method,” Bioinformatics, 2010, doi: 10.1093/bioinformatics/btq292.

661 [57] P. C. Ng and S. Henikoff, “SIFT: Predicting amino acid changes that affect protein function,” 662 Nucleic Acids Res., 2003, doi: 10.1093/nar/gkg509.

663 [58] A. Bateman et al., “UniProt: A hub for protein information,” Nucleic Acids Res., 2015, doi: 664 10.1093/nar/gku989.

665 [59] K. Liu et al., “SATé-II: Very fast and accurate simultaneous estimation of multiple sequence 666 alignments and phylogenetic trees,” Syst. Biol., 2012, doi: 10.1093/sysbio/syr095.

667 [60] A. Löytynoja, “Phylogeny-aware alignment with PRANK,” Methods Mol. Biol., 2014, doi: 668 10.1007/978-1-62703-646-7_10.

669 [61] R. C. Edgar, “MUSCLE: A multiple sequence alignment method with reduced time and space 670 complexity,” BMC Bioinformatics, 2004, doi: 10.1186/1471-2105-5-113.

671 [62] P. Rice, L. Longden, and A. Bleasby, “EMBOSS: The European Molecular Biology Open 672 Software Suite,” Trends in Genetics. 2000, doi: 10.1016/S0168-9525(00)02024-2.

673 [63] Z. Yang, “PAML 4: Phylogenetic analysis by maximum likelihood,” Mol. Biol. Evol., 2007, doi: 674 10.1093/molbev/msm088.

675 [64] P. Mittal, S. K. Jaiswal, N. Vijay, R. Saxena, and V. K. Sharma, “Comparative analysis of 676 corrected tiger genome provides clues to its neuronal evolution,” Sci. Rep., 2019, doi: 677 10.1038/s41598-019-54838-z. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

678 [65] Y. Moriya, M. Itoh, S. Okuda, A. C. Yoshizawa, and M. Kanehisa, “KAAS: An automatic genome 679 annotation and pathway reconstruction server,” Nucleic Acids Res., 2007, doi: 680 10.1093/nar/gkm321.

681 [66] J. Huerta-Cepas et al., “Fast genome-wide functional annotation through orthology 682 assignment by eggNOG-mapper,” Mol. Biol. Evol., 2017, doi: 10.1093/molbev/msx148.

683 [67] Y. Liao, J. Wang, E. J. Jaehnig, Z. Shi, and B. Zhang, “WebGestalt 2019: gene set analysis toolkit 684 with revamped UIs and APIs,” Nucleic Acids Res., 2019, doi: 10.1093/nar/gkz401.

685 [68] Y. Katsuyama, T. Kita, and S. Horinouchi, “Identification and characterization of multiple 686 curcumin synthases from the herb Curcuma longa,” FEBS Lett., 2009, doi: 687 10.1016/j.febslet.2009.07.029.

688 [69] R. Singh et al., “Oil palm genome sequence reveals divergence of interfertile species in Old 689 and New worlds,” Nature, 2013, doi: 10.1038/nature12309.

690 [70] S. N. Chan, N. Abu Bakar, M. Mahmood, C. L. Ho, and N. A. Shaharuddin, “Molecular Cloning 691 and Characterization of Novel Phytocystatin Gene from Turmeric, Curcuma longa,” Biomed 692 Res. Int., 2014, doi: 10.1155/2014/973790.

693 [71] R. Ren et al., “Widespread Whole Genome Duplications Contribute to Genome Complexity 694 and Species Diversity in Angiosperms,” Mol. Plant, 2018, doi: 10.1016/j.molp.2018.01.002.

695 [72] H. Qian and Y. Jin, “An updated megaphylogeny of plants, a tool for generating plant 696 phylogenies and an analysis of phylogenetic community structure,” J. Plant Ecol., 2016, doi: 697 10.1093/jpe/rtv047.

698 [73] K. Silvera, K. Winter, B. Leticia Rodriguez, R. L. Albion, and J. C. Cushman, “Multiple isoforms 699 of phosphoenolpyruvate carboxylase in the Orchidaceae (subtribe Oncidiinae): Implications 700 for the evolution of crassulacean acid metabolism,” J. Exp. Bot., 2014, doi: 701 10.1093/jxb/eru234.

702 [74] J. Zeier, “New insights into the regulation of plant immunity by amino acid metabolic 703 pathways,” Plant, Cell and Environment. 2013, doi: 10.1111/pce.12122.

704 [75] E. Widemann et al., “The amidohydrolases IAR3 and ILL6 contribute to jasmonoyl-isoleucine 705 hormone turnover and generate 12-hydroxyjasmonic acid upon wounding in arabidopsis 706 leaves,” J. Biol. Chem., 2013, doi: 10.1074/jbc.M113.499228.

707 [76] T. Vellosillo et al., “Defense activated by 9-lipoxygenase-derived oxylipins requires specific bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

708 mitochondrial proteins,” Plant Physiol., 2013, doi: 10.1104/pp.112.207514.

709 [77] J. Grimplet, D. Pimentel, P. Agudelo-Romero, J. M. Martinez-Zapater, and A. M. Fortes, “The 710 LATERAL ORGAN BOUNDARIES Domain gene family in grapevine: Genome-wide 711 characterization and expression analyses during developmental processes and stress 712 responses,” Sci. Rep., 2017, doi: 10.1038/s41598-017-16240-5.

713 [78] O. S. D. Wally, M. M. Mira, R. D. Hill, and C. Stasolla, “Hemoglobin regulation of plant 714 embryogenesis and plant pathogen interaction,” Plant Signaling and Behavior. 2013, doi: 715 10.4161/psb.25264.

716 [79] R. F. Benevenuto, T. Seldal, S. J. Hegland, C. Rodriguez-Saona, J. Kawash, and J. Polashock, 717 “Transcriptional profiling of methyl jasmonate-induced defense responses in bilberry 718 (Vaccinium myrtillus L.),” BMC Plant Biol., 2019, doi: 10.1186/s12870-019-1650-0.

719 [80] X. Zhou, Y. Jiang, and D. Yu, “WRKY22 transcription factor mediates dark-induced leaf 720 senescence in Arabidopsis,” Mol. Cells, 2011, doi: 10.1007/s10059-011-0047-1.

721 [81] Y. Li, R. Scott, J. Doughty, M. Grant, and B. Qi, “Protein S-Acyltransferase 14: A specific role 722 for palmitoylation in leaf senescence in arabidopsis1[OPEN],” Plant Physiol., 2016, doi: 723 10.1104/pp.15.00448.

724 [82] A. Singh, P. Dwivedi, and C. Padmanabh Dwivedi, “Methyl-jasmonate and salicylic acid as 725 potent elicitors for secondary metabolite production in medicinal plants: A review,” J. 726 Pharmacogn. Phytochem., 2018.

727 [83] L. Zhang, B. C. Paasch, J. Chen, B. Day, and S. Y. He, “An important role of l-fucose 728 biosynthesis and protein fucosylation genes in Arabidopsis immunity,” New Phytol., 2019, 729 doi: 10.1111/nph.15639.

730 [84] W. Wang, G. Liu, H. Niu, M. P. Timko, and H. Zhang, “The F-box protein COI1 functions 731 upstream of MYB305 to regulate primary carbohydrate metabolism in tobacco (Nicotiana 732 tabacum L. cv. TN90),” J. Exp. Bot., 2014, doi: 10.1093/jxb/eru084.

733 [85] K. Kojo et al., “Regulatory mechanisms of ROI generation are affected by rice spl mutations,” 734 Plant Cell Physiol., 2006, doi: 10.1093/pcp/pcj074.

735 [86] L. R. Zeng, M. E. Vega-Sánchez, T. Zhu, and G. L. Wang, “Ubiquitination-mediated protein 736 degradation and modification: An emerging theme in plant-microbe interactions,” 2006, doi: 737 10.1038/sj.cr.7310053. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

738 [87] D. Yee and D. R. Goring, “The diversity of plant U-box E3 ubiquitin ligases: From upstream 739 activators to downstream target substrates,” 2009, doi: 10.1093/jxb/ern369.

740 [88] M. Moin, A. Bakshi, A. Saha, M. Dutta, S. M. Madhav, and P. B. Kirti, “Rice ribosomal protein 741 large subunit genes and their spatio-temporal and stress regulation,” Front. Plant Sci., 2016, 742 doi: 10.3389/fpls.2016.01284.

743 [89] S. Nagaraj, M. Senthil-Kumar, V. S. Ramu, K. Wang, and K. S. Mysore, “Plant ribosomal 744 proteins, RPL12 and RPL19, play a role in nonhost disease resistance against bacterial 745 pathogens,” Front. Plant Sci., 2016, doi: 10.3389/fpls.2015.01192.

746 [90] F. Qi and F. Zhang, “Cell Cycle Regulation in the Plant Response to Stress,” Frontiers in Plant 747 Science. 2020, doi: 10.3389/fpls.2019.01765.

748 [91] Z. Bao and J. Hua, “Interaction of CPR5 with cell cycle regulators UVI4 and OSD1 in 749 Arabidopsis,” PLoS One, 2014, doi: 10.1371/journal.pone.0100347.

750 [92] A. Mosblech, S. König, I. Stenzel, P. Grzeganek, I. Feussner, and I. Heilmann, 751 “Phosphoinositide and inositolpolyphosphate signalling in defense responses of Arabidopsis 752 thaliana challenged by mechanical wounding,” Mol. Plant, 2008, doi: 10.1093/mp/ssm028.

753 [93] K. Mikami, T. Katagiri, S. Luchi, K. Yamaguchi-Shinozaki, and K. Shinozaki, “A gene encoding 754 phosphatidylinositol-4-phosphate 5-kinase is induced by water stress and abscisic acid in 755 Arabidopsis thaliana,” Plant J., 1998, doi: 10.1046/j.1365-313X.1998.00227.x.

756 [94] M. Siqueira and L. Gomes, “Functional Diversity of Early Responsive to Dehydration (ERD) 757 Genes in Soybean,” in A Comprehensive Survey of International Soybean Research - Genetics, 758 Physiology, Agronomy and Nitrogen Relationships, 2013.

759 [95] J. C. Miller, W. R. Chezem, and N. K. Clay, “Ternary WD40 repeat-containing protein 760 complexes: Evolution, composition and roles in plant immunity,” Frontiers in Plant Science. 761 2016, doi: 10.3389/fpls.2015.01108.

762 [96] M. Sharma and G. K. Pandey, “Expansion and function of repeat domain proteins during 763 stress and development in plants,” Frontiers in Plant Science. 2016, doi: 764 10.3389/fpls.2015.01218.

765 [97] Y. Kawahara, T. Hashimoto, H. Nakayama, and Y. Kitamura, “Galactose oxidase/kelch repeat- 766 containing protein is involved in the iron deficiency stress response in the roots of 767 hyoscyamus albus,” Plant Root, 2017, doi: 10.3117/plantroot.11.58. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

768 [98] W. C. Liu, Y. H. Li, H. M. Yuan, B. L. Zhang, S. Zhai, and Y. T. Lu, “WD40-REPEAT 5a functions in 769 drought stress tolerance by regulating nitric oxide accumulation in Arabidopsis,” Plant Cell 770 Environ., 2017, doi: 10.1111/pce.12723.

771 [99] S. Choe, B. P. Dilkes, S. Fujioka, S. Takatsuto, A. Sakurai, and K. A. Feldmann, “The DWF4 gene 772 of Arabidopsis encodes a cytochrome P450 that mediates multiple 22α-hydroxylation steps in 773 brassinosteroid biosynthesis,” Plant Cell, 1998, doi: 10.1105/tpc.10.2.231.

774 [100] C. Gosch, H. Halbwirth, and K. Stich, “Phloridzin: Biosynthesis, distribution and physiological 775 relevance in plants,” Phytochemistry. 2010, doi: 10.1016/j.phytochem.2010.03.003.

776 [101] S. J. Nintemann et al., “Unravelling protein-protein interaction networks linked to aliphatic 777 and indole glucosinolate biosynthetic pathways in Arabidopsis,” Front. Plant Sci., 2017, doi: 778 10.3389/fpls.2017.02028.

779 [102] B. A. Moffatt and E. A. Weretilnyk, “Sustaining S-adenosyl-L-methionine-dependent 780 methyltransferase activity in plant cells,” Physiologia Plantarum. 2001, doi: 10.1034/j.1399- 781 3054.2001.1130401.x.

782 [103] T. Bureau, K. C. Lam, R. K. Ibrahim, B. Behdad, and S. Dayanandan, “Structure, function, and 783 evolution of plant O-methyltransferases,” Genome, 2007, doi: 10.1139/G07-077.

784 [104] S. K. Kohli et al., “Therapeutic potential of brassinosteroids in biomedical and clinical 785 research,” Biomolecules. 2020, doi: 10.3390/biom10040572.

786 [105] I. T. Johnson, “Glucosinolates: Bioavailability and importance to health,” 2002, doi: 787 10.1024/0300-9831.72.1.26.

788 [106] W. Guo, H. Nazim, Z. Liang, and D. Yang, “Magnesium deficiency in plants: An urgent 789 problem,” Crop Journal. 2016, doi: 10.1016/j.cj.2015.11.003.

790 [107] H. H. Nour-Eldin et al., “NRT/PTR transporters are essential for translocation of glucosinolate 791 defence compounds to seeds,” Nature, 2012, doi: 10.1038/nature11285.

792 [108] H. D. Pham et al., “The developmental and iron nutritional pattern of PIC1 and NiCo does not 793 support their interdependent and exclusive collaboration in chloroplast iron transport in 794 Brassica napus,” Planta, 2020, doi: 10.1007/s00425-020-03388-0.

795 [109] S. Ahmed et al., “Altered expression of polyamine transporters reveals a role for spermidine 796 in the timing of flowering and other developmental response pathways,” Plant Sci., 2017, doi: 797 10.1016/j.plantsci.2016.12.002. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

798 [110] L. Chen et al., “Identification and expression analysis of MATE genes involved in flavonoid 799 transport in blueberry plants,” PLoS One, 2015, doi: 10.1371/journal.pone.0118578.

800 [111] X. Wang et al., “Identification and functional characterization of three type III polyketide 801 synthases from Aquilaria sinensis calli,” Biochem. Biophys. Res. Commun., 2017, doi: 802 10.1016/j.bbrc.2017.03.159.

803 [112] J. L. Rodrigues, K. L. J. Prather, L. D. Kluskens, and L. R. Rodrigues, “Heterologous Production 804 of Curcuminoids,” Microbiol. Mol. Biol. Rev., 2015, doi: 10.1128/mmbr.00031-14.

805 [113] I. S. Sandeep et al., “Differential expression of CURS gene during various growth stages, 806 climatic condition and soil nutrients in turmeric (Curcuma longa): Towards site specific 807 cultivation for high curcumin yield,” Plant Physiol. Biochem., 2017, doi: 808 10.1016/j.plaphy.2017.07.001.

809 [114] S. Wannapinpong, K. Srikulnath, A. Thongpan, K. Choowongkomon, and S. Peyachoknagul, 810 “Molecular cloning and characterization of the CHS gene family in turmeric (Curcuma longa 811 Linn.),” J. Plant Biochem. Biotechnol., 2013, doi: 10.1007/s13562-013-0232-8.

812 [115] Y. Pang et al., “Characterization and expression of chalcone synthase gene from Ginkgo 813 biloba,” Plant Sci., 2005, doi: 10.1016/j.plantsci.2005.02.003.

814 [116] X. Xue et al., “Curcumin as a multidrug resistance modulator - A quick review,” Biomedicine 815 and Preventive Nutrition. 2013, doi: 10.1016/j.bionut.2012.12.001.

816 [117] A. Ott et al., “Linked read technology for assembling large complex and polyploid genomes,” 817 BMC Genomics, 2018, doi: 10.1186/s12864-018-5040-z.

818 [118] C. Q. Xu et al., “Genome sequence of Malania oleifera, a tree with great value for nervonic 819 acid production,” Gigascience, 2019, doi: 10.1093/gigascience/giy164.

820 [119] A. Amalraj, A. Pius, S. Gopi, and S. Gopi, “Biological activities of curcuminoids, other 821 biomolecules from turmeric and their derivatives – A review,” Journal of Traditional and 822 Complementary Medicine. 2017, doi: 10.1016/j.jtcme.2016.05.005.

823 [120] H. Q. Ling et al., “Genome sequence of the progenitor of wheat A subgenome Triticum 824 urartu,” Nature, 2018, doi: 10.1038/s41586-018-0108-0.

825 [121] A. D’hont et al., “The banana (Musa acuminata) genome and the evolution of 826 monocotyledonous plants,” Nature, 2012, doi: 10.1038/nature11241.

827 [122] U. P. Rayirath, R. R. Lada, C. D. Caldwell, S. K. Asiedu, and K. J. Sibley, “Role of ethylene and bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.286245; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

828 jasmonic acid on rhizome induction and growth in rhubarb (Rheum rhabarbarum L.),” Plant 829 Cell. Tissue Organ Cult., 2011, doi: 10.1007/s11240-010-9861-y.

830 [123] T. T. H. Dao, H. J. M. Linthorst, and R. Verpoorte, “Chalcone synthase and its functions in 831 plant resistance,” Phytochem. Rev., 2011, doi: 10.1007/s11101-011-9211-7.

832 [124] S. A. Pandith et al., “Functional promiscuity of two divergent paralogs of type III plant 833 polyketide synthases,” Plant Physiol., 2016, doi: 10.1104/pp.16.00003.

834 [125] A. Ramakrishna and G. A. Ravishankar, “Influence of abiotic stress signals on secondary 835 metabolites in plants,” Plant Signaling and Behavior. 2011, doi: 10.4161/psb.6.11.17613.

836 [126] T. Isah, “Stress and defense responses in plant secondary metabolites production,” Biological 837 research. 2019, doi: 10.1186/s40659-019-0246-3.

838

839