Planta DOI 10.1007/s00425-015-2304-6

ORIGINAL ARTICLE

Next-generation sequencing (NGS) transcriptomes reveal association of multiple genes and pathways contributing to secondary metabolites accumulation in tuberous roots of heterophyllum Wall.

1 1 1 1 Tarun Pal • Nikhil Malhotra • Sree Krishna Chanumolu • Rajinder Singh Chauhan

Received: 13 February 2015 / Accepted: 10 April 2015 Ó Springer-Verlag Berlin Heidelberg 2015

Abstract transcriptomes, respectively. In silico expression profiling of Main conclusion The transcriptomes of Aconitum the mevalonate/2-C-methyl-D-erythritol 4-phosphate (non- heterophyllum were assembled and characterized for mevalonate) pathway genes for aconites biosynthesis re- the first time to decipher molecular components con- vealed 4 genes HMGR (3-hydroxy-3-methylglutaryl-CoA tributing to biosynthesis and accumulation of metabo- reductase), MVK (mevalonate kinase), MVDD (mevalonate lites in tuberous roots. diphosphate decarboxylase) and HDS (1-hydroxy-2-methyl- 2-(E)-butenyl 4-diphosphate synthase) with higher expres- Aconitum heterophyllum Wall., popularly known as Atis, is a sion in root transcriptome compared to shoot transcriptome high-value medicinal herb of North-Western Himalayas. No suggesting their key role in biosynthesis of aconite alkaloids. information exists as of today on genetic factors contributing Five genes, GMPase (geranyl diphosphate mannose py- to the biosynthesis of secondary metabolites accumulating in rophosphorylase), SHAGGY, RBX1 (RING-box protein 1), tuberous roots, thereby, limiting genetic interventions to- SRF receptor kinases and b-amylase, implicated in tuberous wards genetic improvement of A. heterophyllum.Illumina root formation in other species showed higher levels of paired-end sequencing followed by de novo assembly yielded expression in tuberous roots compared to shoots. A total of 75,548 transcripts for root transcriptome and 39,100 tran- 15,487 transcription factors belonging to bHLH, MYB, bZIP scripts for shoot transcriptome with minimum length of families and 399 ABC transporters which regulate biosyn- 200 bp. Biological role analysis of root versus shoot tran- thesis and accumulation of bioactive compounds were iden- scriptomes assigned 27,596 and 16,604 root transcripts; tified in root and shoot transcriptomes. The expression of 5 12,340 and 9398 shoot transcripts into gene ontology and ABC transporters involved in tuberous root development was clusters of orthologous group, respectively. KEGG pathway validated by quantitative PCR analysis. Network connectivity mapping assigned 37 and 31 transcripts onto starch–sucrose diagrams were drawn for starch–sucrose metabolism and metabolism while 329 and 341 KEGG orthologies associated isoquinoline alkaloid biosynthesis associated with tuberous with transcripts were found to be involved in biosynthesis of root growth and secondary metabolism, respectively, in root various secondary metabolites for root and shoot transcriptome of A. heterophyllum. The current endeavor will be of practical importance in planning a suitable genetic in- Tarun Pal and Nikhil Malhotra contributed equally to this work. tervention strategy for the improvement of A. heterophyllum.

Electronic supplementary material The online version of this Keywords Aconitum Á Network connectivity diagrams Á article (doi:10.1007/s00425-015-2304-6) contains supplementary material, which is available to authorized users. Pathway mapping Á RNA-seq Á Transcript abundance Á Transcriptome analysis Á Tuberous roots & Rajinder Singh Chauhan [email protected] Abbreviations AHSR Root transcriptome 1 Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, AHSS Shoot transcriptome Waknaghat 173234, Himachal Pradesh, India COG Cluster of orthologous group 123 Planta

GO Gene ontology on tuberous root formation process of Ipomoea batatas KO KEGG orthology and Manihot esculenta (Indira and Kurian 1977; Wang MEP 2-C-Methyl-D-erythritol 4-phosphate et al. 2005). The genes/proteins like SRF, GIGANTEA, MVA Mevalonate MADS-box and NAM-like have been implicated in the NGS Next-generation sequencing tuberous root development in these plant species (You FPKM Fragments per kilobase of transcript per million et al. 2003; Tanaka et al. 2005; Sheffield et al. 2006). fragments mapped The activity of AGPase and associated enzymes in SSR Simple sequence repeat tuberous roots has been found to regulate biomass yield in M. esculenta (Ihemere et al. 2006). Moreover, the underlying genetic mechanism that controls tuberous root formation in Rehmannia glutinosa has been carried out to unravel the role of tuberous root development genes (Sun Introduction et al. 2010). However, no molecular information exists on the tuberous root development in A. heterophyllum. Herbs are staging a comeback and ‘herbal revolution’ is The process of tuberous root development in this plant happening all over the globe. The North-Western Hi- species is simple, yet interesting. The roots grow im- malayas are a reservoir of important medicinal plant mediately after seed germination from the radicle and species that are used either in the preparation of herbal those primary roots transform into tuberous roots rather drugs or for the isolation of highly valuable phyto- formed from adventitious roots due to secondary growth chemicals/metabolites. Atis (Aconitum heterophyllum of vascular cambium in other plant species. Morpho- Wall.) is an important medicinal plant species belong- logically distinct developmental stages of tuberous root ing to family . It is a biennial herb dis- formation in A. heterophyllum are depicted in Fig. 1. The tributed in sub-alpine and alpine regions of the North- proliferation and swelling of primary tuberous root re- Western Himalayas found at altitudes of 2400–3600 m. sults in further branching and bulking of storage organ The tuberous roots of A. heterophyllum contain non- with the passage of time. toxic alkaloids like atisine, aconitine, hetidine and The availability of whole transcriptome data can be used heterophyllinine (Pelletier et al. 1968; Zhaohong et al. not only to discover candidate genes involved in tuberous 2006) which are widely used in the treatment of diar- root development and secondary metabolites production, rhea, vomiting, cough, cold, etc. (Thatte et al. 1993; but also for understanding molecular basis of various bio- Mitra et al. 2001). A. heterophyllum has been listed as a logical processes (Sigurdsson et al. 2010; Xie et al. 2012). ‘critically endangered species’ by the International Further, the whole transcriptomes offer an opportunity to Union for Conservation of Nature and Natural Re- explore microsatellites in expressed genes which may en- sources (Nautiyal et al. 2002;Srivastavaetal.2011). In rich the number of molecular markers to assist DNA fin- order to meet the ever increasing industrial demands gerprinting, gene mapping and marker-assisted breeding in due to its vast medicinal properties and healthcare A. heterophyllum. The next-generation sequencing (NGS) needs, A. heterophyllum needs to be grown commer- technique has been successfully utilized in analyzing the cially. No information exists as of today on genetic non-model plant species including, I. batatas (Wang et al. improvement of A. heterophyllum either towards in- 2010c), Hevea brasiliensis (Xia et al. 2011), Jatropha creased production of secondary metabolites or in the curcas (Costa et al. 2010), Sesamum indicum (Wei et al. biomass (tuberous roots) accumulating those metabo- 2011), Picrorhiza kurroa (Gahlan et al. 2012), Boehmeria lites. Moreover, a genetic intervention strategy would nivea (Liu et al. 2013) and so on. As a first step to gain require information on candidate genes contributing to insight of tuberous root development, we utilized Illumina the formation of tuberous roots. Generating whole paired-end sequencing technology to characterize mole- genome transcriptome of A. heterophyllum would, cular components that are possibly involved in tuberous therefore, be an ideal staring point to capture genetic root formation in A. heterophyllum. Fragments per kilobase components contributing to a trait of economic impor- of transcript per million mapped fragment (FPKM) based tance (Hussain et al. 2012). comparative expression profiling study was done to sys- Tuberous roots are storage organs to store nutrients tematically characterize the RNAs to identify differentially that are required as a source of energy for regeneration. regulated genes involved in tuberous root formation. Net- Their formation and development constitutes complex work connectivity diagrams were drawn for the molecules biological processes involving morphogenesis as well as that interact in various developmental and regulatory pro- dry matter accumulation. Several studies involving ana- cesses for biosynthesis of aconites and tuberous root de- tomical and physiological functions have been carried out velopment in A. heterophyllum. 123 Planta

Fig. 1 Schematic representation of root developmental stages in A. 24-month-old age group represents intermediate stage (c, d); heterophyllum showing morphological changes occurring during 36-month-old represents mature stage (e) having fully developed tuberous root development. Tuberous roots of 6 months and 12-mon- tuberous roots th-old age group represents young stage (a, b); 18-month and

Materials and methods started with mRNA fragmentation followed by reverse transcription, second-strand synthesis, pair-end adapter li- The stepwise detailed execution of methodology followed gation, and finally ended by index PCR amplification of for the transcriptome analysis of two tissues is shown in adaptor-ligated library. Library quantification and qualifi- Fig. 2. cation was performed on Caliper Lab Chip GX using HT DNA High Sensitivity Assay Kit. Plant material De novo assembly and sequence clustering Seeds of A. heterophyllum were germinated under natural conditions in the nursery of Himalayan Forest Research Paired-end sequencing allows the template fragments to be Institute at Shilaru, Himachal Pradesh (2450 m altitude, sequenced in both the forward and reverse directions. 31°230N, 77°440E, India). Since the plant starts developing Cluster generation was carried out by hybridization of tuberous roots immediately after seed germination, roots template DNA molecules onto the oligonucleotide-coated and shoots of different age groups (6–18 months) were surface of the flow cell. Immobilized DNA template copies collected, pooled together, frozen immediately in liquid were amplified by bridge amplification to generate clonal nitrogen, and stored at -80 °C until used in the isolation of DNA clusters. This process of cluster generation was per- RNA and preparation of NGS libraries. formed on cBOT using TruSeq PE Cluster kit v3-cBot-HS. The kit reagents were used in binding of samples to com- RNA isolation and library construction plementary adapter oligos on paired-end flow cells. The for transcriptome analysis adapters were designed to allow selective cleavage of the forward DNA strand after resynthesis of the reverse strand Total RNA of roots for analyses of root transcriptome during sequencing. The copied reverse strand was then (AHSR) and shoots for shoot transcriptome (AHSS) was used to sequence from the opposite end of the fragment. isolated by using RaFlexTM total RNA isolation kit as per TruSeq SBS v3-HS kit was used to sequence DNA of each manufacturer’s instruction. The quality of RNA was ana- cluster on a flow cell using sequencing by synthesis tech- lyzed on 1 % denaturing agarose gel. Quantification was nology on the HiSeq 2000. done using NanoDrop 8000 spectrophotometer. The pair- Assembled sequenced reads were filtered with the end cDNA sequencing libraries were prepared for both quality assessment software, Trimmomatic v0.32 at quality samples, separately using Illumina TruSeq RNA Library threshold of 25. After adaptor trimming and quality filtra- Preparation Kit as per protocol. Library preparation was tion, the high-quality reads for both samples were

123 Planta

Fig. 2 Work flow of de novo whole transcriptome analysis for root and shoot transcriptomes of A. heterophyllum

assembled with Velvet pipeline (using de Bruijn graph Functional annotation and analysis algorithm) for different k-mer length (k-mer 43, 45, 47, 49, 51, 53, 55). Best k-mer assembly was selected for each Functional annotation of transcripts was performed by sample based on N50 and transcriptome length covered and aligning contigs to non-redundant database of NCBI using its respective transcripts were used for downstream BLASTX program. The transcripts were also scanned analysis. against the enzyme sequences (with significant E value The assembled transcripts were used for GENSCAN \1e-5) to identify their role in MVA and MEP pathways. gene prediction based on Arabidopsis model matrix for the To further annotate assembled transcripts, the BLAST2GO identification of coding sequence (CDS), exons and pep- program (Conesa et al. 2005) was used to retrieve GO tides. The raw reads and related material including anno- annotation which differentiates according to molecular tation, CDS, exons, gene ontology (GO), peptide datasets, function, biological process and cellular component on- etc., for AHSR and AHSS are available at the URL: http:// tologies (http://www.geneontology.org/). In-house devel- 14.139.240.55/NGS/download.php. oped scripts were used to align the assembled sequences to

123 Planta the COG database to predict and classify possible func- gave us insight for those transcripts which are known to be tions. The high-quality transcripts were searched against involved in secondary metabolites production. The prob- the COG database using BLAST program (with significant able set of TFs for both the datasets was compared for E value \1e-5) and the top hit observed for each given unique and common TFs for further assessment. Domain transcripts was selected based on best E value which was architecture present in protein can be used to predict its further mapped to their respective COG IDs. functional class (Jaiswal et al. 2013); hence, for identifi- cation of ABC transporters, the protein sequences from Fragment mapping and transcript abundance both the samples were scanned for their domain architec- measurement ture using Pfam domain database.

Transcript quantification for the de novo assembled se- Expression analysis of ABC transporters through quences was carried out using RSEM approach which quantitative PCR assesses the transcript abundances based on the mapping of RNA-Seq reads to the assembled transcriptome. It uses cDNA was prepared from 5 lg of RNA (RNA was treated directed graphical model where paired end reads, length of with 2 U of DNase I), reverse transcribed by using fragment, probability of that read’s sequence are modeled M-MuLV reverse transcriptase (GeNeiTM) with oligo-dT as well as lengths of read can vary (Li and Dewey 2011). primer. Equal sample quantities were verified by measuring RSEM calculates maximum likelihood abundance esti- the amount of RNA with a spectrophotometer. The cDNA mates as well as posterior mean estimates and 95 % was then separated by electrophoresis, stained with ethi- credibility intervals for genes/isoforms. Additionally, dium bromide to further verify equal concentrations. The FPKM (fragments per kilobase per million) level mea- reaction was performed in triplicate on a CFX96 system surement was used which is a sensitive approach to detect (Bio-Rad Laboratories, Hercules, CA, USA) with the expression level and measures expression of even poorly iScript one step RT PCR kit (Bio-Rad). The PCR protocol expressed transcripts using fragment count. Optimized was as follows: denaturation for 5 min at 94 °C, followed rsem–prepare–reference and rsem–calculate–expression by 40 cycles each of denaturation for 20 s at 94 °C, an- commands were used to prepare the reference sequences nealing for 30 s at 51–59 °C, followed by one elongation and to calculate the expression values using raw data, re- step for 20 s at 72 °C (Suppl. Table S1). 26S rRNA was spectively. Expression data for all the transcripts from both used as internal control for calculating transcript abun- the samples were collected. dance. The significant differences between treatments were statistically evaluated by standard deviation. Pfam domains search from transcripts Identification of SSR markers in root and shoot All the assembled protein sequences were scanned against transcriptomes the functional domain database Pfam using HMMER- 3.1b1. Perl program, pfam_scan.pl and pfam library of Microsatellites for all the transcripts were detected in both HMMs for protein families were downloaded from Pfam the tissues with Simple Sequence Repeat Identification website (http://pfam.janelia.org/) for identification of do- Tool program (SSRIT, http://www.gramene.org/db/mar mains in protein sequences. kers/ssrtool) (Li et al. 2012b). In this study, the parameters were adjusted to identify motifs with two to nine nu- Identification of transcription factors and ABC cleotides in size and having a minimum of 3 contiguous transporters involved in secondary metabolites repeating units. production Identification of common and unique genes in A. The transcription factors were predicted according to pro- heterophyllum root versus shoot transcriptomes tein sequences obtained from CDS prediction. For the identification of transcription factors in AHSR and AHSS, For comparative analysis, the root and shoot transcriptomes these transcriptomes were subjected to BLAST hit with of A. heterophyllum were aligned in order to analyze the PlantTFDB with a significant cut-off E value\1e-5. On the common and unique genes between them. BLASTN was basis of earlier reports, master list (MADS, C3H, PHD, used for finding similarity among transcripts of AHSR and AP2/ERF, WRKY, bHLH, MYB-related and bZIP) was AHSS with a cut-off E value threshold 10-5. In order to prepared for transcription factors (TFs) involved in sec- identify common and unique genes in root, the genes from ondary metabolites production (Bhattacharyya et al. 2013). shoot were taken as database whereas the genes from root The top BLAST hits were mined using master list which were taken as query sequence; on the other hand, to 123 Planta identify common and unique genes in shoot the genes from enzyme commission (EC) number with default BLAST bit root were taken as database, whereas the genes from shoot score threshold 60. These assigned unique EC number were were taken as query sequence. In-house perl script was further mapped onto their respective KEGG pathways. used to perform BLASTN to extract results with mentioned cutoff; further results were also checked manually. Functional classification and network connectivity diagrams Mining of starch biosynthesis genes of A. heterophyllum in M. esculenta transcriptome In-house developed scripts were used to optically map the assembled transcripts to the known molecules that interact To identify starch biosynthesis genes in A. heterophyllum in a biological system using NCBI BioSystems database root transcriptome, transcripts were compared with the which incorporates recent updated records from several predicted cassava proteins for sequence similarity. Since source databases like BioCyc, KEGG, Reactome, Pathway starch biosynthesis pathway includes three processes, viz. Interaction Database, Wikipathways and Gene Ontology. sucrose synthesis, storage starch biosynthesis and Calvin All the gene IDs mentioned in the annotation file were cycle, their role was checked in roots of A. heterophyllum. mined for all the transcripts sequences. Retrieved gene IDs Standalone TBLASTN was used for homology search were mapped onto their respective biological processes in (with cut-off E value threshold 10-5) against identified NCBI BioSystems database. The transcripts with sig- starch-related proteins in cassava. Further extracted results nificant matches were assigned to 5 main categories. After with the mentioned cutoff were also checked manually. mapping and classifying gene sequences onto their re- spective BioSystem, corresponding pathway ID was as- Identification of genes involved in tuberous root signed to them in accordance with NCBI BioSystem linked formation and development file. For the entire interactions detected using BioSystems, network connectivity diagrams were drawn for starch-su- In order to identify genes involved in growth and devel- crose metabolism and isoquinoline alkaloids biosynthesis opment of tuberous roots, literature was mined and 10 using perl-tk. Perl package manager GD arrow was genes (GMPase, SHAGGY, NOP10, expansin, early downloaded from perl ActiveState and installed to draw nodulin, calmodulin, RBX1, MAP kinase, SRF, b-amylase) arrows with aligned arrowheads. were selected. All known sequences till date for these were mined from non-redundant (nr) NCBI protein database. Website development and platform used Further, for similarity search, the assembled transcript se- quences were scanned against nr sequences through The Website for free accessibility of detailed results cor- BLASTX with the e-value threshold of 1e-5. In-house perl responding to the analysis done was developed using script was used to extract results into tabular comma HTML and PHP in combination with custom Java pro- separated file. grams and Perl scripts. A computing cluster [each having 4 AMD Opteron (TM) Processor 6276 cores, 24 GB of Pathway mapping using KEGG memory] was used for allowing the simultaneous analysis and parallel processing of up to four different datasets. The KEGG automatic annotation server (KAAS) was used for operating systems are CentOS based 64-bit systems and representing the transcripts by BLAST comparing them programming languages are Perl and Perl-CGI. with manually curated ortholog groups (KEGG genes) (Moriya et al. 2007). KEGG gene database has an edge over other databases, as it acts as a single resource for Results cross-species representation by assigning KEGG orthology (KO) to all available genomes. Based on BLAST scores cDNA synthesis and quantification and bi-directional hit between query and KEGG gene database, homologs above a particular cutoff were chosen In order to obtain a global overview of the transcriptome as orthologs. These ortholog candidates were grouped by and gene activity, total RNA was purified from roots KO, and each KO group was assigned score. Finally, KO (AHSR) and shoots (AHSS) of A. heterophyllum. The was ranked in accordance with the highest calculated quality of the RNA was determined by agarose gel elec- scores. Based on this KAAS methodology, transcripts were trophoresis with OD260/OD280 ratio (2.4 and 2.17) and assigned with KO identifiers which were further assigned OD260/OD230 ratio (2.05 and 2.16) for AHSR and AHSS.

123 Planta

Illumina paired-end sequencing and de novo pooled transcripts were 39,100 sequences, out of which sequence assembly 36,326 sequences showed significant BLAST hit with nr database while no hits were found for 2,774 transcripts. Using Illumina paired-end sequencing technology, each While searching against UniprotKB/Swissprot database, of sequenced sample yielded 2 9 100-bp independent reads all the transcripts, 36,217 showed significant BLAST hit from each end of a cDNA fragment. In this study, a total of while no hits were found for 39,331 in AHSR, whereas in 49,131,411 (producing 23.8 GB paired-end data) and AHSS, 21,927 sequences showed significant hit and 17,173 30,641,740 (producing 14.8 GB paired-end data) raw se- sequences had no significant hit (at E value cutoff 1e-5). quencing reads were generated for AHSR and AHSS, re- Maximum percentage of transcripts showed significant spectively. After stringent quality checking and data similarity mainly with Vitis vinifera species for both the filtering 46,612,687 (producing 22.1 GB paired-end data) samples (Figs. 3, 4). We applied GenScan based on Ara- and 28,777,415 (producing 13.6 GB paired-end data) high- bidopsis model matrix parameter and predicted 34,424 quality reads were obtained for AHSR and AHSS, re- CDS, 41,700 exons and 34,424 peptides for AHSR and spectively. The high-quality reads for both the samples 23,149 CDS, 27,906 exons and 23,149 peptides for AHSS were assembled using Velvet assembler with optimized (Table 2). parameters. Velvet was run at different k-mer sizes ranging between 31 and 63 mers, in order to select the best k-mer Functional classification by GO and COG based on N50 and transcriptome length covered. K-mer size 51 emerged best for AHSR and 43 for AHSS, re- Gene ontology (GO) is a standardized gene functional spectively. Based on the assembler statistics, AHSR gen- classification system whose terms are derived from dy- erated 75,548 transcripts with the length ranging from 200 namic controlled vocabularies or ontologies that can be to 12,376 bp, N50 of 1059 bp, GC content 42 % and av- used to describe the function of genes and their products in erage transcript length being 696 bp. Similarly, AHSS any organism. GO sequence distributions help in specify- generated 39,100 transcripts with the length ranging from ing all the annotated nodes comprising GO functional 200 to 19,757 bp, N50 of 1059 bp, GC content 42 % and groups. Transcripts associated with similar functions are average transcript length being 884 bp (Table 1). assigned to the same GO functional group. GO database provides three ontologies: molecular function, cellular Functional annotation of transcripts component and biological process, which form the back- bone of the GO annotation. The annotated transcripts for For validation and annotation of the assembled transcripts, both the samples were then mapped in GO database. On the sequence similarity search was conducted. The transcripts basis of non-redundant annotation, the BLAST2GO pro- were searched against NCBI non-redundant (nr) and Uni- gram (Conesa et al. 2005) was used to obtain GO annota- protKB/Swissprot protein database using BLASTX pro- tion for the transcripts annotated by the non-redundant gram with an E value threshold of 1e-5. From the total database. Transcripts with BLAST matches to known pooled assembled high-quality transcript sequences proteins were assigned to GO classes with 27,596 and (75,548), 46,850 sequences had significant BLAST hits 12,340 functional terms for AHSR and AHSS, respectively. with nr protein database while no hits were found for As shown in Figs. 5 and 6, the assignments to the mole- 28,698 sequences in AHSR, whereas for AHSS, total cular function made up the majority (11,996, 43.47 % for AHSR; 5665, 45.90 % for AHSS), followed by the biolo- gical process (9688, 35.10 % for AHSR; 4228, 34.26 % for AHSS) and cellular component (5912, 21.42 % for AHSR; Table 1 Transcriptome assembly statistics for roots and shoots samples of A. heterophyllum 2447, 19.82 % for AHSS). The Clusters of Orthologous Groups (COG) is a data- Description Root transcriptome Shoot transcriptome base to phylogenetically classify the complete comple- Best k-mer k-mer-51 k-mer-43 ment of proteins encoded in a genome. Every protein in Number of transcripts 75,548 39,100 the COG database is assumed to be evolved from an Total transcript length 52,568,225 34,586,788 ancestor protein. All transcripts were aligned to the COG Transcript N50 1059 1,239 database to predict and classify plausible functions. A Max transcript size (bp) 12,376 19,757 total of 16,604 and 9398 transcripts of AHSR and AHSS, Min transcript size (bp) 200 200 respectively, were assigned to COG classifications. GC content (%) 42 42 Among the 24 COG IDs, general function prediction was the largest group in both tissue samples (2870, 17.28 % in

123 Planta

Fig. 3 Transcripts similarity distribution for A. heterophyllum root transcriptome against different plant species

Fig. 4 Transcripts similarity distribution for A. heterophyllum shoot transcriptome against different plant species

AHSR and 1786, 19.00 % in AHSS) (Figs. 7, 8). More Identification of genes involved in MVA and MEP importantly, 563 (3.39 %) and 385 (4.09 %) transcripts pathways for the production of aconites in were classified in the group of secondary metabolites A. heterophyllum using transcriptome mining biosynthesis in A. heterophyllum from AHSR and AHSS, respectively, which might unravel the mechanism behind All the key genes involved in MVA and MEP pathways transport and catabolism. that are involved in the biosynthesis of aconites in A.

123 Planta

Table 2 Prediction summary of CDS, exons and peptides in root and information on expected transcript abundance (expression shoot transcriptomes of A. heterophyllum level) in two tissues. Description Root transcriptome Shoot transcriptome Identification of transcription factors regulating Number of CDS 34,424 23,149 secondary metabolism Number of exons 41,700 27,906 Number of peptides 34,424 23,149 Transcription factors (TFs) play important role in plant development and stress response by acting temporarily and spatially on target genes (Jin et al. 2013). For identification heterophyllum (Fig. 9) were identified and computationally of TFs in A. heterophyllum, transcripts from both the mined using in-house developed perl script. We identified samples having significant cut-off value 1e-5 were sub- 15 enzymes from in-house generated transcriptomes, jected to BLAST against PlantTFDB. Literature mining namely ACTH, HMGS, HMGR, MVK, PMK, MVDD, was used to prepare master list (MADS, C3H, PHD, AP2/ IPP, GDPS, DXPS, DXPR, ISPD, ISPE, MECPS, HDS, ERF, WRKY, bHLH, MYB-related and bZIP) for the TFs ISPH (enzymes listed in Fig. 9) for MVA and MEP path- involved in secondary metabolites production. Transcrip- ways in AHSR and AHSS from nr database using tome sequences from AHSR and AHSS were compared for BLASTX (Tables 3, 4). The analysis of gene expression unique TFs for their involvement in secondary metabolites for two tissues was estimated using was estimated using production. For AHSR and AHSS, total transcripts having FPKM approach, where fragment-counts of a particular match in the database were 9169 and 6318, respectively. transcript represent its expression level for a particular Top hits of TFs involved in secondary metabolites (match gene (Suppl. Fig. S1). Tables 3 and 4 contain detailed with 8 classes) were 3691 and unique TFs from these were

Fig. 5 GO distribution for A. heterophyllum root transcriptome

123 Planta

Fig. 6 GO distribution for A. heterophyllum shoot transcriptome

Fig. 7 COG classification of A. heterophyllum root transcriptome

123 Planta

Fig. 8 COG classification of A. heterophyllum shoot transcriptome

Fig. 9 Common MVA/MEP pathways for diterpene alkaloids biosyn- 1-deoxy-D-xylulose 5-phosphatesynthase, DXPR 1-deoxy-D-xylulose thesis (adapted from Malhotra et al. 2014). MVA pathway: ACTH 5-phosphatereductoisomerase, ISPD 2-C-methylerythritol 4-phosphate- acetoacetyl-CoA thiolase, HMGS 3-hydroxy-3-methylglutaryl-CoA cytidyl transferase, ISPE 4-(cytidine-5-diphospho)-2-C-methylerythri- synthase, HMGR 3-hydroxy-3-methylglutaryl-CoA reductase, MVK tol kinase, MECPS 2-C-methylerythritol-2,4-cyclophosphate synthase, mevalonate kinase, PMK phosphomevalonate kinase, MVDD meval- HDS 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase, ISPH onate diphosphate decarboxylase, IPPI isopentenylpyrophosphate iso- 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphatereductase merase, GDPS geranyl diphosphate synthase. MEP pathway: DXPS

123 Planta

Table 3 Transcript abundance of MVA and MEP pathways genes identified in A. heterophyllum shoot transcriptome Gene Gene ID(s) Accession number Transcript ID(s) Length Effective length Expected count TPM FPKM

DXPS 2.2.1.7 AAB88295.1 Transcript_1017 2450 2297.53 5154.87 143.67 176.67 DXPR 1.1.1.267 CAB43344.1 Transcript_7612 1887 1734.53 2247.25 82.96 102.02 ISPD 2.7.7.60 2YCM Transcript_27344 1394 1241.53 1528.46 78.83 96.94 ISPE 2.7.1.148 NP_180261.1 Transcript_11462 1549 1396.53 687 31.5 38.73 MECPS 4.6.1.12 2PMP Transcript_3605 959 806.53 4130.23 327.92 403.23 HDS 1.17.7.1 BAB09833.1 Transcript_7021 2445 2292.53 2624 73.29 90.12 ISPH 1.17.1.2 BAD94833.1 Transcript_4360 1693 1540.53 5796.86 240.96 296.29 ACTH 2.3.1.9 NP_568694.2 Transcript_11170 1722 1569.53 443.01 18.07 22.22 HMGS 2.3.3.10 NP_849361.1 Transcript_5417 1652 1499.53 1614.7 68.95 84.79 HMGR 1.1.1.34 AAA33358.1 Transcript_2381 2037 1884.53 1155.12 39.25 48.26 MVK 2.7.1.36 AAD45421.1 Transcript_14392 1418 1265.53 30.83 1.56 1.92 PMK 2.7.4.2 AAH28659.1 Transcript_4131 3303 3150.53 1061.79 21.58 26.54 MVDD 4.1.1.33 NP_001068892.1 Transcript_33574 972 819.53 127.79 9.99 12.28 IPP 5.3.3.2 AAL57687.1 Transcript_1345 1273 1120.53 2202.97 125.89 154.8 GDPS 2.5.1.84 ABA97399.2 Transcript_5178 2059 1906.53 1606 53.94 66.33

Table 4 Transcript abundance of MVA and MEP pathways genes identified in A. heterophyllum root transcriptome Gene Gene ID(s) Accession number Transcript ID(s) Length Effective length Expected count TPM FPKM

DXPS 2.2.1.7 AAB88295.1 Transcript_5409 2872 2721.47 4830.96 46.18 57.68 DXPR 1.1.1.267 CAB43344.1 Transcript_1136 2116 1965.47 2936.23 38.87 48.54 ISPD 2.7.7.60 2YCM Transcript_10174 1508 1357.47 990.23 18.98 23.7 ISPE 2.7.1.148 NP_180261.1 Transcript_64620 1521 1370.47 807 15.32 19.13 MECPS 4.6.1.12 2PMP Transcript_1824 1259 1108.47 3927.36 92.18 115.13 HDS 1.17.7.1 BAB09833.1 Transcript_8398 2807 2656.47 8631 84.53 105.58 ISPH 1.17.1.2 BAD94833.1 Transcript_30192 2172 2021.47 1250.1 16.09 20.09 ACTH 2.3.1.9 NP_568694.2 Transcript_51038 1784 1633.47 1077.07 17.15 21.43 HMGS 2.3.3.10 CAA58763.1 Transcript_5690 1574 1423.47 711.88 13.01 16.25 HMGR 1.1.1.34 P29058.1 Transcript_3795 2434 2283.47 10052.89 114.54 143.06 MVK 2.7.1.36 AAD45421.1 Transcript_19736 1519 1368.47 594.59 11.3 14.12 PMK 2.7.4.2 NP_593421.3 Transcript_6636 2508 2357.47 312.56 3.45 4.31 MVDD 4.1.1.33 NP_001068892.1 Transcript_63336 1720 1569.47 691 11.45 14.31 IPP 5.3.3.2 AAL57687.1 Transcript_10661 1328 1177.47 4063.74 89.79 112.15 GDPS 2.5.1.84 ABA97399.2 Transcript_52592 872 721.53 188 6.78 8.47

236 in AHSR, whereas in AHSS, top hits were 2520 and (including ABC_membrane_2 domains, PDR_assoc and unique TFs were 105. TFs common in both AHSR and Cytochrom_C_asm family) were identified in AHSR and AHSS were found to be 448 (Suppl. Table S2a–c). AHSS, respectively, using Pfam domain database (Suppl. Table S3 and Suppl. Table S4). For 13 protein sequences, Identification of ABC transporters and their there was no clan predicted and the remaining sequences validation by quantitative PCR were grouped into CL0023, CL0241 and CL0328 clans. The comparative analysis of root versus shoot tran- ATP-binding cassette (ABC) transporters constitute largest scriptomes for ABC transporters resulted in identification class of protein families implicated in the transport of of 9 transcripts specific to tuberous root development in A. various metabolites (Yazaki 2005). A total of 234 and 165 heterophyllum root transcriptome. Five out of 9 transcripts protein sequences having ABC-type transporters domains were experimentally validated through comparative

123 Planta

14 markers, the transcriptome sequences generated for both the AHSR samples were mined for potential SSRs with di- to deca- 12 AHSS nucleotide motifs having three minimum repeats. By uti- 10 lizing Simple Sequence Repeat Identification Tool (SSRIT), a total of 177,438 potential simple sequence repeats (SSRs) 8 were identified in 56,692 transcripts for AHSR and a total of 6 118,814 potential SSRs were identified in 32,719 transcripts for AHSS. Of the 56,692 transcripts, 18,194 and 38,498 4 transcripts contained one and more than one SSR, respec-

Relative Expression 2 tively, for AHSR. Furthermore, of the 32,719 transcripts, 7584 and 25,135 transcripts contained one and more than 0 one SSRs, respectively, for AHSS (Table 5). ABC 1 ABC 2 ABC 3 ABC 4 ABC 5 The di-nucleotide repeats were most abundant (127,937, -2 ABC Transporters 72.10 % in AHSR; 86,859, 73.10 % in AHSS), followed Fig. 10 Expression status of ABC transporters in root versus shoot by tri- (44,523, 25.09 % in AHSR; 28,872, 24.30 % in transcriptomes. Error bars correspond to statistical evaluation by AHSS), tetra- (2525, 1.42 % in AHSR; 1539, 1.29 % in standard deviation where the reported margin of error is typically AHSS), hexa- (1272, 0.71 % in AHSR; 914, 0.76 % in about twice the standard deviation—the half-width of a 95 % AHSS) and penta-nucleotide (801, 0.45 % in AHSR; 522, confidence interval 0.43 % in AHSS) repeats (Table 5). Di- to deca-nucleotide motifs were further summarized for the number of repeat quantitative PCR analysis. The expression profile for 5 out units. The repeat unit of potential SSRs mostly represented of 9 ABC transporters tested for the same gave positive was 3 (88.26 %, 156,609), followed by 4 (9.30 %, 16,513) correlation with the bioinformatics analysis. All 5 tran- and 5 (1.52 %, 2709) in AHSR (Table 6). A total of 55 scripts showed higher transcript abundance in AHSR potential SSRs contained more than 13 repeat units. On compared to AHSS giving a first insight into the molecular further analysis, similar results were obtained for AHSS. mechanisms occurring at the tissue level for the formation The repeat unit of potential SSRs mostly represented was 3 of tuberous roots in A. heterophyllum (Fig. 10). (88.17 %, 104,768), followed by 4 (9.25 %, 11,000) and 5 (1.58 %, 1,878). A total of 33 potential SSRs contained Identification of simple sequence repeats (SSRs) more than 13 repeat units (Table 7). from transcriptome data Comparative analysis of A. heterophyllum root Markers development has been revolutionized with the versus shoot transcriptomes advent of next-generation sequencing and SSRs are im- portant polymorphic markers in ecology and evolution One of major objective of transcriptome sequencing was to (Rico et al. 2013). In order to identify new microsatellites compare the transcriptomes for identifying differentially

Table 5 SSRs identified in the Exploratory item Root transcriptome Shoot transcriptome transcriptomes of A. heterophyllum Total number of sequences examined 75,548 39,100 Total size of examined sequences (bp) 52,568,225 34,586,788 Total number of identified SSRs 177,438 118,814 Number of SSRs containing sequences 56,692 32,719 Number of sequences containing [1 SSRs 38,498 25,135 Di-nucleotide 127,937 86,859 Tri-nucleotide 44,523 28,872 Tetra-nucleotide 2525 1539 Penta-nucleotide 801 522 Hexa-nucleotide 1272 914 Hepta-nucleotide 168 83 Octa-nucleotide 47 14 Nona-nucleotide 163 11 Deca-nucleotide 2 0

123 Planta

Table 6 Repeat unit No. of repeat unit Di Tri Tetra Penta Hexa Hepta Octa Nona Deca Total distribution of SSRs in root transcriptome 3 114,423 37,972 2,227 700 997 119 40 129 2 156,609 4 11,272 4,702 231 78 184 18 4 24 0 16,513 5 1373 1,190 53 10 56 17 1 9 0 2,709 6 379 372 6 3 13 4 1 1 0 779 7 209 153 4 6 3 8 1 0 0 384 8 114 71 0 2 15 2 0 0 0 204 96517114000088 10 32 21 0 0 0 0 0 0 0 53 11 21 8 0 1 0 0 0 0 0 30 12 11 2 1 0 0 0 0 0 0 14 C13 38 15 2 0 0 0 0 0 0 55

Table 7 Repeat unit No. of repeat unit Di Tri Tetra Penta Hexa Hepta Octa Nona Deca Total distribution of SSRs based on the number of repeat units in 3 77,664 24,451 1,391 456 718 65 12 11 0 104,768 shoot transcriptome 4 7,662 3,018 128 52 129 9 2 0 0 11,000 5 939 860 12 9 53 5 0 0 0 1,878 6 249 294 5 3 2 4 0 0 0 557 7 134 142 1 2 7 0 0 0 0 286 8 63560050000 124 9 65140000000 79 10 30 11 0 0 0 0 0 0 0 41 11 17 12 2 0 0 0 0 0 0 31 12 16 1 0 0 0 0 0 0 0 17 C13 20 13 0 0 0 0 0 0 0 33 expressed transcripts in roots versus shoots. Of the total pathway showed sequence similarity with E values of 0 for 75,548 transcripts in AHSR, significant similarity was AHSR—the transcripts from tuberous roots of A. hetero- found for 47,572 transcripts and 27,976 transcripts were phyllum. Result file includes EC number (phytozome), unique in AHSR, whereas for AHSS, 36,212 out of 39,100 function (phytozome), score, E value, identities, positives transcripts showed significant similarity with AHSS. Ad- and gaps parameters for genes producing significant hits. ditionally, 2888 transcripts were uniquely present in AHSS (absent in AHSR) (Suppl. Table S5a, b). Possible key possible genes contributing to tuberous root formation and development Identification of starch sucrose biosynthetic pathways genes in A. heterophyllum vis-a`-vis To identify possible candidate genes contributing to tuberous M. esculenta root formation and development in A. heterophyllum,10ge- nes/proteins (GMPase, SHAGGY, NOP10, expansin, early Aconitum heterophyllum roots are made up of starch ana- nodulin, calmodulin, RBX1, MAP kinase, SRF, ß-amylase) logues to roots of M. esculenta. Therefore, all 270 proteins reported for tuberous root development in different plant which are involved in starch biosynthesis in cassava (Sai- species were mined against AHSR transcriptome. Out of 10 thong et al. 2013) were used to mine against AHSR tran- genes, 9 showed significant similarity, except for calmodulin. scriptome. Out of 129 cassava proteins having a function in Here again, FPKM approach was used to calculate the tran- Calvin cycle, 127 showed high sequence similarity with script abundance for each gene (Table 8). AHSR, whereas all 141 cassava proteins involved in su- crose and starch metabolism were found to have significant Pathway mapping of transcripts using KEGG sequence similarity with AHSR (at E value cutoff 1e-5) (Suppl. Table S6a, b). Furthermore, 85 transcripts involved KEGG automatic annotation server (KAAS) was used for in Calvin cycle and 104 transcripts in sucrose and starch ortholog assignment and mapping of transcripts onto their

123 Planta

Table 8 Transcript abundance of tuberous root development genes in A. heterophyllum root transcriptome Gene Gene ID(s) Accession number Transcript ID Length Effective length Expected count TPM FPKM

GMPase 2.7.7.13 CAC35355.1 Transcript_5758 1435 1284.47 2790.79 56.53 70.6 SHAGGY 2.7.11.26 NP_001125343.1 Transcript_23205 2259 2108.47 7848.56 96.84 120.96 NOP10 2.20.28.40/5.4.99.25 YP_379759.1 Transcript_25378 970 819.48 136.18 4.32 5.4 Expansin 4.2.2.10 XP_001820360.2 Transcript_16536 1583 1432.47 398 7.23 9.03 Early nodulin 3.2.1.147 NP_197972.2 Transcript_5623 1675 1524.47 424.32 7.24 9.04 Calmodulin 2.1.1.60 – No Hits Found – – – – – RBX1 6.3.2.19 NP_565192.1 Transcript_3637 2434 2283.47 13069.95 148.91 185.99 MAPkinase 2.7.11.24/2.7.12.2 XP_758504.1 Transcript_37759 1326 1175.47 143 3.16 3.95 SRF 1.14.11.19 O04274.1 Transcript_19751 1385 1234.47 440 9.27 11.58 b-amylase 3.2.1.2 O65015.1 Transcript_62668 1340 1189.47 1621.76 35.47 44.3

biological pathways. Homologs were identified between centralizes and cross-links existing biological systems the query sequence and the reference sequence set (KEGG databases like KEGG, BioCyc, Reactome and others. It gene database) using BLAST bit score with cut-off value of integrates their pathways and systems into NCBI resources 60 (default). Bi-directional best hit method was used for which allow quick categorization of proteins, genes and the analysis and genes whose BHR was greater than 0.95 small molecules by metabolic pathway, disease state or (default) were selected. Score was assigned to each other biosystem type (Geer et al. 2009). To further analyze orthologues group to allocate best k number to query genes. AHSR and AHSS transcriptomes, all transcripts were Out of total transcripts, KAAS assigned EC number to tested in NCBI BioSystems database. The transcripts with 3487 sequences from AHSR and 3177 sequences from significant matches in the database were assigned to 5 main AHSS (Suppl. Table S7a–d). These transcripts were further categories out of which genetic information processing was mapped to 337 unique pathways in root transcriptome and the biggest category (1821, 48.07 %), followed by meta- 333 unique pathways in shoot transcriptome. These tran- bolism (1638, 43.24 %), cellular processes (170, 4.48 %), scripts represented metabolic pathways for major bio organismal systems (113, 2.98 %) and environmental in- molecules such as amino acids, carbohydrates, vitamins, formation processing (46, 1.21 %) in AHSR. On the other nucleotides, etc. Because of their high valued medicinal hand, metabolism was the biggest category (415, 48.25 %), importance attributed due to the presence of secondary followed by genetic information processing (359, metabolites, 341 number of mapped KOs were found to be 41.74 %), cellular processes (38, 4.41 %), organismal involved in the biosynthesis of secondary metabolites for systems (32, 3.72 %) and environmental information pro- AHSR and 329 for AHSS, respectively (Suppl. Table S8 cessing (16, 1.86 %) in AHSS (Suppl. Fig. S2). These re- and Suppl. Table S9). As this plant is also a rich source of sults indicated that the active genetic information abundant starch, pathways associated with starch were also processing and metabolic processes were underway in the mined. It was found that carbon fixation in photosynthetic roots and shoots of A. heterophyllum. organisms was represented by 22 and 25 number of map- For AHSR, 1821 transcripts were divided into genetic ped KOs for AHSR and AHSS, respectively, whereas 37 information processing, including transcription, translation, (AHSR) and 31 (AHSS) transcripts were involved in starch replication and repair, etc. In addition, 1638 transcripts and sucrose metabolism. BRITE categorized all the tran- were classified into metabolism containing carbohydrate scripts into hierarchical order starting from metabolism, metabolism, energy metabolism, lipid metabolism, etc. A genetic information processing (transcription factors, total of 170, 113 and 46 transcripts were classified into translation factors, DNA replication), signaling and cellular cellular processes, organismal systems and environmental processes (secretion system proteins, ion channels, GTP- information processing, respectively. Similarly, for AHSS, binding proteins) (Suppl. Table S10a, b). 415, 359, 38, 32 and 16 transcripts were assigned for metabolism, genetic information processing, cellular pro- Functional classification using NCBI BioSystems cesses, organismal systems and environmental information processing categories, respectively (Suppl. Table S11a, b). A Biosystem is a group of molecules that interact in a The functional classification of NCBI BioSystems provided biological system. The NCBI BioSystems database a valuable resource for investigating specific processes,

123 Planta functions and pathways involved in root versus shoot regulation and SSR markers development, which will transcriptomes. certainly accelerate the research progress in molecular bi- ology of this plant species. The substantial amount of Construction of network connectivity diagrams transcripts obtained is expected to assist in understanding growth and development mechanism, along with providing Transcriptome-based network connectivity diagrams can a strong basis for future genomic research. To the best of contribute in deciphering the interactions between ex- our knowledge, this is the first attempt to assemble and pressed transcribed genes. As this plant is a rich source of characterize the transcriptome of A. heterophyllum using abundant starch so carbon fixation in photosynthetic or- Illumina paired-end sequencing method. The biosynthetic ganisms and starch and sucrose metabolism can be ex- pathways genes, TFs, ABC transporters and SSRs were plored to increase the level of starch content. To gain an predicted and their characterizations were analyzed based insight into such developmental mechanisms, network on the transcriptome assembly. connectivity diagrams were constructed for isoquinoline alkaloid biosynthesis and starch–sucrose metabolism as- Illumina paired end sequencing and assembly sociated with the biosynthesis of secondary metabolites and development of tuberous roots, respectively, in root Transcriptome represents the expressed portion of the transcriptome of A. heterophyllum using in-house devel- genome and often described as a prototype of transcribed oped scripts (Fig. 11 and Suppl. Fig. S3). genes. With the development of sequencing technology, many such as Arabidopsis, rice and maize have had their complete genomes sequenced (Initiative TAG 2000; Discussion Goff et al. 2002; Schnable et al. 2009). However, for some non-model plants, it is not feasible to execute whole gen- In this work, we reported the transcriptomes of A. hetero- ome sequencing because of the cost. The next-generation phyllum roots and shoots by providing valuable resources sequencing (NGS) provides an opportunity to mine the for new genes discovery, pathway mapping, transcriptional genes and deepen our understanding of growth and

Fig. 11 Network connectivity diagram for isoquinoline alkaloids biosynthesis in root transcriptome of A. heterophyllum 123 Planta development in non-model plants. The development of used in-house developed scripts. The FPKM values for all various NGS technologies like Solexa/Illumina platform in 15 genes were predicted in both root and shoot tran- the past decade has made it possible to perform de novo scriptomes. Genes such as HMGR, MVDD, MVK and HDS transcriptome sequencing (Hudson 2008; Wang et al. showed higher transcript abundance in AHSR compared 2009). These technologies have been successfully utilized to AHSS. HMGR is known to regulate MVA pathway of for non-model organisms (Wang et al. 2010a, b; Garg et al. isoprenogenesis for phytosterols biosynthesis (Nogue´s 2011; Xie et al. 2012). This had expanded our under- et al. 2006) and is also involved in shikonin plastidial standing of gene expression, regulation and networks of monoterpenes biosynthesis in Arnebia euchroma (Singh important traits of the corresponding plants. The results et al. 2010) while genes like MVK and HDS are respon- from this research suggested that short reads from Illumina sible for terpenoid-indole alkaloids production in Catha- sequencing can be effectively assembled and used for gene ranthus roseus (Schulte et al. 2000; Ginis et al. 2012). identification and SSR marker development in a non-model These results were found in agreement with involvement plant species. of multiple genes of MVA/MEP pathways in aconites In our study, the transcriptome sequencing was done by biosynthesis in A. heterophyllum (Malhotra et al. 2014). using the Illumina Genome Analyser system platform The remaining genes of these pathways showed higher HiSeq 2000. The HiSeq 2000 can provide sequences with a FPKM in AHSS which may be due to their involvement read-length of 90 bp, which is longer than that provided by in some other biological processes which needs to be other sequencing platform, such as GAII. In addition, the validated. These results are expected to further explore raw reads were stringently filtered before the de novo major genes for important agronomic traits in A. hetero- assembly. Overall, 31,472 genes were discovered in A. phyllum, and further understanding their regulatory heterophyllum. In this study, more than 70 million high- mechanisms, especially for the production of medicinally quality reads were used to assemble A. heterophyllum important secondary metabolites. transcriptome. Sequence orientation of the transcripts dis- covered in this study was ascertained, along with the an- Role of transcription factors and ABC transporters notation of their function by searching against public in plant biological processes databases. Sequence similarities presented new clues for determining the phylogenetic relationship among A. The importance of TFs and ABC transporters in almost all heterophyllum, V. vinifera, Ricinus communis and Populus plant species pertaining to secondary metabolites biosyn- trichocarpa. Significant sequence similarities of 39.96, thesis cannot be underrated since these are known to reg- 12.09 and 11.67 % between A. heterophyllum and V. ulate metabolites biosynthesis and accumulation in a vinifera, R. communis, P. trichocarpa, respectively, were spatial and temporal manner (Yazaki 2005). They regulate observed for AHSR. Similarly, sequence similarities of transportation, organ growth, plant nutrition, plant devel- 28.88 and 9.14 % between A. heterophyllum and V. vini- opment, response to abiotic stress and secondary metabo- fera, Theobroma cacao, respectively, were observed for lites production. Knowledge of transporters provides a AHSS in the present study. These plants are commonly platform to gain insight into the molecular mechanisms placed in monophyletic clade of in the plant behind the transportation and accumulation of a metabolite classification system (Endress 2002). in different compartments of plants. Previous reports on Thalictrum minus and Coptus japonica have demonstrated Functional annotation of transcripts the importance of ABC transporters in alkaloids accumu- lation which may also be true for A. heterophyllum since it Since there is a lack of reference genome in this plant contains diterpene alkaloids (Zhaohong et al. 2006). species, it was difficult to estimate the number of genes Similarly, regulation of genes involved in anthocyanin and the level of transcript coverage. We indirectly (Mol et al. 1998), terpenoid-indole alkaloids (Memelink evaluated the transcriptome coverage breadth by estimat- et al. 2001) and rutin biosynthesis (Gupta et al. 2011)is ing the number of unique genes using BLAST algorithm. controlled by distinct classes of transcription factors in- A large number of transcripts could be matched with cluding bHLH, MYB, bZIP, etc., that control secondary unique known proteins in public databases, which implied metabolism. The regulatory role of TFs identified can be that the Illumina sequencing yielded a substantial fraction further investigated for aconites biosynthesis in A. of unique genes from A. heterophyllum. The functions of heterophyllum. the transcripts were classified by GO and COG and the One of the major findings of the current study was the metabolic pathways were ascertained by using the KEGG identification of ABC transporters involved in tuberous database. For the identification of genes involved in root formation in A. heterophyllum root transcriptome. In MVA/MEP pathways present in A. heterophyllum,we silico analysis in conjunction with quantitative PCR 123 Planta implied the importance of ABC transporters in tuberous Identification of candidate genes involved root development in this plant species. Five ABC trans- in tuberous root formation and development porters implicated in tuberous root formation showed high transcript abundance in AHSR compared to AHSS. Transcriptome represents complete set of transcribed genes from the expressed part of the genome. AHSR and AHSS Identification and characterization of SSR markers transcriptome data were used to identify possible genes involved in tuberous root formation and development. It is well-established that SSR markers are important in From the 10 genes involved in tuberous root formation, all comparative genomics, marker-assisted selection breed- showed significant sequence similarity to AHSR, except ing, assessment of genetic diversity, development of ge- for calmodulin. Five out of 10 genes coding for enzymes/ netic maps, etc. No SSR markers have been developed proteins namely GMPase, SHAGGY, RBX1, SRF and b- until now, which limits the application of SSRs in A. amylase showed higher level of expression in root tran- heterophyllum. The transcriptome sequencing provided scriptome as compared to shoot transcriptome. GMPase numerous SSR markers in A. heterophyllum. In total, and AGPase (ADP-glucose pyrophosphorylase) play pri- 177,438 potential SSRs were identified in 56,692 tran- mary role in the biosynthesis of cell wall carbohydrates scripts of AHSR and 118,814 potential SSRs were (Conklin et al. 1999) and starch biosynthesis (Li et al. identified in 32,719 transcripts of AHSS. These potential 2012a), respectively, thereby required for tuberous root SSRs will provide a wealth of resource for developing development. The latter has been reported to accumulate in SSRs in this medicinally important plant species. Based tuberous roots of I. batatas (Wang et al. 2005) and M. on these identified SSRs, we will assess the polymor- esculenta (Ihemere et al. 2006). Similarly, RBX1 helps in phism in A. heterophyllum and related species. They will ubiquitination activities which are active in tuberous roots assist in dissecting the genetic background of A. hetero- (Dreher and Callis 2007). Proteins like SHAGGY and SRF phyllum vis-a`-vis distinct genotypes in addition to genetic regulate the plant development processes (Li et al. 2001; linkage map construction and association studies. This Tanaka et al. 2005). These results point it out that these information will also help in developing the metabolic genes might be involved in tuberous root development in engineering strategies to increase the production of ati- A. heterophyllum. However, their functional validation sine (aconites). needs to be done through gene function approaches.

Transcriptome data identified starch biosynthesis, Functional classification and network analysis common and unique genes in A. heterophyllum Networks have been used to represent important biological Starch is major product of photosynthesis and is the most processes. Mapping genes (proteins) in a network shows abundant storage carbohydrate after cellulose in many their interconnections involved in the same biological plant species (Umemoto et al. 2002). Starch biosynthesis process and having probable function in that biological pathway includes three main processes Calvin cycle, su- process (Ideker and Krogen 2012). The transcripts with crose synthesis, and storage starch biosynthesis. However, significant hits in the NCBI BioSystems database were the identification of genes involved in starch biosynthesis assigned to 5 main categories, out of which genetic infor- pathway in A. heterophyllum is yet not available. 270 mation processing and metabolism categories were found starch-related proteins identified in Manihot esculenta were in higher ratio in AHSR (1821, 48.07 % genetic informa- predicted in A. heterophyllum. Out of these, only 2 tran- tion processing; 1638, 43.24 % metabolism) compared to scripts of Calvin cycle cassava4.1_017330m and cas- AHSS (359, 41.74 % genetic information processing; 415, sava4.1_023305m involved in ribulose bisphosphate 48.25 % metabolism). The predicted network connectivity carboxylase had no significant hit above cutoff in AHSR. diagrams for starch–sucrose metabolism and isoquinoline Transcriptome data were used to identify and mine com- alkaloid biosynthesis can be helpful to discover key inter- mon and unique transcripts in AHSR and AHSS. Tissue- actions in root transcriptome of A. heterophyllum. Since specific RNA transcriptome generated 32,237 root-specific roots are the site of aconites biosynthesis in A. hetero- and 3961 shoot-specific transcripts which can be explored phyllum (Malhotra et al. 2014) and are majorly associated and employed to disentangle missing links in the biosyn- with its growth and development, the results confirmed thetic pathways of A. heterophyllum. The computationally these findings in AHSR compared to AHSS. We found a identified potential transcripts can be further explored for higher proportion of genes assigned to the corresponding their role in starch biosynthesis and to decipher molecular NCBI BioSystems pathways and it was inferred that the components unravelling the molecular response of A. transcriptome sequence generated in this study will be heterophyllum at tissue level. valuable for further research on A. heterophyllum. 123 Planta

In conclusion, although the knowledge available on atisine Campos FAP, Da Silva MJ (2010) Transcriptome analysis of the (aconites) biosynthesis along with developmental pattern of oil-rich seed of the bioenergy crop Jatropha curcas L. BMC Genom 11:462 tuberous roots—the store house of starch and key to biomass Dreher K, Callis J (2007) Ubiquitin, hormones and biotic stress in enhancement is limited or incomplete, the results obtained plants. Ann Bot 99:787–822 from A. heterophyllum root and shoot transcriptomes will be Endress PK (2002) Morphology and angiosperm systematics in the of immense value in future. The roots versus shoots data molecular era. Bot Rev 68:545–570 Gahlan P, Singh RH, Shankar R, Sharma N, Kumari A, Chawla V, corresponding to genes involved in tuberous root develop- Ahuja PS, Kumar S (2012) De novo sequencing and charac- ment, MVA/MEP pathways, TFs, ABC transporters and terization of Picrorhiza kurrooa transcriptome at two tem- SSRs which are uniquely present or abundant in both root and peratures showed major transcriptome adjustments. BMC shoot transcriptomes can be used to plan a genetic interven- Genom 13:126 Garg R, Patel RK, Tyagi AK, Jain M (2011) De novo assembly of tion strategy. The substantial amount of transcripts obtained chickpea transcriptome using short reads for gene discovery and will certainly accelerate the understanding of the plant marker identification. DNA Res 18:53–63 growth and development mechanism, along with providing Geer LY, Bauer-Marchler A, Geer RC, Han L, He J, He S, Liu C, Shi new insights to increase the biomass yield. Furthermore, the W, Bryant SH (2009) The NCBI biosystems database. Nucleic Acids Res 38:492–496 construction of network connectivity diagrams will be the Ginis O, Courdavault V, Melin C, Lanoue A, Giglioli-Guivarc’h N, next big step to create a pipeline to investigate the regulation St-Pierre B, Courtois M, Oudin A (2012) Molecular cloning and of various biological processes in A. heterophyllum. All the functional characterization of Catharanthus roseus hydrox- generated data sets need to be experimentally validated to ymethylbutenyl 4-diphosphate synthase gene promoter from the methyl erythritol phosphate pathway. Mol Biol Rep ascertain the biological functions assigned through compu- 39:5433–5447 tational annotations. To the best of our knowledge, this is the Goff S, Ricke D, Lan T, Presting G, Wang R, Dunn M, Glazebrook J, first attempt to assemble and characterize the transcriptomes Sessions A, Oeller P, Varma H (2002) A draft sequence of the of A. heterophyllum using NGS technique. rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100 Gupta N, Sharma SK, Rana JC, Chauhan RS (2011) Expression of Author contribution statement TP, NM and RSC con- flavonoid biosynthesis genes vis-a`-vis rutin content variation in ceived and designed research. TP and SKC conducted different growth stages of Fagopyrum species. J Plant Physiol 168:2117–2123 computational analysis. TP, NM and SKC analyzed the Hudson HE (2008) Sequencing breakthroughs for genomic ecology data. NM conducted quantitative expression analysis. TP and evolutionary biology. Mol Ecol Resour 8:3–17 designed and developed the Website. TP and NM wrote the Hussain MS, Fareed S, Saba Ansari M, Rahman A, Ahmad IZ, Saeed manuscript. All authors read and approved the manuscript. M (2012) Current approaches toward production of secondary plant metabolites. J Pharmacy Bioallied Sci 4:10–20 Ideker T, Krogen NJ (2012) Differential network biology. Mol Syst Acknowledgments The authors are thankful to the Department of Biol 8:565 Biotechnology, Ministry of Science and Technology, Government of Ihemere U, Arias-Garzon D, Lawrence S, Sayre R (2006) Genetic India, for providing funds to RSC in the form of a program support on modification of cassava for enhanced starch production. Plant high-value medicinal plants. The authentic plant material provided by Biotechnol J 4:453–465 Dr. Sandeep Sharma, HFRI, Panthaghati, Shimla (HP), is also Indira P, Kurian T (1977) A study on the comparative anatomical acknowledged. changes undergoing tuberization in the roots of cassava and sweet potato. J Root Crops 3:29–32 Conflict of interest The authors declare that they have no conflict Initiative TAG (2000) Analysis of the genome sequence of the of interest. flowering plant Arabidopsis thaliana. Nature 408:796–815 Jaiswal V, Chanumolu SK, Gupta A, Chauhan RS, Rout C (2013) Jenner-predict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions. BMC Bioinformatics 14:211 References Jin J, Zhang H, Kong L, Gao G, Luo J (2013) PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription Bhattacharyya D, Sinha R, Hazra S, Datta R, Chattopadhyay S (2013) factors. Nucleic Acids Res 42:D1182–D1187 De novo transcriptome analysis using 454 pyrosequencing of the Li B, Dewey CN (2011) RSEM: accurate transcript quantification Himalayan Mayapple, Podophyllum hexandrum. BMC Ge- from RNA-Seq data with or without a reference genome. BMC nomics 14:748 Bioinformatics 12:323 Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M Li J, Nam KH, Vafeados D, Chory J (2001) BIN2, a new (2005) Blast2GO: a universal tool for annotation, visualization brassinosteroid insensitive locus in Arabidopsis. Plant Physiol and analysis in functional genomics research. Bioinformatics 127:14–22 21:3674–3676 Li C, Li QG, Dunwell JM, Zhang YM (2012a) Divergent evolutionary Conklin PL, Norris SR, Wheeler GL, Williams EH, Smirnoff N, Last pattern of starch biosynthetic pathway genes in grasses and RL (1999) Genetic evidence for the role of GDP-mannose in dicots. Mol Biol Evol 29:3227–3236 plant ascorbic acid (vitamin C) biosynthesis. Proc Natl Acad Sci Li DJ, Xia ZH, Deng Z, Liu XH, Dong JM, Feng FY (2012b) USA 96:4198–4203 Development and characterization of intron-flanking EST-PCR Costa GGL, Cardoso KC, Del Bem LEV, Lima AC, Cunha MAS, de markers in rubber tree (Hevea brasiliensis Muell. Arg.). Mol Campos-Leite L, Vicentini R, Papes F, Moreira RC, Yunes JA, Biotechnol 51:148–159

123 Planta

Liu T, Zhu S, Tang Q, Chen P, Yu Y, Tang S (2013) De novo Srivastava N, Sharma V, Dobriyal AK, Kamal B, Gupta S, Jadon VS assembly and characterization of transcriptome using Illumina (2011) Influence of pre-sowing treatments on in vitro seed paired-end sequencing and identification of CesA gene in ramie germination of Ativisha (Aconitum heterophyllum Wall.) of (Boehmeria nivea L. Gaud). BMC Genom 14:125 Uttarakhand. Biotechnology 10:215–219 Malhotra N, Kumar V, Sood H, Singh TR, Chauhan RS (2014) Sun P, Guo Y, Qi J, Zhou L, Li X (2010) Isolation and expression Multiple genes of mevalonate and non-mevalonate pathways analysis of tuberous root development related genes in Rehman- contribute to high aconites content in an endangered medicinal nia glutinosa. Mol Biol Rep 37:1069–1079 herb, Aconitum heterophyllum Wall. Phytochemistry 108:26–34 Tanaka M, Takahata Y, Nakatani M (2005) Analysis of genes Memelink J, Verpoorte R, Kijne JW (2001) ORCAnisation of developmentally regulated during storage root formation of jasmonate-responsive gene expression in alkaloid metabolism. sweet potato. J Plant Physiol 162:91–102 Trends Plant Sci 6:212–219 Thatte UM, Rege NN, Phatak SD, Dahanukar SA (1993) The flip side Mitra SK, Ashish S, Udupa V, Jayakumar K (2001) Experimental of ayurveda. J Postgrad Med 39:179–182 evaluation of diarex vet in lactose induced diarrhoea in rats. Umemoto T, Yano M, Satoh H, Shomura A, Nakamura Y (2002) Indian Vet J 78:212–216 Mapping of a gene responsible for the differences in amylopectin Mol J, Grotewold E, Koes R (1998) How genes paint flowers and structure between japonica-type and indica-type rice varieties. seeds. Trends Plant Sci 3:212–217 Theor Appl Genet 104:1–8 Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007) Wang QM, Zhang LM, Wang ZL (2005) Formation and thickening of KAAS: an automatic genome annotation and pathway recon- tuberous roots in relation to the endogenous hormone concen- struction server. Nucleic Acids Res 35:W182–W185 trations in sweetpotato. Scientia Agricultura Sinica Nautiyal BP, Prakash V, Bahuguna R, Maithani UC, Bisht H, 38:2414–2420 Nautiyal MC (2002) Population study for monitoring the status Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary of rarity of three Aconite species in Garhwal Himalaya. Tropical tool for transcriptomics. Nat Rev Genet 10:57–63 Ecol 43:297–303 Wang B, Guo G, Wang C, Lin Y, Wang X, Zhao M, Guo Y, He M, Nogue´s I, Brilli F, Loreto F (2006) Dimethylallyl diphosphate and Zhang Y, Pan L (2010a) Survey of the transcriptome of geranyl diphosphate pools of plant species characterized by Aspergillus oryzae via massively parallel mRNA sequencing. different isoprenoid emissions. Plant Physiol 141:721–730 Nucleic Acids Res 38:5075–5087 Pelletier SW, Aneja R, Gopinath KW (1968) The alkaloids of Wang XW, Luan JB, Li JM, Bao YY, Zhang CX, Liu SS (2010b) De Aconitum heterophyllum Wall: isolation and characterization. novo characterization of a whitefly transcriptome and analysis of Phytochemistry 7:625–635 its gene expression during development. BMC Genom 11:400 Rico C, Normandeau E, Dion-Coˆte´ AM, Rico MI, Coˆte´ G, Bernatchez Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y L (2013) Combining next-generation sequencing and online (2010c) De novo assembly and characterization of root tran- databases for microsatellite development in non-model organ- scriptome using Illumina paired-end sequencing and develop- isms. Sci Rep 3:3376 ment of cSSR markers in sweet potato (Ipomoea batatas). BMC Saithong T, Rongsirikul O, Kalapanulak S, Chiewchankaset P, Genom 11:726 Siriwat W, Netrphan S, Suksangpanomrung M, Meechai A, Wei W, Qi X, Wang L, Zhang Y, Hua W, Li D, Lv H, Zhang X Cheevadhanarak S (2013) Starch biosynthesis in cassava: a (2011) Characterization of the sesame (Sesamum indicum L.) genome-based pathway reconstruction and its exploitation in global transcriptome using Illumina paired-end sequencing and data integration. BMC Syst Biol 7:75 development of EST-SSR markers. BMC Genom 12:451 Schnable P, Ware D, Fulton R, Stein J, Wei F, Pasternak S, Liang C, Xia ZH, Xu HM, Zhai JL, Li DJ, He CZ, Huang X (2011) RNA-Seq Zhang J, Fulton L, Graves T (2009) The B73 maize genome: analysis and de novo transcriptome assembly of Hevea complexity, diversity, and dynamics. Science 326:1112–1115 brasiliensis. Plant Mol Biol 77:299–308 Schulte AE, Llamas Dura´n AM, van der Heijden R, Verpoorte R Xie W, Lei Y, Fu W, Yang Z, Zhu X, Guo Z, Wu Q, Wang S, Xu B, (2000) Mevalonate kinase activity in Catharanthus roseus plants Zhou X, Zhang Y (2012) Tissue-specific transcriptome profiling and suspension cultured cells. Plant Sci 150:59–69 of Plutella xylostella third instar larval midgut. Int J Biol Sci Sheffield J, Taylor N, Fauquet C, Chen S (2006) The cassava 8:1142–1155 (Manihot esculenta Crantz) root proteome: protein identification Yazaki K (2005) ABC transporters involved in the transport of plant and differential expression. Proteomics 6:1588–1598 secondary metabolites. FEBS Lett 580:1183–1191 Sigurdsson MI, Jamshidi N, Steingrimsson E, Thiele I, Palsson BA You MK, Hur CG, Ahn YS, Suh MC, Jeong BC, Shin JS, Bae JM (2010) A detailed genome-wide reconstruction of mouse (2003) Identification of genes possibly related to storage root metabolism based on human Recon 1. BMC Syst Biol 4:140 induction in sweet potato. FEBS Lett 536:101–105 Singh RS, Gara RK, Bhardwaj PK, Kaachra A, Malik S, Kumar R, Zhaohong W, Wang J, Xing J, He Y (2006) Quantitative determi- Sharma M, Ahuja PS, Kumar S (2010) Expression of 3-hydroxy- nation of alkaloids in four species of Aconitum by HPLC. 3-methylglutaryl-CoA reductase, p-hydroxybenzoate-m-geranyl- J Pharma Biomed Anal 40:1031–1034 transferase and genes of phenylpropanoid pathway exhibits positive correlation with shikonins content in arnebia [Arnebia euchroma (Royle) Johnston]. BMC Mol Biol 11:88

123