<<

Gene 4 (2015) 78–82

Contents lists available at ScienceDirect

Plant Gene

journal homepage: www.elsevier.com/locate/plant-gene

Insights into balbisiana and species divergence and development of genic microsatellites by transcriptomics approach

Ravishankar Kundapura Venkataramana a,⁎, Megha Hastantram Sampangi-Ramaiah a, Rekha Ajitha b, Ganesh N. Khadke a,VeerrajuChellama a Division of Biotechnology, ICAR-Indian Institute of Horticultural Research, Hessaraghatta Lake Post, Bengaluru 560089, b Division of Crops, ICAR-Indian Institute of Horticultural Research, Hessaraghatta Lake Post, Bengaluru 560089, India article info abstract

Article history: The species, Colla, and Musa acuminata are progenitors of cultivated banana and belong to Received 19 September 2014 the . M. balbisiana, a wild banana species harbors many useful important traits such as disease re- Received in revised form 15 September 2015 sistance and abiotic stress tolerance and represents an important genetic resource for banana improvement pro- Accepted 23 September 2015 grams. Therefore, in our study, we for the first time have produced transcriptome data from 12 different tissues Available online 8 October 2015 of Musa balbisiana collection ‘Bee Hee Kela’ and Musa acuminata ssp. burmannicoides —‘Calcutta-4’ separately using Illumina GA II X technology. We have done a comparative analysis of the transcriptome of these two species. Keywords: Musa balbisiana The high quality reads were assembled and mapped to the DH-Pahang reference genome. A total of 9857 sequences Musa acuminata from BB genome and a total of 4424 sequences from the AA genome had SSRs and validation was done for a few Next generation sequencing selected SSRs. We have analyzed effective number of codons, calculated Ka/Ks ratio, and overall GC and GC3 con- Transcriptome tents in the two species. In the present study a large number of gene based SSR markers developed for both Codon usage bias Musa species will facilitate molecular marker breeding strategies, development of a genetic linkage map and QTL EST-SSRs analysis. © 2015 Published by Elsevier B.V. All rights reserved. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Banana (Musa spp.) is the fourth most important crop in the world molecular tools to identify genes and QTLs associated with these stress after rice, maize and corn. In many countries of Asia and Africa, it is a tolerance. The information on B genome will enhance for Marker assisted major staple food crop (Heslop-Harrison and Schwarzacher, 2007). breeding efforts. Before the emergence of next generation sequencers, a The cultivated edible banana and plantains are derived from the pro- study showed that the existence of microsynteny among Musa, rice and genitor species Musa acuminata (genome designated as ‘AA’)and Arabidopsis (Lescot et al., 2008). Next-generation sequencing has tremen- Musa balbisiana (genome designated as ‘BB’). The majority of edible cul- dously reduced sequencing costs and increased the transcript coverage, tivars are allopolyploid triploids with a genome constitution of AAA which has in turn enhanced our ability to detect novel and rare tran- (dessert banana), AAB (plantains) and ABB (cooking ). scripts, novel alternative splice isoforms and direct measurement of tran- M. balbisiana harbors many traits for abiotic and biotic stress toler- script abundance and identify a large number of SSRs and SNPs which can ance. Genotypes with a ‘B’ genome such as Sukali Ndizi and Kayinja be used to develop markers. Among the DNA markers, simple sequence (also known as ) are more tolerant to banana weevils than repeat (SSR or microsatellite) markers are useful as they are highly infor- genotypes with only ‘A’ genomes (D'Ocan et al., 2008). M. balbisiana ge- mative, codominant, adaptable to high-throughput genotyping, and are nome is believed to be drought tolerant as they originated from a drier simple to maintain and exchange between laboratories. SSRs are pre- ecologythancultivarwiththeM. acuminata genome. with an ferred over other DNA markers for the characterization and assessment AAB or ABB genome constitution are said to be more drought tolerant of genetic variability, as they are highly polymorphic, detect high levels and hardy due to the presence of the B genome (Simmonds, 1962; of allelic diversity, reproducible, easy to interpret, can be assayed by Thomas and Turner, 2001; Robinson and Sauco, 2010). The genomic in- PCR (Venter et al., 2004), and are amenable to automation (Weber and formation on ‘B’ genome will also enhance our knowledge on under- May, 1989). standing biotic and abiotic stress tolerance mechanism operating in The draft sequence of the 523-megabase genome of a doubled- banana. Hence, there is an urgent need to develop high throughput haploid M. acuminata genotype (D'Hont et al., 2012) and also its refer- ence assembled M. balbisiana [Pisang Klutuk Wulung — B genome] (Davey et al., 2013) has recently become publicly available. Here we re- ⁎ Corresponding author. port an analysis of the transcriptome from the whole plant (12 different E-mail address: [email protected] (R. Kundapura Venkataramana). tissues) of M. balbisiana collection ‘Bee Hee Kela’ and M. acuminata ssp.

http://dx.doi.org/10.1016/j.plgene.2015.09.007 2352-4073/© 2015 Published by Elsevier B.V. All rights reserved. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). R. Kundapura Venkataramana et al. / Plant Gene 4 (2015) 78–82 79 burmannicoides —‘Calcutta-4’, separately obtained using the Illumina was examined by 1.0% (w/v) agarose gel electrophoresis, and visualized sequencing technology. The analysis led to the identification of the ad- under UV transilluminator. ditional genes that were not predicted from the genome sequencing Initially, a total of 168 (for BB) and 79 (for AA) SSR primer pairs were project along with the genic SSR markers with its probable biological screened separately using DNA from each five genotypes belonging to the function. We have also attempted to compare selected genes of the same genomic group. Each PCR amplification reaction (20 μl) contained two species M. balbisiana and M. acuminata to understand the diversifi- 2 mM MgCl2, 16 mM (NH4)2SO4, 67 mM Tris HCl, pH 8.8, 0.01% (v/v) cation of selected genes in these species. The comparative analysis of ‘A’ Tween-20, 0.1 mM of each dNTP, 3 ρmole of the forward and 5 ρmole re- and ‘B’ transcriptomes can throw light on diversification and evolution verse primers and 5 ρmole of Probe (FAM, PET, NED, VIC), (Schuelke, of these species. 2000; Ju et al., 1996), 50 ng genomic DNA, and 0.3 Units of Taq DNA po- Banana genotypes are maintained at ICAR-Indian Institute of Horti- lymerase (Bangalore Genei, Bangalore, India). PCR reactions were carried cultural Research (IIHR), Bengaluru. The tissues for the RNA isolation out in a thermal cycler using the following two amplification conditions: was collected from these healthy of M. balbisiana ‘Bee hee kela’ A: SSR55: An initial denaturation at 94 °C for 2 min, followed by 35 cycles (BB) and M. acuminata ssp. burmannicoides —‘Calcutta-4’ (AA). The of94°Cfor30s,55°Cfor30s,and72°Cfor1min,withafinal extension gemplasm of M. balbisiana ‘Bee hee kela’ (BB) was collected from step of 5 min at 72 °C. The primer pairs which were not able to amplify Cachar and Jaintia hills, Assam, India whereas M. acuminata ssp. DNA properly in the above conditions, were subjected to touchdown burmannicoides —‘Calcutta-4’ (AA) was collected from Banana Research PCR. B: Touchdown : Initial denaturation at 94 °C (5 min), then 30 cycles Station, Thrissur, Kerala, India. The 12 different tissues used for the RNA at 94 °C (30 s)/60 °C (45 s)/72 °C (45 s), followed by 8 cycles of 94 °C isolation 2 of them were below the ground tissues (root and ) and (30 s)/55 °C (45 s)/72 °C (45 s), and a final extension at 72 °C for the remaining 10 tissues were above the ground, which can be classified 10 min. The amplified PCR products were separated on a normal agarose into 3 classes — foliar tissues (lamina, midrib, petiole); reproductive tis- gel. Primers which have amplified in the expected sizes were multiplexed sues [male flower, female flower, fruit pulp, peel, male inflorescence, and sent for fragment size analysis using ABI 3730 Automated DNA se- ]; pseudostem. Total RNA was isolated using the pine method quencer (Applied Biosystems, California, USA) at ICRISAT (International (Chang et al., 1993). The quality of RNA was analyzed using a Nanodrop Crops Research Institute for the Semi-Arid Tropics, Hyderabad) facility. spectrophotometer and integrity was checked using agarose gel electro- The raw data were analyzed using the software peak scanner (Applied phoresis. The 10 μg of RNA from each tissue sample for M. balbisiana and Biosystems, California, USA) (Simko, 2009) to estimate the exact frag- M. acuminata were pooled separately, and they were sent for the RNA- ment size in basepair. The genetic analysis was done to calculate the ob- seq analysis using Illumina GA II X platform at M/s Genotypic Technolo- served heterozygosity (Ho), the expected heterozygosity (He), and the gy Private Limited, Bengaluru, India. polymorphic information content (PIC) using the fragment size data gen- Paired end (PE) cDNA library was generated from the pooled total erated for each SSR locus with the help of Cervus 3.0 software RNA from various tissues of both species separately. The library construc- (Kalinowski et al., 2007). tion and sequencing was performed following manufactures instruction The number of synonymous substitutions (Ks) and non- synonymous

(Genotypic Technology Private Limited, Bangalore, India). Various quality substitutions (Ka) per site were calculated using PAL2NAL program controls, removal of reads containing primer/adapter sequences were which uses CODEML algorithm of PAML package (Nei and Gojobori, done using an in-house tool kit. The sequence data for M. balbisiana and 1986). All the annotated transcripts were screened for internal stop co- M. acuminata has been deposited at NCBI (SRX501830 and SRX501837). dons using INCA v 2.1 software (Supek and Vlahovicek, 2004). Transcripts The high quality reads (Q N 20; N70% bases in a read) were assembled with no internal stop codons were used for further analysis. A pair of se- using Oases version 0.1.8 (http://www.ebi.ac.uk/~Zerbino/oases/; quence files was made with same biological functions (N1e−15) be- Schulz et al., 2012). Further, the transcripts were mapped to the tween M. balbisiana and M. acuminata. The nucleotide sequence file was M. acuminata ssp. malaccensis var. Pahang (DH Pahang) whole genome translated into protein file using Bioedit software (Hall, 1999). The ob- sequence (http://banana-genome.cirad.fr) using the genomic mapping tained protein sequence was aligned using ClustalW in Bioedit and it and alignment tool bowtie2-2.0.5 (Langmead and Salzberg, 2012). The was used to guide nucleic acid coding sequence alignments with transcripts which did not show with the M. acuminata (DH Pa- PAL2NAL (http://www.bork.embl.de/pal2nal/)(D'Hont et al., 2012; hang) CDS were subjected to BLAST homology search against nr protein Suyama et al., 2006). Finally, we obtained 77 sets of sequences with no se- database (NCBI) using BLASTX software using the criteria of transcripts quencing and alignment error and Ka/Ks values were calculated. The total with more than 70% identity and 70% query coverage are taken for further GC content of M. balbisiana and M. acuminata coding sequences was com- analysis. pared with each other using Emoss: GeeCee Explorer software (Elhaik The Perl Script software program MISA (http://pgrc.ipk-gatersleben. and Tatarinova, 2012)(http://emboss.sourceforge.net/apps/cvs/emboss/ de/misa/;Thiel et al., 2003) was used for identification of SSRs. This apps/ geecee.html). The histogram was generated using statistical soft- unigene set was used for mining SSR and later primers were designed ware PAST version 2.14 (Oyvind et al., 2001). Further, codon usage bias using Batch Primer3 v1.0 software (http://probes.pw.usda.gov/cgi-bin/ of the aligned transcripts was measured as the GC content in silent sites batchprimer3/ batchprimer3.cgi; You et al., 2008). In this study, the and the effective number of codons was computed using the formula SSR loci containing perfect repeat units of 2–6 nucleotides only were used in the previous studies (Novembre, 2002) and it was calculated considered. The primers were tailed with M13 sequences for further using INCA 2.1 software (Supek and Vlahovicek, 2004). PCR analysis (Schuelke, 2000; Ju et al., 1996). In the present transcriptome study, a total of 38.03 million bp DNA from ten Musa genotypes belonged to different genomic groups (38,032,087 = 8.44 GB) in M. balbisiana and 35.48 million bp (i.e., five genotypes from AA genome and five genotypes from the BB ge- (35,477,971 = 7.8 GB) in M. acuminata, 72 base paired end reads were nomic group) were used for SSR primer validation. The genotypes belong- generated by the Illumina GA II X Sequencer. The raw reads were subject- ing to the AA genome used in the present study are — M. acuminata ssp. ed to quality control using SeqQC-V2.0. The final set comprising of 33.62 malaccensis, Pisang lilin, Tongat, Calcutta-4, and Erachi vazhai. The geno- million bp in M. balbisiana and 33.16 million bp in M. acuminata with high types belonging to the BB genome are — Bhimaithia, Musa balbisiana quality reads was used for further assembly and analysis. The high quality tani, Musa balbisiana (1), Amturkela and Bee Hee Kela. All Musa acces- reads were assembled into transcripts using transcriptome assembly sions used in this study were maintained at the Banana Germplasm Col- software — Oases v 0.1.8 (http://www.ebi.ac.uk/~Zerbino/oases/) lection of the ICAR — Indian Institute of Horticultural Research (IIHR), (Schulz et al., 2012). Transcripts which were shorter than or equal to Bengaluru, India. DNA isolation was made from fresh, young tissue 200 bps were filtered out and other parameters were kept at default set- (2.0 g) using a modified CTAB method (Ravishankar et al., 2013). DNA tings, resulting in 72,697 transcripts in the case of Musa balbisiana and was quantified using a UV Spectrophotometer at 260 nm and, its integrity 47,150 transcripts in the case of Musa acuminata. Further mapping of 80 R. Kundapura Venkataramana et al. / Plant Gene 4 (2015) 78–82

Musa balbisiana and Musa acuminata transcripts to the M. acuminata ssp. malaccensis var. Pahang (DH Pahang) whole genome sequence (http://banana-genome.cirad.fr) revealed that 75.55% (54,922) and 87.74% (41,370) transcripts aligned with the reference genome respec- tively. The lengths of the assembled transcripts are represented by a his- togram (Table 1). The average length of the transcripts was found to be 910.8 ± 830.6 bp and 678.2 ± 565.9 bp respectively. The N50 value of Musa balbisiana was found to be 1375 bp and 982 bp in the case of Musa acuminata. The percentage of the M. acuminata (DH-Pahang) CDS covered (Reference sequence length (bp) — 37,955,432) was 53.34% (sequence length (bp) — 20,245,356) for M. balbisiana and 45.66% (sequence length (bp) — 17,332,143) for M. acuminata. The remain- ing unaligned transcripts were used for homology search using BLASTX program against a nr protein database (NCBI). By this pro- cess we obtained 1796 and 1133 transcripts showing homology Fig. 1. Venn diagram showing the common annotated transcripts and the unique tran- with other plant genes (Supplementary material file 3 3; scripts in the two Musa species. Figs. S1a,b). Since the data was generated using RNA from 12 differ- ent tissues there were redundant transcripts showing homology with the same protein (Supplementary material file 2). Hence, to ob- For these 55 polymorphic primers, the PIC values ranged from 0.165 to tain unigenes, the redundancy was removed at 95% identity. Finally, 0.758. (Supplementary material file 4). we obtained 26,688 unigenes in case of M. balbisiana and 22,913 in In to understand, the influence of the stabilizing or diversifying case of M. acuminata. Among the annotated transcripts 19,922 were selection between M. balbisiana and M. acuminata species, the compar- common between the two Musa species and 4918 were unique in ison of synonymous base substitutions (Ks) and non-synonymous base thecaseofM. balbisiana and 1856 were unique to M. acuminata substitutions (Ka) was calculated using 77 randomly selected tran- (Fig. 1). The unaligned 1796 and 1133 unigenes may be specificto scripts without a stop codon (Hurst, 2002). The distribution of the calcu- the Musa species. lated Ka/Ks ratios for the selected transcripts is represented in (Fig. 2). Here in this study, we have examined 72,697 and 47,150 sequences of Ka/Ks values greater than 2.5 were discarded during the analysis assum- M. balbisiana and M. acuminata respectively for SSRs. Out of 72,697 tran- ing that such high values likely to be resulted from alignment errors scripts examined by Perl Script MISA software, 9857 transcripts (Prabin et al., 2011). Following a method, similar to Novaes et al. contained simple sequence repeats (SSRs); in these transcripts a total (2008), we further classified genes with Ka/Ks b 0.15 values to be number of 12,231 SSRs were identified. Whereas in the case of under stabilizing selection and Ka/Ks between 0.50 and 2.50 for diversi- M. acuminata, from 47,150 transcripts, only 4424 had SSRs and a total fying selection, and compared the Ka/Ks distribution with the gene an- of 4718 SSRs were identified (Table 2). The majority of SSRs were di- notation results. By the above analysis, we obtained 29 genes which repeats in case of M. balbisiana (38%), whereas in case of M. acuminata, contributed mainly to the divergence of these two Musa species (Sup- tri-repeats dominated the other type of repeats (44%). A small fraction plementary file 5). These genes are mostly involved in stress tolerance of the number that is 160 and 57 for tetra, 40 and 17 for penta, 51 and and fruit quality. The Effective Number of Codons (ENC) and GC3 con- 23 for hexa nucleotide SSRs were identified in M. balbisiana and tent in M. balbisiana was 57.34; 0.034 and that of M. acuminata was M. acuminata transcripts respectively. We have synthesized 168 BB (for 48.72; 0.039 respectively. The patterns of codon usage in these genes, M. balbisiana) SSR primers, and 79 AA (for M. acuminata) SSR primers studied as the effective number of codons (Nc′) showed differences, with M13 tail. These SSRs were annotated and are provided with the bi- whereas the GC content in synonymous sites (GC3), were similar in ological function (CDS-ID) Supplementary file 1. These primers were both M. balbisiana and M. acuminata. Thus the codons were more ran- employed to amplify five AA and five BB genotypes of banana separately. domly used in case of M. balbisiana compared to M. acuminata. Also Out of the 168 BB SSR primer pairs examined, 139 SSR primers(82.7%) overall GC content in the transcripts of M. balbisiana and M. acuminata amplified the fragments from the BB genotypes. Among 139 amplified was estimated. Highest number of transcripts in M. balbisiana and SSR primers 86 showed polymorphism. For the 86 polymorphic primers M. acuminata showed 40% to 48% of the GC content. Both M. balbisiana the PIC value ranged from 0.164 to 0.768. Similarly, 79 AA SSR primers as well as M. acuminata showed an intermediate distribution, between were examined; only 68 primers (86.07%) amplified PCR fragments in unimodal and bimodal types of distribution. AA genotypes. Among the 68 amplified primers, 55 were polymorphic.

Table 1 De novo assembly program statistics. Table 2 Statistics of SSRs identified in banana transcriptome. Sample Musa balbisiana Musa acuminata Sample Musa Musa Transcripts generated 72,697 47,150 balbisiana acuminata Maximum transcript length 12,432 5644 Minimum transcript length 200 200 Total number of sequences examined 72,697 47,150 Average transcript length 910.8 ± 830.6 678.2 ± 565.9 Total size of examined sequences (bp) 66,210,554 31,979,411 Median transcript length 618 453 Total number of identified SSRs 12,231 4718 Total transcripts length 66,210,554 31,979,411 Number of SSR containing sequences 9857 4424 Total Number of non-ATGC characters 2329 340 Number of sequences containing more than 1 SSR 1118 262 Percentage of non-ATGC characters 0.004 0.001 Number of SSRs present in compound formation 847 270 Transcripts ≧ 100 bp 72,697 47,150 Mono repeats 3201 884 Transcripts ≧ 200 bp 72,697 47,150 Di repeats 4269 1503 Transcripts ≧ 500 bp 42,146 21,630 Tri repeats 3663 1964 Transcripts ≧ 1 kbp 23,114 9955 Tetra repeats 160 57 Transcripts ≧ 10 kbp 2 0 Penta repeats 40 17 N50 value 1375 982 Hexa repeats 51 23 R. Kundapura Venkataramana et al. / Plant Gene 4 (2015) 78–82 81

Fig. 2. The Ka/Ks distribution of orthologs between M. balbisiana and M. acuminata.

Bananas and plantains (Musa spp.) are one of the most important M. balbisiana (B genome) is known to be more tolerant to biotic and tropical fruit crops. With the whole genome sequence of double- abiotic stress, whereas M. acuminata (A genome) is known for its fruit haploid M. acuminata [DH-Pahang] (A genome) and also its reference quality characters (Price, 1995). Thus, to understand the possible selec- assembled M. balbisiana [Pisang Klutuk Wulung] (B genome) tion pressure resulting in the maintenance of divergence, we (D'Hont et al., 2012; Davey et al., 2013) available, our knowledge of calculated the ratio of synonymous and non-synonymous mutations be- banana genome has drastically improved. Here in the present tween M. balbisiana and M. acuminata (Arunyawat et al., 2007; Bamshad study, we utilize Illumina sequencing technology to generate and Wooding, 2003; Ford, 2002). By the above analysis, we shortlisted RNASeq data from pooled 12 tissues of M. balbisiana and 29 genes which showed divergence among the two Musa species. M. acuminata separately. Since the data is obtained from both Their BLAST analysis for homology of sequences suggests that they above the ground and below the ground tissues, we have sequenced play a role in signaling pathways between phytohormones and abiotic most of the tissue specific genes, thus we can term the RNASeq data stresses (SNF2 family genes), female gametogenesis, lipid binding pro- astranscriptomedataofthewholeplant.Afterthequalityfiltering, tein domains which play an important role in protein targeting, signal assembly and reference mapping we obtained 72,697 and 47,150 transduction, lipid transport, lipid biosynthesis, lipid metabolism and transcripts which were used for further analysis. The obtained se- maintenance of cellular compartments and membranes, pollen tube quences are more than expected since the data is generated using growth (Li et al., 2011; Saito et al., 2007). A formal investigation of mo- 12 different tissues and due to lots of redundancy mainly because lecular evolution within these species (with a proper outgroup) would of many genes were overexpressed (Supplementary material file help in discerning selection pressure operated in these species diver- 2). Further, annotation of these transcripts to the M. acuminata ssp. gence (Novembre, 2002). malaccensis var. Pahang (DH Pahang) whole genome sequence re- Also, it is reported that high GC3 content provides more targets for vealed that 75.55% of M. balbisiana transcripts were similar to the modulation. In many studies the correlation between methylation and DH-Pahang gene models, whereas 87.74% of the M. acuminata were GC3 is demonstrated. The positive correlation between GC3 and vari- similar to the reference gene models. This shows that M. balbisiana ability of gene expression has previously been reported (Fuglsang, is less similar to the reference compared to the M. acuminata. Fur- 2004). Hence the GC content at the third codon position was estimated ther, to know the function of the unannotated M. balbisiana and M. but it was found to be similar in both Musa species, whereas the effec- acuminata transcripts, we did a homology search using BLAST soft- tive number of codons were different indicating that the codons were ware against the NCBI nr database. 1796 and 1133 transcripts more randomly used in case of M. balbisiana compared to M acuminata. showed homology with other plant species, of which Vitis vinifera Other than the GC content at the third codon position overall GC content dominated with highest homology of 23% and 22%, respectively in the transcriptome data of both species was estimated, both of which (Supplementary material file 3; Figs. S1a,b). After removing the re- showed intermediate distribution between unimodal and bimodal dundancy we obtained 26,688 and 22,913 unigenes in M. balbisiana types of distribution. According to previous studies Oryza sativa has and M. acuminata respectively, out of which 19,922 unigenes were shown bimodal type of distribution, whereas other showed common and remaining were unique for the species (Supplementary intermediate to unimodal and bimodal type of distribution (D'Hont material file 3; Fig. 1). et al., 2012). This confirms the preliminary evidence that Musa is closely We mined SSRs using the transcriptome data. These genic SSRs are related to Zingiberales than O. sativa. very informative for study on quantitative trait loci for different traits, In this study, we report NGS-based generation of pooled as they are located on genes (Thiel et al., 2003). Here in the present transcriptome data from twelve different tissues of two species, study, we were able to mine 12,231 SSRs and 4718 SSRs from M. acuminata and M. balbisiana, that are progenitors for cultivating ba- M. balbisiana and M. acuminata respectively. We also validated the ran- nana. The results provide useful information for banana genetic studies domly selected SSRs to confirm the computational results. We obtained and helps in understanding their diversification. Genes were character- 82.7% and 86.07% amplification of SSR primers in the BB and AA genome ized based on BLAST annotation, and protein database. The data repre- genotypes respectively (Supplementary material file 1). We also calcu- sent a global transcriptome resource for Musa with identification of lated PIC values for each locus, since polymorphism is the main criteria candidate genes expressed in different tissues for two distinct species. for genotyping and analyzing the segregation during linkage map con- Our analysis of synonymous codon usage, Ka/Ks ratio as well as overall struction (Hwang et al., 2009). The highest PIC value obtained was GC and GC3 content in these two species would help in understanding 0.768 and 0.758 in M. balbisiana and M. acuminata respectively (Supple- diversification in these two species and their use in breeding programs. mentary file 4). The information generated can be used for genetic engineering biotic 82 R. Kundapura Venkataramana et al. / Plant Gene 4 (2015) 78–82 and abiotic stresses tolerant plants. Compared to the genomic SSR Kalinowski, S.T., Taper, M.L., TC, 2007. Revising how the computer program CERVUS ac- commodates genotyping error increases success in paternity assignment. Mol. Ecol. markers, the genic SSR markers developed from the transcriptome se- 16, 1099–1106. quencing technique in the present study will help to identify and devel- Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. op candidate gene functional SSR markers. Further, this would increase Methods 9, 357–359. fi Lescot, M.P., Ciampi, A.Y.P., Ruiz, M., Blanc, G., et al., 2008. Insights into the Musa genome: the ef ciency of Marker Assisted breeding of Musa species. It will also syntenic relationships to rice and between Musa species. BMC Genomics 9, 58. help in the enrichment of genetic map for M. acuminata, and develop- Li, X.Y., Wang, C., Nie, P.P., et al., 2011. Characterization and expression analysis of the ment of genetic/linkage map for M. balbisiana. SNF2 family genes in response to phytohormones and abiotic stresses in rice. Biol. – We thank the Indian Council of Agricultural Research, New Delhi for Plant. 55, 625 633. Nei, M., Gojobori, 1986. Simple methods for estimating the numbers of synonymous and the financial assistance through the ICAR Network Project on Trans- nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426. genics in Crops: Functional Genomics-Fusarium wilt and drought toler- Novaes, E., Drost, D.R., Farmerie, W.G., Pappas, G.J., Grattapaglia, D., Sederoff, R.R., Kirst, M., ance in Banana (3011). M/s Genotypic Pvt. Ltd. Bengaluru for 2008. High-throughput gene and SNP discovery in Eucalyptus grandis,an uncharacterized genome. BMC Genomics 9, 312. sequencing and bioinformatic assistance. Novembre, J.A., 2002. Accounting for background nucleotide composition when measur- ing codon usage bias. Mol. Biol. Evol. 19, 1390–1394. Appendix A. Supplementary data Oyvind, H., David, A.T.H., Paul, D.R., 2001. PAST: paleontological statistics software pack- age for education and data analysis. Palaeontol. Electron. 4, 9. Prabin, B., Bryce, A.R., Jared, C.P., Richard, C.C., Joshua, A.U., 2011. Transcriptome charac- Supplementary data to this article can be found online at http://dx. terization and polymorphism detection between subspecies of big sagebrush doi.org/10.1016/j.plgene.2015.09.007. (Artemisia tridentata). BMC Genomics 12, 370. Price, N.S., 1995. The origin and development of banana and plantain cultivation. Bananas and Plantains.Chapman and Hall, London, pp. 1–13 References Ravishankar, K.V., Raghavendra, K.P., et al., 2013. Development and characterization of microsatellite markers for wild banana (Musa balbisiana). J. Hortic. Sci. Biotechnol. Arunyawat, U., Stephan, W., Stadler, T., 2007. Using multilocus sequence data to assess 88, 605–609. population structure, natural selection, and linkage disequilibrium in wild tomatoes. Robinson, J.C., Sauco, V.G., 2010. Bananas and Plantains. CABI Publishing, Wallingford. Mol. Biol. Evol. 24, 2310–2322. Saito, K., Tautz, L., Mustelin, T., 2007. The lipid-binding SEC14 domain. Biochim. Biophys. Bamshad, M., Wooding, S.P., 2003. Signatures of natural selection in the human genome. Acta 1771, 719–726. Nat. Rev. Genet. 4, 99–111. Schuelke, M., 2000. An economic method for the fluorescent labeling of PCR fragments. Chang, S., Puryear, J., Cairney, J.A., 1993. Simple and efficient method for isolating RNA Nat. Biotechnol. 18, 233–234. from pine . Plant Mol. Biol. Rep. 11, 113–116. Schulz, M.H., Zerbino, D.R., et al., 2012. Oases: robust de novo RNA-seq assembly across D'Ocan, Mukasa, H., et al., 2008. Effects of banana weevil damage on plant growth and the dynamic range of expression levels. Bioinformatics 28, 1086–1092. yield of East African Musa genotypes. J. Appl. Biol. Sci. 9, 407–415. Simko, I., 2009. Development of EST-SSR markers for the study of population structure in D'Hont, A., Denoeud, F., Aury, J.M., et al., 2012. The banana (Musa acuminata)genomeand lettuce (Lactuca sativa L.). J. Hered. 100, 256–262. the evolution of monocotyledonous plants. Nature 488 (7410), 213–217. Simmonds, N.W., 1962. The evolution of the bananas. Longmans, London. Davey, M.W., Gudimella, R., et al., 2013. AdraftMusa balbisiana genome sequence for mo- Supek, F., Vlahovicek, K., 2004. INCA: synonymous codon usage analysis and clustering by lecular genetics in polyploid, inter-and intra-specific Musa hybrids. BMC Genomics means of self-organizing map. Bioinformatics 20, 2329–2330. 14, 683. Suyama, M., Torrents, D., Bork, P., 2006. PAL2NAL: robust conversion of protein sequence Elhaik, E., Tatarinova, T., 2012. GC3 Biology in Eukaryotes and Prokaryotes (arXiv, alignments into the corresponding codon alignments. Nucleic Acids Res. 34, 609–612. 12033929). Thiel, T., Michalek, W., Varshney, R., et al., 2003. Exploiting EST databases for the development Ford, M.J., 2002. Applications of selective neutrality tests to molecular ecology. Mol. Ecol. and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. 11, 1245–1262. Appl. Genet. 106, 411–422. Fuglsang, A., 2004. The 'effective number of codons' revisited. Biochem. Biophys. Res. Thomas, D.S., Turner, D.W., 2001. Banana (Musa sp.) leaf gas exchange and chlorophyll Commun. 317, 957–964. fluorescence in response to soil drought, shading and lamina folding. Sci. Hortic. 90, Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis 93–108. program for windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98. Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Heslop-Harrison, J.S., Schwarzacher, T., 2007. Domestication, genomics and the future for Paulsen, I., Nelson, K.E., Nelson, W., 2004. Environmental genome shotgun sequenc- banana. Ann. Bot. 100, 1073–1084. ing of the Sargasso Sea. Science 304, 66–74. Hurst, L.D., 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Weber, J.L., May, P.E., 1989. Abundant class of human DNA polymorphisms which can be Genet. 18, 486. typed using the polymerase chain reaction. Am. J. Hum. Genet. 44, 388–396. Hwang, T.Y., Sayama, T., Takahashi, M., et al., 2009. High-density integrated linkage map You, F.M., Huo, N., Gu, Y.Q., et al., 2008. BatchPrimer3: a high throughput web application based on SSR markers in soybean. DNA Res. 16 (4), 213–225. for PCR and sequencing primer design. BMC Bioinf. 9, 253. Ju, J., Glazer, A.N., Mathies, R.A., 1996. Energy transfer primers: a new fluorescence label- ing paradigm for DNA sequencing and analysis. Nat. Med. 2, 246–249.