Mol Breeding DOI 10.1007/s11032-014-0134-z

Characterization and cross- transferability of EST–SSR markers developed from the transcriptome of versipellis () and their application to population genetic studies

Rui Guo • Yun-Rui Mao • Jin-Rui Cai • Jin-Yang Wang • Jie Wu • Ying-Xiong Qiu

Received: 12 March 2014 / Accepted: 9 June 2014 Ó Springer Science+Business Media Dordrecht 2014

Abstract Dysosma (Berberidaceae), which com- Among the 5,167 EST–SSRs, 1,050 primer pairs were prises seven herbaceous perennial species, has long successfully designed. After selecting 80 of these pairs been used as main sources of a traditional Chinese at random for further validation, 19 pairs were medicine, ‘‘Guijiu.’’ Despite its ecological and eco- identified as true-to-type SSR loci, and 14 of those nomic importance, molecular research of Dysosma has could reliably amplify polymorphic bands from 12 lagged behind because of the shortcoming of molec- individuals of D. versipellis. These 14 EST–SSR ular markers. In this study, a cDNA library of D. markers showed high average genetic diversity (e.g., versipellis leaves was sequenced using the Illumina NA = 6.29; HE = 0.528), when surveyed across four HiSeqTM 2000 sequencing system. A total of 44,855 D. versipellis populations, and were also transferable nonredundant unigenes were assembled from 57.6 to almost all other Dysosma species, excepting one million reads, and 5,167 expressed sequence tag– marker (four instead of six species). Finally, 11 simple sequence repeats (EST–SSRs) were identified polymorphic markers were chosen to provide insights in 4,536 unigenes. Trinucleotide motifs were the most into the population structure of D. versipellis and its common type, with a frequency of 43.7 % (2,260). presumed sister species, D. pleiantha. Both genetic distance and STRUCTURE analyses identified two genetic clusters largely congruent with the current species Rui Guo and Yun-Rui Mao have contributed equally to this classification. These findings indicate that the EST– work. SSRs examined can be used with confidence in future Electronic supplementary material The online version of population genetic studies of both D. versipellis and D. this article (doi:10.1007/s11032-014-0134-z) contains supple- pleiantha. mentary material, which is available to authorized users.

R. Guo Y.-R. Mao J.-R. Cai J.-Y. Wang Keywords Dysosma EST–SSRs Transcriptome J. Wu Y.-X. Qiu (&) Transferability Population genetics Key Laboratory of Conservation Biology for Endangered Wildlife of the Ministry of Education, and College of Life Sciences, Zhejiang University, Hangzhou 310058, China e-mail: [email protected] Introduction R. Guo Y.-R. Mao Y.-X. Qiu Laboratory of Systematic and Evolutionary Botany and ‘‘Guijiu’’, a traditional Chinese herbal medicine, is Biodiversity, Institute of Ecology, and Conservation Center for Gene Resources of Endangered Wildlife, extracted from the or roots of various taxa in Zhejiang University, Hangzhou, China Podophylloideae Eaton (Berberidaceae), including 123 Mol Breeding

Sinopodophyllum hexandrum (Royle) Ying and spe- according to the original sequences used for develop- cies of Dysosma Woodson (Shang et al. 1994). The ment of SSRs: genomic SSRs and those derived from active constituent of ‘‘Guijiu’’ is podophyllotoxin, expressed sequence tags (EST–SSRs). Genomic SSRs which has been used for semi-synthesis of various have traditionally been isolated by hybridizing repeat- powerful and extensively employed cancer-treating enriched molecular probes in genomic libraries (Ra- drugs, e.g., teniposide (VM-26), etopophos (VP-16- jora et al. 2000; Hodgetts et al. 2001; Scotti et al. 213), and etoposide (Roy et al. 1992; Giri and Narasu 2002), but this method has low efficiency and can be 2000; Lee 2012). The latter are topoisomerase II very time-consuming, cost-expensive, and labor- inhibitors that are widely used for treating lung and intensive (Zane et al. 2002; Parchman et al. 2010). testicular cancers, among others. As the main source As whole genome-level sequence data are still scarce of ‘‘Guijiu’’, the genus Dysosma consists of seven for most groups of organisms, large collections of species, all of which are geographically restricted to ESTs currently offer the most promising source of mainland China and Taiwan (Ying et al. 1993). information for the discovery and characterization of Dysosma versipellis (Hance) M. Cheng ex Ying is SSRs (Gupta et al. 2003; Ellis and Burke 2007). EST– the most widespread species and is presently known SSR markers specifically designed for single species from about 30 populations, scattered in isolated stands frequently display a high degree of operational across Central-Southeast China. Other Dysosma spe- transferability to related species because of their cies are endemic to particular regions. Of these seven location in conserved genic regions (Gupta et al. species, four (i.e., D. versipellis, D. veitchii, D. 2003; Pashley et al. 2006; Zheng et al. 2013). As EST– tsayuensis, and D. aurantiocaulis) have been consid- SSRs are potentially tightly linked with functional ered ‘‘endangered’’ or ‘‘vulnerable’’ by both the China genes controlling phenotype, they are also especially Species Red List (Wang and Xie 2004) and the IUCN attractive for comparative genetic mapping and stud- (2013) because of their small ranges, few populations, ies of adaptive evolution (Varshney et al. 2005a; and small population sizes. Kumari et al. 2013). Despite its ecological and economic importance, To our knowledge, no study to date has developed however, molecular research of Dysosma is lagged EST–SSR markers in Dysosma. Here, we report our behind. So far, only a limited number of genetic work on EST–SSRs derived from a large expressed markers have been used for investigating levels of sequence dataset based on Illumina HiSeqTM 2000 diversity and population structure (only D. versipellis/ sequencing data from young leaves of D. versipellis. D. pleiantha), including allozymes (Qiu et al. 2005), The aims of this study were to (1) determine the inter-simple sequence repeats (ISSRs) (Qiu et al. frequency and distribution of EST–SSRs in the 2006; Zong et al. 2008), chloroplast (cp) DNA transcriptome of this species, (2) establish EST–SSR sequences (Qiu et al. 2009), and amplified frag- markers and validate their level of polymorphism, (3) ment length polymorphisms (AFLPs) (Guan et al. evaluate the transferability of the polymorphic EST– 2010). Of the few simple sequence repeats (SSRs) SSRs to other related species, (4) compare levels of identified from random genomic sequences of D. genetic diversity and population structure between D. versipellis (9) and D. pleiantha (14), most were proved versipellis and its presumed sister species, D. pleian- to be nontransferable to other species (Guan et al. tha, and (5) elucidate genetic relationships among 2008, 2011). Consequently, more markers are needed populations of the two species. for an in-depth understanding of the genetic diversity and population genetic structure of Dysosma species with implications for their reproductive ecology and Materials and methods conservation genetics. Due to their codominant and highly polymorphic materials and genomic DNA extraction nature, SSRs have been extremely useful as genetic markers, especially for the purposes of investigating Twelve individuals from three populations (DJ, SN, genetic variation and relatedness and performing TT) of D. versipellis were used for the initial screening linkage mapping and evolutionary studies (Guichoux of EST–SSRs (Table 1). These primer pairs, which et al. 2011). Two categories of SSRs can be defined were successfully amplified and found to be 123 Mol Breeding

Table 1 Sampling details of seven Dysosma species from China and Taiwan used in the present study for EST–SSR development and characterization, tests of cross-species transferability, and/or population genetic surveys (see text) Species Population Location Latitude Longitude Altitude (m) Sample code (N) (E) size

D. versipellis DJ Dujiangyan, Sichuan Province 31°060 103°380 1,290 20 SN Shennongjia, Hubei Province 31°280 110°220 2,000–2,700 33 TT Tiantangzhai, Anhui Province 31°140 115°740 900–1,200 37 HS Mount Huang, Anhui Province 30°080 118°180 800–1,000 16 D. pleiantha DY Mount Daiyun, Fujian Province 25°380 118°110 300 24 TM Mount Tianmu, Zhejiang Province 30°190 119°260 400 26 YL County Yilan, Taiwan 24°350 121°260 1,594–1,699 13 D. veitchii AL Mount Ailao, Yunnan Province 24°320 101°010 2,450 1 CEM Mount Emei, Sichuan Province 29°350 103°220 2,124 1 ZX County Zhenxiong, Yunnan Province 27°230 104°380 1,749 1 KD County Kangding, Sichuan Province 30°010 101°570 2,700 1 TC County Tengchong, Yunnan Province 25°210 98°080 1,820 1 D. majorensis YJ County Yinjiang, Guizhou Province 27°560 108°360 942 1 GEM Mount Emei, Sichuan Province 29°350 103°220 2,200 1 JF Mount Jinfo, Chongqing City 29°000 107°110 1,823 1 ML County Malipo, Yunnan Province 23°110 104°450 1,805 1 D. difformis HPS Mount Huping, Hunan Province 30°010 110°320 925 1 LS County Longsheng, Guangxi Province 25°380 109°540 700 1 BD Mount Badagong, Hunan Province 29°450 110°030 1,513 1 SH Mount Shunhuang, Hunan Province 26°260 110°580 993 1 D. tsayuensis LZ Count Linzhi, Tibet Autonomous Region 29°430 94°430 3,841 2 D. aurantiocaulis CS Mount Cang, Yunnan Province 25°460 100°170 2,646 3

polymorphic in initial screening tests, were then tested Transcriptome sequencing, de novo assembly, across 106 individuals of D. versipellis from two identification of EST–SSRs, and primer designing populations each in (west)central (DJ, SN) and (south)eastern (TT, HS) China (Table 2). The trans- Total RNA was isolated from young leaves of D. ferability of EST–SSR markers was evaluated among versipellis using the TRIzol reagent according to the six other Dysosma species (D. aurantiocaulis, D. manufacturer’s instructions (Invitrogen Life Technol- difformis, D. majorensis, D. pleiantha, D. tsayuensis, ogies, USA). The integrity of the RNA was evaluated and D. veitchii) (Table 1). Finally, to assess levels of by agarose gel electrophoresis, and the concentration genetic diversity, population structure, and genetic was determined using a NanoDrop 2000 spectropho- relationships between populations of D. versipellis tometer (Thermo Scientific, Wilmington, DE, USA). and D. pleiantha, 63 individuals of the latter species The cDNA library was constructed following the were collected in (south)east China (TM, DY) and methods described in Liu et al. (2013a, b). After Taiwan (YL) (Table 1). Voucher specimens represen- validation with an Agilent 2100 Bioanalyzer and ABI tative of all sampled populations of D. versipellis and StepOnePlus real-time PCR system, the cDNA library D. pleiantha, as well as five other Dysosma species, was sequenced on the Illumina sequencing platform are stored at the Herbarium of Zhejiang University (Illumina HiSeqTM 2000) of the Beijing Genomics (HZU). Total genomic DNA was extracted from the Institute (BGI) at Shenzhen, China, using the single- dried leaf tissue using a DNeasy plant tissue kit end paired-end technology in a single run. The initial (Qiagen). 3-step data processing (i.e., image analysis, base-

123 Mol Breeding

Table 2 Characteristics of 14 EST–SSR primer pairs validated in a survey of four Dysosma versipellis populations

0 0 Locus Primer pair (5 –3 ) Repeat Size NA HO HE PIC HWE Accession motif range P value number (bp)

EDV-11 F: GAAGACGAAGACGAAGACGTATC (GAA)7 119–140 7 0.143 0.555 0.519 0 KJ000288 R: TTGTTTCGGAGGATGTTGTCTAA

EDV-28 F: GTTGTTTGGGTTTTTGAATGGTA (TGG)7 123–141 7 0.192 0.707 0.656 0 KJ000289 R: CAAACTCCACCAAACTAGTGTCC

EDV-30 F: CTGGATTCTTCACAGACCAAGAC (GAA)7 128–152 10 0.565 0.750 0.713 0.0059 KJ000290 R: GTGACCGTCTTTCCATTCTATCA

EDV-37 F: GAGAACGGACTTGATGGTTTTC (GTTG)5 134–158 7 0.407 0.724 0.675 0.6288 KJ000291 R: TATTGTCTCTTTCCGTGATTCGT

EDV-40 F: GTCGTAAGATAAGCGATTTCTGC (GGAT)5 104–128 7 0.483 0.733 0.685 0.4668 KJ000292 R: TTGCAGCTGTATTCATCATCAAC

EDV-46 F: GAACATTAGAGGTGGAAATGCTG (GATG)5 152–176 6 0.304 0.568 0.531 0.0292 KJ000293 R: CCATTGTTTTTACAAACCTACCG

EDV-52 F: ACAATTTCGTTACAGCGTTGTTT (AATGG)4 125–140 4 0.086 0.286 0.256 1.0000 KJ000294 R: CTCCGCTTTTCTAGGTTTTCTTC

EDV-53 F: GGCAGTAATTCTTCGGAGTCTTT (TGG)5 109–145 7 0.519 0.558 0.527 0.2632 KJ000295 R: GATTCTGATATCAAGTGTTCGCC

EDV-54 F: AAGGGTCTCATCTGTTTTTGGTT (TAA)6 130–142 5 0.510 0.693 0.651 0.0357 KJ000296 R: GATGGGTGATTCATGGGTATAGA

EDV-59 F: CTAGGAGGTGGTGGACATACAAG (TTC)6 126–159 10 0.373 0.717 0.682 0.1209 KJ000297 R:GAAGGAGAACCACAAGGAAAGTT

EDV-60 F: GCGTATCGCAACAGATAAAAATC (AAG)6 138–171 8 0.327 0.435 0.415 0.2925 KJ000298 R: GAAGAAGAAGAGGCTGGTGTTG

EDV-67 F: ATGTGAGGATGGTGACTATGGTT (GTG)5 161–173 4 0.216 0.392 0.345 0.0553 KJ000299 R: GGGCGGTAAGTAGTAGAGGAAGA

EDV-74 F: CCATAACTAACCCGCTTCTATGA (AAT)6 119–125 2 0.010 0.010 0.009 1.0000 KJ000300 R: GACGAGAGAAGAATCTGCTTGAA EDV-80 F: AGTATGACCACGACCAGCACT (ACC)5 119–137 4 0.010 0.298 0.273 0 KJ000300 R: GTCAGGGTTATGGTAATCATGGA

NA number of alleles per locus, HO observed heterozygosity, HE expected heterozygosity, PIC polymorphism information content, and HWE Hardy–Weinberg equilibrium

calling, quality value calculations) was performed by quality clean reads were de novo assembled using the Illumina GA Pipeline version 1.6, from which TRINITY (Release-2012-06-08; Grabherr et al. 2011). 90-bp paired-end reads were obtained. The raw Briefly, clean reads with a certain overlap length were Illumina sequencing data for D. versipellis (accession combined to form longer fragments without ‘‘N’s’’ number: SRX528842) were then submitted to the (gaps), which are called contigs. The individual reads NCBI Sequence Read Archive (SRA) database. were assigned to the respective contigs by paired-end Before assembly, the 90-bp raw paired-end reads mapping. TRINITY was used to detect contigs from the were filtered to obtain high-quality clean reads by same transcript, determine the distances between these removing adaptors, low-quality sequences (reads with contigs, and connect them to obtain sequences that unknown bases ‘‘N’’), and reads with more than 20 % could not be extended at either end. Such sequences low-quality bases (quality value B10). The high- were defined as unigenes. These unigenes were further 123 Mol Breeding processed by sequence splicing and redundancy Dalian, Liaoning, China), 19 PCR buffer, 1 llof removal using the TGICL software (version 2.1; Pertea 2.5 mM MgCl2,1ll of 2.5 mM dNTPs, 0.1 ll bovine et al. 2003) to obtain nonredundant unigenes. This serum albumin (BSA) (TaKaRa, Dalian, Liaoning, nonredundant unigene dataset was used for detecting China), and 0.5 ll of each 10 lM primer. PCR SSR loci in MicroSAtellite (MISA)(http://pgrc.ipk- amplification conditions were as follows: initial dena- gatersleben.de/misa) (Thiel et al. 2003), a Perl script, turation at 95 °C for 10 min, followed by 35 cycles of which is able to detect perfect as well as compound 94 °C for 45 s, 60 °C for 2 min, 72 °C for 1 min, and a SSRs in nucleotide sequences. The classification of final extension at 72 °C for 10 min. The successful motifs was carried out according to the method of PCR products were further resolved on 12 % nonde- Jurka and Pethiyagoda (1995). To gain insights into naturing polyacrylamide gels using a 50-bp ladder their putative functions, SSR-containing sequences (TaKaRa) as reference and visualized by silver stain- were assigned to euKaryotic Orthologous Groups ing. Primers that were not successful for amplification (KOGs) according to the results of BLASTX searches or produced multiple bands were reanalyzed using the against amino acid sequences in the KOG dataset touchdown PCR method with 1 °C increments. The (http://www.ncbi.nlm.nih.gov/COG/) (Tatusov et al. optimized SSR primers were used to amplify DNA 2003). Sequence similarities were judged to be sig- from 106 individuals of D. versipellis and 63 individ- nificant when the e value was less than 1E-5. uals of D. pleiantha for genetic diversity analysis. To Primer pairs flanking repeats with a minimum accurately screen for population-level variation, the 50 length of 20 bp were designed using PRIMER (version end of the forward primer of each pair was labeled with 3.0; Rozen and Skaletsky 1999). Only those EST–SSR a fluorescent dye (6-FAM, HEX, or TAMRA). PCR loci were considered that contained motifs two to six amplification followed the above. Fragments were nucleotides in size. The minimum repeat unit was separated on a 3730xl DNA Analyzer (Applied Bio- defined as six for dinucleotides, five for trinucleotides, systems) with GeneScan 500 LIZ as an internal size and four for all the higher order motifs, including tetra- standard and scored and compiled using GENEMARKER , penta-, and hexanucleotides. Mononucleotide repeats (version 2.2.0; SoftGenetics, Pennsylvania, USA). were excluded. Parameters for designing the primers were set as follows: primer length ranging from 18 Cross-species amplification bases to 28 bases with 23 as the optimum, PCR product size ranging from 80 to 160 bp, melting temperature To assess the transferability of EST–SSR markers, we between 57 and 63 °C with 60 °C as the optimum tested their amplification in other Dysosma species, annealing temperature, and GC content from 40 to using PCR reactions as described above. 60 % with 50 % as the optimum. The primers designed were also subjected to OLIGO (version 6.67; Population genetic analysis and selective Molecular Biology Insights, Inc., Cascade, CO, USA) neutrality tests to check against potential primer dimers, hairpin structures, and the occurrence of mismatches. The To test the level of polymorphism at each EST–SSR specificity of primer pairs was checked by blasting locus (and overall) in D. versipellis, the number of their sequences against the EST sequences. observed alleles (NA), observed (HO) and expected (HE) heterozygosities, and polymorphism information EST–SSR analysis content (PIC) values were calculated for each locus using CERVUS (version 3.0.3; Kalinowski et al. 2007). The EST–SSR markers were initially tested for The significance of departures from Hardy–Weinberg amplification using DNA from 12 D. versipellis equilibrium (HWE) assuming heterozygote deficit was individuals to optimize the annealing temperature. tested by GENEPOP (version 4.0.7; Rousset 2008). The PCR amplification reactions were performed using Frequencies of null alleles were estimated in FREENA a thermal cycler GeneAmp PCR System 9700 (Applied (Chapuis and Estoup 2007) following the Expectation Biosystems, Foster City, USA) and conducted in a total Maximization (EM) method described by Dempster volume of 10 ll containing 1 ll of genomic DNA et al. (1977). Potential signatures of selection were (50 ng), 0.25 unit Taq DNA polymerase (TaKaRa, tested in LOSITAN (Antao et al. 2008), with 200,000 123 Mol Breeding simulations following the method of Beaumont and Q20 bases (base qualities that are larger than 20). The Nichols (1996). This method identifies loci under total length of the reads was about 11.9 gigabases selection based on the joint distributions of HE and FST (Gb). N percentage (percentage of ambiguous ‘‘N’’ obtained from simulations under an island model of bases) and GC percentage for the raw reads were 0 and migration. Loci outside the 99 and 1 % confidence 46.43 %, respectively. Using the TRINITY assembler areas were identified as candidates affected by positive software, short-read sequences from D. versipellis and balancing selection, respectively. were assembled into 90,328 contigs, with mean and For each population of D. versipellis and D. median (N50) sizes of 409 and 938 bp, respectively pleiantha, we used FSTAT (version 2.9.3; Goudet (Figure S1). 2001) to estimate the following genetic diversity and With paired-end reads, contigs can be identified inbreeding parameters across all loci: NA, HE, allele from the same transcript and the distance between richness (RS, standardized for 13 individuals using these contigs can be estimated. TRINITY can be used to rarefaction), and the average inbreeding coefficient map the reads back to the contigs, and to connect the

(FIS). The significance of departures from HWE, given contigs into unigenes that cannot be further extended by the deviations of FIS values from zero, was tested either end. As a result, 44,855 unigenes were obtained, by GENEPOP. Unbiased FST values between populations with an N50 value of 1,226 bp. Among the 44,855 within species were estimated following the Exclud- unigenes with a total length of approximately 34.9 ing Null Alleles (ENA) method implemented in megabases (Mb), the length of 22,754 unigenes FREENA (Chapuis and Estoup 2007), which corrects (50.73 %) ranged from 300 to 500 bp; 14,119 unig- for the presence of null alleles. To infer the most likely enes (31.48 %) ranged from 500 to 1,000 bp; and 724 number of population genetic clusters K, we used a unigenes (1.61 %) were more than 3,000 bp in length. Bayesian approach, as implemented in STRUCTURE (version 2.2; Pritchard et al. 2000). Assuming a Frequency and distribution of EST–SSRs population admixture model, we ran 10,000 burn-ins in the transcriptome and 10,000 Markov chain Monte Carlo (MCMC) replicates for K varying from 1 to 7. Ten independent From the 44,855 nonredundant unigenes, a total of runs were performed for each K. The most likely 5,167 EST–SSRs (4,906 with simple repeats and 261 number of clusters K was then inferred by estimating with compound formation) were identified in 4,536 ln P(D) and delta K (Evanno et al. 2005). Individuals unigene sequences, with 543 unigene sequences with a proportion of membership coefficient (Qind)to containing more than one EST–SSR locus. The each cluster less than 0.90 (admixed individual) were frequency of EST–SSRs observed in the combined assigned to more than one cluster, whereas individuals unigenes was 10.01 %, and the distribution density with Qind [ 0.90 were assigned to only one cluster. was one EST–SSR locus per 2.46 kilobases (kb). The genetic relationships among populations were Trinucleotide motifs were the most common type, evaluated by generating a neighbor-joining (NJ) with a frequency of 43.7 % (2,260), followed by network based on DA distances (Nei et al. 1983) dinucleotide (28.1 %, 1,453), mononucleotide between populations using POPTREE (version 2; Take- (19.4 %, 1,002), hexanucleotide (4.7 %, 245), pen- zaki et al. 2009). Bootstrap values were calculated tanucleotide (2.6 %, 136), and tetranucleotide (1.4 %, from 100,000 replications of resampled loci. 71) motifs (Figure S2). Mono- to hexanucleotide repeat motifs were further characterized by repeat number and EST–SSR length (Table S1). For exam- Results ple, among the 2,260 trinucleotide motifs, five tandem repeats were the most common repeat number Assembly of Dysosma versipellis transcriptome (24.3 %, 1,257) followed by six tandem repeats data from Illumina sequencing (11.5 %, 596), seven tandem repeats (7.2 %, 373), and eight tandem repeats (0.7 %, 34). The average After a rigorous quality check and data filtering, repeat number varied from 4 in hexanucleotide motifs Illumina HiSeqTM 2000 sequencing produced to 15.1 in mononucleotide motifs. The length of EST– 57,578,354 reads for D. versipellis with 97.40 % SSRs ranged from 12 to 25 bp, with 15 bp being the 123 Mol Breeding most frequent (26.3 %, 1,358). Among the 5,167 of 1,050 primer pairs were randomly selected and EST–SSR loci, 179 different motifs were identified. synthesized. Of these primer pairs, 37 successfully The frequency of 16 different types of SSR motifs amplified genomic DNA of D. versipellis, and 34 of (mono-, di- and tri-nucleotide repeats) is shown in those yielded amplification products of the expected Figure S3. Among the mononucleotide motifs, A/T size; the 43 remaining primer pairs either did not was the most common (19.22 %, 993), followed by generate amplification products or produced amplified C/G (0.17 %, 9). The dominant motif among dinucle- bands that were too weak, most likely due to otide SSRs was AG/CT (24.10 %, 1,200), followed by polymorphisms or deletions in primer binding sites. AC/GT (3.39 %, 169), AT/AT (1.67 %, 83), and CG/ With 12 D. versipellis individuals used as PCR CG (0.02 %, 1). Among the trinucleotide repeats, templates, 19 of the 34 primer pairs were found to AAG/CTT was the most abundant motif (12.01 %, be polymorphic, and 15 were monomorphic (reference 598), followed by ACC/GGT (10.53 %, 544), AGG/ to Table S3). The latter can still become useful in a CCT (5.32 %, 275), ATC/ATG (5.01 %, 259), AAC/ much larger sample and/or in different related taxa. GTT (4.95 %, 256), AGC/CTG (3.72 %, 192), ACG/ Among the polymorphic primer pairs, five were CGT (1.12 %, 58), AAT/ATT (0.93 %, 48), ACT/ excluded for further analysis because their amplified AGT (0.33 %, 17), and CCG/CGG (0.25 %, 13). fragments exhibited an excess of stutter bands pre- Regarding tetra-, penta-, and hexanucleotide motifs, venting the unambiguous identification of alleles. As the following were the most common: AAAT/ATTT detailed below, 14 polymorphic primer pairs (Table 2) (0.33 %, 17), AGAGG/CCTCT (0.41 %, 21), and were finally used to evaluate polymorphisms in 106 ACCATC/ATGGTG (0.17 %, 9), respectively. individuals from four natural populations of D. Of the 4,536 unigene sequences, only 1,269 showed versipellis, and test cross-species transferability for similarities to KOG sequences with functional classi- population genetic studies in other Dysosma species. fications and a large number of the contigs were A total of 90 alleles were detected at the 14 assigned to more than one sub-category. The distri- polymorphic loci in the 106 individuals genotyped. bution of nonredundant EST sequences assigned to Across these four populations, NA ranged from 2 to 10 KOG functional categories is shown in Figure S4. with an average of 6.29 ± 2.30. Estimates of HO Among them, the cluster for general function predic- ranged from 0.010 to 0.565 with an average of 0.279, tion only (663, 52.25 %) represented the largest group, and those of HE from 0.010 to 0.750 with an average of followed by transcription (471, 37.12 %), replication/ 0.528. The PIC ranged from 0.009 for EDV74 to 0.750 recombination/repair (306, 24.11 %), posttransla- for EDV30 (Table 2). Of the 14 loci, three (EDV11, tional modification/protein turnover/chaperones EDV28, EDV80) displayed significant deviation from (302, 23.80 %), carbohydrate transport and metabo- HWE after Bonferroni adjustment (P \ 0.0036; lism (291, 22.93 %), and signal transduction mecha- Table 2). Such deviations are expected when there is nisms (276, 21.75 %). Only a few unigenes were either a lumping together of separate gene pools assigned to nucleotide transport and metabolism, (Wahlund effect), nonrandom mating within the defense mechanisms, extracellular structures, and species, or when there are null alleles. A high nuclear structure. frequency of null alleles (null allele frequency [5 %) was observed at two loci (EDV11, EDV28) Polymorphism and transferability of EST–SSR for each population and at EDV80 for two populations, markers suggesting that deviations from Hardy–Weinberg expectations at these loci were most likely due to the Of the 5,167 EST–SSR loci, primer pairs could be high frequency of null alleles observed. Of the 14 loci, designed for 1,050 loci (20.3 %) using PRIMER (Table only EDV80 (outside the 99 % confidence areas) was S2). Primer pairs for the remaining 4,117 loci could affected by (positive) selection (data not shown). not be designed successfully because sequences Thirteen loci were successfully amplified in all the flanking the SSR loci were either too short or not other six species of Dysosma while EDV28 produced appropriate for designing primers. To investigate PCR fragments in only four species (D. pleiantha, D. whether the potential EST–SSR loci mined were the majorensis, D. difformis, D. veitchii) to the exclusion true-to-type ones for use in population genetics, 80 out of D. tsayuensis and D. aurantiocaulis (Table S4). 123 Mol Breeding

Table 3 Population genetic diversity and inbreeding parameters for Dysosma versipellis and D. pleiantha

1 Population code NA RS HE FIS HWE FIS HWE (number of individuals) P value P value1

D. versipellis DJ (20) 46 3.167 0.481 0.012 0.003 -0.103 0.633 SN (33) 52 3.114 0.388 0.324 0.000 0.236 0.000 TT (37) 51 2.933 0.318 0.118 0.000 -0.064 0.013 HS (16) 37 2.553 0.281 0.203 0.001 0.018 0.396 D. pleiantha DY (24) 54 3.366 0.391 0.158 0.000 0.119 0.000 TM (26) 57 3.499 0.435 0.268 0.000 0.239 0.000 YL (13) 36 2.571 0.337 0.299 0.000 0.058 0.100 1 Calculated after excluding three loci with a high frequency of null alleles (EDV11, EDV28, EDV80) or signs of selection (only EDV80)

Comparison of population genetic diversity, comprised three populations of D. versipellis (DJ, structure and assignment between D. versipellis SN, TT), and cluster II contained all populations of D. and D. pleiantha pleiantha (TM, DY, YL); as an exception, population HS of D. versipellis was assigned to both clusters Across the four D. versipellis populations surveyed, (Fig. 1b). The population-based NJ network (Fig. 2) the range was 37–52 for NA, 2.553–3.167 for RS, and generally agreed with the output of STRUCTURE and 0.281–0.481 for HE. Similar values were found among grouped the HS population into cluster I (i.e., D. the three populations of D. pleiantha, where the range versipellis), albeit without significant bootstrap was 36–57 for NA, 2.571–3.499 for RS, and support. 0.337–0.435 for HE. However, populations of D. pleiantha had, on average, slightly higher levels of diversity (HE = 0.388) than those of D. versipellis Discussion (HE = 0.367). FIS values in all populations of D. versipellis and D. pleiantha were significantly posi- Frequency and distribution of EST–SSRs tive, suggesting heterozygosity deficit in these popu- lations (Table 3). After excluding the three loci with a In this study, we identified a total of 5,167 EST–SSRs high frequency of null alleles (EDV11, EDV28, from 4,536 (out of 44,855 nonredundant) unigene EDV80) or signs of selection (only EDV80), the FIS sequences derived from the transcriptome of D. values ranged from -0.103 to 0.239 with an average versipellis. Considering the density of EST–SSRs of 0.072, and only three populations (SN of D. and the abundance of their different motifs, it is well versipellis, DY and TM of D. pleiantha) still showed known that these characteristics are highly dependent significant deviations from HWE. Levels of popula- on the size of the databases, the SSR search criteria, tion differentiation were higher in D. versipellis and the mining tools used (Parchman et al. 2010; (FST = 0.375) than in D. pleiantha (FST = 0.250). Varshney et al. 2005b; Koilkonda et al. 2012). Despite In the STRUCTURE analysis, the true number of all this, we observed that about 10.1 % of the clusters K in the data was difficult to determine transcriptome sequences of D. versipellis possess following Pritchard et al. (2000), because ln single sequence repeats, which falls well within the P(D) increased progressively as K increased (Figure range of values (2.60–16.82 %) reported for other S5a). The delta K statistic of Evanno et al. (2005), dicotyledonous species (Kumpatla et al. 2005). By however, permitted detection of a rate change in ln contrast, the distribution density of EST–SSRs in D. P(D) corresponding to K = 2 (Figure S5b). The versipellis is one locus per 2.46 kb, which is higher assignment of individuals to these groups (or ‘clus- than that in rice (3.4 kb), wheat (5.4 kb), soybean ters’) largely followed their : Cluster I (7.4 kb), Medicago sativa (12.06 kb), Arabidopsis 123 Mol Breeding

Fig. 1 a Geographic distribution of four (a) populations of Dysosma N50° versipellis (DJ, SN, TT, HS) and three populations of D. pleiantha (DY, TM, YL) from mainland China and Taiwan. The colors of histograms correspond to two genetic clusters China N40° identified by the program STRUCTURE. b Histogram of Yellow River the STRUCTURE analysis for the model with K = 2 Yangtze River TT (showing the highest DK). SN DJ The smallest vertical bar N30° represents one individual. HS TM The assignment proportion YL of each individual into DY cluster I versus II is shown along the y-axis Taiwan 0 600 km N20°

E80° E90° E100° E110° E120° E130°

(b) Cluster IAdmixed Cluster II 1.00 0.80 0.60 0.40 0.20 0.00 DJ SN TT HS DY TM YL D. versipellis D. pleiantha

DY 83 TM D. versipellis 61 YL HS

DJ D. pleiantha TT 60 72 SN

0.05

Fig. 2 Neighbor-joining network of the seven populations of Dysosma versipellis and D. pleiantha based on DA distances (Nei et al. 1983). Numbers below branches indicate bootstrap values as a percentage of 100,000 replicates

123 Mol Breeding thaliana (14 kb), Chrysanthemum nankingense in D. versipellis. This rate is lower than the success (14.7 kb), and cotton (20 kb) (Cardle et al. 2000; rate of 60–90 % amplification in previous studies (Li Peng et al. 2005; Liu et al. 2013a, b; Wang et al. 2013). et al. 2012; Wang et al. 2013; Zheng et al. 2013). This result, however, might be due to the looser Nonetheless, 19 of those 34 primer pairs (i.e., 56 %) standard of SSR search criteria used in the present were proved to be polymorphic among 12 individuals study (mono-, di-, tri-, tetra-, penta-, and hexanucleo- of D. versipellis. Similarly high ratios of polymorphic tide motifs with a minimum of 12, 6, 5, 4, 4, and 4 to total validated primer pairs have been reported in repeats, respectively). other plant studies (e.g., Hordeum vulgare: Thiel et al. In D. versipellis, the proportions of EST–SSRs 2003; rubber tree: Li et al. 2012; Populus euphratica: were not evenly distributed among motif types (di-, Du et al. 2013; Amorphophallus konjac/A. bulbifer: tri-, tetra- repeats, etc.; see Figure S3). Among the 1–6 Zheng et al. 2013). repeat types, trinucleotide repeats were found to be the Nine genomic microsatellite markers have previ- most frequent type, representing 43.7 % of the SSR ously been reported for D. versipellis (Guan et al. loci identified, followed by dinucleotides (28.1 %) and 2008) and 14 for D. pleiantha (Guan et al. 2011). mononucleotides (19.4 %). By contrast, hexa-, penta-, Based on these markers, the average values of NA were and tetranucleotide repeat types were far less common 4.50 for D. versipellis and 6.67 for D. pleiantha. The (4.7, 2.6, and 1.4 %, respectively). Hence, the present corresponding values for EST–SSR markers reported results support earlier suggestions of the trinucleotide in the present study are 3.32 and 3.50, respectively. repeat being the main EST–SSR repeat type in both This lower level of allelic diversity at EST–SSRs is mono- and dicots (Varshney et al. 2005a; Koilkonda most likely due to functional constraints in transcribed et al. 2012; Li et al. 2012). This predominance might regions of the genome (Ellis and Burke 2007), but reflect frameshift mutations due to the suppression of might also be further influenced by the type and nontrimeric SSRs in coding regions (Metzgar et al. number of repeat units and the gene region in which 2000), which concurs with trimeric SSRs being mainly they occur (Varshney et al. 2005a; Metzgar et al. 2000; found in such regions (Li et al. 2004). Bouck and Vision 2007). In D. versipellis, the most abundant di- and trinu- Our tests of interspecies transferability of alto- cleotide motifs were AG/CT (24.10 %) and AAG/CTT gether 14 EST–SSRs from D. versipellis to other (12.01 %), respectively. By contrast, CCG/CGG Dysosma species yielded rates of 100 % (13) and (0.25 %) and CG/CG (0.02 %) motifs were very rare. 83.3 % (1). This high transferability is consistent with These results are also consistent with previous studies previous reports from many other plant taxa, e.g., (Toth et al. 2000; Kumpatla and Mukhopadhyay 2005; Medicago (Eujayl et al. 2003), Citrus (Luro et al. Wang et al. 2010), suggesting that both AG/CT and 2008), Epimedium (Zeng et al. 2010), Amorphophallus AAG/CTT are common motifs. One possible expla- (Zheng et al. 2013), and Chrysanthemum nankingense nation is that codons containing AG/CT, i.e., GAG, (Wang et al. 2013). AGA, UCU, and CUC, correspond to Arg, Glu, Ala, and Leu, respectively, of which Ala (8 %) and Leu EST–SSR population genetics of D. versipellis (10 %) are relatively common in proteins (Kantety versus D. pleiantha et al. 2002). However, CCG/CGG and CG/CG are very rare motifs in dicots despite being common in mono- We detected null alleles at three (out of 14) loci and cots (Wang et al. 2011). This difference has been positive selection at one of them (EDV80), which may attributed to the high GC content of monocot (espe- have biased our population genetic results. Even cially grass) genomes (Morgante et al. 2002). though FST and genetic distance values could be corrected in this regard, using the program FREENA Polymorphism and cross-species transferability (Chapuis and Estoup 2007), some caution in the use of of EST–SSRs our loci was needed. In consequence, we estimated FIS, deviations from HWE, and individual assignment In this study, 80 designed primer pairs were used for proportions in our seven Dysosma populations (four validation of the EST–SSR markers, and 34 primer of D. versipellis; three of D. pleiantha) before and after pairs (42.5 %) yielded amplicons of the expected size excluding the three ‘problematical’ loci. In fact, their 123 Mol Breeding

inclusion proved highly influential on estimates of FIS Conclusions and HWE, whereas assignment proportions were only slightly affected (see also Chapuis and Estoup 2007; This work presents a de novo transcriptome sequencing Carlsson 2008). analysis of a cDNA library of D. versipellis leaves. A Given the lack of genomic SSR studies in Dysosma, total of 44,855 nonredundant unigenes were assembled, we will compare our population genetic EST–SSRs and 5,167 EST–SSRs were identified in 4,536 unigenes. data with previously published allozyme data for D. A total of 1,050 primer pairs were successfully versipellis and D. pleiantha (Qiu et al. 2005). The designed and characterized as potential molecular present EST–SSR data, however, revealed much markers. Of these pairs, we selected 80 at random for higher levels of average within-population diversity further validation, and 19 pairs were identified as true- in D. versipellis (EST–SSR: HE = 0.367; allozyme: to-type SSR loci, from which we could reliably amplify HE = 0.045) and D. pleiantha (EST–SSR: polymorphic bands from 12 individuals of D. versipel- HE = 0.388; allozyme: HE = 0.208). Focusing on D. lis. Of these 19 polymorphic primer pairs, 14 exhibited versipellis, estimates of FST based on EST–SSRs moderate levels of gene diversity in 106 individuals (0.375) were similar to those produced by allozymes from four populations of D. versipellis, and 13 showed a

(FST = 0.468). Moreover, these SSRs tend to cluster high rate of transferability (100 %) to related Dysosma populations according to their geographical origin species. Except for three loci (EDV11, EDV28, (Fig. 1b), which concurs with results from both EDV80) with a high frequency of null alleles and/or allozymes (Qiu et al. 2005) and AFLPs (Guan et al. signs of positive selection, 11 polymorphic markers 2010). Allozymes are universally recognized as legit- were further used to estimate genetic diversity and give imate markers for population genetic studies because insights into the population structure of D. versipellis they usually qualify as neutral markers (Kim et al. and its presumed sister species, D. pleiantha.Both 2008). Likewise, even though EST–SSRs are poten- phylogenetic and STRUCTURE analyses identified two tially exposed to selection, large-scale surveys (Tiffin genetic clusters largely congruent with the current and Hahn 2002; Clark et al. 2003) indicate that species classification. These findings indicate that the probably only a small percentage of potentially EST–SSRs examined can be used with confidence in functional genes containing SSRs are under positive future population genetic studies of both D. versipellis selection (see also Ellis and Burke 2007). Finally, our and D. pleiantha. Moreover, from the total of 1,050 individual-based STRUCTURE analysis (Fig. 1b) and the EST–SSR primer pairs designed in the present study, population-based NJ network (Fig. 2) revealed that the more than the one evaluated here (80) should be tested seven Dysosma populations surveyed for EST–SSRs for amplification, levels of polymorphism, and cross- largely cluster according to their species status, i.e., D. species transferability, and ultimately their utility in versipellis versus D. pleiantha. However, using STRUC- studies of population genetics, phylogeography, and TURE, individuals of the HS population of D. versipellis breeding system within and among species of Dysosma. from Mount Huangshan were almost equally assigned to the two clusters identified (Fig. 1b). Phylogeo- Acknowledgments This research was supported by the graphic work based on cpDNA sequence variation National Science Foundation of China (Nos. 31170200, 30900082), the Zhejiang Provincial Funds for Distinguished indicates that both species are fixed for species-specific Young Scientists (No. LR02001), the Fundamental Research haplotypes (Mao et al. 2014). Considering further that Funds for the Central Universities (No. 2011QNA6013), the D. versipellis and D. pleiantha have overlapping Qianjiang talent project from Bureau of Science and technology ranges in the Hengduan Mountain region (Ying et al. of Zhejiang Province, China (No. 2010R10090). We are grateful to Hans-Peter Comes (University of Salzburg) and three 1993), it seems plausible that the genetic admixture of anonymous reviewers for their insightful comments and the HS population reported here reflects (ongoing) suggestions to improve the manuscript. inter-specific hybridization, mediated solely by pollen. Taken together, we conclude that the polymorphic References SSRs derived from ESTs behave as effectively neutral markers in Dysosma species and are suitable for Antao T, Lopes A, Lopes RJ, Beja-Pereira A, Luikart G (2008) inferring population genetic structure in these tradi- LOSITAN: a workbench to detect molecular adaptation tional Chinese medicine . based on a Fst-outlier method. BMC Bioinform 9:323 123 Mol Breeding

Beaumont MA, Nichols RA (1996) Evaluating loci for use in the Guichoux E, Lagache L, Wagner S, Chaumel P, Ger PLE, Le- genetic analysis of population structure. Proc R Soc Lond B pais O, Lepoittrvin C, Malausa T, Revardel E, Salin F, Petit Biol Sci 263:1619–1626 RJ (2011) Current trends in microsatellite genotyping. Mol Bouck A, Vision T (2007) The molecular ecologist’s guide to Ecol Res 11:591–611 expressed sequence tags. Mol Ecol 16:907–924 Gupta PK, Rustgi S, Sharma S, Singh R, Kumar N, Balyan HS Cardle L, Ramsay L, Milbourne D, Macaulay M, Marshall D, (2003) Transferable EST–SSR markers for the study of Waugh R (2000) Computational and experimental char- polymorphism and genetic diversity in bread wheat. Mol acterization of physically clustered simple sequence Genet Genomics 270:315–323 repeats in plants. Genetics 156:847–854 Hodgetts RB, Aleksiuk MA, Brown A, Clarke C, Macdonald Carlsson J (2008) Effects of microsatellite null alleles on E, Nadeem S, Khasa D, Macdonald E (2001) Develop- assignment testing. J Hered 99:616–623 ment of microsatellite markers for white spruce (Picea Chapuis MP, Estoup A (2007) Microsatellite null alleles and glauca) and related species. Theor Appl Genet estimation of population differentiation. Mol Biol Evol 102:1252–1258 24:621–631 IUCN (2013) IUCN red list of threatened species. Version Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, 2013.2. www.iucnredlist.org Todd MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Jurka J, Pethiyagoda C (1995) Simple repetitive DNA sequences Ferriera S, Wang G, Zheng X, White TJ, Sninsky JJ, from primates: compilation and analysis. J Mol Evol Adams MD, Cargill M (2003) Inferring non-neutral evo- 40:120–126 lution from human-chimp-mouse orthologous gene trios. Kalinowski ST, Taper ML, Marshall TC (2007) Revising how Science 302:1960–1963 the computer program CERVUS accommodates genotyp- Dempster AP, Laird NM, Rubin DB (1977) Maximum likeli- ing error increases success in paternity assignment. Mol hood from incomplete data via the EM algorithm. J R Stat Ecol 16:1099–1106 Soc B 39:1–38 Kantety RV, Rota ML, Matthews DE, Sorrells ME (2002) Data Du FK, Xu F, Qu H, Feng SS, Tang JJ, Wu RL (2013) Exploiting mining for simple sequence repeats in expressed sequence the transcriptome of Euphrates Poplar, Populus euphratica tags from barley, maize, rice, sorghum and wheat. Plant (Salicaceae) to develop and characterize new EST–SSR Mol Biol 48:501–510 markers and construct an EST–SSR database. PLoS ONE Kim KS, Ratcliffe ST, French BW, Liu L, Sappington TW 8:61337 (2008) Utility of EST-derived SSRs as population genetics Ellis JR, Burke JM (2007) EST–SSRs as a resource for popu- markers in a beetle. J Hered 99:112–124 lation genetic analyses. Heredity 99:125–132 Koilkonda P, Sato S, Tabata S, Shirasawa K, Hirakawa H, Sakai Eujayl I, Sledge MK, Wang L, May GD, Chekhovskiy K, H, Sasamoto S, Watanabe A, Wada T, Kishida Y, Tsuruoka Zwonitzer JC, Mian MAR (2003) Medicago trunculata H, Fujishiro T, Yamada M, Kohara M, Suzuki S, Hasegawa EST–SSRs reveal cross-species genetic markers for M, Kiyoshima H, Isobe S (2012) Large-scale development Medicago spp. Theor Appl Genet 108:414–422 of expressed sequence tag-derived simple sequence repeat Evanno G, Regnaut S, Goudet J (2005) Detecting the number of markers and diversity analysis in Arachis spp. Mol Breed clusters of individuals using the software STRUCTURE: a 30:125–138 simulation study. Mol Ecol 14:2611–2620 Kumari K, Muthamilarasan M, Misra G, Gupta S, Subramanian Giri A, Narasu ML (2000) Production of podophyllotoxin from A, Parida SK, Chattopadhyay D, Prasad M (2013) Devel- Podophyllum hexandrum: a potential natural product for opment of eSSR-markers in Setaria italica and their clinically useful anticancer drugs. Cytotechnology 34:17–26 applicability in studying genetic diversity, cross-transfer- Goudet J (2001) Fstat, a program to estimate and test gene ability and comparative mapping in millet and non-millet diversities and fixation indices. Version 2.9.3. http:// species. PLoS ONE 8:67742 www2.unil.ch/popgen/softwares/fstat.htm Kumpatla SP, Mukhopadhyay S (2005) Mining and survey of Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, simple sequence repeats in expressed sequence tags of Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, dicotyledonous species. Genome 48:985–998 Chen ZH, Mauceli E, Hacohen N, Gnirke A, Rhind N, Lee C (2012) Monolignol and lignan biosynthetic studies: from Palma FD, Birren BW, Nusbaum C, Lindblad-Toh K, reaction mechanisms to next generation sequencing. PhD Friedman N, Regev A (2011) Full-length transcriptome thesis, Washington State University assembly from RNA-Seq data without reference genome. Li YC, Korol AB, Fahima T, Nevo E (2004) Microsatellites Nat Biotechnol 29:644–652 within genes: structure, function, and evolution. Mol Biol Guan BC, Qiu YX, Fu CX (2008) Isolation and characterization Evol 21:991–1007 of microsatellite markers in Dysosma versipellis (Berbe- Li DJ, Deng Z, Qin B, Liu XH, Men ZH (2012) De novo ridaceae), a rare endemic from China. Conserv Genet assembly and characterization of bark transcriptome using 9:783–785 Illumina sequencing and development of EST–SSR Guan BC, Fu CX, Qiu YX, Zhou SL, Comes HP (2010) Genetic markers in rubber tree (Hevea brasiliensis Muell. Arg.). structure and breeding system of a rare understory herb, BMC Genom 13:192 Dysosma versipellis (Berberidaceae), from temperate Liu TM, Zhu SY, Tang QM, Chen P, Yu YT, Tang SW (2013a) deciduous forests in China. Am J Bot 97:111–122 De novo assembly and characterization of transcriptome Guan BC, Gong X, Zhou SL (2011) Development and charac- using Illumina paired-end sequencing and identification of terization of polymorphic microsatellite markers in Dys- CesA gene in ramie (Boehmeria nivea L. Gaud). BMC osma pleiantha (Berberidaceae). Am J Bot 9:210–212 Genom 14:125 123 Mol Breeding

Liu ZP, Chen TL, Ma LC, Zhao ZG, Zhao PX, Nan ZB, Wang teniposide (VM-26) induced apoptosis in unstimulated YR (2013b) Global transcriptome sequencing using the mature murine lymphocytes. Exp Cell Res 200:416 Illumina platform and the development of EST–SSR Rozen S, Skaletsky H (1999) Primer3 on the WWW for general markers in autotetraploid alfalfa. PLoS ONE 8:e83549 users and for biologist programmers. Methods Mol Biol Luro FL, Costantino G, Terol J, Argout X, Allario T, Wincker P, 132:365–386 Talon M, Ollitrault P, Morillon R (2008) Transferability of Scotti I, Paglia GP, Magni F, Morgante M (2002) Efficient the EST–SSRs developed on Nules clementine (Citrus development of dinucleotide microsatellite markers in clementina Hort ex Tan) to other Citrus species and their Norway spruce (Picea abies Karst.) through dot-blot effectiveness for genetic mapping. BMC Genom 9:287 selection. Theor Appl Genet 104:1035–1041 Mao YR, Zhang YH, Nakamura K, Guan BC, Qiu YX (2014) Shang MY, Xu GJ, Xu LS, Li P (1994) Herbalogical study of Developing DNA barcodes for species identification in Chinese drug guijiu and xiaoyelian. J Tradit Chin Med Podophylloideae (Berberidaceae). J Syst Evol. doi:10. 19:451–453 (in Chinese with an English abstract) 1111/jse.12076 Takezaki N, Nei M, Tamura K (2009) Poptree2: software for Metzgar D, Bytof J, Wills C (2000) Selection against frameshift constructing population trees from allele frequency data mutations limits microsatellite expansion in coding DNA. and computing other population statistics with Windows- Genome Res 10:72–80 interface. http://www.med.kagawa-u.ac.jp/*genomelb/ Morgante M, Hanafey M, Powell W (2002) Microsatellites are takezaki/poptree2/index.html preferentially associate with nonrepetitive DNA in plant Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, genomes. Nat Genet 30:194–200 Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nei M, Tajima F, Tateno Y (1983) Accuracy of estimated Nikolskaya AN et al (2003) The COG database: an updated phylogenetic trees from molecular data II. Gene frequency version includes eukaryotes. BMC Bioinform 4:41 data. J Mol Evol 19:153–170 Thiel T, Michalek W, Varshney RK, Graner A (2003) Exploit- Parchman T, Geist K, Grahnen J, Benkman C, Buerkle CA ing EST databases for the development of cDNA derived (2010) Transcriptome sequencing in an ecologically microsatellite markers in barley (Hordeum vulgare L.). important tree species: assembly, annotation, and marker Theor Appl Genet 106:411–422 discovery. BMC Genom 11:180 Tiffin P, Hahn MW (2002) Coding sequence divergence Pashley CH, Ellis JR, McCauley DE, Burke JM (2006) EST between two closely related plant species: Arabidopsis databases as a source for molecular markers: lessons from thaliana and Brassica rapa ssp. pekinensis. J Mol Evol Helianthus. J Hered 97:381–388 54:746–753 Peng JH, Nora L, Lapitan V (2005) Characterization of EST- Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different derived microsatellites in the wheat genome and develop- eukaryotic genomes: survey and analysis. Genome Res ment of eSSR markers. Funct Integ Genomics 5:80–96 10:967–981 Pertea G, Huang X, Liang F, Antonescu V, Sultana R et al Varshney RK, Sigmunda R, Bo¨rnera A, Korzunb V, Steina N, (2003) TIGR Gene Indices clustering tools (TGICL): a Sorrellsc ME, Langridged P, Graner A (2005a) Interspe- software system for fast clustering of large EST datasets. cific transferability and comparative mapping of barley Bioinformatics 19:651–652 EST–SSR markers in wheat, rye and rice. Plant Sci Pritchard JK, Stephens M, Donnelly P (2000) Inference of 168:195–202 population structure using multilocus genotype data. Varshney RK, Graner A, Sorrells ME (2005b) Genic microsat- Genetics 155:945–959 ellite markers in plants: features and applications. Trends Qiu YX, Zhou XW, Fu CX, Gilbert CYS (2005) A preliminary Biotechnol 23:48–55 study of genetic variation in the endangered, Chinese Wang S, Xie Y (2004) China species red list (vol 1 Red List). endemic species Dysosma versipellis (Berberidaceae). Bot Higher Education Press, Beijing Bull Acad Sin 46:61–69 Wang ZY, Fang BP, Chen JY, Zhang XJ, Luo ZX, Huang LF, Qiu YX, Li JH, Liu HL, Chen YY, Fu CX (2006) Population Chen XL, Li YJ (2010) De novo assembly and character- structure and genetic diversity of Dysosma versipellis ization of root transcriptome using Illumina paired-end (Berberidaceae), a rare endemic from China. Biochem Syst sequencing and development of cSSR markers in sweet- Ecol 34:745–752 potato (Ipomoea batatas). BMC Genom 11:726 Qiu YX, Guan BC, Fu CX, Comes HP (2009) Did glacials and/ Wang ZY, Li J, Luo ZX, Huang LF, Chen XL, Fang BP, Li YJ, or interglacials promote allopatric incipient speciation in Chen JY, Zhang XJ (2011) Characterization and develop- East Asian temperate plants? Phylogeographic and coa- ment of EST-derived SSR markers in cultivated sweetpo- lescent analyses on refugial isolation and divergence in tato (Ipomoea batatas). BMC Plant Biol 11:139 Dysosma versipellis. Mol Phylogenet Evol 51:281–293 Wang HB, Jiang JF, Chen SM, Qi XY, Peng H, Li PR, Song AP, Rajora OP, Mosseler A, Major JE (2000) Indicators of popula- Guan ZY, Fang WM, Liao Y, Chen FD, Chen FD (2013) tion viability in red spruce, Picea rubens. II. Genetic Next-generation sequencing of the Chrysanthemum nan- diversity, population structure, and mating behaviour. Can kingense (Asteraceae) transcriptome permits large-scale J Bot 78:941–956 unigene assembly and SSR marker discovery. PLoS ONE Rousset F (2008) Genepop’007: a complete re-implementation 8:e62293 of the genepop software for Windows and Linux. Mol Ecol Ying TS, Zhang YL, Boufford DE (1993) The endemic genera Resour 8:103–106 of seed plants of China. Science Press, Beijing Roy C, Brown D, Little JE, Valentine BK, Walker PR, Sikorska Zane L, Bargelloni L, Patarnello T (2002) Strategies for M, Leblanc J, Caly N (1992) The topoisomerase II inhibits microsatellite isolation: a review. Mol Ecol 11:1–16 123 Mol Breeding

Zeng SH, Xiao G, Guo J, Fei ZJ, Xu YQ, Roe BA, Wang Y sequencing in two species of Amorphophallus (Araceae). (2010) Development of a EST dataset and characterization BMC Genom 14:490 of EST–SSRs in a traditional Chinese medicinal plant, Zong M, Liu HL, Qiu YX, Yang SZ, Zhao MS, Fu CX (2008) Epimedium sagittatum (Sieb. et Zucc.) Maxim. BMC Ge- Genetic diversity and geographic differentiation in the nom 11:94 threatened species Dysosma pleiantha in China as revealed Zheng XF, Pan C, Diao Y, You YN, Yang CZ, Hu ZL (2013) by ISSR analysis. Biochem Genet 46:180–196 Development of microsatellite markers by transcriptome

123