<<

Journal of Systematics JSE and Evolution doi: 10.1111/jse.12469 Research Article

Comparative and development of expressed sequence tag-simple sequence repeat markers for two closely related oak species

Jing-Jing Sun1, Tao Zhou2, Rui-Ting Zhang1, Yun Jia1, Yue-Mei Zhao3, Jia Yang1, and Gui-Fang Zhao1*

1Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an 710069, China 2School of Pharmary, Xi’an Jiaotong University, Xi’an 710061, China 3College of Biopharmaceutical and Food Engineering, Shangluo University, Shangluo 726000, Shaanxi, China *Author for correspondence. E-mail: [email protected]. Tel.: 86-29-88305264. Fax: 86-29-88303572. Received 1 December 2017; Accepted 7 October 2018; Article first published online11 xx November Month 2018 2018

Abstract Quercus species comprise the major genera in the family Fagaceae and they are widely distributed in the Northern Hemisphere. Many Quercus species, including several endemics, are distributed in China. Genetic resources have been established for the important genera but few transcriptomes are available for Quercus species in China. In this study, we used Illumina paired-end to obtain the transcriptomes of two oak species, Q. liaotungensis Koidz. and Q. mongolica Fisch. ex Turcz. Approximately 24 million reads were generated and then a total of 103 618 unigenes were obtained after assembly for both species. Comparative analyses of both species identified a total of 12 981 orthologous contigs. The Ka/Ks estimation and enrichment analysis indicated that 1179 (9.08%) orthologs showed rapid evolution, and most of these orthologs were related to functions comprising “DNA repair”, “response to cold”, and “response to drought”. This findings could provide some insights into how these two closely related Quercus species adapted to extreme environments characterized by aridity and cold. The divergence time (approximately 4.27–5.93 Mya) between the two Quercus species was estimated according to the Ks distribution. Moreover, 16 608 simple sequence repeat loci were detected and 12 363 primer pairs were designed. Subsequently, 158 of the 12 363 primer pairs were randomly selected to test the polymorphisms and 92 of the primer pairs were successfully amplified in the two oak species. The resultant orthologs and simple sequence repeat markers are valuable for genetic differentiation analyses and evolutionary studies of Quercus. Key words: de novo assembly, genetic differentiation, positive selection, Quercus, RNA sequencing.

1 Introduction Q. mongolica was colder and drier (Yang et al., 2016). Quercus Quercus liaotungensis Koidz. (also known as Q. wutaishanica) mongolica and Q. liaotungensis are very similar species with and Q. mongolica Fisch. ex Ledeb. are dominant tree species in only minor morphological differences, that is, a smaller north China with high ecological and economic value. Quercus number of lateral leaf veins and lobes as well as flat scales on mongolica is a constructive species in northeast China and is the acorn cup in Q. liaotungensis compared with Q. mongolica also a main timber tree species. Its ecological value is reflected (Yun et al., 1998). No previous studies have explained why by good resistance to corrosion and water and soil these two closely related species with their concentrated conservation. Similarly, as a constructive species distributed distribution areas differed in cold resistance and drought in temperate and warm temperate forests, the seeds of resistance. Q. liaotungensis contain starch that can be used for brewing Transcriptome sequencing is a convenient method for and its leaves can be used to feed tussah silkworms (Liu, 2012). rapidly obtaining information about expressed genomic Although Q. liaotungensis and Q. mongolica are mainly regions and for resolving comparative genomic-level prob- distributed in north and northeast China, their concentrated lems related to non-model organisms (Logacheva et al., 2011; distribution areas are quite different. Quercus liaotungensis is Zhang et al., 2013b). Due to the rapid development of mainly distributed in northern China, such as northern next-generation sequencing, RNA sequencing (RNA-Seq) Shaanxi, Shanxi and Hebei Provinces, whereas Q. mongolica has become more efficient and less expensive, and it is is mainly distributed in northeastern China, including Heilong- increasingly used to study the evolutionary origins and jiang, Jilin, parts of Liaoning and eastern Inner Mongolia. ecology of non-model plants (Hudson, 2008; Strickler et al., Compared with the habitat of Q. liaotungensis, the habitat of 2012).

©XXX 2018 2018 Institute | Volume of Botany, 9999 |Chinese Issue 9999Academy | 1– 11of Sciences © 2018 InstituteSeptember of Botany, 2019 | Volume Chinese 57 Academy | Issue 5 of | 440–450 Sciences 2Transcriptomes and Sun EST– etSSR al. markers for Quercus 441

Simple sequence repeats (SSRs) are commonly used for cDNA libraries and for RNA-seq. The cDNA library used for analyzing genetic diversity and evolutionary studies because transcriptome sequencing was prepared using a cDNA of their codominant and highly polymorphic nature (Song Synthesis Kit (Illumina) according to the manufacturer’s et al., 2003; Hao et al., 2006; Ali et al., 2008). The application of instructions. The cDNA library was then sequenced using a next-generation sequencing has allowed the development of HiSeq 2000 (Illumina) to obtain short sequences from both large numbers of molecular markers for non-model species ends of each cDNA. (Teacher et al., 2012; Zalapa et al., 2012). For instance, a large number of microsatellite markers or single-copy nuclear 2.3 De novo assembly and functional annotation of unigenes have been identified using RNA-Seq in Aspidistra saxicola The raw data were filtered to generate clean data by Y. Wan (Huang et al., 2013), Benincasa hispida (Thunb.) Cogn. removing the adapter sequences, reads with unknown bases (Jiang et al., 2013), Dysosma versipellis (Hance) M. Cheng ex comprising greater than 20% (quality value 10), and low- Ying (Guo et al., 2014), Medicago sativa L. (Wang et al., 2014), quality sequences (reads with unknown bases “N”). The and Colocasia esculenta (L). Schott (You et al., 2015). Many clean reads were then assembled using Trinity software with studies have developed SSR markers for the population the default parameters (Grabherr et al., 2011). The predicted and quantitative trait loci studies of genus Quercus protein sequences (open reading frames) were extracted (Isagi & Suhandono, 1997; Ueno et al., 2008; Chatwin et al., using the Perl script Transdecoder in the Trinity program 2014; An et al., 2016). package (Grabherr et al., 2011). All of the assembled In this study, we compared the transcriptomes of Q. unigenes were searched against the NCBI non-redundant liaotungensis and Q. mongolica, and investigated why Q. protein (NR), Clusters of Orthologous Groups (COG), and mongolica is better adapted to the cold and dry environment Kyoto Encyclopedia of Genes and Genomes (KEGG) data- than that of Q. liaotungensis in northeast China. We also bases using the BlastX algorithm with a typical cut-off value undertook pairwise comparisons of orthologous sequences to of E-value <1e-5 was used. Based on the results obtained by identify candidate genes that might be under positive protein database annotation, Blast2GO (Conesa et al., 2005) selection and estimated their divergence time between two was used to obtain Ontology (GO) annotations according oaks. A large number of genic SSR markers were developed to the molecular function, biological process, and cellular from the transcriptome sequences and validated in other component ontologies. Based on sequence homology Quercus species. The transcriptomic resources and genic SSR searches, the unigenes were aligned to the COG database to markers obtained in this study might facilitate further predict and classify possible functions; the KEGG database was evolutionary and genetic differentiation studies of other also used to annotate the pathways for these unigenes with an Quercus species. E-value threshold of 10 5 (Chen et al., 2011).

2.4 Identification of orthologous contigs and estimation of Ka/Ks between orthologous genes 2 Material and Methods The reciprocal best hits (RBH) algorithm was widely used for 2.1 Plant materials defining orthologous genes based on Blast. Usually, a pair of Samples of Quercus liaotungensis and Q. mongolica for genes that belong to two different genomes are recognized transcriptome sequencing were collected from Tongchuan, as orthologs if their sequences are the best hits for each other 0 00 0 00 Shaanxi Province (35°18 36 N, 108°55 12 E) and Changbai (Tatusov et al., 1997; Bork et al., 1998; Moreno-Hagelsieb & 0 00 0 00 Mountain in Liaoning Province (42°2 24 N, 127°46 12 E), Latimer, 2008). Reciprocal BlastN was executed using the China, respectively. Mature seeds were collected and grown unigenes from both species with an E-value cut-off of 1e-5. in the laboratory. Mixtures of fresh leaves from seven or A python script was then used to find the best hits based on eight individuals belonging to both oak species were sent the BlastN results. The predicted coding DNA sequence directly to Biomarker Technologies (Beijing, China) for total regions of Q. liaotungensis and Q. mongolica transcriptomes RNA extraction and high-throughput sequencing. In addi- were then used to identify orthologous groups between tion, fresh leaves of 44 individuals from 11 Quercus species the two species. OrthoMCL version 2.0.9, based on a protein were dried with silica gel for DNA extraction, polymerase similarity graph method, was used to retrieve the groups chain reaction (PCR) amplification, and validation of SSR of homologous protein coding genes with the default markers. Detailed information about the materials is listed in parameters (Li et al., 2003). A threshold of Ks > 0.1 was set Table S1. to avoid identifying paralogs using RBH. The remaining orthologous pairs were categorized into two groups using a 2.2 RNA extraction, cDNA library construction, and Illumina threshold of Ka/Ks ¼ 1. Gene Ontology enrichment analysis sequencing was carried out between the reference dataset (Ka/Ks < 1) and Total RNA was extracted using TRIzol reagent (Invitrogen, the test dataset (Ka/Ks 1) using Fisher’s exact test with Carlsbad, CA, USA) according to the manufacturer’s instruc- a custom R script. The Ka/Ks ratio measures the ratio of tions. Poly-A mRNA was then isolated from total RNA using non-synonymous substitutions (Ka) relative to synonymous poly-T oligo-attached magnetic beads (Illumina, San Diego, substitutions (Ks) between orthologous genes (Zhang et al., CA, USA). The integrity and quantity of the RNA were assessed 2006). using an Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA) and NanoDrop 2000 Spectrophotome- 2.5 Simple sequence repeat mining and primer design ter (Thermo Fisher Scientific, Wilmington, DE, USA). RNA in The MISA program (Thiel et al., 2003) was used to identify equal volumes from each species was used to construct the and localize microsatellite motifs in a mixed unigene library

J.www.jse.ac.cn Syst. Evol. 9999 (9999): 1–11, 2018J. Syst. Evol. 57(5): 440–450, www.jse.ac.cn 2019 442 Transcriptomes and ESTSun–SSR et al. markers for Quercus 3 for the two Quercus species; the criteria for selecting the 3 Results SSRs comprised a minimum of six repeats for dinucleotide 3.1 Illumina paired-end sequencing and sequence assembly motifs, five repeats for trinucleotide motifs, and four A total of 12.21 GB clean data was obtained for two oaks; the repeats for tetra-, penta-, and hexanucleotide motifs. clean data of each sample reaches 6.03 GB. After cleaning the Mononucleotide repeats and complex SSR types were raw reads, ca. 24 million reads were obtained for Quercus ignored. The primers were designed using BatchPrimer3 liaotungensis and Q. mongolica. We obtained 2 568 616 contigs version 1.0 software (You et al., 2008). The parameters for with a mean length of 66.21 and an N50 value of 1180 for designing the primers were as follows: PCR product size Q. liaotungensis, and 2 647 390 contigs with a mean length of range ¼ 100–300 bp; primer length ¼ 18–25 nt; GC content 66.08 and an N50 value of 1189 for Q. mongolica (Table 1). The ¼ 40–70% with 50% as the optimum; and annealing raw reads for Q. liaotungensis and Q. mongolica were temperature ¼ 50–70 °C with 55 °C as the optimum melting deposited in the NCBI Sequence Read Archive under accession temperature. numbers SRR6125586 and SRR6125587, respectively. Little or no difference was found in the contigs and unigenes length 2.6 DNA isolation, PCR amplification, and SSR validation of the two oak species (Fig. 1). Using paired end-joining, gap- Plant DNA was isolated using the dried leaves from 44 filling, and Trinity, the contigs were assembled into scaffolds, individuals belonging to different species with a Plant which were further assembled into unigenes. Finally, we Genomic DNA Kit (TianGen Biotech, Beijing, China). Gel obtained 72 012 and 72 874 unigenes for Q. liaotungensis and Q. electrophoresis was carried out using a 1% agarose gel to mongolica, respectively (Table 2). check the DNA integrity. All of the SSR primers were amplified by using DNA from the 11 species. Polymerase chain reaction 3.2 Functional annotation amplification was carried out in a reaction volume of 10 mL Sequence similarity searches were carried out against the NR, containing 5 mL2 Taq PCR Master Mix, 0.2 mmol/L of each Swiss-Prot, GO, COG, and KEGG databases using the BlastX primer, 1 mL template DNA, and 3.6 mL double-distilled H O. 5 2 algorithm with an E-value threshold of 10 . Among the Amplification was undertaken using a SimpliAmp Thermal annotated unigene sequences, 18 549 unigenes had lengths Cycler (Applied Biosystems, Carlsbad, CA, USA) as follows: >1000 bp. In total, 36 371 unigenes were annotated for both denaturation at 94 °C for 5 min, followed by 31 cycles at 94 °C species in the NR database, where the three top-hit species for for 50 s, the specific annealing temperature (Tm) for 30 s, and NR annotations were Vitis vinifera L. (14.31% in Q. liaotungensis/ 72 °C for 40 s, with a final extension at 72 °C for 5 min. The PCR 14.68% in Q. mongolica), Theobroma cacao L. (9.65%/10.06%), products were resolved on 10% denaturing polyacrylamide and Prunus mume (Siebold) Siebold & Zucc. (8.93%/9.00%) gels and stained using the silver-staining protocol. The sizes of (Fig. 2). the DNA bands were determined with a PBR322 marker ladder Based on the NR annotations, GO terms were assigned to (TianGen Biotech) and the alleles were scored using Quantity 15 374 annotated sequences from Q. liaotungensis and 7746 One Software version 4.6.2 (Bio-Rad Laboratories, Hercules, annotated sequences from Q. mongolica. The GO annotated CA, USA). unigenes clustered into three ontologies comprising cellular component, molecular function, and biological process. There 2.7 Genetic differentiation analysis and data scoring was a high degree of consistency in the GO analysis results for Genetic analyses were carried out for polymorphic loci to the two species (Fig. 3). In the cellular category, “cell” was the calculate the parameters including the number of alleles dominant group, followed by “cell part” and “organelle”, and (Na), effective number of alleles per locus (Ne), Shannon’s “catalytic activity” and “binding” were overrepresented in the information index (I), observed heterozygosity (H ), and O molecular function category. “Metabolic process” and expected heterozygosity (He) by using GenAlEx 6.501 “cellular process” were most frequent in the biological (Peakall & Smouse, 2012). CERVUS version 3.0.7 (Kalinowski process category. Furthermore, all of the unigenes were et al., 2007) was used to calculate the polymorphic subjected to searches against the COG database to obtain information content (PIC) value for each SSR primer. functional predictions and classifications. We annotated 7075 Arlequin version 3.5 (Excoffier & Lischer, 2010) were used and 6984 unigenes according to 25 categories in Q. to test the Hardy–Weinberg equilibrium and linkage liaotungensis and Q. mongolica, respectively (Fig. 4). The disequilibrium (LD) for all loci. The genetic structure of main annotation category for the two oak species was the all the populations was estimated using a clustering method based on a Bayesian model with the software package structure (Pritchard et al., 2007). The dataset without prior population information was analyzed using an admixture model and based on independent allelic Table 1 Summary of the assembly results obtained for frequencies (Gao et al., 2016). In total, 20 independent Quercus liaotungensis and Q. mongolica simulations were run for K ¼ 2to13with1 105 burn-in Q. liaotungensis Q. mongolica steps followed by 5 105 Monte Carlo Markov chain steps. Total number of reads 23 927 717 24 559 095 The structure harvester program (Pritchard & Wen, 2004) GC content 44.43% 44.27% was used to evaluate the most likely number (K) of genetic Q30 bases 85.63% 85.59% clusters using the delta K criterion (Earl & Vonholdt, Total number of contigs 2 568 616 2 647 390 2012), in which the inferred clusters were drawn as colored Mean length of contigs 66.21 66.08 box-plots using the DISTRUCT bio-software (Rosenberg, N50 of contigs 1180 1189 2004). www.jse.ac.cnJ. Syst. Evol. 57(5): 440–450, 2019 J. Syst. Evol. 9999 (9999):www.jse.ac.cn 1–11, 2018 4Transcriptomes and Sun EST– etSSR al. markers for Quercus 443

of amino acids (249 genes/256 genes, ko01230) in both species (Fig. 5).

3.3 Orthologous contigs, substitution rates, and molecular divergence time between Q. liaotungensis and Q. mongolica We identified 12 981 pairs of orthologous unigenes in Q. liaotungensis and Q. mongolica using the OrthoMCL method. When an appropriate threshold of 1 for the Ka/Ks ratio was considered as an indicator of strong positive selection, we found 1179 (9.08%) pairs with Ka/Ks values >1 and 8312 (64.03%) pairs with a Ka/Ks of 0.5–1, thereby indicating weak purifying selection (Fig. 6). We estimated the divergence time between Q. liaotungensis and Q. mongolica according to the peak of Ks distribution. In this study, a peak in the Ks distribution between the two species was observed at 0.021344 0.01958 (Fig. 7). We obtained a rough estimate of the divergence time (T) between the two oak species according to the simple formula: T ¼ K/2r, where r was the synonymous substitution rate, which was considered to be 1.8–2.5 10 9 substitutions/synonymous site/year, and K was the genetic divergence expressed in terms of the mean number of synonymous substitutions between orthologs (K value was considered as 0.021344 when ignore the error). We estimated the period for molecular divergence time between Q. liaotungensis and Q. mongolica was 4.27–5.93 Mya, which ranged from the late Miocene to the middle Pliocene.

3.4 Functional genes under positive selection and implications for adaptive evolution of two oak species Fig. 1. Length distribution of assembled contigs and unigenes The functions of the homologous genes in the pairs of in two oak species, Quercus liaotungensis and Q. mongolica. A, orthologs where Ka/Ks >1 were mainly related to DNA repair, Frequency distribution of the contig sizes from two Quercus response to cold and drought, and stress resistance species. B, Size distribution of unigenes from two Quercus (Table S2). The levels were measured for species. the partially positively selected genes and fragments per kilobase of transcript per million mapped reads (FPKM) values were generated by mapping the fragments for Q. liaotungen- “general function prediction only” cluster, followed by “post- sis and Q. mongolica (Table S2). Based on comparative translational modification, protein turnover, chaperones” and analyses using FPKM values, the orthologs that had under- “translation, ribosomal structure, and biogenesis.” No genes gone positive selection provided insights into why Q. were found in the extracellular structures or nuclear mongolica adapted to colder and more arid habitats than Q. structures category in either species. liaotungensis (e.g., qm15026, qm20708, and qm16912). All of the unigenes were analyzed based on the KEGG pathway database to further assess the transcriptomes of the 3.5 Development and characterization of expressed two oak species. The results showed that 8439 unigenes from sequence tag (EST)–SSR markers Q. liaotungensis mapped to 128 pathways and 8442 unigenes In order to further evaluate the validity of markers, 16 608 from Q. mongolica mapped to 128 pathways. Interestingly, potential SSRs were identified in 10 744 sequences. Among the representative pathways were ribosome (276 genes in Q. these 10 744 sequences, 6761 and 3983 sequences contained liaotungensis/254 genes in Q. mongolica, ko03010), carbon one and more than one SSR, respectively (Table 3). The metabolism (266 genes/277 genes, ko01200), and biosynthesis most common repeat type was dinucleotide, followed by

Table 2 Summary of the unigenes identified in two oak species, Quercus liaotungensis and Q. mongolica Q. liaotungensis Q. mongolica All Total number of unigenes 72 012 72 874 103 618 Total length of unigenes 50 069 442 51 358 327 76 582 089 N50 of unigenes 1 180 1 189 1 307 Mean length of unigenes 695.29 704.76 739.08

J.www.jse.ac.cn Syst. Evol. 9999 (9999): 1–11, 2018J. Syst. Evol. 57(5): 440–450, www.jse.ac.cn 2019 444 Transcriptomes and ESTSun–SSR et al. markers for Quercus 5

Fig. 2. Similarity of Quercus liaotungensis and Q. mongolica sequences with those of other species. trinucleotide, tetranucleotide, pentanucleotide, and hexanu- due to insertion or deletion mutations in the amplified region. cleotide (Table 3). Based on the SSRs, 12 363 primer pairs could In addition, six primer pairs were discarded due to introns or be designed using BatchPrimer3 version 1.0. Among the 12 363 primer dimerization and the remaining five primer pairs primer pairs, 158 were randomly selected for validation using obtained no PCR products. Therefore, the EST–SSR markers DNA from the six samples from each of the two species and 92 developed from Q. liaotungensis and Q. mongolica could be were successfully amplified by PCR. The remaining 66 primer applied successfully to the other nine species, with a pairs failed to generate PCR products. The 92 successful transferability rate of 83.70% (77/92). primer pairs in Q. liaotungensis and Q. mongolica were tested in the other nine species. Of the 92 working primer pairs, 77 3.6 Genetic differentiations and relatedness in the genus yielded PCR products with the expected fragment size, three Quercus primer pairs amplified fragments larger than the expected We selected all the 77 primer pairs; the details of these primers size, and one primer pair amplified a shorter than expected are given in Table S3. Among the 77 SSR pairs, seven primer fragment. These larger or shorter fragments were probably pairs (L4/L15/L23/L42/L58/L63/L65) amplified PCR products

Fig. 3. Comparison of Gene Ontology term distributions in two oak species, Quercus liaotungensis and Q. mongolica. www.jse.ac.cnJ. Syst. Evol. 57(5): 440–450, 2019 J. Syst. Evol. 9999 (9999):www.jse.ac.cn 1–11, 2018 6Transcriptomes and Sun EST– etSSR al. markers for Quercus 445

Fig. 4. Clusters of Orthologous Groups classifications obtained for the Quercus liaotungensis and Q. mongolica transcriptomes. and they detected polymorphic fragments in all 11 Quercus subdivisions could be obtained in the genus. Figure 8 shows species. In order to evaluate the genetic differentiations of that there was only one group when using K ¼ 11 between Quercus, 44 individuals (Table S1) from 11 different oak species species YN and HS. were collected and analyzed using 77 primer pairs. Based on the 77 primer pairs, Na per locus ranged from one to six with an average of 3.29 and Ne varied from 1.47 to 5.14. He varied 4 Discussion from 0.20 to 0.76 and HO ranged from 0.02 to 0.89. I ranged from 0.32 to 1.63 with an average of 0.99; PIC ranged from 0.61 4.1 Illumina paired-end sequencing to 0.95, except that for L59 was 0.49 with an average of 0.83, Transcriptome sequencing is an important tool for identifying thereby suggesting that the developed EST–SSRs were highly expression patterns and gene discovery (Wei et al., 2011). We polymorphic (Table S3). undertook a comprehensive study of the de novo assembly and The structure analysis using the DK method showed that characterization of the transcriptomes of Quercus liaotungensis the optimal K value was K ¼ 4 (Fig. 8), which indicated that the and Q. mongolica using next-generation sequencing technology. 11 species were clustered into four groups (Fig. 9). Species YN Similar numbers of unigene sequences were generated for both (Q. yunnanensis Franch.) and HS (Q. dentata Thunb.) were species after assembly, partly because the same tissues were grouped into one cluster, DB (Q. serrata var. brevipetiolata (A. collected from both species for sequencing. A large number of DC.) Nakai), DY (Q. griffithii Hook. f. & Thomson ex Miq.), and Q. liaotungensis and Q. mongolica transcriptomic unigenes (72 012 BAI (Q. fabri Hance) were grouped into another cluster, HL (Q. in Q. liaotungensis/72 874 in Q. mongolica)wereobtainedusingthe aliena Blume) and RC (Q. aliena var. acutiserrata Maxim. ex Illumina HiSeq 2000 platform (Table 2). Generally, the assembly Wenz.) grouped into a third cluster, and LD (Q. liaotungensis) result of the non-reference transcriptome is assessed by the N50 and MG (Q. mongolica) grouped into a fourth cluster. Species value of the unigene; an N50 length greater than 1000 bp indicates BL (Q. serrata Thunb.) and HSL (Q. stewardii Rehd.) showed interspecific hybridization. The results obtained using K ¼ 11 were also determined in order to detect whether further

Fig. 5. Kyoto Encyclopedia of Genes and Genomes classifica- tion obtained for the Quercus liaotungensis and Q. mongolica Fig. 6. Distribution of Ka and Ks for 12 981 pairs of orthologs in transcriptomes. Quercus liaotungensis and Q. mongolica.

J.www.jse.ac.cn Syst. Evol. 9999 (9999): 1–11, 2018J. Syst. Evol. 57(5): 440–450, www.jse.ac.cn 2019 446 Transcriptomes and ESTSun–SSR et al. markers for Quercus 7

Fig. 7. Ks distribution for orthologs in two oak species, Quercus liaotungensis and Q. mongolica. SD, standard Fig. 8. Bayesian inference analysis of microsatellite data to deviation. determine the most likely number of clusters (K) for 11 Quercus species. a good assembly quality for the transcriptome. In our study, theN50valueofQ. liaotungensis was 1180 and Q. mongolica was associated with ribosome, carbon metabolism, and biosyn- 1189(Table2).Theseresultswerecomparabletothoseobtained thesis of amino acids. Thus, the KEGG annotations were in previous transcriptomic analyses of other Quercus species, mainly enriched for metabolism, possibly because the leaves such as the 49 845 unigenes found in Q. austrocochinchinensis were used as materials for the RNA-Seq analysis. Hickel & A. Camus and 50 767 unigenes in Q. kerrii Craib (An et al., 2016). In a previous study, 95 800 unigenes were obtained for Q. liaotungensis with only 3.8 GB of clean data 4.2 Molecular divergence time between Q. liaotungensis and (at least 6.03 GB in our study) (Liu et al., 2014). This Q. mongolica discrepancy could be explained by differences in the Peaks in the Ks value distribution of orthologs between closely sequencing locations and the lengths of the unigenes. related species often indicate speciation events (Wang & Hey, Among the 36 881 unigenes with Blast matches in the NR, 2010) and this approach has been used successfully for GO, COG, and KEGG databases, 18549 unigenes were over 1000 bp. The remaining unigenes could not be functionally annotated because they had no Blast hits in the databases or they were matched to unknown proteins. The GO annotation results showed that many unigenes were involved with biological processes such as catalytic activity and metabolic process, suggesting the presence of numer- ous enzymes involved with primary and secondary metabo- lism. The KEGG predictions identified many unigenes

Table 3 Summary of the simple sequence repeats (SSRs) identified in the transcriptomes of two oaks Searching Item Number Total number of sequences examined 21 219 Total size of examined sequences (bp) 44 035 869 Total number of identified SSRs 16 608 Number of SSR-containing sequences 10 744 Number of sequences containing more than one 3983 SSR Number of SSRs present in compound 1565 Fig. 9. Estimated genetic clusters obtained with the structure formation program for 11 Quercus species based on simple sequence Dinucleotides 5573 repeat data (K ¼ 4 and 11). Black lines separate different Trinucleotides 3133 populations. Species: BAI, Q. fabri; BL, Q. serrata; DB, Q. serrata Tetranucleotides 177 var. brevipetiolata; DY, Q. griffithii; HL, Q. aliena; HS, Q. Pentanucleotides 34 dentata; HSL, Q. stewardii; LD, Q. liaotungensis; MG, Q. Hexanucleotides 29 mongolica; RC, Q. aliena var. acuteserrata; YN, Q. yunnanensis. www.jse.ac.cnJ. Syst. Evol. 57(5): 440–450, 2019 J. Syst. Evol. 9999 (9999):www.jse.ac.cn 1–11, 2018 8Transcriptomes and Sun EST– etSSR al. markers for Quercus 447 inferring these events (Blanc & Wolfe, 2004). According to the sources of marker systems for genetic mapping, molecular simple formula: T ¼ K/2r (Lloyd, 2000), the mean rate of breeding, gene mapping, comparative , and popula- synonymous substitutions in our study was considered to be tion genetic analyses in various species (Senior, 1998; 1.8–2.5 10 9 substitutions/synonymous site/year for oaks Chapman et al., 2009; Bushman et al., 2011). Transcriptome (Cavender-Bares & Deacon, 2011). The divergence interval data are an excellent resource for microsatellite mining and from the late Miocene to middle Pliocene (4.27–5.93 Mya) SSR marker development, and they have been utilized in many determined in our study was highly consistent with the species (Conesa et al., 2005; Wang et al., 2010; Dutta et al., potential divergence ranges based on chloroplast DNA and 2011; Koivusalo, 2011; Zhang et al., 2012). In this study, 16 608 nuclear genes (Yang et al., 2016), as well as the inferred SSRs were detected in 10 744 unigene sequences and the divergence ranges among the infragroup of section Quercus percentage of SSRs in sequences was higher than that in using multiple nuclear genes (Hubert et al., 2014), and the Cajanus cajan (L.) Millsp. (Dutta et al., 2011), Apium graveolens earliest and unequivocal fossils of white oaks discovered in L. (Fu et al., 2013), and Aspidistra saxicola (Huang et al., 2013). north China (during the middle to late Pliocene) (Zhou, 1993). This difference in the frequency of SSRs might be explained mainly by the number of contigs and the parameters used in 4.3 Orthologs under positive selection in two Quercus the tools used for searching for microsatellite loci (Zhou et al., species 2016). Dinucleotide repeats were the most abundant repeat Rapidly evolved genes can be identified by assessing the Ka/ type in this study, which was consistent with previous studies Ks substitution ratio in orthologous genes with protein-coding of species such as Sesamum indicum L. (Wei et al., 2011), functions (Miyata et al., 1979; Ellegren, 2008). Paired genes Ipomoea batatas (L.) Lam. (Wang et al., 2010), and pigeon pea with Ka/Ks >1 indicates that genes have undergone rapid (Dutta et al., 2011). Morgante et al. (2002) has suggested that evolution favored by natural selection (Zhang et al., 2013a). In the dominance of dinucleotide repeats detected here may be our study, 1179 (9.08%) pairs of orthologs with Ka/Ks >1 had caused by an over-representation of untranslated regions undergone rapid evolution with signs of strong positive compared with open reading frames. However, in some other selection. The functions of the homologous genes with Ka/ studies, the most abundant class of SSRs was trinucleotide, Ks >1 were mainly related to “DNA repair”, “response to for example, in Glycine max (L.) Merr. (Hiroshi et al., 2008), drought”, and “response to cold”. DNA repair was essential Polygala tatarinowii Regel (Kaur et al., 2011), and Citrus sinensis for maintaining genomic stability in all organisms. Thus, in our Osbeck (Chen et al., 2006). study, we identified qm9877 and qm20659, which were involved with DNA helicase and DNA excision repair. Some 4.5 Polymorphic SSR markers and genetic differentiations orthologs related to abiotic stress were also positively Genetic markers based on transcriptome sequences are selected. For example, qm24806 was related to the activities effective for studying the population structure, diversity, of genes with roles in stress and resistance (Guo et al., 2016), and species divergence (Chen et al., 2015). In this study, we and qm15026 was involved with seed dormancy and designed 12 363 SSR markers and 158 primer pairs were germination in multiple plant developmental stages and randomly selected to evaluate the validity of these markers in several abiotic stress responses (e.g., drought and high Quercus, where 92 (58.23%) primer pairs yielded clear salinity) (Shu & Yang, 2017). In addition, four of the positively fragments in the two Quercus species and 77 (48.73%) in 11 selected genes were closely related to abiotic stress Quercus species. The unsuccessful amplification of 15 primer resistance: qm26163 (Zhou et al., 2015), qm15102 (Tan et al., pairs might have been due to the presence of introns, large 2014), qm4239 (Sadhukhan et al., 2017), and qm17170 (Wilder insertions or repeat number variations, lack of specificity, or et al., 2009). Moreover, we found that some of the pairs of assembly errors (Wei et al., 2011). According to a previous orthologs have higher expression levels in response to study, a PIC value greater than 0.5 generally indicates a highly drought, including qm9540 (Sinha et al., 2016), qm3901 polymorphic state (Yadav et al., 2011). In our study, the 76 (Scharf et al., 2012), qm27995 (Sun et al., 2016), qm20708 (Luo selected primers had PIC values greater than 0.5, thus the et al., 2006), qm8971 (Giarola et al., 2015), and qm13577 polymorphic SSR markers obtained in this study might be (Padma et al., 2016). In particular, one positively selected gene highly effective in marker-assisted Quercus genetic studies. (qm16912), which is related to the response to cold (Chiok The genetic structure determined for Quercus based on the et al., 2013), showed higher expression levels in Q. mongolica EST–SSR markers showed that the 11 species were divided into compared with Q. liaotungensis. This could provide an insight four groups with the best K ¼ 4. When K ¼ 4, we found that LD into the origin of the differences between Q. mongolica and Q. (Q. liaotungensis) and MG (Q. mongolica) grouped into one liaotungensis. These findings might clarify the occurrence of cluster and showed remarkable genetic differentiation with the genetic differentiation between two Quercus species as other clades. This result suggests genetic differentiation well as increasing our understanding of how plants in existed between the two oaks and their closely related northern environments might adapt to different characteristic species. Furthermore, the results obtained using K ¼ 11 stresses, such as cold and drought. showed large genetic differentiation between Q. liaotungensis and Q. mongolica. We also found considerable genetic 4.4 Genic SSR distribution and frequency in Quercus variations among other siblings of Q. liaotungensis and Q. transcriptome mongolica based on the selected EST–SSR markers and limited Simple sequence repeats are widely used in genetic studies samples. The structure analyses support the versatility of the because they are reliable and cost-efficient molecular markers EST–SSR markers derived from the transcriptome, and (Zalapa et al., 2012). Simple sequence repeats are typically co- suggest the potential effect of the markers used for genetic dominant and highly polymorphic, and they are common analyses in oak species. Further genetic study with a larger

J.www.jse.ac.cn Syst. Evol. 9999 (9999): 1–11, 2018J. Syst. Evol. 57(5): 440–450, www.jse.ac.cn 2019 448 Transcriptomes and ESTSun–SSR et al. markers for Quercus 9 number of individuals from different populations are required (Carthamus tinctorius L.). Theoretical & Applied Genetics 120: to test the genetic relationships among closely related oaks 85–91. based on the EST–SSR markers developed in this study. Chatwin WB, Carpenter KK, Jimenez FR, Elzinga DB, Johnson LA, In this study, RNA-Seq and de novo transcriptome assembly Maughan PJ. 2014. Microsatellite primer development for post were carried out for two closely related species, Q. oak, Quercus stellata (Fagaceae). Applications in Plant Sciences 2: liaotungensis and Q. mongolica. A total of 103 618 unigenes 481–490. were obtained after assembly. The unigenes identified in the Chen C, Zhou P, Choi YA, Huang S, Gmitter FG Jr. 2006. Mining and two species were functionally annotated based on the NR, characterizing microsatellites from citrus ESTs. Theoretical & COG, GO, and KEGG databases. The speciation event between Applied Genetics 112: 1248–1257. the two oaks occurred approximately 4.27–5.93 Mya during Chen S, Zhou R, Huang Y, Zhang M, Yang G, Zhong C, Shi S. 2011. the late Miocene to the middle Pliocene. The estimation of Ka/ Transcriptome sequencing of a highly salt tolerant mangrove species Ks and subsequent enrichment analysis identified 22 genes, Sonneratia alba using Illumina platform. Marine Genomics 4: 129. most of which were related to DNA repair and the response to Chen W, Liu YX, Jiang GF. 2015. De novo assembly and characterization abiotic stresses such as cold and drought. In addition, 16 608 of the testis transcriptome and development of EST-SSR markers microsatellite loci were detected and 12 363 primer pairs were in the cockroach Periplaneta americana. Scientific Reports 5: 11144. designed. We selected 158 primers for validation in 11 Quercus Chiok KL, Addwebi T, Guard J, Shah DH. 2013. Dimethyl adenosine species, which yielded 77 SSR polymorphic microsatellite transferase (KsgA) deficiency in Salmonella enterica serovar markers. This large amount of information related to the Enteritidis confers susceptibility to high osmolarity and virulence transcriptomes and SSRs could be valuable for genetic attenuation in chickens. Applied & Environmental Microbiology 79: – differentiations and evolutionary studies in Quercus species. 7857 7866. Conesa A, Terol J, Robles M. 2005. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics Acknowledgements research. 21: 3674–3676. Dutta S, Kumawat G, Singh BP, Gupta DK, Singh S, Dogra V, Gaikwad K, fi This study was nancially supported by the National Natural Sharma TR, Raje RS, Bandhopadhya TK. 2011. Development Science Foundation of China (31770229). of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh]. BMC Plant Biology 11: 17. Earl DA, Vonholdt BM. 2012. STRUCTURE HARVESTER: A website and Data Archiving Statement program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources 4: 359–361. The sequence data obtained in this study were deposited in the NCBI Sequence Read Archive under accession numbers Ellegren H. 2008. Comparative genomics and the study of evolution by – SRR6125586 and SRR6125587. natural selection. Molecular Ecology 17: 4586 4596. Excoffier L, Lischer HEL. 2010. Arlequin suite version 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources 10: 564–567. References Fu N, Wang Q, Shen HL. 2013. De novo assembly, gene annotation and Ali ML, Rajewski JF, Baenziger PS, Gill KS, Eskridge KM, Dweikat I. marker development using Illumina paired-end transcriptome 2008. Assessment of genetic diversity and relationship among a sequences in celery (Apium graveolens L.). PLoS One 8: e57686. collection of US sweet sorghum germplasm by SSR markers. Gao T, Han Z, Zhang X, Luo J, Yanagimoto T, Zhang H. 2016. Population – Molecular Breeding 21: 497 509. genetic differentiation of the black rockfish Sebastes schlegelii An M, Deng M, Zheng SS, Song YG. 2016. De novo transcriptome revealed by microsatellites. Biochemical Systematics & Ecology 68: assembly and development of SSR markers of oaks Quercus 170–177. austrocochinchinensis and Q. kerrii (Fagaceae). Tree Genetics & Giarola V, Krey S, Frerichs A, Bartels D. 2015. Taxonomically restricted Genomes 12: 103. genes of Craterostigma plantagineum are modulated in their Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant expression during dehydration and rehydration. Planta 241: species inferred from age distributions of duplicate genes. Plant 193–208. – Cell 16: 1667 1678. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Bork P, Dandekar T, Diazlazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Adiconis X, Fan L, Raychowdhury R, Zeng Q. 2011. Full-length 1998. Predicting function: From genes to genomes and back. transcriptome assembly from RNA-Seq data without a reference Journal of Molecular Biology 283: 707–725. genome. Nature Biotechnology 29: 644. Bushman BS, Larson SR, Tuna M, West MS, Hernandez AG, Vullaganti Guo DL, Xi FF, Yu YH, Zhang XY, Zhang GH, Zhong GY. 2016. D, Gong G, Robins JG, Jensen KB, Thimmapuram J. 2011. Comparative RNA-Seq profiling of berry development between Orchardgrass (Dactylis glomerata L.) EST and SSR marker table grape ‘Kyoho’ and its early-ripening mutant ‘Fengzao’. BMC development, annotation, and transferability. Theoretical & Genomics 17: 795. – Applied Genetics 123: 119 129. Guo R, Mao YR, Cai JR, Wang JY, Wu J, Qiu YX. 2014. Characterization Cavender-Bares J, Deacon N. 2011. Phylogeography and climatic niche and cross-species transferability of EST-SSR markers developed evolution in live oaks (Quercus series Virentes) from the tropics to from the transcriptome of Dysosma versipellis (Berberidaceae) the temperate zone. Journal of Biogeography 38: 962–981. and their application to population genetic studies. Molecular – Chapman MA, Hvala J, Strever J, Matvienko M, Kozik A, Michelmore Breeding 34: 1733 1746. RW, Tang S, Knapp SJ, Burke JM. 2009. Development, polymor- Hao CY, Zhang XY, Wang LF, Dong YS, Shang XW, Jia JZ. 2006. Genetic phism, and cross-taxon utility of EST-SSR markers from safflower diversity and core collection evaluations in common wheat www.jse.ac.cnJ. Syst. Evol. 57(5): 440–450, 2019 J. Syst. Evol. 9999 (9999):www.jse.ac.cn 1–11, 2018 10Transcriptomes and SunEST– etSSR al. markers for Quercus 449

germplasm from the northwestern spring wheat region in China. Padma N, Tomason YR, Abburi VL, Alejandra A, Thangasamy S, Vajja Molecular Breeding 17: 69–77. VG, Germania S, Panicker GK, Amnon L, Wechter WP. 2016. Hiroshi H, Shusei S, Sachiko I, Shigemi S, Tsuyuko W, Ai M, Tsunakazu Genome-wide differentiation of various melon horticultural fi F, Manabu Y, Shinobu N, Yasukazu N. 2008. Characterization of groups for use in GWAS for fruit rmness and construction of a the soybean genome using EST-derived microsatellite markers. high resolution genetic map. Frontiers in Plant Science 7: 1437. DNA Research 14: 271–281. Peakall R, Smouse PE. 2012. GenAlEx 6.5. Bioinformatics 28: 2537–2539. Huang D, Zhang Y, Jin M, Li H, Song Z, Wang Y, Chen J. 2013. Pritchard JK, Stephens M, Donnelly P. 2007. Inference of population Characterization and high cross-species transferability of micro- structure using multilocus genotype data. Molecular Ecology satellite markers from the floral transcriptome of Aspidistra Resources 7: 574–578. – saxicola (Asparagaceae). Molecular Ecology Resources 14: 569 577. Pritchard JK, Wen W. 2004. Documentation for STRUCTURE software. Hubert F, Grimm GW, Jousselin E, Berry V. 2014. Multiple nuclear genes Chicago: The University of Chicago Press. stabilize the phylogenetic backbone of the genus Quercus. Rosenberg NA. 2004. Distruct: A program for the graphical display of – Systematics & Biodiversity 12: 405 423. population structure. Molecular Ecology Resources 4: 137–138. Hudson ME. 2008. Sequencing breakthroughs for genomic ecology Sadhukhan A, Kobayashi Y, Nakano Y, Iuchi S, Kobayashi M, Sahoo L, – and evolutionary biology. Molecular Ecology Resources 8: 3 17. Koyama H. 2017. Genome-wide association study reveals that the Isagi Y, Suhandono S. 1997. PCR primers amplifying microsatellite loci aquaporin NIP1;1 contributes to variation in hydrogen peroxide of Quercus myrsinifolia Blume and their conservation between sensitivity in Arabidopsis thaliana. Molecular Plant 10: 1082–1094. oak species. Molecular Ecology 6: 897. Scharf KD, Berberich T, Ebersberger I, Nover L. 2012. The plant heat Jiang B, Xie D, Liu W, Peng Q, He X. 2013. De novo assembly and stress factor (Hsf) family: Structure, function and characterization of the transcriptome, and development of SSR evolution. Biochimica et Biophysica Acta 1819: 104–119. markers in wax gourd (Benicasa hispida). PLoS One 8: e71054. Senior ML. 1998. Utility of SSRs for determining genetic similarities an Kalinowski ST, Taper ML, Marshall TC. 2007. Revising how the relationships in maize using an agarose gel system. Crop Science computer program CERVUS accommodates genotyping error 38: 1088–1098. increases success in paternity assignment. Molecular Ecology 16: Shu K, Yang W. 2017. E3 ubiquitin ligases: Ubiquitous actors in plant – 1099 1106. development and abiotic stress responses. Plant & Cell Physiology KaurS,CoganNO,PembletonLW,ShinozukaM,SavinKW, 58: 1461–1476. MaterneM,ForsterJW.2011.Transcriptomesequencingof Sinha P, Pazhamala LT, Singh VK, Saxena RK, Krishnamurthy L, Azam S, lentil based on second-generation technology permits large- Khan AW, Varshney RK. 2016. Identification and validation of scale unigene assembly and SSR marker discovery. BMC selected universal stress protein domain containing drought- Genomics 12: 265. responsive genes in pigeonpea (Cajanus cajan L.). Frontiers in Plant Koivusalo M. 2011. Characterization of transcriptome dynamics during Science 6: 1065. watermelon fruit development: Sequencing, assembly, annota- Song ZP, Xu X, Wang B, Chen JK, Lu BR. 2003. Genetic diversity in the fi tion and gene expression pro les. BMC Genomics 12: 454. northernmost Oryza rufipogon populations estimated by SSR Li L, Stoeckert CJ Jr, Roos DS. 2003. OrthoMCL: Identification of markers. Theoretical and Applied Genetics 107: 1492. ortholog groups for eukaryotic genomes. Genome Research 13: Strickler SR, Bombarely A, Mueller LA. 2012. Designing a transcriptome – 2178 2189. next-generation sequencing project for a nonmodel plant species. Liu M. 2012. Research on genetic evolution relationships of Quercus American Journal of Botany 99: 257–266. mongolica and Quercus wutaishannica. Ph.D. Dissertation. Sun X, Sun C, Li Z, Hu Q, Han L, Luo H. 2016. AsHSP17, a creeping Heilongjiang: Northeast Forestry University. bentgrass small heat shock protein modulates plant photosyn- Liu Y, Li W, Zhang Z. 2014. Transcriptome analysis for Quercus thesis and ABA-dependent and independent signalling to liaotungensis Koidz. based on high-throughput sequencing attenuate plant response to abiotic stress. Plant Cell & technology. Biotechnology Bulletin 7: 119–124. Environment 39: 1320. Lloyd A. 2000. Fundamentals of molecular evolution. Briefings in Tan X, Yan S, Tan R, Zhang Z, Wang Z, Chen J. 2014. Characterization Bioinformatics 1: 202–204. and expression of a GDSL-like lipase gene from Brassica napus in – Logacheva MD, Kasianov AS, Vinogradov DV, Samigullin TH, Gelfand Nicotiana benthamiana. Protein Journal 33: 18 23. MS, Makeev VJ, Penin AA. 2011. De novo sequencing and Tatusov RL, Koonin EV, Lipman DJ. 1997. A genomic perspective on characterization of floral transcriptome in two species of protein families. Science 278: 631. buckwheat (Fagopyrum). BMC Genomics 12: 30. Teacher AGF, Kahk€ onen€ K, Meril€a J. 2012. Development of 61 new Luo J, Shen G, Yan J, He C, Zhang H. 2006. AtCHIP functions as an E3 transcriptome-derived microsatellites for the Atlantic herring ubiquitin ligase of protein phosphatase 2A subunits and alters (Clupea harengus). Conservation Genetics Resources 4: 71–74. plant response to abscisic acid treatment. Plant Journal 46: Thiel T, Michalek W, Varshney RK, Graner A. 2003. Exploiting EST – 649 657. databases for the development and characterization of gene- Miyata T, Miyazawa S, Yasunaga T. 1979. Two types of amino acid derived SSR-markers in barley (Hordeum vulgare L.). Theoretical & substitutions in protein evolution. Journal of Molecular Evolution Applied Genetics 106: 411–422. – 12: 219 236. Ueno S, Taguchi Y, Tsumura Y. 2008. Microsatellite markers derived Moreno-Hagelsieb G, Latimer K. 2008. Choosing Blast options for from Quercus mongolica var. crispula (Fagaceae) inner bark better detection of orthologs as reciprocal best hits. Bioinformat- expressed sequence tags. Genes & Genetic Systems 83: 179. – ics 24: 319 324. Wang Y, Hey J. 2010. Estimating divergence parameters with small Morgante M, Hanafey M, Powell W. 2002. Microsatellites are samples from a large number of loci. Genetics 184: 363–379. preferentially associated with nonrepetitive DNA in plant Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y. 2010. – genomes. Nature Genetics 30: 194 200. De novo assembly and characterization of root transcriptome

J.www.jse.ac.cn Syst. Evol. 9999 (9999): 1–11, 2018J. Syst. Evol. 57(5): 440–450, www.jse.ac.cn 2019 450 Transcriptomes and ESTSun–SSR et al. markers for Quercus 11

using Illumina paired-end sequencing and development of cSSR Zhang J, Shan L, Duan J, Jin W, Chen S, Cheng Z, Qiang Z, Liang X, Li markers in sweetpotato (Ipomoea batatas). BMC Genomics 11: Y. 2012. De novo assembly and characterisation of the 726. transcriptome during seed development, and generation of Wang Z, Yu G, Shi B, Wang X, Qiang H, Gao H. 2014. Development and genic-SSR markers in peanut (Arachis hypogaea L.). BMC characterization of simple sequence repeat (SSR) markers based Genomics 13: 90. on RNA-sequencing of Medicago sativa and in silico mapping onto Zhang J, Xie P, Lascoux M, Meagher TR, Liu J. 2013a. Rapidly evolving the M. truncatula genome. PLoS One 9: e92029. genes and stress adaptation of two desert poplars, Populus Wei W, Qi X, Wang L, Zhang Y, Hua W, Li D, Lv H, Zhang X. 2011. euphratica and P. pruinosa. PLoS One 8: e66370. Characterization of the sesame (Sesamum indicum L.) global Zhang L, Yan HF, Wu W, Yu H, Ge XJ. 2013b. Comparative transcriptome using Illumina paired-end sequencing and devel- transcriptome analysis and marker development of two closely opment of EST-SSR markers. BMC Genomics 12: 451. related Primrose species (Primula poissonii and Primula wilsonii). Wilder VV, Brouwer VD, Loizeau K, Gambonnet B, Albrieux C, Straeten BMC Genomics 14: 329. DVD, Lambert WE, Douce R, Block MA, Rebeille F. 2009. C1 Zhang Z, Li J, Zhao XQ, Wang J, Wong KS, Yu J. 2006. KaKs_Calculator: metabolism and chlorophyll synthesis: The mg-protoporphyrin IX Calculating Ka and Ks through model selection and model methyltransferase activity is dependent on the folate status. New averaging. Genomics Proteomics & Bioinformatics 4: 259–263. Phytologist 182: 137–145. Zhou T, Li ZH, Bai GQ, Feng L, Chen C, Wei Y, Chang YX, Zhao GF. 2016. Yadav HK, Ranjan A, Asif MH, Mantri S, Sawant SV, Tuli R. 2011. EST- Transcriptome sequencing and development of genic SSR derived SSR markers in Jatropha curcas L.: Development, markers of an endangered Chinese endemic genus Dipteronia characterization, polymorphism, and transferability across the Oliver (Aceraceae). Molecules 21: 166. species/genera. Tree Genetics & Genomes 7: 207–219. Zhou XX, Yang LT, Qi YP, Guo P, Chen LS. 2015. Mechanisms on boron- Yang J, Di X, Meng X, Feng L, Liu Z, Zhao G. 2016. Phylogeography and induced alleviation of aluminum-toxicity in citrus grandis seed- evolution of two closely related oak species (Quercus) from north lings at a transcriptional level revealed by cDNA-AFLP analysis. and northeast China. Tree Genetics & Genomes 12: 89. PLoS One 10: e0115485. You FM, Huo N, Yong QG, Luo MC, Ma Y, Hane D, Lazo GR, Dvorak J, Zhou Z. 1993. The fossil history of Quercus. Acta Botanica Yunnanica 15: Anderson OD. 2008. BATCHPRIMER3: A high throughput web 21–33. application for PCR and sequencing primer design. BMC Bioinformatics 9: 253. You Y, Liu D, Liu H, Zheng X, Diao Y, Huang X, Hu Z. 2015. Development and characterisation of EST-SSR markers by transcriptome Supplementary Material sequencing in taro (Colocasia esculenta (L.) Schoot). Molecular Breeding 35: 134. The following supplementary material is available online for Yun R, Wang H, Hu Z, Zhong M, Wei W, Qian Y. 1998. Genetic this article at http://onlinelibrary.wiley.com/doi/10.1111/ differentiation of Quercus mongolica and Q. liaotungensis based jse.12464/suppinfo: on morphological observation, isozyme and DNA analysis. Acta Table S1. Information of 11 Quercus species used in the genetic Botanica Sinica 40: 1040–1046. diversity analysis. Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, Mccown B, Table S2. Partial list of candidate orthologs under positive Harbut R, Simon P. 2012. Using next-generation sequencing selection between Quercus liaotungensis and Q. mongolica. approaches to isolate simple sequence repeat (SSR) loci in the Table S3. Characterization of expressed sequence tag-simple plant sciences. American Journal of Botany 99: 193. sequence repeats (EST-SSR).

www.jse.ac.cnJ. Syst. Evol. 57(5): 440–450, 2019 J. Syst. Evol. 9999 (9999):www.jse.ac.cn 1–11, 2018