Comparative Transcriptomes and Development of Expressed Sequence Tag-Simple Sequence Repeat Markers for Two Closely Related Oak Species

Journal of Systematics JSE and Evolution doi: 10.1111/jse.12469 Research Article Comparative transcriptomes and development of expressed sequence tag-simple sequence repeat markers for two closely related oak species Jing-Jing Sun1, Tao Zhou2, Rui-Ting Zhang1, Yun Jia1, Yue-Mei Zhao3, Jia Yang1, and Gui-Fang Zhao1* 1Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an 710069, China 2School of Pharmary, Xi’an Jiaotong University, Xi’an 710061, China 3College of Biopharmaceutical and Food Engineering, Shangluo University, Shangluo 726000, Shaanxi, China *Author for correspondence. E-mail: [email protected]. Tel.: 86-29-88305264. Fax: 86-29-88303572. Received 1 December 2017; Accepted 7 October 2018; Article first published online11 xx November Month 2018 2018 Abstract Quercus species comprise the major genera in the family Fagaceae and they are widely distributed in the Northern Hemisphere. Many Quercus species, including several endemics, are distributed in China. Genetic resources have been established for the important genera but few transcriptomes are available for Quercus species in China. In this study, we used Illumina paired-end sequencing to obtain the transcriptomes of two oak species, Q. liaotungensis Koidz. and Q. mongolica Fisch. ex Turcz. Approximately 24 million reads were generated and then a total of 103 618 unigenes were obtained after assembly for both species. Comparative transcriptome analyses of both species identified a total of 12 981 orthologous contigs. The Ka/Ks estimation and enrichment analysis indicated that 1179 (9.08%) orthologs showed rapid evolution, and most of these orthologs were related to functions comprising “DNA repair”, “response to cold”, and “response to drought”. This findings could provide some insights into how these two closely related Quercus species adapted to extreme environments characterized by aridity and cold. The divergence time (approximately 4.27–5.93 Mya) between the two Quercus species was estimated according to the Ks distribution. Moreover, 16 608 simple sequence repeat loci were detected and 12 363 primer pairs were designed. Subsequently, 158 of the 12 363 primer pairs were randomly selected to test the polymorphisms and 92 of the primer pairs were successfully amplified in the two oak species. The resultant orthologs and simple sequence repeat markers are valuable for genetic differentiation analyses and evolutionary studies of Quercus. Key words: de novo assembly, genetic differentiation, positive selection, Quercus, RNA sequencing. 1 Introduction Q. mongolica was colder and drier (Yang et al., 2016). Quercus Quercus liaotungensis Koidz. (also known as Q. wutaishanica) mongolica and Q. liaotungensis are very similar species with and Q. mongolica Fisch. ex Ledeb. are dominant tree species in only minor morphological differences, that is, a smaller north China with high ecological and economic value. Quercus number of lateral leaf veins and lobes as well as flat scales on mongolica is a constructive species in northeast China and is the acorn cup in Q. liaotungensis compared with Q. mongolica also a main timber tree species. Its ecological value is reflected (Yun et al., 1998). No previous studies have explained why by good resistance to corrosion and water and soil these two closely related species with their concentrated conservation. Similarly, as a constructive species distributed distribution areas differed in cold resistance and drought in temperate and warm temperate forests, the seeds of resistance. Q. liaotungensis contain starch that can be used for brewing Transcriptome sequencing is a convenient method for and its leaves can be used to feed tussah silkworms (Liu, 2012). rapidly obtaining information about expressed genomic Although Q. liaotungensis and Q. mongolica are mainly regions and for resolving comparative genomic-level prob- distributed in north and northeast China, their concentrated lems related to non-model organisms (Logacheva et al., 2011; distribution areas are quite different. Quercus liaotungensis is Zhang et al., 2013b). Due to the rapid development of mainly distributed in northern China, such as northern next-generation sequencing, RNA sequencing (RNA-Seq) Shaanxi, Shanxi and Hebei Provinces, whereas Q. mongolica has become more efficient and less expensive, and it is is mainly distributed in northeastern China, including Heilong- increasingly used to study the evolutionary origins and jiang, Jilin, parts of Liaoning and eastern Inner Mongolia. ecology of non-model plants (Hudson, 2008; Strickler et al., Compared with the habitat of Q. liaotungensis, the habitat of 2012). ©XXX 2018 2018 Institute | Volume of Botany, 9999 |Chinese Issue 9999Academy | 1– 11of Sciences © 2018 InstituteSeptember of Botany, 2019 | Volume Chinese 57 Academy | Issue 5 of | 440–450 Sciences 2Transcriptomes and Sun EST– etSSR al. markers for Quercus 441 Simple sequence repeats (SSRs) are commonly used for cDNA libraries and for RNA-seq. The cDNA library used for analyzing genetic diversity and evolutionary studies because transcriptome sequencing was prepared using a cDNA of their codominant and highly polymorphic nature (Song Synthesis Kit (Illumina) according to the manufacturer’s et al., 2003; Hao et al., 2006; Ali et al., 2008). The application of instructions. The cDNA library was then sequenced using a next-generation sequencing has allowed the development of HiSeq 2000 (Illumina) to obtain short sequences from both large numbers of molecular markers for non-model species ends of each cDNA. (Teacher et al., 2012; Zalapa et al., 2012). For instance, a large number of microsatellite markers or single-copy nuclear genes 2.3 De novo assembly and functional annotation of unigenes have been identified using RNA-Seq in Aspidistra saxicola The raw data were filtered to generate clean data by Y. Wan (Huang et al., 2013), Benincasa hispida (Thunb.) Cogn. removing the adapter sequences, reads with unknown bases (Jiang et al., 2013), Dysosma versipellis (Hance) M. Cheng ex comprising greater than 20% (quality value 10), and low- Ying (Guo et al., 2014), Medicago sativa L. (Wang et al., 2014), quality sequences (reads with unknown bases “N”). The and Colocasia esculenta (L). Schott (You et al., 2015). Many clean reads were then assembled using Trinity software with studies have developed SSR markers for the population the default parameters (Grabherr et al., 2011). The predicted genetics and quantitative trait loci studies of genus Quercus protein sequences (open reading frames) were extracted (Isagi & Suhandono, 1997; Ueno et al., 2008; Chatwin et al., using the Perl script Transdecoder in the Trinity program 2014; An et al., 2016). package (Grabherr et al., 2011). All of the assembled In this study, we compared the transcriptomes of Q. unigenes were searched against the NCBI non-redundant liaotungensis and Q. mongolica, and investigated why Q. protein (NR), Clusters of Orthologous Groups (COG), and mongolica is better adapted to the cold and dry environment Kyoto Encyclopedia of Genes and Genomes (KEGG) data- than that of Q. liaotungensis in northeast China. We also bases using the BlastX algorithm with a typical cut-off value undertook pairwise comparisons of orthologous sequences to of E-value <1e-5 was used. Based on the results obtained by identify candidate genes that might be under positive protein database annotation, Blast2GO (Conesa et al., 2005) selection and estimated their divergence time between two was used to obtain Gene Ontology (GO) annotations according oaks. A large number of genic SSR markers were developed to the molecular function, biological process, and cellular from the transcriptome sequences and validated in other component ontologies. Based on sequence homology Quercus species. The transcriptomic resources and genic SSR searches, the unigenes were aligned to the COG database to markers obtained in this study might facilitate further predict and classify possible functions; the KEGG database was evolutionary and genetic differentiation studies of other also used to annotate the pathways for these unigenes with an À Quercus species. E-value threshold of 10 5 (Chen et al., 2011). 2.4 Identification of orthologous contigs and estimation of Ka/Ks between orthologous genes 2 Material and Methods The reciprocal best hits (RBH) algorithm was widely used for 2.1 Plant materials defining orthologous genes based on Blast. Usually, a pair of Samples of Quercus liaotungensis and Q. mongolica for genes that belong to two different genomes are recognized transcriptome sequencing were collected from Tongchuan, as orthologs if their sequences are the best hits for each other 0 00 0 00 Shaanxi Province (35°18 36 N, 108°55 12 E) and Changbai (Tatusov et al., 1997; Bork et al., 1998; Moreno-Hagelsieb & 0 00 0 00 Mountain in Liaoning Province (42°2 24 N, 127°46 12 E), Latimer, 2008). Reciprocal BlastN was executed using the China, respectively. Mature seeds were collected and grown unigenes from both species with an E-value cut-off of 1e-5. in the laboratory. Mixtures of fresh leaves from seven or A python script was then used to find the best hits based on eight individuals belonging to both oak species were sent the BlastN results. The predicted coding DNA sequence directly to Biomarker Technologies (Beijing, China) for total regions of Q. liaotungensis and Q. mongolica transcriptomes RNA extraction and high-throughput sequencing. In addi- were then used to identify orthologous groups between tion, fresh leaves of 44 individuals from 11 Quercus species the two species. OrthoMCL version 2.0.9, based on a protein were dried with silica gel for DNA extraction, polymerase similarity graph method, was used to retrieve the groups chain reaction (PCR) amplification, and validation of SSR of homologous protein coding genes with the default markers. Detailed information about the materials is listed in parameters (Li et al., 2003). A threshold of Ks > 0.1 was set Table S1. to avoid identifying paralogs using RBH. The remaining orthologous pairs were categorized into two groups using a 2.2 RNA extraction, cDNA library construction, and Illumina threshold of Ka/Ks ¼ 1.

Comparative Transcriptomes and Development of Expressed Sequence Tag-Simple Sequence Repeat Markers for Two Closely Related Oak Species

Genbank Dennis A

Rapid Evolution and Selection Inferred from the Transcriptomes of Sympatric Crater Lake Cichlid Fishes

Genomic Approaches to Research in Lung Cancer Edward Gabrielson the Johns Hopkins University School of Medicine, Baltimore, USA

Characterization of an EST Database for the Perennial Weed Leafy Spurge: an Important Resource for Weed Biology Research James V

Navigating the Human Transcriptome

Patentability of Expressed Sequence Tags Joshua Kim

In Silico Gene Expression Analysis–An Overview

Data from Expressed Sequence Tags from the Organs and Embryos Of

Training Modules Documentation Release 1

Searching Expressed Sequence Tag Databases: Discovery and Confirmation of a Common Polymorphism in the Thymidylate Synthase Gene1

Transcriptomics in the RNA-Seq Era

The European Nucleotide Archive