Odorrana Margaretae)
Total Page:16
File Type:pdf, Size:1020Kb
Transcriptome Profile of the Green Odorous Frog (Odorrana margaretae) Liang Qiao1,2, Weizhao Yang2, Jinzhong Fu2,3*, Zhaobin Song1,4* 1 Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, China, 2 Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China, 3 Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, 4 Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China Abstract Transcriptome profiles provide a practical and inexpensive alternative to explore genomic data in non-model organisms, particularly in amphibians where the genomes are very large and complex. The odorous frog Odorrana margaretae (Anura: Ranidae) is a dominant species in the mountain stream ecosystem of western China. Limited knowledge of its genetic background has hindered research on this species, despite its importance in the ecosystem and as biological resources. Here we report the transcriptome of O. margaretae in order to establish the foundation for genetic research. Using an Illumina sequencing platform, 62,321,166 raw reads were acquired. After a de novo assembly, 37,906 transcripts were obtained, and 18,933 transcripts were annotated to 14,628 genes. We functionally classified these transcripts by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). A total of 11,457 unique transcripts were assigned to 52 GO terms, and 1,438 transcripts were assigned to 128 KEGG pathways. Furthermore, we identified 27 potential antimicrobial peptides (AMPs), 50,351 single nucleotide polymorphism (SNP) sites, and 2,574 microsatellite DNA loci. The transcriptome profile of this species will shed more light on its genetic background and provide useful tools for future studies of this species, as well as other species in the genus Odorrana. It will also contribute to the accumulation of amphibian genomic data. Citation: Qiao L, Yang W, Fu J, Song Z (2013) Transcriptome Profile of the Green Odorous Frog (Odorrana margaretae). PLoS ONE 8(9): e75211. doi: 10.1371/journal.pone.0075211 Editor: Marc Robinson-Rechavi, University of Lausanne, Switzerland Received July 3, 2013; Accepted August 13, 2013; Published September 20, 2013 Copyright: © 2013 Qiao et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work is supported by a National Natural Science Foundation of China (http://www.nsfc.gov.cn/Portal0/default152.htm), grant number 31172061. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (JF); [email protected] (ZS) Introduction transcriptomes have been reported, including Xenopus tropicalis [9], Cyclorana alboguttata [10], Rana chensinensis Genomics has revolutionized many disciplines of biological [11,12], R. kukunoris [11], R. muscosa [13], R. sierra [13], Hyla science, and genomic data of non-model organisms are rapidly arborea [14], Notophthalmus viridescens [15,16], and accumulating [1-3]. Among vertebrates, for example, a total of Ambystoma mexicanum [17]. The accumulation of 61 annotated genomes are currently available from Ensembl transcriptome data will provide a ‘sneak peek’ of amphibian database (June 21, 2013). Nevertheless, amphibians, as a genome evolution. major transitional group in the evolutionary history of The green odorous frog Odorrana margaretae is a typical vertebrates, are currently represented by only one species anuran from the family Ranidae, and is an important species (Xenopus tropicalis) [4]. This is likely due to the fact that both ecologically and in terms of biological resources. It is a amphibians generally have very large and complex genomes dominant species in the stream ecosystem of the Hengduan [5,6]. As a small but important part of the genome, Mountain, a global diversity hotspot [18], and is an excellent transcriptome includes most protein coding genes. With environmental indicator species. Odorous frog species are advances of the Next Generation Sequencing (NGS) sensitive to environmental changes, and at least two of them technologies, transcriptome data offer an opportunity to deliver have been observed to shift their ranges northward possibly as fast, inexpensive, and accurate genome information for a response to global warming ([19], personal observation). genomic exploration in non-model organisms [7,8]. Additionally, odorous frogs are reservoirs for antimicrobial Transcriptome data are particularly useful for amphibians, peptides (AMPs), and may represent the most extreme AMPs because of their large genome sizes, which make obtaining diversity in nature [20]. AMPs are generally short peptides with whole genome data difficult. Currently, nine amphibian potent antibacterial and antifungal activity [21]. So far, 728 PLOS ONE | www.plosone.org 1 September 2013 | Volume 8 | Issue 9 | e75211 Transcriptome Profile of the Green Odorous Frog different AMPs have been identified from nine odorous frog Table 1. Summary of transcriptome data for Odorrana species, which account for approximately 30% of all AMPs margaretae. discovered [20]. A transcriptome profile of the green odorous frog will provide new molecular markers for ecological research of this as well Total number of raw reads 62,321,166 as other odorous frog species. Single nucleotide polymorphism Total number of clean reads 58,816,733 (SNP) and microsatellite DNA loci are excellent genetic Length of reads (bp) 101 markers, and are commonly employed in population genetic Total length of clean reads 5.88G and molecular ecological studies. They are abundant in Total length of assembly (bp) 54,362,822 transcriptomes. Furthermore, due to limitations of conventional Total number of transcripts 37,906 biochemical isolation methods, some peptides cannot be N50 length of assembly (bp) 1,870 isolated and purified. Transcriptome data will not suffer from Mean length of assembly (bp) 1,434 these limitations and provide an important alternative way to Median length of assembly (bp) 1,096 identify potential AMPs. Transcripts annotated 18,933 In this study, we construct the transcriptome profile of O. Number of unique genes represented 14,628 margaretae using an Illumina sequencing platform. Multiple bp = base pair tissues types from multiple individuals were pooled to maximize doi: 10.1371/journal.pone.0075211.t001 the chance of revealing as many genes as possible. After de novo assembly, we implemented a functional annotation using GO classification bioinformatic analysis. In addition, we made a genome wide search for cDNA encoding AMPs, microsatellite DNA loci, and Gene Ontology (GO) is widely used to standardize SNP sites from the transcriptome. Our data will serve as an representation of genes across species and provides a set of important step forward to establish the foundation for genomic structured and controlled vocabularies for annotating genes, research of amphibians, as well as to provide new markers for gene products, and sequences [24]. In total, 11,457 unique molecular ecological studies. transcripts were assigned to 52 level-2 GO terms, which were summarized under three main GO categories, including cellular Results component, molecular function, and biological process (Figure 3). Compared to the GO annotations of two other species of the same family, R. chensinensis and R. kukunoris, the GO Illumina sequencing, de novo assembly, and gene category distributions of the transcripts for the three ranid frogs annotation were highly similar (Figure S1). Within the GO category of Illumina sequencing of O. margaretae yielded a total of cellular components, 14 level-2 categories were identified, and 62,321,166 raw reads. Among them, 3,504,433 were first the terms cell, cell part, and organelle were the most abundant filtered out before assembly as low quality sequences or (>50%). Within the GO category of molecular function, 15 potential contaminations. Thirty combinations of multiple K-mer level-2 categories were identified, and the term binding was the lengths and coverage cut-off values were used to perform de most abundant (>50%). For biological process function, 23 novo assembly of the clean reads [22,23]. Finally, we merged level-2 categories were identified, and the terms of cellular the 30 raw assemblies by integrating sequence overlaps and process and metabolic process were the most abundant eliminating redundancies. The final assembly included a total of (>50%). 54.3 mega base pairs (Mb), and 37,906 transcripts were obtained with a N50 length of 1,870 base pairs (bps) and a KEGG analysis mean length of 1,434 bps. The sequencing information is presented in Table 1 and the length distribution of all Kyoto Encyclopedia of Genes and Genomes (KEGG) [25] transcripts is shown in Figure 1. All sequence reads are database was used to identify potential biological pathways deposited at NCBI (accession number SRA091981), and the represented in the O. margaretae transcriptome. A total of final assembly is presented as supporting information 1,438 transcripts were assigned to 128 KEGG pathways (Sequence S1 and S2). (Figure 4, Table S1). Among the pathways, purine metabolism, A reference