Download from the Three [17]
Total Page:16
File Type:pdf, Size:1020Kb
Chou et al. BMC Genomics 2018, 19(Suppl 2):103 DOI 10.1186/s12864-018-4463-x RESEARCH Open Access The aquatic animals’ transcriptome resource for comparative functional analysis Chih-Hung Chou1,2†, Hsi-Yuan Huang1,2†, Wei-Chih Huang1,2†, Sheng-Da Hsu1, Chung-Der Hsiao3, Chia-Yu Liu1, Yu-Hung Chen1, Yu-Chen Liu1,2, Wei-Yun Huang2, Meng-Lin Lee2, Yi-Chang Chen4 and Hsien-Da Huang1,2* From The Sixteenth Asia Pacific Bioinformatics Conference 2018 Yokohama, Japan. 15-17 January 2018 Abstract Background: Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. Results: To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. Conclusion: In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/. Background In most of the aquaculture industries, prophylactic Aquatic animals have significant economic benefits for antibiotics have been used to prevent bacterial and virus humans and also play key roles in the development of infections [4]. medical applications [1, 2]. Aquaculture transforms nat- To investigate the genomic resource of aquatic ural aquatic resources such as fish, shrimp, and mussels animals, next generation sequencing (NGS) approach into socially-valued commodities [3]. Chemicals derived has been adapted to uncover new genes and biological from aquatic organisms are used to develop pharmaceut- mechanisms. To effectively increase the our knowledge ical compounds with important clinical applications [2]. on aquatic animal gene discovery at mRNA level, RNA- seq has been applied to several important aquaculture * Correspondence: [email protected] species such as, Plecoglossus altivelis [5], Cyprinus carpio † Equal contributors [6], Penaeus monodon [7], Hyriopsis cumingii [8], 1Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan Mytilus galloprovincialis [9], and Anguilla anguilla [10]. 2Department of Biological Science and Technology, National Chiao Tung To gain insights into the immunogenetics and immune re- University, Hsinchu 300, Taiwan sponse system, RNA-seq has been successfully adapting to Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chou et al. BMC Genomics 2018, 19(Suppl 2):103 Page 162 of 180 study the host defense gene activities against bacteria and pollution in ecosystems. Aquatic animals’ related studies viruses [11]. Clupea pallasii, Fundulus grandis, Pandalus have seen exponentially increased in recent years latirostri,etc.[12–15], have recently come under signifi- (Additional file 1: Figure S1). Many studies have used cant additional anthropogenic and environmental pres- NGS approaches to investigate genomic diversity among sure, and SNPs and a set of genes associated with immune aquatic animals at genomic and transcriptomic levels. function in these species have been identified as providing However, these studies largely focus on single species, high differentiation between two regions, and this insight and no databases exist so far to facilitate homologous can be applied in developing ecological monitoring gene mining among multiple aquatic species. In this systems. Evolutionary research on species including study, we aim to establish an integrated aquatic animal Sinocyclocheilus angustiporus (a comparative investigation transcriptomic database to facilitate studies in the fields between surface and cave species) revealed reduced of genomics, evolution, and phylogeny. We initially expression of a series of visual photo-transduction and performed whole organism RNA-seq on four aquatic an- retinal disease-related genes in cave-dwelling species [16]. imals and collected RNA-seq data for eighteen species Astyanax mexicanus was also used to clarify heritable gen- from the public domain. Reference genome free RNA- etic changes governing adaptation to cave environments seq data of aquatic animals download from the three [17]. Beyond NGS transcriptomic data analysis, the common NGS reads database, NCBI Sequence Read MitoFish database focuses on fish mitochondrial genome Archive (SRA), European Nucleotide Archive (ENA), and annotation to explore a new resource for resolving fish DDBJ Sequence Read Archive (DRA). Then, a computa- phylogenies and identifying new fish species [18]. tional pipeline was developed to annotate these NGS tran- NGS technology has emerged as a powerful tool to scriptomes. Finally, all data were summarized and provided illustrate the blueprints of novel species. Whole genome se- in the dbATM database. Figure 1 illustrates how the data- quencing (WGS) and RNA sequencing (RNA-seq) can be base can be used for mining transcriptomics, functions used to investigate the individual differences, evolution and such Gene Ontology and KEGG pathways and comparative gene function [19, 20]. However, whole genome sequencing analysis of aquatic animals. The dbATM collects transcrip- is more costly in terms of expense and computing re- tomes for twenty-two aquatic animals’ transcriptomes and sources than RNA-seq, and cannot obtain comprehensive provides an invaluable tool for homologous evolution and information for transcriptomes directly by gene prediction. evo-devo studies. A better strategy is using RNA-seq to elucidate the molecu- lar basis of biological functions [21]. Given the economical Results importance of these aquatic animals, a comprehensive re- Overview of dbATM content source of transcriptomes should be establsihed despite the Additional file 1: Table S1 displays all aquatic animal lack of draft genomes of these non model organisms. RNA-seq data sources and original data types. Following RNA-seq technology pave the way for transcriptome data systematic analysis (Fig. 2), the individual species infor- investigation. With the techniques, transcripts can not only mation and comparative analysis can be accessed from be detected in accordance with the reference genome but dbATM. Table 1 shows the species annotation of the also can be used for de novo the assembly of RNA-seq transcripts, unigenes, proteins, and homologous genes. reads to discover genes and profile their expression in or- To facilitate evolutionary studies and comparative ana- ganisms without a reference genome [22–24]. However, lysis across the different aquatic animal species, a even if these data are deposited in the NCBI Sequence taxonomic tree was constructed according to biological Read Archive (SRA) [25] which provides high-throughput classification (Fig. 3), for shellfish, shrimp, and fish from unassembled and un-annotated raw NGS reads for more invertebrate to vertebrate. than 1000 Terabase pairs, public access is not provided to allow for searches of the well-annotated data. To address Web interface this problem, we have created a queryable database from The dbATM provides various query interfaces and the annotated transcriptomic database, called dbATM graphical visualization pages to access to aquatic tran- (http://dbATM.mbc.nctu.edu.tw). A similar approach was scriptome data (Fig. 4). The summary table shows spe- reported for ASGARD [26], which collects the annotated cies information including scientific name (or common transcriptomes from three model arthropod species and name), followed by total sequencing read size (in bp), NHPRTR [21], focusing on non-human primate NGS sequencing platform, type of read sequences, number of transcriptomic sequencing and analysis. dbATM is an reads and a link to NCBI SRA and NCBI Taxonomy and easily-accessible