Annotation and SSR Discovery
Total Page:16
File Type:pdf, Size:1020Kb
ÔØ ÅÒÙ×Ö ÔØ Sequencing and De Novo assembly of visceral mass transcriptome of the Critically Endangered land snail Satsuma myomphala: Annotation and SSR Discovery Se Won Kang, Bharat Bhusan Patnaik, Hee-Ju Hwang, So Young Park, Jong Min Chung, Dae Kwon Song, Hongray Howrelia Patnaik, Jae Bong Lee, Soonok Kim, Hong Seog Park, Seung-Hwan Park, Yeon Soo Han, Jun Sang Lee, Yong Seok Lee PII: S1744-117X(16)30080-6 DOI: doi:10.1016/j.cbd.2016.10.004 Reference: CBD 432 To appear in: Comparative Biochemistry and Physiology - Part D: Genomics and Proteomics Received date: 24 January 2016 Revised date: 24 October 2016 Accepted date: 26 October 2016 Please cite this article as: Kang, Se Won, Patnaik, Bharat Bhusan, Hwang, Hee-Ju, Park, So Young, Chung, Jong Min, Song, Dae Kwon, Patnaik, Hongray Howrelia, Lee, Jae Bong, Kim, Soonok, Park, Hong Seog, Park, Seung-Hwan, Han, Yeon Soo, Lee, Jun Sang, Lee, Yong Seok, Sequencing and De Novo assembly of visceral mass transcriptome of the Critically Endangered land snail Satsuma myomphala: Annotation and SSR Dis- covery, Comparative Biochemistry and Physiology - Part D: Genomics and Proteomics (2016), doi:10.1016/j.cbd.2016.10.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT Sequencing and De Novo assembly of visceral mass transcriptome of the Critically Endangered land snail Satsuma myomphala: Annotation and SSR Discovery Se Won Kang 1†, Bharat Bhusan Patnaik 1,2,†, Hee-Ju Hwang 1, So Young Park 3, Jong Min Chung 1, Dae Kwon Song 1, Hongray Howrelia Patnaik 1, Jae Bong Lee 4, Changmu Kim 5, Soonok Kim 5, Hong Seog Park 6, Seung-Hwan Park 7, Yeon Soo Han 8, Jun Sang Lee 9, Yong Seok Lee 1,* 1 Department of Life Science and Biotechnology, College of Natural Sciences, Soonchunhyang University, 22 Soonchunhyangro, Shinchang-myeon, Asan, Chungcheongnam-do 31538, Korea. 2 Trident School of Biotech Sciences, Trident Academy of Creative Technology (TACT), Chandaka Industrial Estate, Chandrasekharpur, Bhubaneswar, Odisha 751024, India. 3 Nakdonggang National Institute of Bioloogical Resources, Biodiversity Conservation and Climate Change Division, 137, Donam-2-gil, Sangju-si, Gyeongsangbuk-do, 37242, Korea 3 Korea Zoonosis Research Institute (KOZRI), Chonbuk National University, 820-120 Hana-ro, Iksan, Jeollabuk-do 54528, Korea. 4 National Institute of Biological Resources, 42, Hwangyeong-ro, Seo-gu, Incheon 22689, Korea. 5 Research Institute, GnC BIO Co., LTD. 621-6 Banseok-dong, Yuseong-gu, Daejeon 34069, Korea. 6 Biological Resource Centre, Korean Research Institute of Bioscience and Biotechnology (KRIBB), Jeongeup,ACCEPTED Korea MANUSCRIPT 7 College of Agriculture and Life Science, Chonnam National University 77 Yongbong-ro, Buk- gu, Gwangju 61186, Korea. 8 Institute of Environmental Research, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon-si, Gangwon-do 243341, Korea. † These authors contributed equally to this work. Running title: Transcriptome of Critically Endangered Satsuma myomphala. Ms. Has 76 pages, 9 figures, 6 tables, 2 suppl. files * Corresponding author: 1 ACCEPTED MANUSCRIPT Dr. Yong Seok Lee E-Mail: [email protected]; Tel.: +82-10-4727-5524; Fax: +82-41-530-1256. Abstract Satsuma myomphala is critically endangered through loss of natural habitats, predation by natural enemies, and indiscriminate collection. It is a protected species in Korea but lacks genomic resources for an understanding of varied functional processes attributable to evolutionary success under natural habitats. For assessing the genetic information of S. myomphala, we performed for the first time, de novo transcriptome sequencing and functional annotation of expressed sequences using Illumina Next-Generation Sequencing (NGS) platform and bioinformatics analysis. We identified 103,774 unigenes of which 37,959, 12,890, and 17,699 were annotated in the PANM (Protostome DB), Unigene, and COG (Clusters of Orthologous Groups) databases, respectively. In addition, 14,451 unigenes were predicted under Gene Ontology functional categories, with 4581 assigned to a single category. Furthermore, 3369 sequences with 646 having Enzyme Commission (EC) numbers were mapped to 122 pathways in the KyotoACCEPTED Encyclopedia of Genes MANUSCRIPT and Genomes Pathway database. The prominent protein domains included the Zinc finger (C2H2-like), Reverse Transcriptase, Thioredoxin-like fold, and RNA recognition motif domain. Many unigenes with homology to immunity, defense, and reproduction-related genes were screened in the transcriptome. We also detected 3120 putative simple sequence repeats (SSRs) encompassing dinucleotide to hexanucleotide repeat motifs from >1 kb unigene sequences. A list of PCR primers of SSR loci have been identified to study the genetic polymorphisms. The transcriptome data represents a valuable resource for 2 ACCEPTED MANUSCRIPT further investigations on the species genome structure and biology. The unigenes information and microsatellites would provide an indispensable tool for conservation of the species in natural and adaptive environments. Keywords: Satsuma myomphala; Illumina sequencing; De novo annotation; Transcriptome; Functional annotation; Simple sequence repeats 1. Introduction The Mollusca among invertebrates comprise the second most habituated species phylum with over 100,000 described members under eight major lineages (Haszprunar et al., 2008). The ecological diversity of Molluscs show the most conspicuous organisms, with a majority inhabiting the marine environment, extending from intertidal to deepest oceans. The high diversity and abundance is a feature of terrestrial molluscs wherein as many as 60-70 species may coexist in a single habitat. Molluscs belonging to class Gastropoda form the largest group with more than 62,000 described living species. Gastropod molluscs have succeeded to occupy almost all the ecological habitats due to diversity in body, size and shell morphology. These species have been used as models for investigations into ecological, behavioral, biomechanical, physiological, and phylogeneticsACCEPTED attributes ofMANUSCRIPT relevance (Kocot et al., 2011; Bourdeau et al., 2015). The land snails, in particular, show numerous special adaptations to particular habitats, and hence are one of the successful animal groups on the earth. The land gastropods of family Camaenidae currently comprises of 87 genera majorly distributed in the tropics of Eastern Asia and Australasia (Cuezzo, 2003; Kohler, 2010). The morphological attributes of Camaenidae snails with reference to shell and genital tracts have been found useful in understanding the taxonomic position in genera inhabiting Western 3 ACCEPTED MANUSCRIPT Australia (Kohler and Criscione, 2015). Among the taxa of Camaenidae snails with main occurrence in South-East Asia, the genus Satsuma includes three species namely, Satsuma japonica, Satsuma myomphala, and Satsuma omphaloides. A karyotype analysis of S. japonica and S. omphaloides have been reported showing dissimilarities in sizes of chromosomes (Tatewaki and Kitada, 1987). S. myomphala is a terrestrial pulmonate gastropod found in Japan. However, in Korea the snail species has been known only from Island Geoje in Gyeongsangnam- do inhabiting humid and bushy sub-tropical forests. The extent of occurrence and the area of occupancy of the species are very narrow, and the living individual has been observed only once. Due to trampling, indiscriminate collection, predation from natural enemies, and loss of natural habitats the living individuals of the species have been threatened. This species has been assessed as Critically Endangered in Korea and classified as monitored species under Korea Red List (KORED) Assessment. Due to the lack of sample resource and genomic information, sufficient progress towards a study on the species phylogenomics and phylogeography has been hindered. Added to the limited genomic resources, the phenotypic screens for measurable traits related to local adaptation, immune system, reproductive processes and pathways in S. myomphala remainsACCEPTED poorly explored. With regardsMANUSCRIPT to the National Centre for Biotechnology Information (NCBI) database entry, the mitochondrion cytochrome oxidase subunit I gene sequence is known for the species. Furthermore, the molecular markers relevant to specific traits, required to support the breeding initiatives has not been explored in S. myomphala. For the protection of the species in the wild and for successful marker-assisted breeding programmes, a comprehensible understanding of the genetic background seems necessary. Transcriptomics is a fast, reliable, and cost-efficient method that can explore the genetic resource information in S. myomphala by annotation of candidate genes involved in metabolism, immune and reproductive 4 ACCEPTED MANUSCRIPT pathways. The method would also assist to identify the molecular markers in the gene sequences of S. myomphala, and hence would be utilized as a tool to take stock of selective pressures in the species genome due to anthropogenic activities. The Next-generation sequencing (NGS) technologies have provided the whole-genome and transcriptome