Genome and Transcriptome Atlas of the Digestive Tract
Total Page:16
File Type:pdf, Size:1020Kb
DNA Research, 2018, 25(5), 547–560 doi: 10.1093/dnares/dsy024 Advance Access Publication Date: 25 July 2018 Full Paper Full Paper The yellowtail (Seriola quinqueradiata) genome and transcriptome atlas of the digestive tract Motoshige Yasuike1,*, Yuki Iwasaki1,†, Issei Nishiki1, Yoji Nakamura1, Aiko Matsuura1, Kazunori Yoshida2,‡, Tsutomu Noda2, Tadashi Andoh3, and Atushi Fujiwara1,* 1Research Center for Bioinformatics and Biosciences, National Research Institute of Fisheries Science, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa 236-8648, Japan, 2Goto Laboratory, Stock Enhancement and Aquaculture Division, Seikai National Fisheries Research Institute Japan Fisheries Research and Education Agency, Tamanoura-cho, Goto, Nagasaki 853-0508, Japan, and 3Stock Enhancement and Aquaculture Division, Seikai National Fisheries Research Institute, Japan Fisheries Research and Education Agency, Nagasaki 851-2213, Japan *To whom correspondence should be addressed. Tel. þ81 045 788 7640. Fax. þ81 045 788 7640. Email: [email protected] (M.Y.); Tel. þ81 045 788 7691. Fax. þ81 045 788 7691. Email: [email protected] (A.F.) †Present address: Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan. ‡Present address: Kamiura Laboratory, Research Center for Aquatic Breeding, National Research Institute of Aquaculture, Japan Fisheries Research and Education Agency, Tsuiura, Kamiura, Saiki, Oita 879-2602, Japan. Edited by Dr. Osamu Ohara Received 8 November 2017; Editorial decision 26 June 2018; Accepted 28 June 2018 Abstract Seriola quinqueradiata (yellowtail) is the most widely farmed and economically important fish in aquaculture in Japan. In this study, we used the genome of haploid yellowtail fish larvae for de novo assembly of whole-genome sequences, and built a high-quality draft genome for the yellowtail. The total length of the assembled sequences was 627.3 Mb, consisting of 1,394 scaf- fold sequences (>2 kb) with an N50 length of 1.43 Mb. A total of 27,693 protein-coding genes were predicted for the draft genome, and among these, 25,832 predicted genes (93.3%) were functionally annotated. Given our lack of knowledge of the yellowtail digestive system, and us- ing the annotated draft genome as a reference, we conducted an RNA-Seq analysis of its three digestive organs (stomach, intestine and rectum). The RNA-Seq results highlighted the impor- tance of certain genes in encoding proteolytic enzymes necessary for digestion and absorption in the yellowtail gastrointestinal tract, and this finding will accelerate development of formu- lated feeds for this species. Since this study offers comprehensive annotation of predicted protein-coding genes, it has potential broad application to our understanding of yellowtail biol- ogy and aquaculture. Key words: Seriola quinqueradiata, genome sequencing, protein-coding genes, transcriptome analysis, digestive enzymes VC The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 547 548 The haploid-based draft genome of the yellowtail 1. Introduction Proton (four runs) platforms (Life Technologies, Carlsbad, CA). The genus Seriola (family Carangidae) contains nine recognized spe- These haploid reads were assembled using Newbler 3.0 (Roche cies of carnivorous, marine fish, globally distributed in tropical, sub- Diagnostics, Basel, Switzerland). Four Illumina paired-end (PE) li- tropical and temperate ocean regions.1 Seriola species have a high braries (insert size: 240, 360, 480 and 720 bp) and three mate-pair market value and several species are farmed around the world.2 (MP) libraries (insert size: 3–5, 10 and 20 kb) were constructed from Japan is the biggest producer of Seriola species, farming yellowtail the diploid dam genomic DNA and sequenced on a NextSeq 500 se- (S. quinqueradiata), greater amberjack (S. dumerili) and yellowtail quencer (Illumina, San Diego, CA, USA) using 2 Â 150 bp chemistry. kingfish (S. lalandi). In 2016 the production of these three species To improve sequence accuracy of the haploid contig sequences gen- was 141,000 tons, accounting for over 56% of marine finfish aqua- erated by Ion PGM and Proton sequencers, the Illumina PE reads culture in Japan.3 Most of this production is accounted for by yel- were mapped against the haploid contigs using Burrows–Wheeler lowtail, which has the longest history of aquaculture, having started aligner-maximum exact matches (BWA-MEM),17 and nucleotide in 1927 and expanded rapidly in the 1960s.4 Although yellowtail mismatches or short indels were overridden by the Illumina read aquaculture technologies have continued to improve, several chal- sequences. We also carried out a de novo assembly of the four dip- lenges to the sustainable farming of this species remain to be loid Illumina PE libraries (all merged 240 bp, and 10 Gb of 360, 480 addressed. The price of fish meal used for yellowtail feed continues and 720 bp inserts) with Platanus 1.2.4,18 and performed a sequence to rise,5 necessitating a better understanding of yellowtail digestive comparison between the haploid and diploid contigs using pathways and nutritional requirements in order to develop more BLASTN19 (E value threshold of 1E – 20 and the bit score of the top cost-effective feeds. hit with a more than 2-fold difference over that of the second hit),20 To improve aquaculture production efficiency, product quality, and the diploid contigs absent in the haploid contigs were added to sustainability and profitability, recent efforts have focussed on the the haploid contigs. These merged contigs were subjected to scaffold- genetics and genomics of economically important aquaculture spe- ing using Platanus 1.2.418 together with all the Illumina PE cies, particularly following the development of high-throughput (unmerged 240, 360, 480 and 720 bp insert) and MP reads (3–5, 10 next-generation sequencing (NGS) technologies.6–8 Genetic maps9–11 and 20 kb insert), and gaps in the scaffolds were filled by the and a whole genome radiation hybrid panel12 for yellowtail have Illumina PE reads (unmerged 240, 360, 480 and 720 bp inserts) with been constructed, and the detection of quantitative trait loci (QTL) the gap-close step in Platanus. The resulting scaffolds were further in yellowtail has recently been initiated for resistance to a monoge- assembled using RNA-Seq data sets generated in this study (de- nean parasite, Benedenia seriolae,13 and sex-determination.14,15 scribed later), by L_RNA_scaffolder.21 However, the resources for genes and protein annotations for this To confirm ploidy level and estimate genome size, Jellyfish soft- species are limited. Only 257 protein sequences have been deposited ware22 was used to produce a k-mer distribution (19-mer) with the at NCBI Genbank, taxonomy ID: 8161 (as of 11 July 2018). A com- Ion Proton haploid reads and with the Illumina PE diploid reads (all prehensive annotation of the overall protein-coding genes for the yel- merged 240 bp, and 10 Gb of 360, 480 and 720 bp inserts). lowtail genome will help identify important genes and pathways involved in the regulation of economically important traits. In addi- tion, a well-annotated genome will provide a powerful tool for better 2.2. Linkage map construction and scaffold correction understanding yellowtail biology, and will enhance physiological To detect and correct scaffold misassemblies, we generated a double- studies for optimization of breeding technologies, including the de- digest restriction-site associated DNA (ddRAD)-based genetic link- velopment of artificial diets. age map of the yellowtail using a previously described method20 Recently, we demonstrated that use of the genome of a haploid with slight modifications. Briefly, genomic DNA samples were gynogenetic yellowtail larva leads to a significant improvement in de extracted from 161 full-sib progeny and their parents’ muscles using novo assembly because allelic variation does not impede contig ex- the DNeasy Blood & Tissue Kit (Qiagen, Venlo, Netherlands). For tension as it does for assembly of diploid genomes.16 Here, we pre- preparation of the ddRAD libraries, genomic DNA was digested us- sent a draft genome sequence for yellowtail based on a haploid ing two restriction enzymes (PstI and MspI) and ligated to Ion Plus genome assembly and its annotation of protein-coding genes. Fragment Library Adapters using an Ion Xpress Plus gDNA Furthermore, to better understand molecular mechanisms of feed di- Fragment Library kit (Life technologies). The ligation products from gestion and nutrient absorption in yellowtail, we performed RNA- 25 individuals were pooled in equimolar proportions and size- Seq analysis of the three major organs of the gastrointestinal (GI) selected within a range of 200–260 bp using BluePippin (Sage tract i.e. stomach, intestine and rectum, using the annotated draft Science, Beverly, MA, USA). After size selection, the ddRAD libraries yellowtail genome as a reference. The analysis suggested that certain were amplified by eight-cycle PCR, purified using Agencourt proteolytic digestive enzymes play key roles in digestion and absorp- AMPure XP beads (Beckman Coulter Inc., Brea, CA, USA), and se- tion, and could inform the choice of ingredients for the development quenced on the Ion Proton (Life technologies). Raw sequences were of yellowtail-specific feeds for use in aquaculture. filtered to discard those of low quality and demultiplexed using the process_radtags programme in Stacks.23 BWA-MEM17 was used to map the filtered ddRAD-Seq reads to the scaffold sequences, and the 2. Materials and methods output SAM file was filtered using custom Perl scripts based on the following criteria: mapping quality (MQ) of 30, length fraction of 2.1. Genome sequencing and assembly 0.95 and non-specific match handling of ignore.