DNA Research, 2018, 25(5), 547–560 doi: 10.1093/dnares/dsy024 Advance Access Publication Date: 25 July 2018 Full Paper

Full Paper The yellowtail ( quinqueradiata) genome and transcriptome atlas of the digestive tract Motoshige Yasuike1,*, Yuki Iwasaki1,†, Issei Nishiki1, Yoji Nakamura1, Aiko Matsuura1, Kazunori Yoshida2,‡, Tsutomu Noda2, Tadashi Andoh3, and Atushi Fujiwara1,*

1Research Center for Bioinformatics and Biosciences, National Research Institute of Fisheries Science, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa 236-8648, Japan, 2Goto Laboratory, Stock Enhancement and Aquaculture Division, Seikai National Fisheries Research Institute Japan Fisheries Research and Education Agency, Tamanoura-cho, Goto, Nagasaki 853-0508, Japan, and 3Stock Enhancement and Aquaculture Division, Seikai National Fisheries Research Institute, Japan Fisheries Research and Education Agency, Nagasaki 851-2213, Japan

*To whom correspondence should be addressed. Tel. þ81 045 788 7640. Fax. þ81 045 788 7640. Email: [email protected] (M.Y.); Tel. þ81 045 788 7691. Fax. þ81 045 788 7691. Email: [email protected] (A.F.)

†Present address: Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan. ‡Present address: Kamiura Laboratory, Research Center for Aquatic Breeding, National Research Institute of Aquaculture, Japan Fisheries Research and Education Agency, Tsuiura, Kamiura, Saiki, Oita 879-2602, Japan.

Edited by Dr. Osamu Ohara

Received 8 November 2017; Editorial decision 26 June 2018; Accepted 28 June 2018

Abstract Seriola quinqueradiata (yellowtail) is the most widely farmed and economically important fish in aquaculture in Japan. In this study, we used the genome of haploid yellowtail fish larvae for de novo assembly of whole-genome sequences, and built a high-quality draft genome for the yellowtail. The total length of the assembled sequences was 627.3 Mb, consisting of 1,394 scaf- fold sequences (>2 kb) with an N50 length of 1.43 Mb. A total of 27,693 protein-coding genes were predicted for the draft genome, and among these, 25,832 predicted genes (93.3%) were functionally annotated. Given our lack of knowledge of the yellowtail digestive system, and us- ing the annotated draft genome as a reference, we conducted an RNA-Seq analysis of its three digestive organs (stomach, intestine and rectum). The RNA-Seq results highlighted the impor- tance of certain genes in encoding proteolytic enzymes necessary for digestion and absorption in the yellowtail gastrointestinal tract, and this finding will accelerate development of formu- lated feeds for this species. Since this study offers comprehensive annotation of predicted protein-coding genes, it has potential broad application to our understanding of yellowtail biol- ogy and aquaculture.

Key words: Seriola quinqueradiata, genome sequencing, protein-coding genes, transcriptome analysis, digestive enzymes

VC The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 547 548 The haploid-based draft genome of the yellowtail

1. Introduction Proton (four runs) platforms (Life Technologies, Carlsbad, CA). The genus Seriola (family ) contains nine recognized spe- These haploid reads were assembled using Newbler 3.0 (Roche cies of carnivorous, marine fish, globally distributed in tropical, sub- Diagnostics, Basel, Switzerland). Four Illumina paired-end (PE) li- tropical and temperate ocean regions.1 Seriola species have a high braries (insert size: 240, 360, 480 and 720 bp) and three mate-pair market value and several species are farmed around the world.2 (MP) libraries (insert size: 3–5, 10 and 20 kb) were constructed from Japan is the biggest producer of Seriola species, farming yellowtail the diploid dam genomic DNA and sequenced on a NextSeq 500 se- (S. quinqueradiata), greater amberjack (S. dumerili) and yellowtail quencer (Illumina, San Diego, CA, USA) using 2 150 bp chemistry. kingfish (S. lalandi). In 2016 the production of these three species To improve sequence accuracy of the haploid contig sequences gen- was 141,000 tons, accounting for over 56% of marine finfish aqua- erated by Ion PGM and Proton sequencers, the Illumina PE reads culture in Japan.3 Most of this production is accounted for by yel- were mapped against the haploid contigs using Burrows–Wheeler lowtail, which has the longest history of aquaculture, having started aligner-maximum exact matches (BWA-MEM),17 and nucleotide in 1927 and expanded rapidly in the 1960s.4 Although yellowtail mismatches or short indels were overridden by the Illumina read aquaculture technologies have continued to improve, several chal- sequences. We also carried out a de novo assembly of the four dip- lenges to the sustainable farming of this species remain to be loid Illumina PE libraries (all merged 240 bp, and 10 Gb of 360, 480 addressed. The price of fish meal used for yellowtail feed continues and 720 bp inserts) with Platanus 1.2.4,18 and performed a sequence to rise,5 necessitating a better understanding of yellowtail digestive comparison between the haploid and diploid contigs using pathways and nutritional requirements in order to develop more BLASTN19 (E value threshold of 1E – 20 and the bit score of the top cost-effective feeds. hit with a more than 2-fold difference over that of the second hit),20 To improve aquaculture production efficiency, product quality, and the diploid contigs absent in the haploid contigs were added to sustainability and profitability, recent efforts have focussed on the the haploid contigs. These merged contigs were subjected to scaffold- genetics and genomics of economically important aquaculture spe- ing using Platanus 1.2.418 together with all the Illumina PE cies, particularly following the development of high-throughput (unmerged 240, 360, 480 and 720 bp insert) and MP reads (3–5, 10 next-generation sequencing (NGS) technologies.6–8 Genetic maps9–11 and 20 kb insert), and gaps in the scaffolds were filled by the and a whole genome radiation hybrid panel12 for yellowtail have Illumina PE reads (unmerged 240, 360, 480 and 720 bp inserts) with been constructed, and the detection of quantitative trait loci (QTL) the gap-close step in Platanus. The resulting scaffolds were further in yellowtail has recently been initiated for resistance to a monoge- assembled using RNA-Seq data sets generated in this study (de- nean parasite, Benedenia seriolae,13 and sex-determination.14,15 scribed later), by L_RNA_scaffolder.21 However, the resources for genes and protein annotations for this To confirm ploidy level and estimate genome size, Jellyfish soft- species are limited. Only 257 protein sequences have been deposited ware22 was used to produce a k-mer distribution (19-mer) with the at NCBI Genbank, taxonomy ID: 8161 (as of 11 July 2018). A com- Ion Proton haploid reads and with the Illumina PE diploid reads (all prehensive annotation of the overall protein-coding genes for the yel- merged 240 bp, and 10 Gb of 360, 480 and 720 bp inserts). lowtail genome will help identify important genes and pathways involved in the regulation of economically important traits. In addi- tion, a well-annotated genome will provide a powerful tool for better 2.2. Linkage map construction and scaffold correction understanding yellowtail biology, and will enhance physiological To detect and correct scaffold misassemblies, we generated a double- studies for optimization of breeding technologies, including the de- digest restriction-site associated DNA (ddRAD)-based genetic link- velopment of artificial diets. age map of the yellowtail using a previously described method20 Recently, we demonstrated that use of the genome of a haploid with slight modifications. Briefly, genomic DNA samples were gynogenetic yellowtail larva leads to a significant improvement in de extracted from 161 full-sib progeny and their parents’ muscles using novo assembly because allelic variation does not impede contig ex- the DNeasy Blood & Tissue Kit (Qiagen, Venlo, Netherlands). For tension as it does for assembly of diploid genomes.16 Here, we pre- preparation of the ddRAD libraries, genomic DNA was digested us- sent a draft genome sequence for yellowtail based on a haploid ing two restriction enzymes (PstI and MspI) and ligated to Ion Plus genome assembly and its annotation of protein-coding genes. Fragment Library Adapters using an Ion Xpress Plus gDNA Furthermore, to better understand molecular mechanisms of feed di- Fragment Library kit (Life technologies). The ligation products from gestion and nutrient absorption in yellowtail, we performed RNA- 25 individuals were pooled in equimolar proportions and size- Seq analysis of the three major organs of the gastrointestinal (GI) selected within a range of 200–260 bp using BluePippin (Sage tract i.e. stomach, intestine and rectum, using the annotated draft Science, Beverly, MA, USA). After size selection, the ddRAD libraries yellowtail genome as a reference. The analysis suggested that certain were amplified by eight-cycle PCR, purified using Agencourt proteolytic digestive enzymes play key roles in digestion and absorp- AMPure XP beads (Beckman Coulter Inc., Brea, CA, USA), and se- tion, and could inform the choice of ingredients for the development quenced on the Ion Proton (Life technologies). Raw sequences were of yellowtail-specific feeds for use in aquaculture. filtered to discard those of low quality and demultiplexed using the process_radtags programme in Stacks.23 BWA-MEM17 was used to map the filtered ddRAD-Seq reads to the scaffold sequences, and the 2. Materials and methods output SAM file was filtered using custom Perl scripts based on the following criteria: mapping quality (MQ) of 30, length fraction of 2.1. Genome sequencing and assembly 0.95 and non-specific match handling of ignore. After local realign- Preparation of genomic DNA from haploid gynogenetic yellowtail ment, single nucleotide polymorphism (SNP) calling was performed larvae induced by ultraviolet-irradiated sperm and the blood of its using the Unified Genotyper module in the Genome Analysis Toolkit dam (diploid genome) has been described in.16 For contig construc- (GATK)24 and the variants were filtered as follows: only bi-allelic tion, the genomic shotgun libraries were prepared from the haploid sites, genotype quality (GQ) of 30, either reference (REF) or alter- genomic DNA, and sequenced with the Ion PGM (two runs) and native (ALT) allele frequencies of 95% in homozygous sites and M. Yasuike et al. 549 both REF and ALT allele frequencies of 30% in heterozygous sites. 2.5. Gene prediction and annotation 25 The R/qtl package was employed for linkage analysis and linkage Protein-coding genes in the yellowtail genome were predicted using map construction with the same parameters as previously described AUGUSTUS (version 3.1).31 First, repeat regions in the yellowtail 20 in. Linkage group (LG) numbering and orientation followed that scaffold sequences (>2 kb) were masked by RepeatMasker32 accord- 10 of Fuji et al. To detect the possibility of misassembled chimeric ing to the yellowtail repeat database constructed by scaffolds, the linkage marker sequences were mapped to the scaffold RepeatModeler.33 Next, the RNA-Seq reads of yellowtail sequenced sequences with BLASTN using the same criteria as previously de- in this study were mapped to the scaffolds using TopHat34 and as- 20 scribed in ; the scaffolds attributed to two or more LGs were split sembled with Cufflinks.35 Then, the scaffold sequences were scanned into consistent scaffolds based on the locations of markers. Scaffolds with the Zebrafish model in AUGUSTUS, using the map information >2 kb in length were used for subsequent analysis. of RNA-Seq as hints. The predicted gene sequences (24,728 genes) were validated using BUSCO,29 and loci attributed to ‘complete’ 2.3. Evaluation of completeness of the final assembly (792 loci) were used to construct a yellowtail gene training model. A completeness assessment was performed with gVolante26 using the Moreover, protein sequences of 11 fish species: spotted gar pipeline CEGMA (Core Eukaryotic Genes Mapping Approach)27 (Lepisosteus oculatus), Mexican tetra (Astyanax mexicanus), zebra- and the reference dataset Core Vertebrate Genes (CVGs).28 We also fish (Danio rerio), Atlantic cod (Gadus morhua), Nile tilapia assessed the assembly quality by Benchmarking Universal Single- (Oreochromis niloticus), platyfish (Xiphophorus maculatus), Copy Orthologues (BUSCO)29 version 3.0.2 with an - Amazon molly (Poecilia formosa), medaka (Oryzias latipes), stickle- specific set of 4,584 single copy orthologs. back (Gasterosteus aculeatus), green puffer (Tetraodon nigroviridis) and fugu (Takifugu rubripes), were downloaded from the Ensembl database (Release 84),36 mapped to the yellowtail scaffolds by 2.4. cDNA sequencing TBLASTN37 with E-value < 105, and the map information was For gene predictions, normalized, full-length enriched cDNA libraries used as hints in AUGUSTUS. Finally, the predicted gene sequences were constructed as follows: Total RNAs were isolated from eighteen were analysed with InterProScan38 and those matched by any do- yellowtail tissues (gills, skin, fins, red muscle, white muscle, heart, kid- main or supported by any AUGUSTUS hint were collected as valid ney, spleen, stomach, intestine, pyloric caeca, liver, gallbladder, retina, protein-coding genes. cerebellum, optic lobe, olfactory lobe and ovary) using RNeasy Plus The predicted amino acid sequences were compared using Universal Mini Kit (Qiagen). The isolated total RNA samples were BLASTP19 (E value threshold of 1E 5) against those of well- treated with 2 U of TURBO DNase from the TURBO DNase-free Kit annotated model fishes, medaka and zebrafish, in the Ensembl data- (Life Technologies) at 37C for 30 min as recommended by the manu- base (Release 84),36 and against the NCBI non-redundant (nr) pro- facturer’s instructions. RNA integrity was assessed with an Agilent tein database. Functional annotation was performed using Bioanalyzer 2100 with the RNA 6000 Nano Kit (Agilent Blast2GO39 and the Kyoto Encyclopedia of Genes and Genomes Technologies, Santa Clara, CA, USA). Poly(A) þ RNA (mRNA) was (KEGG) Automatic Annotation Server.40 Orthologous gene cluster purified using the FastTrack MAG Micro mRNA Isolation Kit analysis of yellowtail, Pacific bluefin tuna (Thunnus orientalis),41 (Invitrogen, Carlsbad, CA, USA). Equal amounts of mRNA were croaker (Larimichthys crocea),42 Nile tilapia, medaka and zebrafish pooled from each tissue and 500 ng of the pooled mRNA was used to was performed using the OrthoVenn43 web server with default construct a full-length enriched cDNA library using the SMART PCR parameters (E-value 1e–5, inflation value 1.5). OrthoVenn was also cDNA Synthesis Kit (Clontech Laboratories, Inc. Palo Alto, CA, USA) used to assign significantly enriched GO terms of species-specific 30 with modifications as described by Meyer et al. The primer used for gene clusters based on a hypergeometric test (P-value < 0.05). The first-strand cDNA synthesis was a modified 3’ SMART CDS Primer II same analysis also applied to comparisons within the genus Seriola A (AAGCAGTGGTATCAACGCA including yellowtail, S. dumerili (Genbank: BDQW01000000) and S. GAGTCGCAGTCGGTACTTTTTTCTTTTTTV) including a recog- dorsalis (Genbank: PEQF01000000). nition site for the restriction enzyme BpmI,30 and the cDNA was am- plified using the 5’ PCR Primer II A (AAGCAGTGGTATCAA CGCAGAGT) with the KAPA library amplification kit (KAPA 2.6. RNA-Seq analysis of yellowtail GI tract Biosystems, Boston, MA, USA). Amplified cDNA was purified using The three major organs of the GI tract: stomach, intestine and rec- the QIAquick PCR Purification Kit (Qiagen) and normalized using a tum, were dissected from three healthy adult yellowtails (n ¼ 3) Trimmer-2 cDNA normalization kit (Evrogen, Moscow, Russia). The obtained from brood stock maintained at the Goto laboratory of the normalized cDNA was purified with the QIAquick PCR purification Seikai National Fisheries Research Institute, Nagasaki, Japan and kit, and the purified cDNA was digested with BpmIfor1hat37Cto immediately immersed in RNAlater stabilized solution (Thermo remove poly (A) tails. Finally, the digested cDNA was purified using a Fisher Scientific, Waltham, MA, USA). Total RNA was extracted us- QIAquick PCR purification kit. In addition to the 18-tissue normalized ing the Maxwell RSC simply RNA Kit (Promega, Madison, WI, cDNA library, additional full-length cDNA libraries for pituitary, tes- USA) and mRNA was purified from 3 mg total RNA using the Gene tis, ovary haploid cells and digestive organs from larval fish (8–10 Read Pure mRNA kit (Qiagen). Sequencing libraries were con- days post-hatch) were prepared individually with the SMART PCR structed from 5 ng of each mRNA sample using the Ion Total RNA- cDNA Synthesis Kit employing the same procedure as above. Among Seq Kit v2 and sequenced with Ion Proton (Life technologies). RNA- these, the cDNA libraries for the digestive organs from the larval fish Seq data analysis was carried out using CLC Genomics Workbench were normalized using the methods above. These cDNA libraries were 9.5.2 software (CLC Bio, Aarhus, Denmark) as follows: The low- sequenced with the 454 GS FLXþ (Roche Diagnostics), Ion PGM or quality sequences were filtered with default parameters (quality limit Proton platforms (Life Technologies). Detailed information on the ¼ 0.05 and ambiguous limit ¼ 2), and the filtered sequence reads cDNA libraries with corresponding sequencers and sequence statistics were mapped to the yellowtail reference genome constructed in this are summarized in Supplementary Table S2. study with default parameters (mismatch cost ¼ 2, insertion cost ¼ 3 550 The haploid-based draft genome of the yellowtail and deletion cost ¼ 3, length fraction ¼ 0.8 and similarity fraction ¼ allelic variations as biological starting material for genome assembly. 0.8). Subsequently, transcripts per millions (TPMs)44 values for each Zhang et al.46 reported a marked improvement in genome assembly gene were calculated. The differentially expressed genes (DEGs) were after employing doubled-haploid fugu individuals via mitotic gyno- considered significant for the following criteria: a 2-fold or greater genesis. Although doubled haploid fishes are useful in genomic analy- change in expression with a false discovery rate P-value < 0.05. sis, limited doubled haploid fish lines are available, and their Visualization of the DEGs was performed by constructing a hierar- establishment is labour-intensive, time-consuming and expensive. On chical heatmap with Euclidean distance and complete linkage in the the other hand, the use of haploid larvae is a convenient means of CLC Genomics Workbench. To compare the over-represented func- obtaining a non-allelic variation genome in fish if an artificial insemi- tional categories between the highly expressed genes (500 highest nation technique is established, and we have already demonstrated the TPM values) in the three tissues, a GO enrichment analysis was con- effectiveness of genomic DNA from a haploid gynogenetic yellowtail ducted using the functional annotation tool, PANTHER version larva in genome assembly.16 However, there are two concerns related 12.0 (released 7 Ocotber 2017)45 using the zebrafish gene list. The to the use of haploid fish larvae for the construction of reference ge- Bonferroni correction for multiple testing was applied with the cor- nome sequences. To overcome these concerns, we established an as- rected P-value < 0.05 considered significantly over-represented in sembling strategy (Fig. 1) as follows. First, defective regions may exist the PANTHER analysis. in the haploid genome sequence due to allelic variations. Taking this probability into account, we compared the haploid contigs with those generated from its dam (diploid), and then merged the diploid sequen- 3. Results and discussion ces absent in the haploid contigs (Fig. 1). Second, because haploid lar- vae die during early larval stages, the amount of DNA available from 3.1. Genome sequencing and assembly a haploid fish larva is limited, and not sufficient for the preparation of Recent advances in the speed, accuracy and cost effectiveness of NGS PE and MP libraries for scaffolding. To make up for this limitation, technologies have provided the opportunity for whole genome se- the contigs were constructed from limited amounts of DNA from a quencing of even non-model organisms with no available reference haploid fish, and scaffolding was performed using sufficiently avail- genomes. However, de novo assembly of diploid species, including tel- able DNA from its dam (Fig. 1). eosts, remains challenging because of allelic variation. Recently, sev- In this study, a total of 11.2 million reads (3.7 Gb) and 259.4 mil- eral teleost studies have used DNA samples with theoretically no lion reads (51.9 Gb) from the haploid genome were generated from

Figure 1. Strategy for de novo assembly of the yellowtail (S. quinqueradiata) genome using the genome of haploid larvae. More detailed information on the se- quencing libraries and the summary statistics of reads used in this study can be found in Supplementary Table S1, and contig and scaffold statistics are shown in Supplementary Tables S3 and S4, respectively. M. Yasuike et al. 551

Ion PGM and Ion Proton, respectively, and a total of 858 million Table 1. Assembly statistics of yellowtail genome reads (129.1 Gb) were obtained from a series of Illumina PE (240 bp Number of scaffolds (>2 kb) 1,394 insert: 27.8 Gb, 360 bp insert: 31.5 Gb, 480 bp insert: 25.3 Gb and Total size of scaffolds (bp) 627,268,966 720 bp insert: 9.5 Gb) and MP (3–5 kb insert: 10.8 Gb, 10 kb insert: Average scaffold length (bp) 449,978 12.2 Gb and 20 kb insert: 12.0 Gb) libraries of the diploid genome N50 scaffold length (bp) 1,433,177 (Supplementary Table S1). The genome size of the yellowtail has N50 scaffold number 127 10 been calculated to be 685 Mb based on the c-value, while the ge- N90 scaffold length (bp) 404,933 nome size based on k-mer frequency was estimated to be 603.8 and N90 scaffold number 453 602.7 Mb from the haploid and diploid reads datasets, respectively. Longest scaffold (bp) 6,617,595 Thus, the generated sequence data (184.7 Gb) has 270- to 306-fold GC (%) 40.9 coverage of its genome size. Figure 1 provides a flow diagram of the genome assembly procedure used for this study; information on the sequencing libraries and summary statistics of reads appears in Supplementary Table S1. Because our previous study demonstrated The completeness of the generated genome assembly was evalu- that the overlap-layout-consensus (OLC)-based de novo assembly ated using CEGMA analysis, which identified 97.86% complete and (i.e. Newbler) was the most efficient method for haploid reads gener- 98.28% partial genes from the 233 CVGs. This percentage of com- ated from the Ion Torrent sequencing platform,16 the haploid reads plete gene hits is higher than that of model fish genome assemblies (55.6 Gb) were assembled by Newbler and generated 90,827 contigs such as medaka (93.56%), zebrafish (93.13%), stickleback (>500 bp) with a total length with 611.9 Mb (Fig. 1, Contigs A and (92.27%) and spotted green puffer (90.56%), according to the Supplementary Table S3). On the other hand, a 52.1 Gb of diploid gVolante, Completeness Score Database (https://gvolante.riken.jp/ reads from the four Illumina PE libraries (Supplementary Table S1) script/database.cgi (11 July 2018, date last accessed)). Recently, two were assembled with Platanus, a de novo assembler specifically yellowtail genome assemblies (Squ_1.0 and Squ_2.0) have been de- designed to reconstruct highly heterozygous genomes,18 and the posited at DDBJ/EMBL/Genbank under accession numbers resulting assembly consists of 209,882 contigs (>500 bp) with a total BDJM01000000 (Squ_1.0) and BDMU00000000 (Squ_2.0). length with 596.7 Mb (Fig. 1, Contigs B and Supplementary Table According to assembly information at NCBI (https://www.ncbi.nlm. S3). Although different types of sequencers and assemblers were used nih.gov/assembly/organism/8161/all/ (11 July 2018, date last for the haploid and diploid reads, the assembly of the haploid ge- accessed)), both assemblies were generated from diploid DNA sam- nome significantly reduced the total number of contigs with longer ples with an Illumina Hiseq2500 platform and a long-read sequencer average and N50 contig lengths when compared with the diploid ge- PacBio RSII. Squ_2.0 is an updated version of Squ_1.0 and is repre- nome assembly (Supplementary Table S3). Furthermore, the k-mer sentative of the genome of this species at present. Although Squ_2.0 distribution of the diploid reads showed an additional small peak has fewer scaffolds (384) and a larger N50 scaffold size (5.6 Mb) which corresponds to heterozygous regions, i.e. a ‘hetero-peak’18 than our assembly, the assembly quality based on BUSCO and (Supplementary Fig. S1). These heterozygous regions may disrupt the CEGMA analysis showed that our assembly has higher scores than extension of contigs during the assembly process.16 Thus, these those of Squ_2.0 in both analyses. Our assembly captured 97.27% results support the argument for using the haploid genome for de (4,459 out of 4,584) of complete BUSCOs, while Squ_2.0 captured novo assembly. 96.81% (4,438 out of 4,584). Similarly, CEGMA evaluation showed After sequence error correction of the haploid contigs using 94.1 that the completeness of our assembly and that of Squ_2.0 was Gb of Illumina PE reads (Fig. 1, Contigs A’), the diploid contigs 97.85% (228 out of 233) and 97.42% (227 out of 233), respectively. (Fig. 1, Contigs B) absent in the haploid contigs (Fig. 1, Contigs A’) Despite using relatively short sequence reads for the genome assem- were added to the haploid contigs. This process could add 26,916 bly, we obtained a high level of completeness in BUSCO and contigs (4.5 Mb) to the haploid contigs, and generated 162,987 con- CEGMA analyses, suggesting the advantage of utilizing the haploid tigs (Fig. 1, Contigs C and Supplementary Table S3). These merged genome for de novo assembly. Thus, we concluded that our assembly contigs (Fig. 1, Contigs C) were subjected to scaffolding with 71.5 has a more complete gene space than Squ_2.0 and is more suitable Gb of Illumina PE and 35.0 Gb of Illumina MP reads for conducting gene-prediction and annotation of yellowtail. (Supplementary Table S1), and the resulting scaffolds were filled by 71.5 Gb of Illumina PE reads (Supplementary Table S1). As a result, 1,225 scaffolds (>2 kb) were obtained (Fig. 1, Scaffolds A and 3.2. Annotation of protein-coding genes Supplementary Table S4). Further scaffolding using RNA-Seq data- A total of 27,693 protein-coding genes were predicted for the yellow- sets generated in this study (Supplementary Table S2) was conducted tail draft genome, based on AUGUSTUS, the yellowtail training and 1,155 scaffolds (>2 kb) were constructed (Fig. 1, Scaffolds B model and the transcript þ protein hints. A summary of the func- and Supplementary Table S4). Finally, the linkage marker sequences tional annotation of the yellowtail predicted genes is shown in from the ddRAD-based genetic linkage map of the yellowtail gener- Table 2 and the results of the functional annotation of the complete ated in this study were used to correct the scaffold misassemblies set of predicted protein-coding genes are available in Supplementary (Fig 1, Final scaffolds). The female and male maps (consisting of 1, Table S7. Of the 27,693 predicted genes, 25,319 (91.4%) genes 163 and 1, 147 ddRAD markers for female and male maps, respec- matched (E value threshold of 1 E 5) at least one entry in the tively) for LGs (LG1–LG24) are shown in Supplementary Fig. S2, NCBI nr database, while 21,233 (76.7%) and 23,422 (84.6%) genes and the locations of genetic markers are listed in Supplementary matched with model fishes, medaka and zebrafish, respectively Table S5. The final assembly is composed of 1, 394 scaffolds (>2 kb) (Table 2). A total of 18,575 (67.1%) genes mapped to at least one and the total size of scaffolds is 627.3 Mb, with a G þ C content of GO term (biological process: 12,606 genes, cellular component: 40.9%, an N50 length of 1.43 Mb (number of N50 or longer: 127) 11,784 genes and molecular function: 13,094 genes). In addition, and a largest scaffold size of 6.62 Mb (Table 1). 13,880 (50.1%) genes were mapped to KEGG pathways, while 552 The haploid-based draft genome of the yellowtail

Table 2. Functional annotation of the 27,693 yelowtail predicted fish, due to their limited taurine biosynthesis abilities.49 It has been protein-coding genes reported that taurine synthesis is markedly low or negligible in yel- lowtail.50 Therefore, the absorption of taurine from feeds via TauT Database Number of the yellowtail Percentage of is arguably important for yellowtail. Further study of the two TauT genes mached annotated genes genes from the yellowtail-specific cluster will provide insight into NCBI non-redundant (nr) 25,319 91.4% taurine absorption in this species. databasea Draft genomes for greater amberjack (S. dumerili)51 and California a Medaka genes 21,233 76.7% yellowtail (S. dorsalis)52 have recently been published. We compared a Zebrafish genes 23,422 84.6% the predicted-protein sequences of yellowtail with those of these two Gene ontrogy (GO) 18,575 67.1% Seriola species. The protein sequences from yellowtail (27,693), Biological process 12,606 45.5% greater amberjack (22,005) and California yellowtail (25,789) were Cellular component 11,784 42.6% grouped into 20,499 clusters from which 17,197 (84%) clusters were Molecular function 13,094 47.3% InterPro 25,035 90.4% shared across all three species, and 20,058 (97.8%) clusters contained Enzyme 2,867 10.4% at least two species (Fig. 2B). We found 441 single-species clusters KEGG pathway 13,880 50.1% (2.2%) including 221 clusters (814 genes) from yellowtail, 22 clusters At least one functional 25,832 93.3% (53 genes) from greater amberjack and 198 clusters (416 genes) from annotation California yellowtail (Fig. 2B). As in the results of the comparisons Unannotated 1, 861 — with five teleosts, the most significant enriched GO terms in yellowtail- and greater amberjack-specific clusters were related to immune pro- aBLASTP hit of less than 1E5. cesses such as antigen binding (GO: 0003823, P ¼ 5.1E-4) in yellow- tail and positive regulation of interleukin-1 beta secretion (GO: 2,867 genes were identified to have at least one enzyme hit. The con- 0050718, P ¼ 3.9E-4) in greater amberjack (Supplementary Table served domain search (InterPro) showed that 25,035 (90.4%) genes S9). California yellowtail-specific clusters showed regulation of osteo- were identified to have at least one domain. Overall, 25,832 pre- clast differentiation (GO: 0045670, P ¼ 0.008) and regulation of dicted genes (93.3%) of yellowtail were functionally annotated by at RNA stability (GO: 0043487, P ¼ 0.008) as the most enriched clus- least one database, and only 1,861 predicted genes (6.7%) were ters, but the significance of these data is unclear (Supplementary Table unannotated (Table 2). This richness in functional annotation for S9). Together with the results of the comparison across six teleosts, yellowtail protein-coding genes will help us to better understand most of the species-specific clusters included immune-related GO their biology in downstream studies, such as whole-transcriptome terms, suggesting that lineage-specific expansion of immune-related analysis. genes is likely to play a key role in species diversification. This may be The genome-wide comparison of orthologous gene clusters was due to the remarkable evolutionary plasticity of the teleost immune performed to identify the degree of commonality across yellowtail system in response to habitat, specific environmental factors and and five other teleost species: Pacific bluefin tuna, croaker, tilapia, lineage-specific pathogens, and which is linked to the survival and ra- medaka and zebrafish. Based on sequence similarity of proteins, all diation of the teleost linnage.53,54 protein sequences from the six species were grouped into 21,658 clusters, from which 9,444 (43.6%) orthologous gene clusters were shared by all six species and 20,787 (96.0%) orthologous clusters 3.3. A snapshot of yellowtail GI tract transcriptome contained at least two species (Fig. 2A). On the other hand, We used our yellowtail reference genome for RNA-Seq analysis of 871 (4.0%) single-species clusters: 114 clusters (336 genes) for yel- three major organs of the GI tract: stomach, intestine and rectum as lowtail, 125 clusters (364 genes) for Pacific bluefin tuna, 105 clusters a first step toward understanding digestion and nutrient absorption (236 genes) for croaker, 109 clusters (326 genes) for tilapia, in yellowtail at the molecular level. This RNA-Seq generated 119 (490 genes) clusters for medaka and 299 (1,515 genes) clusters 10,692,067–27,326,877 reads (mean read length: 135–167 bp) per for zebrafish, were identified (Fig. 2A), suggesting a lineage-specific individual tissue type (Supplementary Table S6). expansion of these clusters. As seen earlier, the number of zebrafish- First, we characterized the highly expressed genes (top 500) in specific clusters was 2.4- to 2.7-fold higher than those for the other each tissue using a GO enrichment analysis, because the highly five teleosts, including yellowtail, which may explain the notable expressed genes can be considered to serve functionally important gene duplication and retention of recent duplicates in zebrafish after roles in the respective tissues that will help us to understand the a teleost-specific genome duplication event around 300 million years mode of operation of each tissue. This analysis showed that most of ago.47 A hypergeometric test for assessing enriched GO terms in the the enriched categories were shared between the intestine and single-species clusters was performed. Overall, many immune-related rectum, and included: ‘metallopeptidase activity’ (GO: 0008237, GO terms were found in the significantly enriched GO terms of nine genes each in intestine and rectum), ‘serine-type peptidase activ- species-specific clusters, except for those in madaka, which showed ity’ (GO: 0008236, 13 genes each in intestine and rectum), ‘peptidase that transposon-related GO terms were the most significant (P ¼ activity’ (GO: 0008233, 29 genes each in intestine and rectum), ‘hy- 2.58E-4) (Supplementary Table S8). drolase activity’ (GO: 0016787, 62 genes in intestine and 66 genes in Focussing on the yellowtail-specific clusters, one of the most sig- rectum) under the molecular function category, and ‘proteolysis’ nificant GO terms was taurine: sodium symporter activity (GO: (GO: 0006508, 31 genes in intestine and 33 genes in rectum) under 0005369, P ¼ 0.015), containing two taurine transporter (TauT/ the biological process category (Fig. 3). These categories are involved Slc6a6) genes, g27489 and g26790 (Supplementary Table S8). TauT in the hydrolysis of proteins into smaller polypeptides, suggesting mediates cellular uptake of taurine (2-aminoethane sulphonic acid) that both the intestine and rectum function to digest protein in the which is found in high concentrations in tissues.48 Taurine is yellowtail. We also found enriched GO terms common in the stom- a vital nutrient for many fish species, particularly for carnivorous ach and rectum that are related to proton transport activity, M. Yasuike et al. 553

Figure 2. Venn diagram showing the number of unique and shared gene clusters in (A) six teleosts (yellowtail, Pacific bluefin tuna, croaker, tilapia, medaka and zebrafish) and in (B) three Seriola species: yellowtail, greater amberjack (S. dumerili) and California yellowtail (S. dorsalis). Orthologous gene clusters were identified and visualized using OrthoVenn.43 The bar charts below the Venn diagrams represent the total number of clusters that are unique to a single species or shared between 2 and 6 species. including ‘proton-transporting ATP synthase activity, rotational these genes may play a role in osmoregulation.57 The stomach has mechanism’ (GO: 0046933, five genes in stomach and six genes in more non-overlapping enriched GO terms than other organs, and rectum), ‘hydrogen ion transmembrane transporter activity’ (GO: overexpressed ‘oxidoreductase activity’ (GO: 0016491, 24 genes), 0015078, seven genes in stomach and eight genes in rectum) under ‘isomerase activity’ (GO: 0016853, 11 genes) under the molecular the molecular function category, and ‘proton-transporting ATP syn- function category, ‘tricarboxylic acid cycle’ (GO: 0006099, five thase complex’ (GO: 0045259, five genes in stomach and six genes genes), ‘glycolysis’ (GO: 0006096, six genes), ‘generation of precur- in rectum) under the cellular component category (Fig. 3). These sor metabolites and energy’ (GO: 0006091, 19 genes) under the bio- genes probably contribute to the acidity of the stomach necessary for logical process category, and ‘mitochondrial inner membrane’ (GO: acid hydrolysis of feeds.55 In general, carnivorous fish have well de- 0005743, six genes) under the cellular component category (Fig. 3). veloped stomachs capable of an acid digestion phase, whereas in These GO terms suggest the role of the stomach in energy metabo- stomach-less fish such as carp, no such phase exists.56 In the rectum, lism. Although interpretation of the roles of these genes is unclear, 554 The haploid-based draft genome of the yellowtail

Figure 3. GO enrichment analysis of the top 500 highly expressed genes in the yellowtail stomach, intestine and rectum. This analysis was performed using PANTHER45 and the zebrafish gene list was selected. Values indicate fold enrichment (P-value < 0.05 after Bonferroni correction) and NS means not significant.

these results suggest that the digestive function of the stomach differs acting during the earliest stage of protein digestion in breaking down from that of the intestine and rectum in yellowtail. long-chain peptides.61,62 In particular, carnivorous fish have higher Next, we surveyed DEGs among the three organs to identify pepsin levels in their stomachs than omnivorous fish with stomachs tissue-specific gene expression patterns. The number of DEGs in the and stomach-less fish,63 such as zebrafish, which lack the pepsinogen stomach, when compared with those in the intestine and rectum gene.64 Another digestive enzyme, encoded by a chitinase gene were 6,001 and 6,040, respectively, indicating that the expression (g6038), was also found in the stomach-specific cluster (Fig. 4). pattern of the stomach was different from those of the intestine and It has been reported that the fish stomach exhibits chitinase activity, rectum, as expected by the GO enrichment analysis of the highly which is associated with the digestion of chitinous substances from expressed genes (Fig. 3). The comparison between the intestine and crustaceans.65,66 We found two stomach-specific genes, the Fc frag- the rectum also showed considerable differences in the gene expres- ment of IgG binding protein (FCGBP; g2075) and cysteine-rich secre- sion pattern, with 2,195 DEGs, suggesting that each part has a spe- tory protein 2 (CRISP2; g10639). FCGBP (g2075) is abundant in cialized function in digestive and absorption processes. Similarly, the mucus of humans and mice, and may be functionally related to over 1,900 DEGs (fold-change cutoff of 2) were observed between the gel-forming mucins,67 while CRISP2 is strongly expressed in the anterior-middle and posterior-intestinal segments (including the rec- mammalian male reproductive tract.68 Since the function of these tum) in a carnivorous fish, European sea bass (Dicentrarchus labrax) two genes in fish has not been elucidated, further studies are required using a custom oligo microarray.58 Using a cluster analysis of the to identify the role of these genes in the yellowtail stomach. DEGs, we extracted expression patterns that are characteristic of the As expected following the results of the GO enrichment analysis three tissue types (Supplementary Fig. S3). Figure 4 shows the heat (Fig. 3), proteolytic digestive enzyme genes, a trypsinogen (precursor map of 30 DEGs with the highest coefficient of variation among the of trypsin) gene (g15220), two chymotrypsinogen (precursor of chy- three tissues. motrypsin) genes (g21846 and g21847) and a carboxypeptidase B A gene encoding potassium-transporting ATPase alpha chain gene (g11916) were found to be specifically expressed in the intestine 1 (g269), which has a role in acid production in the mammalian and the rectum (Fig. 4). In vertebrates including fish, hydrolysis and stomach55,59 was specifically expressed in the stomach (Fig. 4). As absorption of proteins occurs primarily in the intestine.69 It has been mentioned above, this gene may facilitate the acidic environment of reported that trypsin activity in fish is significantly higher in the intes- the yellowtail stomach. In addition, four genes (g6958, g17761, tine than in the stomach, which showed very little or no activity,61,62 g17762 and g23012) encoding pepsinogens, and activated to the ac- and high trypsinogen mRNA expression levels were also found in the tive form pepsin under acidic conditions,60 were specifically intestine of Atlantic salmon (Salmo salar),70 Senegalese sole (Solea expressed in the stomach (Fig. 4). In vertebrates, including fish, pep- senegalensis)71 and pufferfish (Takifugu obscurus).72 Our RNA-Seq sin has been identified as the major acidic protease in the stomach, results suggest that both the intestine and rectum play major roles in M. Yasuike et al. 555

Figure 4. Heat map of 30 DEGs with the highest coefficient of variation among the yellowtail stomach (st), intestine (int) and rectum (rec). hydrolysis and absorption of proteins in yellowtail. It should be mammals, which use carbohydrates as their main energy source, noted that the gene expression of these proteolytic digestive enzymes most fish predominantly make use of lipids.77 Therefore, these apoli- was low in the intestine of one fish (Fig. 4). Calduch-Giner et al.58 poproteins and AFP4 may play important roles in lipid metabolism suggested that fish intestine transcriptomes are not static, but change in the yellowtail intestine. In addition to these lipid metabolism- spatially, seasonally and with diet. Since the samples of GI tract ex- related genes, an acidic mammalian chitinase (AMCase) gene amined were obtained from cultured yellowtail maintained under the (g23028) was found in the intestine-specific cluster (Fig. 4), and a same conditions and sampled at the same time, our study suggests higher expression of this gene in the anterior-middle intestinal area that the expression of proteolytic enzymes also varies between indi- as opposed to posterior segments in European seabass has been viduals, and that such differences may cause different physiological found.58 In the mouse, acidic mammalian AMCase can act in diges- conditions in individuals. tion of chitin polymers even in the presence of pepsin C, trypsin and Focussing on intestine-specific gene expression, three apolipopro- chymotrypsin.78 This enzyme also plays a critical role in the human tein genes, apolipo Eb (g15747), apolipo B-100 (g10622) and apo- immune response to pathogens.79 It should be noted that fish intes- lipo A-1 (g15231), were included in this cluster (Fig. 4). tines function as immune barriers to pathogens80,81 and that fish Apolipoproteins play a crucial role in lipid transport and uptake in apolipoproteins play a role in this immune function.82–84 Thus, vertebrates and are synthesized mainly in the intestine and liver of AMCase and apolipoproteins may have roles not only in digestion most teleosts.73–75 In addition, another intestine-specific gene, Type- and absorption but also in defense against intestinal pathogens. Four 4 ice-structuring protein LS-12 (AFP4; g15894), which was first iso- muscle-related genes, myosin-11 (g20983), desmin (g10938), actin lated as an antifreeze protein from the serum of the longhorn sculpin (g20707) and filamin A (g9142) were highly expressed in the intes- (Myoxocephalus octodecimspinosis),76 may also play a role in lipid tine (Fig. 4). The GO enrichment analysis (Fig. 3) also showed the metabolism because it has an apolipoprotein A-II domain (InterPro: over-representation of muscle-related GO terms in the intestine such IPR006801). It has also been reported that the expression of the as ‘cytoskeleton’ (GO: 0005856, 22 genes), ‘actin cytoskeleton’ (GO: AFP4 gene was higher in the anterior-middle intestinal segments 0015629, 13 genes) and ‘intermediate filament cytoskeleton’ (GO: than the posterior segment in European seabass.58 In contrast to 0045111, 6 genes). Recently, Abrams et al.85 reported that zebrafish 556 The haploid-based draft genome of the yellowtail myosin-11 (myh11) gene mutations affected intestinal motility. In expression patterns of proteolytic digestive enzymes in yellowtail. In addition, a collagen a-1(I) chain gene (g10729) was found in the the yellowtail genome, we found 75 candidate genes for proteolytic intestine-specific expression cluster (Fig. 4), and its role in responding digestive enzymes: five pepsinogens, 19 trypsinogens, four chymo- to stretching of foetal human intestinal smooth muscle cells has been trypsinogens, 11 elastases, seven carboxypeptidases (A and B), four investigated.86 Thus, these muscle-related genes of yellowtail may aminopeptidase Ns, 19 cathepsins and six collagenases (Fig. 5A). have functions related to intestinal motility. Grau et al.87 suggested The expressional percentages of these categories were expressed as the importance of intestinal motility for intestinal absorption in car- TPM values for each tissue (Fig. 5B). The distribution patterns of nivorous fish with irregular intake of large quantities of food. these enzymes varied among the three tissues. In the stomach, the ex- The rectum of yellowtail is separated from the posterior region of pressional percentage was almost entirely occupied by pepsinogen the intestine by the ileo-rectal valve. As observed in the rectum of the genes, which accounted for 99.9% of the total. Surprisingly, the greater amberjack,87 the mucosa are more deeply folded than in expression percentage of pepsinogens accounted for 45.6–65.0% of those of the intestine, and the rectum is also considered to play a role the overall stomach transcriptome in three individuals. In the intes- in absorption.87 The rectum-specific cluster included two cathepsin tine, the transcript levels of trypsinogens were very high (58.9%), fol- genes, cathepsin L1 (g24904) and cathepsin Z (g6016) (Fig. 4). lowed by chymotrypsinogens (15.4%), elastases (14.6%) and Cathepsin L1 gene also showed a higher expression in the anterior carboxypeptidases (6.4%). These gene expression patterns are corre- segments than in the posterior segment of the intestinal tract in lated with the intestinal protease activity in carnivorous fish as ob- zebrafish64 and European seabass.58 Cathepsins are essential lysoso- served in previous studies. Eshel et al.97 estimated that trypsin mal proteolytic enzymes and the activity of cathepsin D has been contributes to 40–50% of the overall protein digestion in carnivo- shown to be involved primarily with the posterior intestine in rain- rous fish intestines. Furthermore, it has been reported that trypsin ac- bow trout (Oncorhynchus mykiss).69 However, little is known about tivity in carnivorous fish is about four times (3.9 times) that of the function of cathepsins in the fish rectum and studies are needed chymotrypsin activity, a relationship that is reversed in omnivorous to elucidate their role in digestion and absorption processes. Two and herbivorous species.98 Similarly, the expressional percentage of multiligand endocytic receptor genes, cubilin (g20710) and megalin trypsinogen genes in yellowtail showed about four times (3.8 times) (g13916), were also found as rectum-specific genes (Fig. 4), and each that of chymotrypsinogen genes. Trypsinogen genes were also shown to mediate reabsorption of proteins and vitamins in mam- expressed at the highest levels in the rectum (36.6%), but the expres- mals. The megalin/cubilin complex delivers its ligands (proteins and sional percentage was lower than that of the intestine (58.9%). vitamins) to lysosomes, where all proteins are degraded, and the Alternatively, cathepsin genes showed a high transcript level in the amino acids and vitamins returned to circulation.88,89 Two rectum- rectum which accounted for 32.8%, the highest among the three specific genes, deleted in malignant brain tumours 1 (DMBT1; tissues. g13722) and macrophage mannose receptor 1 (g428), are possibly To determine whether there are differential gene usages within involved in defense against bacterial pathogens.90,91 The gene each category of proteolytic digestive enzymes, we further analysed (g9672) encoding a-N-acetylgalactosaminidase, a lysosomal glycohy- the expression levels of individual genes in the highly expressed cate- drolase,92 was also specifically expressed in the rectum (Fig. 4), al- gories in each tissue (Fig. 5B): pepsinogen genes in the stomach though the functions of this enzyme in the rectum are unknown. (Fig. 5C), trypsinogen genes in the intestine and rectum (Fig. 5D) and The results of GO enrichment analysis of highly expressed genes and cathepsin genes in the rectum (Fig. 5E). This analysis revealed that cluster analysis of the DEGs together showed that the acid secretion- individual gene expression levels are different for each enzyme. related genes and acid-activated protease (pepsinogen) genes were specif- Of the five pepsinogen genes in the stomach, g23012 occupied ically expressed in the stomach, while the proteolytic digestive enzymes, 72.1% of the total expression value, followed by g6958 which trypsin, chymotrypsin and carboxypeptidase B, were highly and accounted for 16.0% (Fig. 5C). In teleosts, two types of pepsinogen commonly expressed in the intestine and the rectum. In addition, the have been identified, pepsinogen A and pepsinogen C.99 Both RNA-Seq also revealed differences in gene expression patterns between g23012 and g6958 genes encode pepsinogen A, and the amino acid the intestine and the rectum. The possible lipid metabolism-related genes identity and similarity between these two sequences are 64.6 and (apolipoproteins and AFP4) were specifically expressed in the intestine, 78.1%, respectively (Supplementary Fig. S4A). Of 19 trypsinogen while the genes involved in lysosomal digestive and reabsorption pro- genes, only three, g15220, g121 and g15219, accounted for >70% cesses (cathepsins, cubilin and megalin), were specifically expressed in of the total in the intestine and rectum (Fig. 5D). Among them, the rectum. These transcriptional signatures exhibit the main features of g15220 showed about half of the total expression value in the intes- the digestive tract of carnivorous fishes, which possess higher proteolytic tine (46.9%) and rectum (51.7%). Based on amino acid sequence enzyme activities than herbivorous or omnivorous species,93,94 which in identities, most trypsins can be classified into three basic Groups I– general exhibit higher a-amylase (carbohydrate enzyme) activities.95 It III.100 g15220 and g15219 encode Trypsin I, and the amino acid should be noted that two a-amylase genes (g13238 and g13239) were identity and similarity between them was 84.3 and 92.6%, respec- found in the yellowtail, but their expression was low or undetected in tively. On the other hand, g121 encodes Trypsin III, and it has 69.0 the three GI tissues. Efficiency of food absorption and conversion can and 82.7% amino acid identity and similarity with g15220, and 73.9 depend on the availability of digestive enzymes.96 Thus, these findings and 86.7% amino acid identity and similarity with g15219 suggest that proteolytic digestive enzymes play a key role in digestion (Supplementary Fig. S4B). Of 19 cathepsin genes in the rectum, the and nutrient absorption in the yellowtail GI tract. transcript levels of g24904 (cathepsin L1) were highest (40.5%), followed by g6016 (cathepsin Z, 29.3%) and g6015 (cathepsin Z, 18.2%). The protein homology values among them were low or 3.4. Gene expression patterns of proteolytic digestive showed no significant similarity (Supplementary Fig. S4C). enzymes Overall, these results suggest that most proteolytic digestion in the Because the RNA-Seq results suggested the importance of proteolytic yellowtail is related to the expression of specific genes, despite the processes in the yellowtail GI tract, we further analysed the large number of proteolytic digestive enzyme genes in the genome M. Yasuike et al. 557

Figure 5. Transcript levels of proteolytic digestive enzyme genes in the yellowtail stomach, intestine and rectum. (A) The number of proteolytic digestive en- zyme genes identified in the yellowtail genome. (B) Expressional percentages based on TPM values for different categories of proteolytic enzyme genes. The expression levels of individual genes in the highly expressed categories in each tissue are also shown in (C) pepsinogen genes in stomach, (D) trypsinogen genes in intestine and rectum and (E) cathepsin genes in the rectum. The amino acid sequence identities and similarities between these highly expressed genes can be found in Supplementary Fig. S4. 558 The haploid-based draft genome of the yellowtail

(Fig. 5A). The highly expressed proteolytic digestive enzyme genes References identified in this study may play a key role in digestion in adult yel- 1. Swart, B.L., von der Heyden, S., Bester-van der Merwe, A. and Roodt- lowtail. This finding suggests that the identification of proteins di- Wilding, R. 2015, Molecular systematics and biogeography of the cir- gestible by the products of these genes may lead to a yellowtail feed cumglobally distributed genus Seriola (Pisces: carangidae), Mol. 96 formulation with enhanced digestibility. Lemeiux et al. reported Phylogenet. Evol., 93, 274–80. that among the activities of a number of enzymes, trypsin alone 2. Sicuro, B. and Luzzana, U. 2016, The State of Seriola spp. Other Than showed a significant correlation with food conversion efficiency and Yellowtail (S. quinqueradiata) Farming in the World, Rev. Fish Sci. growth rate in the carnivorous Atlantic cod. Furthermore, it has Aquac., 24, 314–25. been demonstrated that trypsin isoforms showed differing substrate 3. Ministry of Agriculture, Forestry and Fisheries of Japan. 2017, Annual preferences and distinct catalytic properties.100,101 Therefore, further Statistics of Fishery and Aquaculture Production. Statistics Department. study of the substrate preferences of the products of the three highly http://www.maff.go.jp/j/tokei/kouhyou/kaimen_gyosei/attach/pdf/index- expressed trypsinogen genes (g15220, g121 and g15219) may inform 7.pdf (in Japanese). 4. Dhirendra, P.T. 2005, Cultured Aquatic Species Information the makeup of yellowtail feed ingredients. Programme. Seriola quinqueradiata. In: Cultured Aquatic Species This study has focussed on adult yellowtail, and the gene expres- Information Programme. Rome: FAO Fisheries and Aquaculture sion patterns for digestive enzymes may be different for larval Department, pp. 1–11. 71 stages. Studies of gene expression patterns for digestive enzymes in 5. World Bank. 2013, Fish to 2030: Prospects for Fisheries and larval stages of yellowtail will inform the development of formulated Aquaculture. Agriculture and Environmental Services Discussion Paper, diets for larvae of this species. In addition, the digestive systems of No. 3. Washington DC: World Bank Group, pp. 1–102. fish show numerous structural and functional adaptations to their 6. Abdelrahman, H., ElHady, M., Alcivar-Warren, A., et al. 2017, feeding habits. Thus, the transcriptomes of digestive tissues in fish Aquaculture genomics, genetics and breeding in the United States: cur- are considered highly species-specific,58 suggesting that transcrip- rent status, challenges, and priorities for future research, BMC tome profiling of these tissues may be instructive for the development Genomics, 18, 191. 7. Macqueen, D.J., Primmer, C.R., Houston, R.D., et al. 2017, Functional of fish feed for species of interest in the aquaculture industry. Annotation of All Salmonid Genomes (FAASG): an international initia- tive supporting future salmonid research, conservation and aquaculture, BMC Genomics, 18, 484. Availability 8. Yue, G.H. and Wang, L. 2017, Current status of genome sequencing and The draft genome sequence of S. quinqueradiata was submitted to its applications in aquaculture, Aquaculture, 468, 337–47. DDBJ under accession no. BEWX01000001–BEWX01001394 9. Aoki, J.Y., Kai, W., Kawabata, Y., et al. 2015, Second generation physi- (1394 entries). The complete set of predicted protein-coding genes cal and linkage maps of yellowtail (Seriola quinqueradiata) and compari- son of synteny with four model fish, BMC Genomics.,16, 406. are accessible at http://nrifs.fra.affrc.go.jp/ResearchCenter/5_BB/ 10. Fuji, K., Koyama, T., Kai, W., et al. 2014, Construction of a genomes/Yellowtail_genes/index.html. The results of the functional high-coverage bacterial artificial chromosome library and comprehensive annotation of the predicted protein-coding genes are available in genetic linkage map of yellowtail Seriola quinqueradiata, BMC Res. Supplementary Table S7. Raw RNA-Seq datasets for the three diges- Notes, 7, 200. tive organs of yellowtail have been deposited at DDBJ Sequence 11. Ohara, E., Nishimura, T., Nagakura, Y., Sakamoto, T., Mushiake, K. Read Archive (DRA) under accession no. DRR107933–DRR107935 and Okamoto, N. 2005, Genetic linkage maps of two yellowtails (stomach), DRR107936–DRR107938 (intestine) and DRR107939– (Seriola quinqueradiata and Seriola lalandi), Aquaculture, 244, 41–8. DRR107941 (rectum). 12. Aoki, J.Y., Kai, W., Kawabata, Y., et al. 2014, Construction of a radiation hybrid panel and the first yellowtail (Seriola quinqueradiata) radiation hy- brid map using a nanofluidic dynamic array, BMC Genomics, 15,165. Acknowledgements 13. Ozaki, A., Yoshida, K., Fuji, K., et al. 2013, Quantitative trait loci (QTL) associated with resistance to a monogenean parasite (Benedenia This work was partly supported by an Adaptable and Seamless Technology seriolae) in yellowtail (Seriola quinqueradiata) through genome wide transfer Programme through Target-driven R&D (A-STEP: AS2815106U) analysis, PloS One, 8, e64987. from the Japan Science and Technology Agency. We thank Dr Shintaro 14. Fuji, K., Yoshida, K., Hattori, K., et al. 2010, Identification of the sex-linked Imamura (National Research Institute of Fisheries Science, Japan Fisheries locus in yellowtail, Seriola quinqueradiata, Aquaculture, 308, S51–5. Research and Education Agency) and Dr Susumu Uji (National Research 15. Koyama, T., Ozaki, A., Yoshida, K., et al. 2015, Identification of Institute of Aquaculture, Japan Fisheries Research and Education Agency) for sex-linked snps and sex-determining regions in the yellowtail genome, providing samples of larval yellowtail digestive organs and Dr Tsubasa Mar. Biotechnol. , 17, 502–10. Uchino (National Research Institute of Fisheries Science, Japan Fisheries 16. Iwasaki, Y., Nishiki, I., Nakamura, Y., et al. 2016, Effective de novo as- Research and Education Agency) for his help in writing Section 2.2. Linkage sembly of fish genome using haploid larvae, Gene, 576, 644–9. map construction and scaffold correction. We also would like to thank Dr 17. Li, H. 2013, Aligning sequence reads, clone sequences and assembly con- Kristian von Schalburg (Department of Molecular Biology and Biochemistry, tigs with BWA-MEM, arXiv Preprint, arXiv, 1303, 3997. S.F.U., Vancouver, BC, Canada) and Editage (www.editage.jp) for English lan- 18. Kajitani, R., Toshimoto, K., Noguchi, H., et al. 2014, Efficient de novo guage editing. assembly of highly heterozygous genomes from whole-genome shotgun short reads, Genome Res., 24, 1384–95. 19. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. Conflict of interest 1990, Basic local alignment search tool, J. Mol. Biol., 215, 403–10. 20. Kai, W., Nomura, K., Fujiwara, A., et al. 2014, A ddRAD-based genetic None declared. map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the Supplementary data teleost-specific genome duplication, BMC Genomics, 15, 233. 21. Xue, W., Li, J.T., Zhu, Y.P., et al. 2013, L_RNA_scaffolder: scaffolding Supplementary data are available at DNARES online. genomes with transcripts, BMC Genomics., 14, 604. M. Yasuike et al. 559

22. Marcais, G. and Kingsford, C. 2011, A fast, lock-free approach for effi- 45. Mi, H., Muruganujan, A., Casagrande, J.T. and Thomas, P.D. 2013, cient parallel counting of occurrences of k-mers, Bioinformatics, 27, Large-scale gene function analysis with the PANTHER classification sys- 764–70. tem, Nat. Protoc., 8, 1551–66. 23. Catchen, J.M., Amores, A., Hohenlohe, P., Cresko, W. and Postlethwait, 46. Zhang, H., Tan, E., Suzuki, Y., et al. 2015, Dramatic improvement in ge- J.H. 2011, Stacks: building and genotyping Loci de novo from nome assembly achieved using doubled-haploid genomes, Sci. Rep., 4, short-read sequences, G3 (Bethesda), 1, 171–82. 6780. 24. DePristo, M.A., Banks, E., Poplin, R., et al. 2011, A framework for vari- 47. Lu, J., Peatman, E., Tang, H., Lewis, J. and Liu, Z. 2012, Profiling of ation discovery and genotyping using next-generation DNA sequencing gene duplication patterns of sequenced teleost genomes: evidence for data, Nat. Genet., 43, 491–8. rapid lineage-specific genome expansion mediated by recent tandem 25. Broman, K.W., Wu, H., Sen, S. and Churchill, G.A. 2003, R/qtl: qTL duplications, BMC Genomics, 13, 246. mapping in experimental crosses, Bioinformatics, 19, 889–90. 48. Pinto, W., Ronnestad, I., Jordal, A.E., Gomes, A.S., Dinis, M.T. and 26. Nishimura, O., Hara, Y. and Kuraku, S. 2017, gVolante for standardiz- Aragao, C. 2012, Cloning, tissue and ontogenetic expression of the tau- ing completeness assessment of genome and transcriptome assemblies, rine transporter in the flatfish Senegalese sole (Solea senegalensis), Bioinformatics, 33, 3635–37. Amino Acids, 42, 1317–27. 27. Parra, G., Bradnam, K., Ning, Z., Keane, T. and Korf, I. 2009, Assessing 49. Takeuchi, T. 2014, Progress on larval and juvenile nutrition to improve the gene space in draft genomes, Nucleic Acids Res., 37, 289–97. the quality and health of seawater fish: a review, Fish. Sci., 80, 389–403. 28. Hara, Y., Tatsumi, K., Yoshida, M., Kajikawa, E., Kiyonari, H. and 50. Takagi, S., Murata, H., Goto, T., et al. 2005, The green liver syndrome Kuraku, S. 2015, Optimizing and benchmarking de novo transcriptome is caused by taurine deficiency in yellowtail, Seriola quinqueradiata Fed sequencing: from library preparation to assembly evaluation, BMC diets without fishmeal, Aquacult. Sci., 53, 279–90. Genomics, 16, 977. 51. Sarropoulou, E., Sundaram, A.Y.M., Kaitetzidou, E., et al. 2017, Full ge- 29. Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. and nome survey and dynamics of gene expression in the greater amberjack Zdobnov, E.M. 2015, BUSCO: assessing genome assembly and annota- Seriola dumerili, GigaScience, 6, 1–13. tion completeness with single-copy orthologs, Bioinformatics, 31, 52. Purcell, C.M., Seetharam, A.S., Snodgrass, O., Ortega-Garcia, S., Hyde, 3210–2. J.R. and Severin, A.J. 2018, Insights into teleost sex determination from 30. Meyer, E., Aglyamova, G.V., Wang, S., et al. 2009, Sequencing and de the Seriola dorsalis genome assembly, BMC Genomics, 19, 31. novo analysis of a coral larval transcriptome using 454 GSFlx, BMC 53. Malmstrom, M., Matschiner, M., Torresen, O.K., et al. 2016, Evolution Genomics., 10, 219. of the immune system influences speciation rates in teleost fishes, Nat. 31. Stanke, M., Diekhans, M., Baertsch, R. and Haussler, D. 2008, Using Genet., 48, 1204–10. native and syntenically mapped cDNA alignments to improve de novo 54. Solbakken, M.H., Voje, K.L., Jakobsen, K.S. and Jentoft, S. 2017, gene finding, Bioinformatics , 24, 637–44. Linking species habitat and past palaeoclimatic events to evolution of 32. Smit, A., Hubley, R. and Green, P. 1996–2010, RepeatMasker the teleost innate immune system, Proc. R Soc. B, 284, 20162810. Open-3.0. http://www.repeatmasker.org. 55. Maeda, M., Oshiman, K., Tamura, S. and Futai, M. 1990, Human gas- 33. Smit, A. and Hubley, R. 2008–2010, RepeatModeler Open-1.0. http:// tric (HþþKþ)-ATPase gene. Similarity to (NaþþKþ)-ATPase genes www.repeatmasker.org. in exon/intron organization but difference in control region, J. Biol. 34. Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R. and Salzberg, Chem., 265, 9027–32. S.L. 2013, TopHat2: accurate alignment of transcriptomes in the pres- 56. Ogino, C., Takeuchi, L., Takeda, H. and Watanabe, T. 1979, ence of insertions, deletions and gene fusions, Genome Biol., 14, R36. Availability of dietary phosphorus in carp and rainbow trout, Nippon 35. Trapnell, C., Williams, B.A., Pertea, G., et al. 2010, Transcript assembly Suisan Gakkaishi, 45, 1527–32. and quantification by RNA-Seq reveals unannotated transcripts and iso- 57. Guffey, S., Esbaugh, A. and Grosell, M. 2011, Regulation of apical form switching during cell differentiation, Nat. Biotechnol., 28, 511–5. H(þ)-ATPase activity and intestinal HCO(3)(-) secretion in marine fish 36. Yates, A., Akanni, W., Amode, M.R., et al. 2016, Ensembl 2016, osmoregulation, Am. J. Physiol. Regul. Integr. Comp. Physiol., 301, Nucleic Acids Res., 44, D710–6. R1682–91. 37. Altschul, S.F., Madden, T.L., Schaffer, A.A., et al. 1997, Gapped BLAST 58. Calduch-Giner, J.A., Sitja-Bobadilla, A. and Perez-Sanchez, J. 2016, and PSI-BLAST: a new generation of protein database search programs, Gene expression profiling reveals functional specialization along the in- Nucleic Acids Res., 25, 3389–402. testinal tract of a carnivorous teleostean fish (Dicentrarchus labrax), 38. Jones, P., Binns, D., Chang, H.Y., et al. 2014, InterProScan 5: Front. Physiol., 7, 359. genome-scale protein function classification, Bioinformatics, 30, 59. Sachs, G., Chang, H.H., Rabon, E., Schackman, R., Lewin, M. and 1236–40. Saccomani, G. 1976, A nonelectrogenic Hþ pump in plasma membranes 39. Gotz, S., Garcia-Gomez, J.M., Terol, J., et al. 2008, High-throughput of hog stomach, J. Biol. Chem., 251, 7690–8. functional annotation and data mining with the Blast2GO suite, Nucleic 60. Foltmann, B. 1981, Gastric proteinases–structure, function, evolution Acids Res., 36, 3420–35. and mechanism of action, Essays Biochem., 17, 52–84. 40. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C. and Kanehisa, M. 61. Chong, A.S.C., Hashim, R., Chow-Yang, L. and Ali, A.B. 2002, Partial 2007, KAAS: an automatic genome annotation and pathway reconstruc- characterization and activities of proteases from the digestive tract of dis- tion server, Nucleic Acids Res., 35, W182–5. cus fish (Symphysodon aequifasciata), Aquaculture, 203, 321–33. 41. Nakamura, Y., Mori, K., Saitoh, K., et al. 2013, Evolutionary changes 62. Deguara, S., Jauncey, K. and Agius, C. 2003, Enzyme activities and pH varia- of multiple visual pigment genes in the complete genome of Pacific blue- tions in the digestive tract of gilthead sea bream, J. Fish Biol., 62, 1033–43. fin tuna, Proc. Natl. Acad. Sci. U. S. A., 110, 11061–6. 63. Haard, N.F. 1995, Digestibility and in vitro evaluation of vegetable pro- 42. Xu, T., Xu, G., Che, R., et al. 2016, The genome of the miiuy croaker teins for salmonid feed. In: Lim, C. and Sessa, D.J. (eds), Nutrition and reveals well-developed innate immune and sensory systems, Sci. Rep., 6, Utilization Technology in Aqualture, Champaign, IL: AOCS Press, pp. 21902. 199–219. 43. Wang, Y., Coleman-Derr, D., Chen, G. and Gu, Y.Q. 2015, OrthoVenn: 64. Wang, Z., Du, J., Lam, S.H., Mathavan, S., Matsudaira, P. and Gong, a web server for genome wide comparison and annotation of ortholo- Z. 2010, Morphological and molecular evidence for functional organiza- gous clusters across multiple species, Nucleic Acids Res., 43, W78–84. tion along the rostrocaudal axis of the adult zebrafish intestine, BMC 44. Wagner, G.P., Kin, K. and Lynch, V.J. 2012, Measurement of mRNA Genomics, 11, 392. abundance using RNA-seq data: rPKM measure is inconsistent among 65. Ikeda, M., Kakizaki, H. and Matsumiya, M. 2017, Biochemistry of fish samples, Theory Biosci., 131, 281–5. stomach chitinase, Int. J. Biol. Macromol., 104, 1672–81. 560 The haploid-based draft genome of the yellowtail

66. Matsumiya, M. and Mochizuki, A. 1996, Distribution of chitinase and mykiss: evaluation of its expression in primary defence barriers and b-N-acetylhexosaminidase in the organs of several fishes, Fish. Sci., 62, plasma levels in sick and healthy fish, Fish Shellfish Immunol., 23, 150–1. 197–209. 67. Lang, T., Klasson, S., Larsson, E., Johansson, M.E., Hansson, G.C. and 85. Abrams, J., Einhorn, Z., Seiler, C., Zong, A.B., Sweeney, H.L. and Pack, Samuelsson, T. 2016, Searching the evolutionary origin of epithelial mucus M. 2016, Graded effects of unregulated smooth muscle myosin on intes- protein components-mucins and FCGBP, Mol. Biol. Evol., 33, 1921–36. tinal architecture, intestinal motility and vascular function in zebrafish, 68. Koppers, A.J., Reddy, T. and O’Bryan, M.K. 2011, The role of Dis. Model. Mech., 9, 529–40. cysteine-rich secretory proteins in male fertility, Asian J. Androl., 13, 86. Gutierrez, J.A. and Perr, H.A. 1999, Mechanical stretch modulates 111–7. TGF-b1 and a1(I) collagen expression in fetal human intestinal smooth 69. Sire, M.F. and Vernier, J.-M. 1992, Intestinal absorption of protein in muscle cells, Am. J. Physiol. Gastrointest. Liver Physiol., 277, teleost fish, Comp. Biochem. Physiol. A Physiol., 103, 771–81. G1074–80. 70. Lilleeng, E., Froystad, M.K., Ostby, G.C., Valen, E.C. and Krogdahl, A. 87. Grau, A., Crespo, S., Sarasquete, M.C. and de Canales, M.L.G. 1992, 2007, Effects of diets containing soybean meal on trypsin mRNA expres- The digestive tract of the amberjack Seriola dumerili, Risso: a light and sion and activity in Atlantic salmon (Salmo salar L), Comp. Biochem. scanning electron microscope study, J. Fish Biol., 41, 287–303. Physiol. A Mol. Integr. Physiol., 147, 25–36. 88. Christensen, E.I., Birn, H., Storm, T., Weyer, K. and Nielsen, R. 2012, 71. Manchado, M., Infante, C., Asensio, E., Crespo, A., Zuasti, E. and Endocytic receptors in the renal proximal tubule, Physiology (Bethesda), Canavate,~ J.P. 2008, Molecular characterization and gene expression of 27, 223–36. six trypsinogens in the flatfish Senegalese sole (Solea senegalensis Kaup) 89. Haraldsson, B. 2010, Tubular reabsorption of albumin: it’s all about during larval development and in tissues, Comp. Biochem. Physiol. B, cubilin, J. Am. Soc. Nephrol., 21, 1810–2. Biochem. Mol. Biol., 149, 334–44. 90. Rodrıguez, A., Esteban, M.A. and Meseguer, J. 2003, A mannose-receptor 72. Zhong, G., Qian, X., Hua, X. and Zhou, H. 2011, Effects of feeding is possibly involved in the phagocytosis of Saccharomyces cerevisiae by with corn gluten meal on trypsin activity and mRNA expression in Fugu seabream (Sparus aurata L.) leucocytes, Fish Shellfish Immunol., 14, obscurus, Fish Physiol. Biochem., 37, 453–60. 375–88. 73. Choudhury, M., Oku, T., Yamada, S., et al. 2011, Isolation and charac- 91. Rosenstiel, P., Sina, C., End, C., et al. 2007, Regulation of DMBT1 via terization of some novel genes of the apolipoprotein A-I family in NOD2 and TLR4 in intestinal epithelial cells modulates bacterial recog- Japanese eel, Anguilla Japonica, Cent. Eur. J. Biol., 6, 545. nition and invasion, J. Immunol., 178, 8203–11. 74. Kim, K.Y., Cho, Y.S., Bang, I.C. and Nam, Y.K. 2009, Isolation and 92. Wang, A.M. and Desnick, R.J. 1991, Structural organization and com- characterization of the apolipoprotein multigene family in Hemibarbus plete sequence of the human a-N-acetylgalactosaminidase gene: homol- mylodon (Teleostei: cypriniformes), Comp. Biochem. Physiol. B ogy with the a-galactosidase A gene provides evidence for evolution Biochem. Mol. Biol., 152, 38–46. from a common ancestral gene, Genomics, 10, 133–42. 75. Qu, M., Huang, X., Zhang, X., Liu, Q. and Ding, S. 2014, Cloning and 93. Sabapathy, U. and Teo, L.H. 1993, A quantitative study of some diges- expression analysis of apolipoprotein A-I (ApoA-I) in the Hong Kong tive enzymes in the rabbitfish, Siganus canaliculatus and the sea bass, grouper (Epinephelus akaara), Aquaculture, 432, 85–96. Lates calcarifer, J. Fish Biol., 42, 595–602. 76. Deng, G. and Laursen, R.A. 1998, Isolation and characterization of an 94. Kapoor, B.G., Smit, H. and Verighina, I.A. 1976, The alimentary canal antifreeze protein from the longhorn sculpin, Myoxocephalus octode- and digestion in teleosts. In: Russell, F.S. and Yonge, M. (eds), Advances cimspinosis, Biochim. Biophys. Acta, 1388, 305–14. in Marine Biology, London: Academic Press, pp. 109–239. 77. Luquet, P. and Watanabe, T. 1986, Lipid nutrition in fish, Fish Physiol. 95. Hidalgo, M.C., Urea, E. and Sanz, A. 1999, Comparative study of diges- Biochem., 2, 121., tive enzymes in fish with different nutritional habits. Proteolytic and am- 78. Ohno, M., Kimura, M., Miyazaki, H., et al. 2016, Acidic mammalian ylase activities, Aquaculture, 170, 267–83. chitinase is a proteases-resistant glycosidase in mouse digestive system, 96. Lemieux, H., Blier, P. and Dutil, J.D. 1999, Do digestive enzymes set a Sci. Rep., 6, 37756. physiological limit on growth rate and food conversion efficiency in the 79. Vannella, K.M., Ramalingam, T.R., Hart, K.M., et al. 2016, Acidic chiti- Atlantic cod (Gadus morhua)?, Fish Physiol. Biochem., 20, 293–303. nase primes the protective immune response to gastrointestinal nemato- 97. Eshel, A., Lindner, P., Smirnoff, P., Newton, S. and Harpaz, S. 1993, des, Nat. Immunol., 17, 538–44. Comparative study of proteolytic enzymes in the digestive tracts of the 80. Martin, S.A.M., Dehler, C.E. and Kro´ l, E. 2016, Transcriptomic european sea bass and hybrid striped bass reared in freshwater, Comp. responses in the fish intestine, Dev. Comp. Immunol., 64, 103–17. Biochem. Physiol. A Physiol., 106, 627–34. 81. Rombout, J.H.W.M., Yang, G. and Kiron, V. 2014, Adaptive immune 98. Jo´na´s, E., Ra´gyanszki, M., Ola´h, J. and Boross, L. 1983, Proteolytic di- responses at mucosal surfaces of teleost fish, Fish Shellfish Immunol., 40, gestive enzymes of carnivorous (Silurus glanis L.), herbivorous 634–43. (Hypophthalmichthys molitrix Val.) and omnivorous (Cyprinus carpio 82. Concha, M.I., Molina, S., Oyarzun, C., Villanueva, J. and Amthauer, R. L.) fishes, Aquaculture, 30, 145–54. 2003, Local expression of apolipoprotein A-I gene and a possible role 99. Chi, L., Xu, S., Xiao, Z., et al. 2013, Pepsinogen A and C genes in turbot for HDL in primary defence in the carp skin, Fish Shellfish Immunol., (Scophthalmus maximus): characterization and expression in early devel- 14, 259–73. opment, Comp. Biochem. Physiol. B, Biochem. Mol. Biol., 165, 58–65. 83. Concha, M.I., Smith, V.J., Castro, K., Bastias, A., Romero, A. and 100. Gudmundsdottir, A. and Palsdottir, H.M. 2005, Atlantic cod trypsins: Amthauer, R.J. 2004, Apolipoproteins A-I and A-II are potentially im- from basic research to practical applications, Mar. Biotechnol., 7, portant effectors of innate immunity in the teleost fish, Cyprinus Carpio. 77–88. Eur. J. Biochem., 271, 2984–90. 101. Fletcher, T.S., Alhadeff, M., Craik, C.S. and Largman, C. 1987, 84. Villarroel, F., Bastias, A., Casado, A., Amthauer, R. and Concha, M.I. Isolation and characterization of a cDNA encoding rat cationic trypsino- 2007, Apolipoprotein A-I, an antimicrobial protein in Oncorhynchus gen, Biochemistry, 26, 3081–6.