A Chromosomal-Level Genome Assembly and the Diet Habit-Speci C
Total Page:16
File Type:pdf, Size:1020Kb
A chromosomal-level genome assembly and the diet habit-specic amino acid mutation identication of the Cyprinidae sh Ancherythroculter nigrocauda Yanhong Sun Fisheries Research Institute, Wuhan Academy of Agricultural Sciences Guiying Wang Fisheries Research Institute, Wuhan Academy of Agricultural Sciences Jianfang Gui State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences Jian Chen Fisheries Research Institute, Wuhan Academy of Agricultural Sciences Pei Li Fisheries Research Institute, Wuhan Academy of Agricultural Sciences Dongmei Zhu College of Fishery, Huazhong Agricultural University Yingwu Liu Fisheries Research Institute, Wuhan Academy of Agricultural Sciences Zongqun Zhang Wuhan Xianfeng Aquaculture Technology Co. Ltd Wei Li Wuhan Xianfeng Aquaculture Technology Co. Ltd Qing Li ( [email protected] ) Fisheries Research Institute, Wuhan Academy of Agricultural Sciences Research Article Keywords: Ancherythroculter nigrocauda, Cyprinidae, genome assembly, genetic breeding, amino acid mutation Posted Date: December 9th, 2020 DOI: https://doi.org/10.21203/rs.3.rs-116303/v1 Page 1/22 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 2/22 Abstract Background Ancherythroculter nigrocauda is an endemic Cyprinidae sh in China, it has many desirable traits for genetic breeding, including strong disease resistance, unusual stress tolerance and high eciency in nutrition update, which have made it an emerging commercial aquaculture sh. With the publication of its close-related species’ genome sequence, we can study the diet-specic genomic mutations within Cyprinidae. Results Here we report whole genome assembly of a female A. nigrocauda individual constructed using the single molecule DNA sequencing platform PacBio Sequel. With the help of Hi-C anchoring, we successfully placed contigs to chromosome level (2n = 48), yielding a genome size of 1054.05 Mb with contig N50 of 3.40 Mb and scaffold N50 of 42.68 Mb. This genome assembly, which has reached a high base-level accuracy of 99.999%, harboring 33,606 annotated protein-coding genes. We also found 582 genes hold diet-specic amino acid mutation between herbivorous and carnivorous shes and 26 of them showed signicant different expression patterns in liver tissue of these two types of shes. Conclusions The availability of the chromosome-level genome assembly of A. nigrocauda provides valuable resources for future in-depth comparative genomics studies and applications including genetic breeding. The diet- specic amino acid mutation can be used in breeding of new strains of carnivorous shes which feed on herbivorous fodder. 1. Introduction Ancherythroculter nigrocauda (NCBI Taxonomy ID: 263419) is a Cyprinidae sh endemic to the upper Yangtze River in China [1]. This species frequently inhabits in main channels and tributaries of the upper Yangtze River in Sichuan province and Chongqing city in southwest China. In recent decades, the natural resource of this species has suffered severely reduction both in population size and in genetic diversity due to habitat disruption caused by pollution from agriculture, chemical industry, overshing, and damming [2]. As a result, there is a loss of genetic diversity of it [3]. Nevertheless, A. nigrocauda is a popular delicacy, and it can be easily kept alive and fresh because of its strong tolerance to hypoxia and stressful conditions, which make it a promising aquaculture sh in China. It has also been used as new stock in genetic breeding that has bred two new aquaculture varieties, such as the hybrid “Culter Xianfeng NO. 1” and “Bofang Xianfeng NO. 2”. The hybrid “Culter Xianfeng NO. 1” is the F1 offspring of Culter alburnus x A. nigrocauda and “Bofang Xianfeng NO. 2”is the selected variety from Megalobrama Page 3/22 amblycephala x A. nigrocauda through fourth-generation hybrid and selection. The excellent traits of A. nigrocauda are presented in both two new varieties. Despite its high economic value and popularity as a resource in aquaculture breeding, genetic and genomic resources for this sh are still limited. Only a few of genomic nucleotide sequences [4, 5] and single nucleotide polymorphism (SNP) markers are publicly available [6]. Such a lack of resources seriously limited the progress of its genetic breeding and application. And the lack of a reference genome also caused some problems in studying gene expression using RNA-Seq approaches [7]. With the development of sequencing technology, the genome of more than 17 shes in Cyprinidae have been deposited in NCBI database. However, this number is still faraway from enough compared with the 3,000 species in Cyprinidae. Nowadays, single-molecule real-time sequencing has been used more and more frequently in sh whole genome sequencing [8, 9]. Super long reads from it provided a much better assembly when compared with short reads from Illumina sequencing [10]. Besides, high-throughput chromatin conformation capture (Hi-C) also greatly foster the assembly quality of whole genome by translating genome-wide 3D proximity maps to contigs’ clustering, ordering and pseudomolecule construction [8, 11]. Therefore, combination of long-reads sequencing and Hi-C has been used in many genome sequencing projects and proved its power in high-quality genome assembly [8, 12]. Here, we presented a highly contiguous reference genome for A. nigrocauda that used a series of technologies including the single-molecule sequencing platform PacBio Sequel and Hi-C analysis-based chromosome construction. The reference genome assembly of A. nigrocauda will be valuable for both comparative genomics research and genetic breeding research. With the help of this high-quality genome, we also inspected the amino acid mutation associated with diet for A. nigrocauda and its closely related species. 2. Method And Materials 2.1. Karyotype analyses The karyotype analysis was conducted using the method as described in a previous study [13]. Phytohemagglutinin (PHA) was rst injected near the base of the pectoral n, followed by injecting colchicine in the kidney. Kidney tissues were taken and placed in a beaker, cut into small pieces, followed by mixing with saline. After settling for 15 min, cells in the supernatant were collected via centrifuge. These cells were resuspended using 0.075 M KCl solution for 35 min and then collected via centrifuge. The cells were xed using Carnoy's Fluid and treated with Giemsa stain before observing using a microscope. 2.2. Samples and Illumina sequencing As higher heterozygosity always causes diculties in genome assembly, we want to choose one sample with the lowest heterozygosity in genome sequencing. Therefore, three female A. nigrocauda individuals Page 4/22 were chosen as candidates for genome sequencing because female samples always have lower heterozygosity than male ones for XY/XX sex determination system in many Cyprinidae shes. To obtain enough high-quality genomic DNA molecules, the adult female sh individual was dissected and fresh muscle tissues were used for DNA extraction using the traditional phenol/chloroform extraction method [14] and the extracted DNA was quality checked using agarose gel electrophoresis. A single band corresponding to high molecular weight was observed, indicating high integrity of DNA molecules for library construction for the Illumina HiSeq X Ten (Illumina Inc., San Diego, CA, USA) and the PacBio Sequel (Pacic Biosciences of California, Menlo Park, CA, USA) sequencing platforms. A library with the insert size of 350 bp was constructed for Illumina HiSeq X Ten sequencing platform according to the manufacturer’s protocol. To get high-quality Illumina reads for following analysis, ambiguous bases and low-quality reads were rst trimmed and ltered using the HTQC package [15]. First, reads with adaptors or generated via duplication were discarded. Reads whose size of Ns or low- quality ( < = 5) bases reached 10% of the total read length were discarded. Second, reads were removed from further analysis if the average quality were smaller than 20 or the read length was shorter than 75 bp. Third, the mate reads were also removed if the corresponding reads had been removed. The processed reads were used for genome assessment analysis. With the K-mer analysis approach GCE software [16], we calculated the number of each 17-mer from the sequencing data to estimate the genome size and heterozygosity. The sample with the lowest heterozygosity was chosen to further PACBIO sequencing. 2.3. PacBio long reads sequencing and genome assembly Eight libraries with 20Kb insert size were constructed for PacBio Sequel sequencing. PacBio data was assembled using the FALCON v2.0 assembler, which implements a hierarchical assembly approach [17]. The initial step is to error-correct long reads by aligning all reads to a subset of the longest reads. For the full raw data set, only reads longer than 5 kb (length_cut_off = 5 kb) were corrected for generating error- corrected reads. To identify overlaps between raw sequences, we used "p-read" (pre-assembled reads from error corrected reads) longer than 5 kb (length_cut_off_pr = 5 kb) for String Graph assembly. Then FALCON-unzip [17] operates from a completed FALCON job directory, it uses the phasing information of heterozygous positions within individual reads to group the reads into different phasing blocks and haplotypes within each block. In this study, we used primary contigs for further analysis. 2.4 In situ Hi-C library construction and chromosome assembly using Hi-C data Liver sample taken from the same female A. nigrocauda individual with lowest heterozygosity was used for library construction for Hi-C analysis as described previously [18, 19]. The library was sequenced with 150 bp paired-end mode on the Illumina HiSeq X Ten platform (San Diego, CA, USA). The reads were mapped to assembled genome with Bowtie [20] with two ends of paired reads being mapped to the genome separately. To increase the interactive Hi-C reads ratio, an iterative mapping strategy was Page 5/22 performed as done previously [18, 19]. Only read pairs with both ends uniquely mapped were used for further analysis.