Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat
Total Page:16
File Type:pdf, Size:1020Kb
INVESTIGATION Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat Huajing Teng,*,†,‡ Yaohua Zhang,* Chengmin Shi,*,§ Fengbiao Mao,‡ Lingling Hou,‡ Hongling Guo,*,† Zhongsheng Sun,‡ and Jianxu Zhang*,1 *The State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 100101 Beijing, China, †University of Chinese Academy of Sciences, 100049 Beijing, China, § ‡ Beijing Institutes of Life Science, Chinese Academy of Sciences, 100101 Beijing, China, and Beijing Institute of Genomics, Chinese Academy of Sciences, 100101 Beijing, China ABSTRACT Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, KEYWORDS which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human genetic pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, landscape little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single- next-generation nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinfor- sequencing matics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Rattus tanezumi Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity- single-nucleotide related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. polymorphisms A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole- structural genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species variations and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Geneticvariationsofthelaboratoryrathaveservedasavaluable resource 2011) is widely distributed in Asia, and has recently invaded the South in biomedical and behavioral research for nearly 200 yr (Baker et al. Pacific, African countries, and the United States (Wilson and Reeder 1979; Jacob 1999). However, the extant strains of laboratory rat orig- 2005; Bastos et al. 2011; Lack et al. 2012; Conroy et al. 2013; Nakayama inate from limited Rattus norvegicus founder populations (Baker et al. et al. 2013). As an agricultural pest (Singleton et al. 2007; Huang et al. 1979; Krinke 2000; Smits et al. 2005; Saar et al. 2008). Extensive genetic 2013), the AHR demonstrates superior adaptive potential, compared to variation is required to facilitate correlation of genotypes with complex other rodents, with higher resistance to commonly used rodenticides behavioral or ecologically relevant traits. Wild-derived rat strains or (Zheng 1981; Tian et al. 2006; Huang et al. 2012). The AHR can also species can provide novel genome resources to decipher the genetic serve as a reservoir of a wide range of pathogens associated with human mechanisms underlying such complex phenotypes (Smits et al. 2005). disease (Plyusnina et al. 2009; Jonsson et al. 2010; Yin et al. 2011; As a diploid species of rodent in the genus Rattus, the Asian house rat Kocher et al. 2015). However, although the AHR represents a tremen- (AHR, R. tanezumi; 2n = 42) (Hamta et al. 2006; Badenhorst et al. dous threat to agriculture and human health, little is known about the genetic basis underlying its remarkable biological traits. A comprehen- sive genetic inventory of the AHR will not only be essential for the Copyright © 2016 Teng et al. doi: 10.1534/g3.116.029504 development of effective pest control programs, but will also advance Manuscript received March 22, 2016; accepted for publication April 27, 2016; our understanding of the diversity of rodent-borne zoonosis. published Early Online April 27, 2016. Rapid advances in next-generation sequencing (NGS) technology This is an open-access article distributed under the terms of the Creative and reverse ecology hold great promise for deciphering the genetic basis Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction underlying the functional variation of natural organisms (Ellison et al. in any medium, provided the original work is properly cited. 2011; Liu et al. 2014). In the present study, we used an NGS-based Supplemental material is available online at www.g3journal.org/lookup/suppl/ pooled sequencing strategy to analyze genome-wide variation, includ- doi:10.1534/g3.116.029504/-/DC1 1 ing single-nucleotide polymorphisms (SNPs) and structural variations Corresponding author: The State Key Laboratory of Integrated Management of (SVs), across the AHR genome. Our data provide a resource for future Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, Chao Yang District, Beijing 100101, China. E-mail: zhangjx@ population genetic research on the AHR, and advance our understanding ioz.ac.cn of the molecular adaptations of rats with respect to their ecological niches. Volume 6 | July 2016 | 1969 MATERIALS AND METHODS Phylogenetic analysis of the AHR based on mitochondrial genomes To estimate the phylogenetic position of the AHR, published Rattus genus mitochondrial genome sequences (accession numbers listed in Figure 1) were downloaded from GenBank. The sequences of 13 mito- chondrial protein-coding genes were joined and aligned using MUSCLE (Edgar 2004). The best mutational model (GTR+I+G) was then selected using jModelTest v2.1.7 (Darriba et al. 2012). Mitochondrial genome phylogenetic analyses were conducted using MrBayes v3.22 (Ronquist et al. 2012), with two independent analyses running in parallel, each with four Markov Chain Monte Carlo (MCMC) chains. The final tree topol- ogy was recovered after a total of 10,000,000 generations, sampling 1 in every 10,000 generations after discarding the first 1000 as burn-in. Samples, DNA extraction, and library preparation In 2014, 42 individual AHRs, including 24 from Dujiangyan (31°01’N/ 103°40’E), Sichuan Province, China and 18 from Taiyuan (37°43’N/ 112°29’E), Shanxi Province, China, were trapped and identified using mitochondrial cytochrome oxidase subunit I barcode sequences. Ge- nomic DNA samples were extracted from small pieces of tail using a TailGen DNA extraction Kit (CWBIO, Beijing, China). The quality and integrity of the extracted DNA was checked by measuring the A260/ A280 ratio using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA) and by agarose gel electrophore- sis. Then, equivalent amounts of DNA extracted from all individuals were pooled for library construction. A library with an insert size of approximately 300 bp was prepared and sequenced on either an Illu- mina HiSequation 2000 or an X Ten instrument, with 100 or 150 bp paired-end reads, respectively. After filtering out of raw sequencing reads containing adapters and reads of low quality, the remaining clean reads were aligned to the reference genome of R. norvegicus (RGSC5.0, Ensembl release 70), a close relative of AHR, using the Burrows- Wheeler Alignment tool (BWA v0.7.5a-r405) with default parameters (Li and Durbin 2009). Genome size was estimated on the basis of k-mer frequency distribution using KmerGenie with default parameters (Chikhi and Medvedev 2014). Duplicate reads were removed using the rmdup function in SAMtools v 0.1.19-44428cd (Li et al. 2009). In addition, un- mapped reads were assembled de novo using SOAPdenovo v2.04 (Luo et al. 2012) and gene predictions were performed using AUGUSTUS (Hoff and Stanke 2013). Figure 1 Phylogenetic tree of Rattus spp., estimated using mitochon- drial genomes. Published mitochondrial genome sequences of the Small variant calling and genetic diversity estimation Rattus genus were downloaded from GenBank and Accession num- bers are indicated. After performing the alignment and removing duplicate reads, two different variant calling tools, SNVer v0.5.3 (Wei et al. 2011) and VarScan v2.3.7 (Koboldt et al. 2009; Koboldt et al. 2012), were used to detect SNPs with a window size of 100 kb and a step size of 20 kb, and the following and small insertions and deletions (InDels) in the pooled samples. For parameters: –min-coverage 20, –max-coverage 500, –min-qual 20, SNVer, we set the number of haploids to 84, and used the RGSC v5.0 and –pool-size 42. reference assembly with the Single-Nucleotide Polymorphism database (dbSNP) build 136. Prior to running VarScan, a pileup file was created Structural variation detection using the mpileup command in SAMtools. Then, variants were called Following the removal of duplicate reads, large InDels and inversions were using the VarScan subcommands mpileup2snp and mpileup2indel detected using MetaSV v0.5 (Mohiyuddin et al. 2015) on the alignment with the following parameters: –min-coverage 20, –min-avg-qual to the RGSC v5.0 reference genome. The integrative structural-variant 20, and –p-value 1e-02. In our further analyses, we considered only caller MetaSV merges results obtained by multiple detection methods, those SNPs and small InDels that were obtained using both pipelines. including BreakDancer (Chen et al. 2009), LUMPY (Layer et al. 2014), Two summary statistics, nucleotide diversity (p) (Nei 1987) and and Pindel (Ye et al. 2009). BreakDancer can detect a cluster of reads ’ Watterson stheta(uW) (Watterson 1975), are commonly used for with abnormal