Insights Into Platypus Population Structure and History from Whole-Genome Sequencing
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/221481; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Insights into platypus population structure and history from whole-genome sequencing 1,2 1 1 3 4 Hilary C. Martin †, Elizabeth M. Batty †, Julie Hussin †, Portia Westall , Tasman Daish ,Stephen Kolomyjec5, Paolo Piazza1,6, Rory Bowden1, Margaret Hawkins7, Tom Grant8, Craig Moritz9, Frank 4 3, 1,10, Grutzner , Jaime Gongora ⇤, Peter Donnelly ⇤. 1. Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX1 7BN, UK. 2. Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK 3. The University of Sydney, Sydney School of Veterinary Science, New South Wales 2006, Australia. 4. University of Adelaide, Adelaide, Australia. 5. School of Biological Sciences, Lake Superior State University, Sault Sainte Marie, MI 49783. 6. Faculty of Medicine, Department of Medicine, Imperial College, London, UK. 7. Taronga Zoo, Mosman, NSW 2088, Australia. 8. University of New South Wales, Sydney, NSW 2052, Australia. 9. Research School of Biology and Centre for Biodiversity Analysis, The Australian National University, Acton, Australian Capital Territory 2601, Australia. 10. Department of Statistics, University of Oxford, Oxford, OX1 3TG, UK. Contributed equally. To whom correspondence should be addressed: [email protected], [email protected] † ⇤ Abstract The platypus is an egg-laying mammal which, alongside the echidna, occupies a unique place in the mammalian phylogenetic tree. Despite widespread interest in its unusual biology, little is known about its population struc- ture or recent evolutionary history. To provide new insights into the dispersal and demographic history of this iconic species, we sequenced the genomes of 57 platypuses from across the whole species range in eastern mainland Australia and Tasmania. Using a highly-improved reference genome, we called over 6.7M SNPs, providing an informative genetic data set for population analyses. Our results show very strong population structure in the platypus, with our sampling locations corresponding to discrete groupings between which there is no evidence for recent gene flow. Genome-wide data allowed us to establish that 28 of the 57 sampled individuals had at least a third-degree relative amongst other samples from the same river, often taken at di↵erent times. Taking advantage of a sampled family quartet, we estimated the de novo mutation rate in the platypus at 7.0 10 9/bp/generation ⇥ − (95% CI 4.1 10 9–1.2 10 8/bp/generation). We estimated e↵ective population sizes of ancestral populations ⇥ − ⇥ − and haplotype sharing between current groupings, and found evidence for bottlenecks and long-term population decline in multiple regions, and early divergence between populations in di↵erent regions. This study demonstrates the power of whole-genome sequencing for studying natural populations of an evolutionarily important species. 1 bioRxiv preprint doi: https://doi.org/10.1101/221481; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Introduction Next-generation sequencing technologies have greatly facilitated studies into the diversity and population struc- ture of non-model organisms. For example, whole-genome sequencing (WGS) has been applied to investigate demographic history and levels of inbreeding in primates, with implications for conservation (Locke et al., 2011; Prado-Martinez et al., 2013; McManus et al., 2015; Xue et al., 2015; Abascal et al., 2016). It has also been used to study domesticated species such as pigs (Li et al., 2013; Bosse et al., 2014), dogs (Freedman et al., 2014), maize (Hu↵ord et al., 2012) and bees (Wallberg et al., 2014), to infer the origins of domestication, its e↵ect on e↵ective population size (Ne) and nucleotide diversity, and to identify genes under selection during this process. Some studies have identified signatures of introgression (Bosse et al., 2014) or admixture (Miller et al., 2012; Lamich- haney et al., 2015) between species, which is important to inform inference of past Ne.OthershaveusedWGS data to identify particular genomic regions contributing to evolutionarily important traits, such as beak shape in Darwin’s finches (Lamichhaney et al., 2015), mate choice in cichlid fish (Malinsky et al., 2015), and migratory behaviour in butterflies (Zhan et al., 2014). Here, we describe a population resequencing study of the platypus (Ornithorhynchus anatinus), which is one of the largest such studies of non-human mammals, and the first for a non-placental mammal. In addition to laying eggs, platypuses have a unique set of characteristics (Grant, 2007), including webbed feet, a venomous spur (only in males), and a large bill that contains electroreceptors used for sensing their prey. Their karyotype is 2n = 52 (Bick and Sharman, 1975), and they have five di↵erent male-specific chromosomes (named Y chromosomes), and five di↵erent chromosomes present in one copy in males and two copies in females (X chromosomes), which form a multivalent chain in male meiosis (Grutzner et al., 2004). Though apparently secure across much of its eastern Australian range, the platypus has the highest conserva- tion priority ranking among mammals when considering phylogenetic distinctiveness (Isaac et al., 2007). Given concerns about the impact of climate change (Klamt et al., 2011), disease (Gust et al., 2009) and other factors on platypus populations, there is a need to better understand past responses of platypus populations to climate change, and the extent of connectivity across the species range. The first platypus genome assembly (ornAna1) was generated using established whole-genome shotgun methods (Warren et al., 2008) from a female from the Barnard River in New South Wales (NSW) (see Figure 1). This assembly was highly fragmented and did not contain any sequence from the Y chromosomes. The initial genome paper included only a limited analysis of inter-individual variation and population structure based on 57 polymorphic retrotransposon loci. Subsequently, several other studies have investigated diversity and population structure using microsatellites or mitochondrial DNA (mtDNA) (Kolomyjec et al., 2009; Gongora et al., 2012; Furlan et al., 2013; Kolomyjec et al., 2013) (Table 1). They reported much stronger di↵erences between than within river systems, but found some evidence of migration between rivers that were close together, implying limited overland dispersal. Using only a small number of markers gives limited information about the underlying population history of a species, and, because genealogies are stochastic given a particular demographic model, a genealogy built from a single locus, such as the mitochondrial control region (Gongora et al., 2012), may not reflect the historical relation- ships between populations under study (Novembre and Ramachandran, 2011). We anticipated that more could be learned about population structure and dynamics from whole-genome sequencing data, which contains consider- 2 bioRxiv preprint doi: https://doi.org/10.1101/221481; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ably more information than the maternally-inherited mitochondria or than a few highly mutable microsatellites. We sequenced the genomes of 57 platypuses from Queensland (QLD), New South Wales (NSW) and Tasmania (TAS) (Figure 1; Tables S1 and S2), in order to gain insights into the population genetics of the species. We inves- tigated the di↵erentiation between subpopulations, the relative historical population sizes and structure, and the extent of relatedness between the individuals sampled, which could be informative about the extent of individual platypus dispersal. Results Genome reassembly and SNP calling We sequenced 57 platypus samples at 12-21X coverage, one in duplicate. We used the improved genome assembly ornAna2 (which will be made available no later than the time of publication of this paper) for all analyses, and ran standard software to jointly call variants across all samples (PLATYPUS (Rimmer et al., 2014)). The variant calls were filtered to produce a set of 6.7M stringently filtered SNPs across 54 autosomal sca↵olds comprising 965Mb of the assembly. Data Quality We undertook two di↵erent approaches to assess the quality of our SNP callset. In the first approach, two separate DNA samples from a single individual were sequenced. These were processed in identical fashion to all the other sequence data, with the processing blind to the fact that they were duplicates. After processing and SNP calling, we then compared the genotypes in the two samples from this individual. The rate of discordant genotypes between the two duplicate samples was very low (2.20 10 3 per SNP; 1.62 10 5 per bp; Table S3); because an error ⇥ − ⇥ − in either duplicate could lead to discordant genotypes, this would lead to an estimated error rate of 1.10 10 3 ⇥ − per SNP and 8.1 10 6 per bp. During our analyses we also discovered we had sampled a family quartet of two ⇥ − parents and two o↵spring (Table S4). This allowed us to use a second approach to assess the quality of our SNP callset by using the rate of Mendelian errors. We found a Mendelian error rate of 1.10 10 3 per SNP. In some ⇥ − configurations, an error in any one of the four genotypes would result in a Mendelian error; in others, an error would not be detected. Both approaches suggest an error rate of order 0.001 per genotype, suggesting that the dataset used in the analyses is of high quality.