Hybridization Ddrad‐Sequencing for Population Genomics of Nonmodel
Total Page:16
File Type:pdf, Size:1020Kb
Received: 2 October 2019 | Revised: 6 March 2020 | Accepted: 30 March 2020 DOI: 10.1111/1755-0998.13168 SPECIAL ISSUE Hybridization ddRAD-sequencing for population genomics of nonmodel plants using highly degraded historical specimen DNA Patricia L. M. Lang1,2 | Clemens L. Weiß1,3 | Sonja Kersten4 | Sergio M. Latorre1 | Sarah Nagel5 | Birgit Nickel5 | Matthias Meyer5 | Hernán A. Burbano1,6 1Research Group for Ancient Genomics and Evolution, Max Planck Institute for Abstract Developmental Biology, Tübingen, Germany Species’ responses at the genetic level are key to understanding the long-term con- 2 Department of Biology, Stanford University, sequences of anthropogenic global change. Herbaria document such responses, and, Stanford, CA, USA 3Department of Genetics, Stanford with contemporary sampling, provide high-resolution time-series of plant evolution- University, Stanford, CA, USA ary change. Characterizing genetic diversity is straightforward for model species 4 Department of Molecular Biology, Max with small genomes and a reference sequence. For nonmodel species—with small Planck Institute for Developmental Biology, Tübingen, Germany or large genomes—diversity is traditionally assessed using restriction-enzyme-based 5Department of Evolutionary Genetics, sequencing. However, age-related DNA damage and fragmentation preclude the Max Planck Institute for Evolutionary use of this approach for ancient herbarium DNA. Here, we combine reduced-rep- Anthropology, Leipzig, Germany 6Centre for Life’s Origins and Evolution, resentation sequencing and hybridization-capture to overcome this challenge and Department of Genetics, Evolution, and efficiently compare contemporary and historical specimens. Specifically, we describe Environment, University College London, London, UK how homemade DNA baits can be produced from reduced-representation libraries of fresh samples, and used to efficiently enrich historical libraries for the same frac- Correspondence Hernán A. Burbano, Centre for Life’s Origins tion of the genome to produce compatible sets of sequence data from both types and Evolution, Department of Genetics, of material. Applying this approach to both Arabidopsis thaliana and the nonmodel Evolution and Environment, University College London, Darwin Building, Gower plant Cardamine bulbifera, we discovered polymorphisms de novo in an unbiased, Street, London WC1E 6BT, UK. reference-free manner. We show that the recovered genetic variation recapitulates Email: [email protected] known genetic diversity in A. thaliana, and recovers geographical origin in both spe- Funding information cies and over time, independent of bait diversity. Hence, our method enables fast, This work was supported by the German Research Foundation (DFG; project cost-efficient, large-scale integration of contemporary and historical specimens for 324876998 of SPP1374) and by the assessment of genome-wide genetic trends over time, independent of genome size Presidential Innovation Fund of the Max Planck Society. and presence of a reference genome. KEYWORDS ancient DNA, capture, herbarium, hybridization double-digest RADseq, nonmodel species This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. © 2020 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd. Mol Ecol Resour. 2020;00:1–20. wileyonlinelibrary.com/journal/men | 1 2 | LANG ET al. 1 | INTRODUCTION as they do not allow the use of RADseq. The most limiting char- acteristic in the context of reduced-representation methods is the Evolutionary studies have over recent years moved from focusing on age-related breakdown of aDNA fragments to median lengths of the effects of various evolutionary forces on genetic variation at sin- 50–80 bp (Sawyer, Krause, Guschanski, Savolainen, & Pääbo, 2012; gle loci (McDonald & Kreitman, 1991) to investigating whole genome Weiß et al., 2016). Enzymatic restriction used in RADseq approaches sequencing data (Mackay et al., 2012). With the continuous devel- would further shorten these fragments, thereby reducing their map- opment of high-throughput next-generation sequencing (NGS) tech- pability (Figure S1) and thus the overall information content his- nologies (e.g., short-read Illumina sequencing: HiSeq4000, NovaSeq torical samples can provide. In addition, fragmentation is likely to [Bentley et al., 2008]), such questions can now in principle be ad- reduce the number of available RAD sites over time, thereby also dressed at the population scale, covering large geographical distri- reducing the information that can be retrieved, and the overlap be- butions (The 1001 Genomes Consortium 2016), or densely sampled tween time-series samples. These problems would be even more phylogenetic space (Zhang et al., 2014). A limiting factor especially for pronounced in double-digest RADseq (ddRADseq), which uses two phylogenetic studies, both in terms of sequencing cost and regard- restriction enzymes with different cutting sequences (Peterson ing downstream analysis, are species that lack reference genomes, et al., 2012). have large genomes, or both. However, this is true for the majority The combination of historical and modern samples is thus diffi- of species, excluding a few well-studied model organisms such as cult when RADseq approaches are the only feasible option, for ex- Arabidopsis thaliana (Arabidopsis Genome Initiative, 2000) or the ample when working with large genome sizes, or population-scale genus Drosophila (Drosophila 12 Genomes Consortium et al., 2007). sampling. Joint analyses of the different sample types require high Most population-scale studies in molecular ecology or evolutionary sequence overlap, which in this situation cannot be achieved by em- and conservation genomics circumvent this bottleneck using a va- ploying the same method across samples. For historical samples, riety of reduced-representation approaches such as restriction-en- deep whole genome sequencing can be used to retrieve the sites zyme associated DNA sequencing (RADseq) (Andrews, Good, Miller, recovered with RADseq of modern samples—a costly and unrealis- Luikart, & Hohenlohe, 2016; Baird et al., 2008; Catchen et al., 2017; tic solution for large genomes and sample sizes, especially consid- Miller, Dunham, Amores, Cresko, & Johnson, 2007; Peterson, Weber, ering the lower quality and metagenomic nature of aDNA (Gutaker Kay, Fisher, & Hoekstra, 2012; Puritz et al., 2014) or exome sequenc- & Burbano, 2017; Poinar et al., 2006). To enrich historical samples ing (De Wit, Pespeni, & Palumbi, 2015). This trades large amounts of for specific genomic subsets, many studies therefore employ hy- shallowly sequenced genomes, which are difficult to analyse without bridization-based captures where biotinylated baits target particu- a reference genome, for sequence data of higher quality and depth, lar regions of the genome. The resulting complexes are immobilized which can be readily analysed with dedicated bioinformatics pipe- on streptavidin-coated beads, and washing steps remove unassoci- lines (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011), ated “background” DNA prior to sequencing of the thus enriched independent of a reference genome. targeted DNA. These protocols often use commercially synthesized Despite their reduced view on the genome, these approaches baits (Gnirke et al., 2009). Because such baits need to be designed serve to infer evolutionary processes based on contemporary se- in silico, which requires genomic resources, this is both time-intense quence variation (Andrews et al., 2016). With the advent of an- and bioinformatically demanding, particularly in nonmodel species. cient DNA sequencing, however, we now have the opportunity to In addition, commercial bait synthesis is very expensive, especially study evolution in real time (Gutaker & Burbano, 2017; Shapiro & for large sample sizes. Hofreiter, 2014). This is particularly relevant in the context of anthro- Protocols for home-made baits derived from RNA, DNA- or ex- pogenic global change, which has been affecting the environment ome-based RAD libraries try to address these issues (e.g., hybrid- at a rapid pace for the last +200 years (Lang, Willems, Scheepens, ization RADseq or hyRAD, and exome-based hyRAD-x; Suchan Burbano, & Bossdorf, 2018). To date, largely uncharacterized spe- et al., 2016; Schmid et al., 2017; Sánchez Barreiro et al., 2017; cies responses to this selective force are key to understanding the Linck, Hanna, Sellas, & Dumbacher, 2017), but do not explicitly long-term consequences of global change, and to promoting species address the challenge of combining modern and historical samples survival (Aitken & Bemmels, 2016)—a key challenge of our time. In at large scale for joint population genetics analyses. Furthermore, the case of plants, dense time-series that document plant responses current protocols depend on enzymatic removal of sequencing to environmental change are stored in herbaria. This largely un- adapters from bait-pools to avoid mix-ups between baits and se- tapped resource provides a global collection of specimens that, es- quencing libraries. They produce only a limited, and as result of pecially combined with contemporary sampling, allows for studying adapter-removal not amplifiable amount of bait, and rely on com- plant morphological and molecular change over the last