The Study of Human Y Chromosome Variation Through Ancient DNA
Total Page:16
File Type:pdf, Size:1020Kb
Hum Genet DOI 10.1007/s00439-017-1773-z REVIEW The study of human Y chromosome variation through ancient DNA Toomas Kivisild1,2 Received: 12 January 2017 / Accepted: 24 February 2017 © The Author(s) 2017. This article is published with open access at Springerlink.com Abstract High throughput sequencing methods have com- accumulate and get fixed over time. This review considers pletely transformed the study of human Y chromosome genome-scale evidence on ancient Y chromosome diver- variation by offering a genome-scale view on genetic vari- sity that has recently started to accumulate in geographic ation retrieved from ancient human remains in context of areas favourable to DNA preservation. More specifically a growing number of high coverage whole Y chromosome the review focuses on examples of regional continuity and sequence data from living populations from across the change of the Y chromosome haplogroups in North Eurasia world. The ancient Y chromosome sequences are providing and in the New World. us the first exciting glimpses into the past variation of male- specific compartment of the genome and the opportunity to evaluate models based on previously made inferences from Background patterns of genetic variation in living populations. Analy- ses of the ancient Y chromosome sequences are challenging Until recently, ancient DNA studies of human remain not only because of issues generally related to ancient DNA focused primarily on variation embedded in mitochon- work, such as DNA damage-induced mutations and low drial DNA (mtDNA). For decades, mtDNA had been a content of endogenous DNA in most human remains, but target of choice in population genetic studies because of also because of specific properties of the Y chromosome, its high mutation rate and high density of polymorphic such as its highly repetitive nature and high homology with markers (Wilson et al. 1985). Also, because for any unique the X chromosome. Shotgun sequencing of uniquely map- sequence in the autosomal, X or Y chromosome locus there ping regions of the Y chromosomes to sufficiently high are many hundreds or even thousands of copies of mtDNA coverage is still challenging and costly in poorly preserved means that this maternally inherited locus was more likely samples. To increase the coverage of specific target SNPs to work in cases where only a very small number of mol- capture-based methods have been developed and used in ecules had survived (Krings et al. 1997). HTS technologies recent years to generate Y chromosome sequence data from have significantly increased the rate at which sequence data hundreds of prehistoric skeletal remains. Besides the pros- can be generated and make accessible surviving chunks of pects of testing directly as how much genetic change in a DNA that are shorter than the size of a PCR (polymerase given time period has accompanied changes in material chain reaction) amplicon (Orlando et al. 2015). They have culture the sequencing of ancient Y chromosomes allows reduced the costs of sequencing and opened the prospects of us also to better understand the rate at which mutations assessing the variation of human populations, both modern and ancient, at the scale of entire genomes (Genomes Pro- ject et al. 2012; Gilbert et al. 2008; Green et al. 2010; Ras- * Toomas Kivisild mussen et al. 2010). These technological advances (Orlando [email protected] et al. 2015) have also made it possible now to study ancient 1 Department of Archaeology and Anthropology, University Y chromosome (aY) variation in human populations at the of Cambridge, Cambridge CB2 1QH, UK scale of the entire accessible length of the male-specific and 2 Estonian Biocentre, 51010 Tartu, Estonia non-recombining regions of human Y chromosome. 1 3 Hum Genet Ancient DNA presents us the opportunity to directly of recently emerged Y chromosome evidence from ancient examine which Y chromosome single nucleotide poly- DNA studies. morphisms (SNPs) and haplotypes were present at dif- ferent time periods in regions that support long-term survival of ancient DNA. It is perhaps not surprising that Methodological issues in dealing with aY archaeological sites from high latitude areas (Hofreiter et al. 2015) have yielded the largest number of success- Two different approaches, shotgun and hybridization cap- fully sequenced samples in recent genome-scale studies ture-based sequencing, have been used in recent ancient that have reported on aY. These studies, as reviewed in DNA studies to assess the variation of ancient Y chromo- further detail below, have provided us the first glimpses somes. In all genome-scale studies, shotgun sequencing of the dynamics of aY haplogroup composition and fre- is typically used firstly in the screening phase. In case of quency changes in transects of time and allow us to test high percentage (~>10%) of reads mapping uniquely to hypotheses based on earlier phylogeographic inferences the human reference genome combined with low clonal- made from the Y chromosome data of presently living ity of the reads, further shotgun sequencing of the sample populations. In this review, ‘haplogroups’ and ‘clades’ to a higher coverage can be generally considered to be an are terms that are interchangeably used to refer to groups efficient and relatively cost-effective strategy. However, in of closely-related Y chromosome sequences that share cases where the content of human genome mapping reads a common ancestor. While there is a general agreement is low (~1 to 10%), enrichment methods that target specifi- in the definition of the basic haplogroups (A, B, C, etc.) cally human-specific DNA have been considered to be a multiple parallel nomenclatures are in use for sub-clades practical solution for getting improved coverage of SNPs (e.g. see http://www.phylotree.org/Y/, http://isogg. that are considered informative in downstream analyses org/tree/, https://www.yfull.com/tree/). For clarity, the (Haak et al. 2015; Lazaridis et al. 2016; Mathieson et al. names of sub-clades used here are suffixed by the defin- 2015). Having sufficient coverage of the Y chromosome ing SNP-marker name, as suggested previously (Karmin SNP data in ancient samples is important for mapping them et al. 2015; van Oven et al. 2014). Similarly to the extent correctly to the phylogenetic tree and for examining the to which our earlier views on peopling of continental relationship among multiple ancient samples in parallel. regions such as Europe based on inferences made from First, low coverage of data can have an undesirable effect extant mtDNA variation have changed in the light of new on the inferences of the positioning of ancient samples in ancient mtDNA evidence (Bollongino et al. 2013; Bra- a tree in relation to high coverage modern Y chromosome manti et al. 2009; Brandt et al. 2013; Haak et al. 2015; sequence data, and, in particular, for the estimation of Posth et al. 2016; Thomas et al. 2013), it can be seen that branch lengths of the phylogeny as exemplified in Fig. 1. models linking the spread of specific Y chromosome hap- Besides the coverage (a measure of how many sites of the logroups with the spread of material culture may require reference genome are covered by data in a given sample), substantial revision in the light of new aY evidence. it is also the sequencing depth (the number of reads cover- In this review, methodological aspects of ancient Y chro- ing a site) at each given site that contributes to the branch mosome work will be first discussed with a focus on recent length estimates as sites covered by only one or two reads HTS research based on shotgun and capture approaches. tend to have a high rate of false positive sequencing errors. Challenges common to all ancient DNA studies include Second, when pooling from many ancient samples, data those related to calling human polymorphic variants from analysis methods that rely on allele frequency comparisons short and damaged sequence reads that are derived from a of individual SNPs require reasonable number of overlap- mix of different organisms. Highly repetitive nature of the ping sites covered by data in multiple samples. Predictably, Y chromosome combined with its high sequence homol- the more low coverage samples are included in the analyses ogy with the X chromosome (Skaletsky et al. 2003) fur- the less overlapping SNPs remain available for downstream ther complicates variant calling from short ancient DNA data analyses. sequence reads. These issues, together with the pater- A number of Y chromosome sequences covered in nal inheritance of Y chromosome, its haploid nature and this review have been generated with shotgun sequenc- high linkage between physically distant SNPs, impact on ing approach. These include the four oldest genomes of choices of the bioinformatics methods of downstream data individuals dated to late Pleistocene as well as a number analysis. They define the restrictions and specifics how the of remains from Europe and Americas dated to the Holo- aY sequence data should be processed and what are the cene period. A substantial proportion of the European and limitations for the interpretations made from such data. Middle Eastern Neolithic and Bronze Age remains have Finally, the demographic histories European and North been sequenced, however, with hybridization-based cap- American populations will be briefly reviewed in the light ture technique. Capture-based methods focus on specific 1 3 Hum Genet Fig. 1 Effect of low coverage of the data and post-mortem True tree Observed tree damage on the inferences of 1 missing data in A1 branch lengths and on the phy- 2 missing data in A2 logenetic mapping of mutations B missing data in A1 and A2 to the tree. A general example damage and sequencing errors of phylogenetic relationships between one high coverage 100 96 modern (M1) and two low 4 B coverage ancient (A1 and A2) samples is shown.