Genetics: Early Online, published on June 13, 2019 as 10.1534/genetics.119.302368 A Rare Deep-Rooting D0 African Y-chromosomal and its Implications for the Expansion of Modern Humans Out of

Marc Haber,* Abigail L. Jones,† Bruce A. Connell,‡ Asan,§ Elena Arciero,* Huanming Yang, §,** Mark G. Thomas,†† Yali Xue,* and Chris Tyler-Smith* *The Wellcome Sanger Institute, Hinxton, Cambs. CB10 1SA, UK, †Liverpool Women's Hospital, Liverpool L8 7SS, UK, ‡Glendon College, York University, Toronto, Ontario M4N 3N6, Canada, §BGI-Shenzhen, Shenzhen 518083, China, **James D. Watson Institute of Genome Science, 310008 Hangzhou, China, ††Research Department of Genetics, Evolution and Environment and UCL Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK. Corresponding author: Chris Tyler-Smith, The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambs. CB10 1SA, UK. Tel: [+44] [0] 1223 495376. Email: [email protected]

ABSTRACT Present-day humans outside Africa descend mainly from a single expansion out ~50,000-70,000 years ago, but many details of this expansion remain unclear, including the history of the male-specific at this time. Here, we re-investigate a rare deep-rooting African Y-chromosomal lineage by sequencing the whole genomes of three Nigerian men described in 2003 as carrying haplogroup DE* Y-chromosomes, and analyzing them in the context of a calibrated worldwide Y-chromosomal phylogeny. We confirm that these three chromosomes do represent a deep-rooting DE lineage, branching close to the DE bifurcation, but place them on the D branch as an outgroup to all other known D chromosomes, and designate the new lineage D0. We consider three models for the expansion of Y lineages out of Africa ~50,000-100,000 years ago, incorporating migration back to Africa where necessary to explain present-day Y-lineage distributions. Considering both the Y-chromosomal phylogenetic structure incorporating the D0 lineage, and published evidence for modern humans outside Africa, the most favored model involves an origin of the DE lineage within Africa with D0 and E remaining there, and migration out of the three lineages (C, D and FT) that now form the vast majority of non-African Y chromosomes. The exit took place 50,300-81,000 years ago (latest date for FT lineage expansion outside Africa - earliest date for the D/D0 lineage split inside Africa), and most likely 50,300-59,400 years ago (considering Neanderthal admixture). This work resolves a long-running debate about Y-chromosomal out- of-Africa/back-to-Africa migrations, and provides insights into the out-of-Africa expansion more generally.

umans outside Africa derive most of their genetic extant non-African lineages were in existence (plus H ancestry from a single migration event 50,000- additional African lineages). The complexity arises 70,000 years ago, according to the current model because three of these four early lineages (C-M130, D- supported by genetic data from genome-wide (Mallick et M174 and FT-M89) are exclusively non-African, apart al. 2016; Pagani et al. 2016), mtDNA (Van Oven and from recent gene flow; while the fourth lineage (E-M96) Kayser 2009) and Y-chromosomal (Wei et al. 2013; is largely African, where it constitutes the major lineage Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016) in most African populations. The debate began in the analyses. The migrating population carried only a small absence of reliable calibration, and these distributions subset of African genetic diversity, particularly strikingly were interpreted as arising in two contrasting ways: (1) for the non-recombining mtDNA and Y chromosome an Asian origin of DE-M145 (also known as the YAP+ where robust calibrated high-resolution phylogenies can lineage), implying migration of CT-M168 out of Africa be constructed, and in each case all non-African lineages followed by divergence into the four lineages outside descend from a single African lineage, L3 for mtDNA or Africa and then migration of E-M96 back to Africa CT-M168 for the Y chromosome. Yet there has been a (Altheide and Hammer 1997; Hammer et al. 1998; Bravi long-running debate about the early spread of Y- et al. 2000), or (2) an African origin of DE-M145, implying chromosomal lineages because their current divergence of CT-M168 within Africa followed by distributions do not fit a simple phylogeographical migration of C-M130, D-M174 and FT-M89 out (Underhill model. The CT-M168 branch diverged within a short time et al. 2001; Underhill and Roseman 2001). The first interval into three lineages (C-M130, DE-M145 and FT- scenario requires two inter-continental lineage M89), and just a few thousand years later the lineage DE- migrations, while the second requires three and is thus M145 further split into D-M174 and E-M96 (Poznik et al. slightly less parsimonious. 2016), illustrated in Figure S1. Thus around the time of An additional very rare haplogroup, DE*, carrying the expansion out of Africa, between one (CT-M168) and variants that define DE but none of those that define D four (C-M130, D-M174, E-M96 and F-M89) of the known 1

Copyright 2019. or E individually, added to this complexity. First identified Data analysis in five out of 1247 Nigerians within a worldwide study of Y chromosome genotypes were called jointly from all 180 >8000 men (Weale et al. 2003), DE* chromosomes were samples using FreeBayes v1.2.0 (Garrison and Marth subsequently reported in a single man among 282 from 2012) with the arguments “--report-monomorphic” and Guinea-Bissau in (Rosa et al. 2007) and in “--ploidy 1”. Calling was restricted to 10.3 Mb of the Y two out of 722 Tibetans within a study of 5783 East chromosome previously determined to be accessible to Asians (Shi et al. 2008). While the phylogeographic short-read sequencing (Poznik et al. 2013). Then sites significance of these rare lineages was immediately with depth across all samples <1900 or >11500 recognized, their interpretation was hindered by the (corresponding to DP/2 or DP*3), or missing in more than incomplete resolution of the phylogenetic branching 20% of the samples, were filtered. In individuals, alleles pattern and the possibility that they might originate from with DP<5 or GQ<30 were excluded and if multiple alleles back-mutations at the small numbers of variants used to were observed at a position, the fraction of reads define the key D and E , or genotyping errors supporting the called allele was required to be >0.8. rather than representing deeply divergent lineages, plus the lack of a robust timescale. Large-scale sequencing of Genome-wide genotypes from the Nigerian samples Y chromosomes has now provided both the full were called using BCFtools version 1.6 (bcftools mpileup phylogenetic resolution and the timescale needed (Wei -C50 -q30 -Q30 | bcftools call -c), then merged with data et al. 2013; Hallast et al. 2015; Karmin et al. 2015; Poznik from ~2,500 people genotyped on the Affymetrix Human et al. 2016), so we have therefore reinvestigated the Origins array (Patterson et al. 2012; Lazaridis et al. 2016). original Nigerian DE* chromosomes using whole- Principal Component Analysis (PCA) using genome-wide genome sequencing to clarify their phylogenetic SNPs was performed using EIGENSOFT v7.2.1 (Patterson position. We then consider the implications for the out- et al. 2006) and plotted using R (R Core Team 2017). of-Africa/back-to-Africa debate related to Y- We inferred a maximum likelihood phylogeny of Y chromosomal lineages, and the expansion out of Africa chromosomes using RAxML v8.2.10 (Stamatakis 2014) more generally. with the arguments “-m ASC_GTRGAMMA” and “--asc- corr=stamatakis”, using only variable sites with QUAL ≥1, METHODS and selecting the tree with the best likelihood from 100 Samples and sequencing runs, then replicating the tree 1000 times for bootstrap values. The tree was plotted using Interactive Tree Of Life We analyzed five DE* samples described previously (iTOL) v3 (Letunic and Bork 2016) and annotated with (Weale et al. 2003), in the context of published haplogroup names assigned using yHaplo (Poznik 2016) worldwide Y-chromosomal sequences including from SNPs reported by the International Society of Japanese D and many E Y chromosomes (Mallick et al. Genetic Genealogy (ISOGG v11.01). 2016). We also included four haplogroup D samples from (Xue et al. 2006), which were newly sequenced for The ages of the internal nodes in the tree were estimated this study; the Japanese and Tibetan D chromosomes using the ρ statistic (Forster et al. 1996), the standard represent the deepest known split within D, since approach for the Y chromosome. We defined the Andamanese D chromosomes lie on the same branch as ancestral state of a site by assigning alleles as ancestral the Japanese (Mondal et al. 2017). when they were monomorphic in the nine samples Sequencing of the Nigerian samples was carried out at belonging to the A and B haplogroups in our dataset. We the Wellcome Sanger institute on the Illumina HiSeq X then determined the age of a node as follows: Having an Ten platform (paired-end read length 150bp) to a Y ancestral node leading to two clades, we select one chromosome mean coverage of ~16X. Sequences were sample from each clade and divide the number of processed using biobambam version 2.0.79 to remove derived variants found in the first sample but absent adapters, mark duplicates, and sort reads. bwa-mem from the second, by the total number of sites having the version 0.7.16a was used to map the reads to the hs37d5 ancestral state in both samples. We compare all possible reference genome. We found that two pairs of pairs under a node and report the average value of individuals were likely duplicates (Figure S2) and thus divergence times in units of years by applying a point one of each pair was removed, leaving three Nigerian mutation rate of 0.76 × 10−9 mutations per site per year individuals for further analysis. The four individuals from (Fu et al. 2014). We report 95% confidence intervals of Tibet were sequenced in the same way to a Y the divergence times based on the 95% highest posterior chromosome mean coverage of ~18X. density (HPD) when estimating the mutation rate (0.67- 0.86 × 10-9) (Fu et al. 2014). This model assumes that For comparative data from other haplogroups, we mutations accumulated on the chromosomes in the obtained Y chromosome bam files for 173 males different lineages at similar rates, and thus expects all representing worldwide populations from the Simons individuals in our dataset to have comparable branch Genome Diversity Project (SGDP) (Mallick et al. 2016). 2 lengths from the AB root. But we found considerable The Nigerian chromosomes shared seven SNPs with differences among individuals in the number of their other D chromosomes, one SNP with E chromosomes, derived mutations from the root. This heterogeneity in one SNP with C1b2a chromosomes, and one SNP with an the accumulation of mutations has been previously F2 chromosome (Table S1). The reads overlapping these reported (Scozzari et al. 2014; Barbieri et al. 2016) and SNPs were visually investigated using the Integrative appears to be haplogroup-specific (Figure S3), and Genomics Viewer (IGV) version 2.4.10 and seen to therefore in our divergence time estimates, we calibrate support the calls. We consider sharing of a single SNP as all lineages to have identical branch length from the root, a recurrent mutation in different lineages and interpret equal to the average branch length estimated from all the Nigerian chromosomes as lying on the D lineage, individuals in our data set. We first calculated the diverging from other D chromosomes at 71,400 years average number of mutations which accumulated on the ago (Figure 1C), very soon after its divergence from E at branches of all individuals in our dataset and found 73,200 years ago. We name the lineage formed by the 768.59 derived mutations on average from the root Nigerian samples D0, in order to reflect its position on (corresponding to ~100,000 years). We then derived a the tree and avoid the need to rename all the other D calibration coefficient α for each individual by dividing lineages. 768.59 by the normalized (in 10,000,000 bp) number of The three D0 chromosomes are distinguishable from one derived mutations an individual has accumulated from another, and have a coalescence time of ~2,500 years the root. And thus for calibrating the branches length (Figure 1C), consistent with their collection from between any two samples when calculating the split different villages, languages, ethnic backgrounds, and times, we multiply α by the number of derived variants paternal birthplaces (Weale et al. 2003). The autosomal found in the first sample but absent from the second. genomes of these individuals confirm their genetic Data Availability Statement ancestry as West Africans (Figure S2). New sequence data from the Nigerian samples are Models for the expansion of Y-chromosomal lineages available through the European Genome-phenome out of Africa Archive (EGA) under study accession number EGAS00001002674, and for Tibetan samples under study The updated phylogeny including the D0 lineage adds accession number EGAS00001003500. two key pieces of information to the debate about the phylogeography of the Y lineages ~50,000-100,000 years Three supplementary figures and two supplementary ago and the mode of expansion out of Africa. First, it tables accompany this paper: increases the number of relevant lineages at this early time period from four to five, and second, it provides a Figure S1 Y-chromosomal phylogeny as understood reliable timescale for the branching times of these before the current study. lineages, and thus for the lineages in existence at any particular time point. Figure S2 PCA of worldwide populations. In the phylogeny (Figure 1A), the DE lineage now Figure S3 Number of mutations from the AB root. contains three, rather than two early sub-lineages: one exclusively African (D0), one mainly African (E), and one Table S1. SNPs defining the D0 haplogroup. exclusively non-African (D). We therefore consider the implications of this revised structure for interpreting the Table S2. Split-time estimates using the ρ statistic. present-day Y phylogeography as the result of male movements at different times between 28,000 and RESULTS 100,000 years ago (Figure 2). In order to do this, we need Construction of a calibrated Y-chromosomal phylogeny to calibrate the phylogeny, and for this use the ancient- DNA-based mutation rate (Fu et al. 2014), which has We constructed a series of phylogenetic trees based on been widely adopted (e.g. Poznik et al. 2016); we all the Y-chromosomal sequences in our dataset, or consider in the Discussion the implications of alternative subsets of them. All showed a consistent structure, in mutation rates and some of the other simplifying which the Nigerian DE* chromosomes formed a clade assumptions we make here. branching from the DE lineage close to the divergence of D and E chromosomes (Figure 1A) in comparison with a We consider three scenarios based on our split-time set of Y chromosomes representing most of the world estimates of the Y-chromosomal lineages (Table S2). (Figure 1B). The Nigerian chromosomes had 489 derived First, between 101,000 years ago (divergence of the B SNPs exclusive to their branch in addition to a large and CT lineages) and 77,000 years ago (divergence of the deletion spanning ~118,000 bp (Y:28,457,736- DE and CF lineages) only one lineage with present-day 28,576,276). All DE-M154 chromosomes shared 29 SNPs. non-African descendants is present in the phylogeny (CT; 3 Figure 2A), so present-day Y lineage distributions could include the mutation rate uncertainty. We also assume be explained by migration of the single lineage CT out of that the mutation rate is constant over time and does not Africa, followed by back-migration of the D0 and E differ between lineages. The first assumption is very lineages between 71,000 years ago (origin of D0) and reasonable for the time period of most interest here, 59,000 years ago (divergence of E within Africa) (Figure 50,000-60,000 years, when the mutation rate averaged 2B). This and all other scenarios require migration out of over 45,000 years (Fu et al. 2014) is used. A flexible E-M35 after 47,000 years ago (its origin) and before mutation rate that assumed a real increase in recent 28,500 years ago (its divergence) in order to explain its times would not influence these estimates since the Fu presence outside Africa (Figure 2B-D). Second, between et al. rate already includes the last few centuries. 76,000 years ago (divergence between C and FT) and Differences in mutation rate between lineages need 73,000 years ago (divergence between D and E), three further investigation, but would not be sufficient to relevant lineages are present (the C, DE and FT lineages, affect the scenarios presented in Figure 2. For these Figure 2A), so migration out of these three followed by reasons, we believe that the Fu et al rate, averaged over back-migration of D0 and E as above (Figure 2C) would 45,000 years, is the appropriate one to use for the times explain the distributions. Third, between 71,000 years of interest here. ago (split of D and of D0) and 57,000 years ago These genetic times can be compared with dates from (divergence within FT), five relevant lineages are present, non-genetic sources for modern humans outside Africa. and migration out of three of these (C, D and FT) would The 45,000-year-old Siberian fossil (Fu et al. 2014) was explain the present-day distributions without requiring reliably dated using carbon-14, while a ~43,000-year-old back-migration (Figure 2D). For simplicity, we do not fragment of human maxilla from the Kent’s Cavern site include the short intervals between these three in the UK was dated using Bayesian modelling of scenarios of 500 years and 1,800 years (Figure 2A, Table stratigraphic, chronological and archaeological data S2). (Higham et al. 2011). Archaeological deposits at Boodie Cave in Australia were dated to ~50,000 years ago using DISCUSSION optically-stimulated luminescence (Veth et al. 2017). The new D0 data presented in this work are based on just Thus there is strong support for the widespread presence three Y chromosomes, but have far-ranging implications of modern humans outside Africa 45,000-50,000 years for the structure of the Y-chromosomal phylogeny and ago. Earlier dates have also been reported, for example hence male movements and migration out of Africa the Madjedbebe rock shelter in northern Australia dated more generally. Our phylogenetic results are consistent by optically-stimulated luminescence to at least 65,000 with three scenarios (Figure 2B-D), and we now consider years ago (Clarkson et al. 2017), a modern human some of the complexities associated with these, and how cranium from Tam Pa Ling, Laos was dated by Uranium- they fit with non-genetic data. Thorium to ~63,000 years ago (Demeter et al. 2012) and Complexities arise because although the phylogenetic 80 teeth from Fuyan Cave in southern China dated using structure, including the branching order, is very robust the same method to 80,000-120,000 years ago (Liu et al. (Wei et al. 2013; Hallast et al. 2015; Karmin et al. 2015; 2015), raising the possibility of a substantially earlier exit Poznik et al. 2016), its calibration depends entirely on the (Bae et al. 2017). Such early archaeological dates also, mutation rate used. The mutation rate chosen above, however, raise the question of whether or not the based on the number of mutations ‘missing’ in a 45,000- humans associated with them contributed genetically to year-old Siberian Y chromosome (Fu et al. 2014), has present-day populations (Mallick et al. 2016; Pagani et been widely adopted (Poznik et al. 2016; Balanovsky al. 2016). Archaeological data alone therefore do not 2017), but a large-scale study of Icelandic pedigrees provide an unequivocal date for the migration of the encompassing the last few centuries suggested a rate ancestors of present-day humans out of Africa. about 14% faster (Helgason et al. 2015). This faster All non-Africans carry around 2% Neanderthal DNA in mutation rate would translate directly into 14% more their genomes (Green et al. 2010), and Neanderthal recent time estimates so that, for example, the fossils have only been reported outside Africa. The migrations out of Africa in the three scenarios presented geographical distribution of Neanderthals thus suggests above would be 87,000-66,000, 65,000-63,000 and that the mixing probably occurred outside Africa, and the 61,000-49,000 years ago, respectively. These differences ubiquitous presence of Neanderthal DNA in present-day between mutation rates inferred in different ways may non-Africans is most easily explained if the mixing took be seen within the context of a wider debate about place once, soon after the migration out. This mixing has human mutation rates, previously based largely on been dated with some precision using the length of the autosomal data (Scally and Durbin 2012). Each mutation introgressed segments in the 45,000-year-old (43,210- rate is also accompanied by its own uncertainty, leading 46,880 years) Siberian (Ust’-Ishim) to 232-430 to the 95% confidence intervals in Table S2, which generations before he lived, i.e. 49,900-59,400 years ago 4 assuming a generation time of 29 years (Fu et al. 2014). Eurasian lineages into northern and central Africa (Haber If this date represented the time of the migration out of et al. 2016; D'atanasio et al. 2018), and is thus a poor Africa, it would exclude the first two scenarios (Figure 2B guide to structure before 10,000 years ago. Despite this, and 2C). Thus the combination of Y phylogenetic it is striking that western central Africa is the location of structure and dating of the out-of-Africa migration based the deepest-rooting A00 lineage in (Mendez on the 45,000-year-old Siberian fossil (Fu et al. 2014) et al. 2013), a major location of the A0 lineage in favors the third scenario (Figure 2D) involving the Cameroon, The Gambia and Ghana (Scozzari et al. 2014; migration out of C, D and FT between 50,300 years ago Poznik et al. 2016) and the D0 lineage in and (lower bound of the FT diversification, Table S2) and Guinea-Bissau (Weale et al. 2003; Rosa et al. 2007). This 59,400 years ago (upper bound of the introgression; see retention of the deepest Y-chromosomal diversity in Figure 3) which is in accordance with suggested models western central Africa contrasts with the autosomal incorporating an African origin of the DE lineages genetic structure, where the deepest roots have been (Underhill et al. 2001; Underhill and Roseman 2001). reported in southern African hunter-gatherers (Gronau According to this interpretation, the reported Tibetan et al. 2011; Schlebusch et al. 2012; Veeramah et al. 2012; DE* chromosomes (Shi et al. 2008) would most likely Mallick et al. 2016; Schlebusch et al. 2017; Skoglund et represent back-mutations or genotyping errors at the al. 2017), perhaps supporting the hypothesis of deep one SNP used to define haplogroup D, but require further population subdivision (Henn et al. 2018; Scerri et al. investigation. 2018). Analysis of ancient African DNA from 50,000- 100,000 years ago would provide conclusive information mtDNA sequences also provide a robust phylogeny on Y haplogroup distributions at this time, but is not which demonstrates that non-African mtDNAs descend currently available. In the meantime, further focus on from a single African branch with rapid diversification present-day Y-chromosomal lineages in central and outside Africa into the M and N lineages and many western Africa in order to understand more about deep subsequent branches (Ingman et al. 2000; Deviese et al. African lineages seems warranted, and the current study 2019). Dating using ancient mtDNA suggests a separation illustrates the broad insights that can sometimes be of non-African from African lineages after 62,000-95,000 revealed by very rare lineages. years ago (Fu et al. 2013), while an analysis of present- day mtDNAs suggested divergence outside Africa In conclusion, sequencing of the D0 Y chromosomes and 57,000-65,000 years ago (Fernandes et al. 2012). These placing them on a calibrated Y-chromosomal phylogeny estimates are based on <1% of the sequence length used identifies the most likely model of Y-chromosomal exit from the Y chromosome, but are nonetheless very from Africa: an origin of the DE lineage inside Africa and consistent. expansion out of the C, D and FT lineages. It suggests an This Discussion has thus far assumed that present-day exit time interval that overlaps with the time of distributions of Y haplogroups are relevant to events Neanderthal admixture estimated from autosomal 50,000-100,000 years ago and thus that Y analyses, and slightly refines it. These findings are phylogeography carries information informative about consistent with a shared history of Y chromosomes and the major migration out of Africa. Ancient population autosomes, and illustrate how study of Y lineages may structure within Africa that segregated C, D and FT from lead to general new insights. other Y haplogroups beginning after 76,000 years ago ACKNOWLEDGEMENTS with migration out only 50,000-59,000 years ago would also fit the evidence presented above. Present-day Y- We thank the sample donors for making this work chromosomal structure in Africa has been massively re- possible, and the Sanger DNAP Bespoke Sequencing shaped by events in the last 10,000 years, including the Team for especial efforts in generating sequences from Bantu-speaker expansion in central and southern Africa small amounts of DNA. Our work was supported by (Poznik et al. 2016; Patin et al. 2017) and entry of Wellcome (098051).

Balanovsky, O., 2017 Toward a consensus on SNP and LITERATURE CITED STR mutation rates on the human Y- chromosome. Hum. Genet. 136: 575-590. Altheide, T. K., and M. F. Hammer, 1997 Evidence for a Barbieri, C., A. Hubner, E. Macholdt, S. Ni, S. Lippold et possible Asian origin of YAP+ Y chromosomes. al., 2016 Refining the Y chromosome phylogeny Am. J. Hum. Genet. 61: 462-466. with southern African sequences. Hum Genet Bae, C. J., K. Douka and M. D. Petraglia, 2017 On the 135: 541-553. origin of modern humans: Asian perspectives. Bravi, C. M., G. Bailliet, V. L. Martinez-Marignac and N. O. Science 358. Bianchi, 2000 Origin of YAP+ lineages of the

5 human Y-chromosome. Am. J. Phys. Anthropol. covering the majority of known clades. Mol. Biol. 112: 149-158. Evol. 32: 661-673. Clarkson, C., Z. Jacobs, B. Marwick, R. Fullagar, L. Wallis Hammer, M. F., T. Karafet, A. Rasanayagam, E. T. Wood, et al., 2017 Human occupation of northern T. K. Altheide et al., 1998 Out of Africa and back Australia by 65,000 years ago. Nature 547: 306- again: nested cladistic analysis of human Y 310. chromosome variation. Mol. Biol. Evol. 15: 427- D'Atanasio, E., B. Trombetta, M. Bonito, A. Finocchio, G. 441. Di Vito et al., 2018 The peopling of the last Green Helgason, A., A. W. Einarsson, V. B. Guethmundsdottir, Sahara revealed by high-coverage resequencing A. Sigurethsson, E. D. Gunnarsdottir et al., 2015 of trans-Saharan patrilineages. Genome Biol. 19: The Y-chromosome point mutation rate in 20. humans. Nat. Genet. 47: 453-457. Demeter, F., L. L. Shackelford, A. M. Bacon, P. Duringer, Henn, B. M., T. E. Steele and T. D. Weaver, 2018 Clarifying K. Westaway et al., 2012 Anatomically modern distinct models of modern human origins in human in Southeast (Laos) by 46 ka. Proc Africa. Curr. Opin. Genet. Dev. 53: 148-156. Natl Acad Sci U S A 109: 14375-14380. Higham, T., T. Compton, C. Stringer, R. Jacobi, B. Shapiro Deviese, T., D. Massilani, S. Yi, D. Comeskey, S. Nagel et et al., 2011 The earliest evidence for al., 2019 Compound-specific radiocarbon dating anatomically modern humans in northwestern and mitochondrial DNA analysis of the . Nature 479: 521-524. Pleistocene hominin from Salkhit Mongolia. Nat Ingman, M., H. Kaessmann, S. Paabo and U. Gyllensten, Commun 10: 274. 2000 Mitochondrial genome variation and the Fernandes, V., F. Alshamali, M. Alves, M. D. Costa, J. B. origin of modern humans. Nature 408: 708-713. Pereira et al., 2012 The Arabian cradle: Karmin, M., L. Saag, M. Vicente, M. A. Wilson Sayres, M. mitochondrial relicts of the first steps along the Jarve et al., 2015 A recent bottleneck of Y southern route out of Africa. Am J Hum Genet chromosome diversity coincides with a global 90: 347-355. change in culture. Genome Res. 25: 459-466. Forster, P., R. Harding, A. Torroni and H. J. Bandelt, 1996 Lazaridis, I., D. Nadel, G. Rollefson, D. C. Merrett, N. Origin and evolution of Native American mtDNA Rohland et al., 2016 Genomic insights into the variation: a reappraisal. Am. J. Hum. Genet. 59: origin of farming in the ancient Near East. Nature 935-945. 536: 419-424. Fu, Q., H. Li, P. Moorjani, F. Jay, S. M. Slepchenko et al., Letunic, I., and P. Bork, 2016 Interactive tree of life (iTOL) 2014 Genome sequence of a 45,000-year-old v3: an online tool for the display and annotation modern human from western Siberia. Nature of phylogenetic and other trees. Nucleic Acids 514: 445-449. Res. 44: W242-245. Fu, Q., A. Mittnik, P. L. F. Johnson, K. Bos, M. Lari et al., Liu, W., M. Martinon-Torres, Y. J. Cai, S. Xing, H. W. Tong 2013 A revised timescale for human evolution et al., 2015 The earliest unequivocally modern based on ancient mitochondrial genomes. Curr humans in southern China. Nature 526: 696-699. Biol 23: 553-559. Mallick, S., H. Li, M. Lipson, I. Mathieson, M. Gymrek et Garrison, E., and G. Marth, 2012 Haplotype-based al., 2016 The Simons Genome Diversity Project: variant detection from short-read sequencing. 300 genomes from 142 diverse populations. arXiv. Nature 538: 201-206. Green, R. E., J. Krause, A. W. Briggs, T. Maricic, U. Stenzel Mendez, F. L., T. Krahn, B. Schrack, A. M. Krahn, K. R. et al., 2010 A draft sequence of the Neandertal Veeramah et al., 2013 An African American genome. Science 328: 710-722. paternal lineage adds an extremely ancient root Gronau, I., M. J. Hubisz, B. Gulko, C. G. Danko and A. to the human Y chromosome phylogenetic tree. Siepel, 2011 Bayesian inference of ancient Am. J. Hum. Genet. 92: 454-459. human demography from individual genome Mondal, M., A. Bergström, Y. Xue, F. Calafell, H. Laayouni sequences. Nat. Genet. 43: 1031-1034. et al., 2017 Y-chromosomal sequences of diverse Haber, M., M. Mezzavilla, A. Bergstrom, J. Prado- Indian populations and the ancestry of the Martinez, P. Hallast et al., 2016 Chad genetic Andamanese. Hum. Genet. 136: 499-510. diversity reveals an African history marked by Pagani, L., D. J. Lawson, E. Jagoda, A. Morseburg, A. multiple Holocene Eurasian migrations. Am. J. Eriksson et al., 2016 Genomic analyses inform on Hum. Genet. 99: 1316-1324. migration events during the peopling of . Hallast, P., C. Batini, D. Zadik, P. Maisano Delser, J. H. Nature 538: 238-242. Wetton et al., 2015 The Y-chromosome tree Patin, E., M. Lopez, R. Grollemund, P. Verdu, C. Harmant bursts into leaf: 13,000 high-confidence SNPs et al., 2017 Dispersals and genetic adaptation of

6 Bantu-speaking populations in Africa and North Shi, H., H. Zhong, Y. Peng, Y. L. Dong, X. B. Qi et al., 2008 America. Science 356: 543-546. Y chromosome evidence of earliest modern Patterson, N., P. Moorjani, Y. Luo, S. Mallick, N. Rohland human settlement in East Asia and multiple et al., 2012 Ancient admixture in human history. origins of Tibetan and Japanese populations. Genetics 192: 1065-1093. BMC Biol. 6: 45. Patterson, N., A. L. Price and D. Reich, 2006 Population Skoglund, P., J. C. Thompson, M. E. Prendergast, A. structure and eigenanalysis. PLoS Genet 2: e190. Mittnik, K. Sirak et al., 2017 Reconstructing Poznik, G. D., 2016 Identifying Y-chromosome prehistoric African population structure. Cell haplogroups in arbitrarily large samples of 171: 59-71 e21. sequenced or genotyped men. bioRxiv. Stamatakis, A., 2014 RAxML version 8: a tool for Poznik, G. D., B. M. Henn, M. C. Yee, E. Sliwerska, G. M. phylogenetic analysis and post-analysis of large Euskirchen et al., 2013 Sequencing Y phylogenies. Bioinformatics 30: 1312-1313. chromosomes resolves discrepancy in time to Underhill, P. A., G. Passarino, A. A. Lin, P. Shen, M. common ancestor of males versus females. Mirazon Lahr et al., 2001 The phylogeography of Science 341: 562-565. Y chromosome binary haplotypes and the origins Poznik, G. D., Y. Xue, F. L. Mendez, T. F. Willems, A. of modern human populations. Ann. Hum. Massaia et al., 2016 Punctuated bursts in human Genet. 65: 43-62. male demography inferred from 1,244 Underhill, P. A., and C. C. Roseman, 2001 The case for an worldwide Y-chromosome sequences. Nat. African rather than an Asian origin of the human Genet. 48: 593-599. Y-chromosome YAP insertion, pp. 43-56 in R Core Team, 2017 R: A language and environment for Genetic, Linguistic and Archaeological statistical computing, pp. R Foundation for Perspectives on Human Diversity in Southeast Statistical Computing, Vienna, Austria. Asia: Recent Advances in Human Biology, edited Rosa, A., C. Ornelas, M. A. Jobling, A. Brehm and R. by L. Jin, M. Seielstad and C. Xiao. World Villems, 2007 Y-chromosomal diversity in the Scientific Publishing, Singapore. population of Guinea-Bissau: a multiethnic van Oven, M., and M. Kayser, 2009 Updated perspective. BMC Evol. Biol. 7: 124. comprehensive phylogenetic tree of global Scally, A., and R. Durbin, 2012 Revising the human human mitochondrial DNA variation. Hum. mutation rate: implications for understanding Mutat. 30: E386-394. human evolution. Nat. Rev. Genet. 13: 745-753. Veeramah, K. R., D. Wegmann, A. Woerner, F. L. Mendez, Scerri, E. M. L., M. G. Thomas, A. Manica, P. Gunz, J. T. J. C. Watkins et al., 2012 An early divergence of Stock et al., 2018 Did our species evolve in KhoeSan ancestors from those of other modern subdivided populations across Africa, and why humans is supported by an ABC-based analysis does it matter? Trends Ecol. Evol. 33: 582-594. of autosomal resequencing data. Mol. Biol. Evol. Schlebusch, C. M., H. Malmstrom, T. Gunther, P. Sjodin, 29: 617-630. A. Coutinho et al., 2017 Southern African ancient Veth, P., I. Ward, T. Manne, S. Ulm, K. Ditchfield et al., genomes estimate modern human divergence to 2017 Early human occupation of a maritime 350,000 to 260,000 years ago. Science 358: 652- desert, Barrow Island, North-West Australia. 655. Quat. Sci. Rev. 168: 19-29. Schlebusch, C. M., P. Skoglund, P. Sjodin, L. M. Weale, M. E., T. Shah, A. L. Jones, J. Greenhalgh, J. F. Gattepaille, D. Hernandez et al., 2012 Genomic Wilson et al., 2003 Rare deep-rooting Y variation in seven Khoe-San groups reveals chromosome lineages in humans: lessons for adaptation and complex African history. Science phylogeography. Genetics 165: 229-234. 338: 374-379. Wei, W., Q. Ayub, Y. Chen, S. McCarthy, Y. Hou et al., Scozzari, R., A. Massaia, B. Trombetta, G. Bellusci, N. M. 2013 A calibrated human Y-chromosomal Myres et al., 2014 An unbiased resource of novel phylogeny based on resequencing. Genome Res. SNP markers provides a new chronology for the 23: 388-395. human Y chromosome and reveals a deep Xue, Y., T. Zerjal, W. Bao, S. Zhu, Q. Shu et al., 2006 Male phylogenetic structure in Africa. Genome Res. demography in East Asia: a north-south contrast 24: 535-544. in human population expansion times. Genetics 172: 2431-2439.

7

Figure 1 Y Chromosome phylogenetic tree from worldwide samples. (A) A maximum-likelihood tree of 180 Y Chromosome sequences from worldwide populations. Different branch colours and symbols represent different haplogroups assigned based on ISOGG v11.01. The Nigerian chromosomes sequenced in this study are highlighted in blue and assigned to the novel D0 haplogroup. Bootstrap values from 1000 replications are shown on the branches. (B) Map showing location of the studied individuals with coloured symbols reflecting the haplogroups assigned in Figure 1A. The clade consisting of the D0 and D haplogroups is represented by blue squares and is observed in Africa and East Asia. (C) Ages of the nodes leading to haplogroup D0 in the phylogenetic tree (point estimates; branch lengths are not to scale). Haplogroups D0 and D are estimated to have split 71,400 (63,100-81,000) years ago while the D0 individuals in this study coalesced 2,500 (2,200-2,800) years ago.

Figure 2 Models for the early movements of Y-chromosomal haplogroups out of Africa and back. (A) Simplified Y- chromosomal phylogeny showing the key lineages, including D0. Lineages currently located in Africa are coloured yellow; those currently outside Africa are blue. Triangle widths are not meaningful, except that they show that E and FT are the predominant lineages inside and outside Africa, respectively. The small orange triangle in FT represents the R1b-V88 back-to-Africa migration that took place after the time period considered here. (B-D) Models for lineage movements that could lead to the present-day African or non-African distributions of lineages, using point estimates of dates derived from the phylogeny (see Table S2). The three models represent migrations out of Africa at different

8 time intervals, indicated by the purple, brown and green shading in A. Arrows in B-D indicate inter-continental movements and their direction, but do not represent particular locations or routes. The first coloured arrow(s) represent the lineage(s) that migrated out, during the time period shown at the top of the maps. Additional uncoloured arrows represent subsequent migrations and their time intervals needed to produce the present-day distributions.

Figure 3 Estimation of the time of the out-of-Africa migration incorporating information from Y-chromosomal lineages (green, this work), archaeological dates (brown, Fu et al. 2014) and ancient DNA (red, Fu et al. 2014).

Figure S1 Y-chromosomal phylogeny as understood before the current study. This simplified phylogeny shows the lineages discussed in the Introduction as relevant to the understanding of the expansion of Y chromosomes out of Africa. Lineages currently located in Africa are coloured yellow; those currently outside Africa are blue.

9

Figure S2 PCA of worldwide populations. (A) PCA using ~600K genome-wide SNPs from global populations shows the Nigerian D0 individuals cluster with sub-Saharan Africans. (B) Magnified area of the PCA containing the Nigerian samples and other genetically close populations.

Figure S3 Number of mutations from the AB root. Violin plots show heterogeneity in the number of mutations that have accumulated within different Y haplogroups.

10 Table S1. SNPs defining the D0 haplogroup Ancestral Haplogroup Haplogroup Haplogroup CHR POS REF ALT Allele D0 D E Notes Y 2668496 G T G T G G SNP on branch D0 Y 2684790 G A,T G A G G SNP on branch D0 Y 2746443 T G T G T T SNP on branch D0 Y 2759823 C A,G C G C C SNP on branch D0 Y 2763977 T C,G T G T T SNP on branch D0 Y 2774691 C T C T C C SNP on branch D0 Y 2794108 C G C G C C SNP on branch D0 Y 2824252 G A G A G G SNP on branch D0 Y 2848910 A G A G A A SNP on branch D0 Y 2850963 C T C T C C SNP on branch D0 Y 2853798 C A C A C C SNP on branch D0 Y 6676225 C T C T C C SNP on branch D0 Y 6708971 G A G A G G SNP on branch D0 Y 6721247 C A,T C T C C SNP on branch D0 Y 6747175 C T C T C C SNP on branch D0 Y 6747201 T A T A T T SNP on branch D0 Y 6753316 C A,G C G C C SNP on branch D0 Y 6797098 A G A G A A SNP on branch D0 Y 6821366 T C T C T T SNP on branch D0 Y 6831372 A C,G A G A A SNP on branch D0 Y 6868284 C G C G C C SNP on branch D0 Y 6889983 C T C T C C SNP on branch D0 Y 6893436 G A G A G G SNP on branch D0 Y 6895235 G A,C G A G G SNP on branch D0 Y 6917321 T C,G T C T T SNP on branch D0 Y 6941294 A T A T A A SNP on branch D0 Y 6949806 G A G A G G SNP on branch D0 Y 7011465 G A G A G G SNP on branch D0 Y 7034919 C A C A C C SNP on branch D0 Y 7079439 G A G A G G SNP on branch D0 Y 7188219 A T A T A A SNP on branch D0 Y 7210681 C T C T C C SNP on branch D0 Y 7242606 G C G C G G SNP on branch D0 Y 7255622 A T A T A A SNP on branch D0 Y 7265843 T C T C T T SNP on branch D0 Y 7270582 G T G T G G SNP on branch D0 Y 7273551 G A G A G G SNP on branch D0 Y 7277533 A G A G A A SNP on branch D0 Y 7341810 C T C T C C SNP on branch D0 Y 7347990 A C,G A G A A SNP on branch D0 Y 7378003 A G A G A A SNP on branch D0 Y 7410122 C T C T C C SNP on branch D0 Y 7570251 A G A G A A SNP on branch D0 Y 7578738 G A,C G A G G SNP on branch D0 Y 7579824 G C G C G G SNP on branch D0 Y 7582262 G T G T G G SNP on branch D0 Y 7620047 A G A G A A SNP on branch D0

11 Y 7622045 C T C T C C SNP on branch D0 Y 7678762 A T A T A A SNP on branch D0 Y 7694383 A C A C A A SNP on branch D0 Y 7716819 A T A T A A SNP on branch D0 Y 7774460 G A G A G G SNP on branch D0 Y 7774466 A C A C A A SNP on branch D0 Y 7774495 G A G A G G SNP on branch D0 Y 7801104 C G C G C C SNP on branch D0 Y 7851738 C G,T C G C C SNP on branch D0 Y 7876921 A G A G A A SNP on branch D0 Y 7893690 C A C A C C SNP on branch D0 Y 7901352 G T G T G G SNP on branch D0 Y 7929183 G T G T G G SNP on branch D0 Y 7982853 A G A G A A SNP on branch D0 Y 8016678 C T C T C C SNP on branch D0 Y 8058955 G A G A G G SNP on branch D0 Y 8060600 A G A G A A SNP on branch D0 Y 8063190 G C,T G T G G SNP on branch D0 Y 8081026 A G A G A A SNP on branch D0 Y 8088679 C G C G C C SNP on branch D0 Y 8090376 A G A G A A SNP on branch D0 Y 8095674 A G A G A A SNP on branch D0 Y 8121376 G A G A G G SNP on branch D0 Y 8155013 C G C G C C SNP on branch D0 Y 8156277 G A G A G G SNP on branch D0 Y 8174610 G A G A G G SNP on branch D0 Y 8184538 A G A G A A SNP on branch D0 Y 8200057 T C T C T T SNP on branch D0 Y 8226806 A G,T A T A A SNP on branch D0 Y 8227145 T A,C T C T T SNP on branch D0 Y 8240623 C A,T C T C C SNP on branch D0 Y 8254941 A C A C A A SNP on branch D0 Y 8280068 C T C T C C SNP on branch D0 Y 8310894 A G A G A A SNP on branch D0 Y 8323918 G T G T G G SNP on branch D0 Y 8329083 T C T C T T SNP on branch D0 Y 8343705 G A G A G G SNP on branch D0 Y 8356071 A C A C A A SNP on branch D0 Y 8361702 C T C T C C SNP on branch D0 Y 8364936 T C T C T T SNP on branch D0 Y 8372651 A G A G A A SNP on branch D0 Y 8445499 C G C G C C SNP on branch D0 Y 8465733 C T C T C C SNP on branch D0 Y 8466101 C T C T C C SNP on branch D0 Y 8467167 G A G A G G SNP on branch D0 Y 8503452 C G C G C C SNP on branch D0 Y 8538145 G T G T G G SNP on branch D0 Y 8575106 T A T A T T SNP on branch D0 Y 8584879 A G A G A A SNP on branch D0 Y 8602076 A C A C A A SNP on branch D0 12 Y 8691053 A C A C A A SNP on branch D0 Y 8722220 C T C T C C SNP on branch D0 Y 8793332 C T C T C C SNP on branch D0 Y 8794276 T G T G T T SNP on branch D0 Y 8797330 G A G A G G SNP on branch D0 Y 8824517 T C,G T C T T SNP on branch D0 Y 8839643 G A G A G G SNP on branch D0 Y 8842697 C T C T C C SNP on branch D0 Y 8856116 C T C T C C SNP on branch D0 Y 8859409 G A G A G G SNP on branch D0 Y 8901707 A T A T A A SNP on branch D0 Y 8989907 G C G C G G SNP on branch D0 Y 9017409 G C G C G G SNP on branch D0 Y 9025308 C A C A C C SNP on branch D0 Y 9037457 G A G A G G SNP on branch D0 Y 9038621 A G A G A A SNP on branch D0 Y 9143448 G A G A G G SNP on branch D0 Y 9393586 A G A G A A SNP on branch D0 Y 9446396 G C G C G G SNP on branch D0 Y 9799632 C G C G C C SNP on branch D0 Y 9806232 A C,T A T A A SNP on branch D0 Y 9832852 T A,C,G T G T T SNP on branch D0 Y 9836568 T G T G T T SNP on branch D0 Y 9869610 C T C T C C SNP on branch D0 Y 13888821 C T C T C C SNP on branch D0 Y 13930633 A C A C A A SNP on branch D0 Y 13992933 T G T G T T SNP on branch D0 Y 13994915 A T A T A A SNP on branch D0 Y 14002365 G A G A G G SNP on branch D0 Y 14003204 G A G A G G SNP on branch D0 Y 14021992 G A G A G G SNP on branch D0 Y 14053839 C T C T C C SNP on branch D0 Y 14081375 C T C T C C SNP on branch D0 Y 14097346 G A G A G G SNP on branch D0 Y 14103822 T C T C T T SNP on branch D0 Y 14109752 C T C T C C SNP on branch D0 Y 14134168 G A,C G A G G SNP on branch D0 Y 14141253 G A G A G G SNP on branch D0 Y 14179641 C T C T C C SNP on branch D0 Y 14187102 T C T C T T SNP on branch D0 Y 14194942 C A C A C C SNP on branch D0 Y 14208960 C T C T C C SNP on branch D0 Y 14211461 A G A G A A SNP on branch D0 Y 14235752 G A G A G G SNP on branch D0 Y 14237523 T C T C T T SNP on branch D0 Y 14245933 A C,T A T A A SNP on branch D0 Y 14246908 G A G A G G SNP on branch D0 Y 14248863 G T G T G G SNP on branch D0 Y 14257701 C G C G C C SNP on branch D0 Y 14260577 A T A T A A SNP on branch D0 13 Y 14280476 T C T C T T SNP on branch D0 Y 14288503 T C T C T T SNP on branch D0 Y 14288505 A T A T A A SNP on branch D0 Y 14305631 G C G C G G SNP on branch D0 Y 14312629 A G A G A A SNP on branch D0 Y 14318958 T C T C T T SNP on branch D0 Y 14330845 G A G A G G SNP on branch D0 Y 14374090 A T A T A A SNP on branch D0 Y 14374809 G A G A G G SNP on branch D0 Y 14399354 G A G A G G SNP on branch D0 Y 14414388 G T G T G G SNP on branch D0 Y 14419460 G C,T G C G G SNP on branch D0 Y 14450770 C T C T C C SNP on branch D0 Y 14500439 G A G A G G SNP on branch D0 Y 14525419 C T C T C C SNP on branch D0 Y 14585420 C G C G C C SNP on branch D0 Y 14609328 G A,C G A G G SNP on branch D0 Y 14666408 G A G A G G SNP on branch D0 Y 14670529 C T C T C C SNP on branch D0 Y 14719988 G A G A G G SNP on branch D0 Y 14747047 C T C T C C SNP on branch D0 Y 14749175 A G A G A A SNP on branch D0 Y 14779913 T C T C T T SNP on branch D0 Y 14804607 T C T C T T SNP on branch D0 Y 14871358 G T G T G G SNP on branch D0 Y 14880739 G A G A G G SNP on branch D0 Y 14899954 G C G C G G SNP on branch D0 Y 14906414 C A C A C C SNP on branch D0 Y 14932294 T A T A T T SNP on branch D0 Y 14937914 T C T C T T SNP on branch D0 Y 14939843 C T C T C C SNP on branch D0 Y 14958661 T G T G T T SNP on branch D0 Y 14968252 G C G C G G SNP on branch D0 Y 14988080 C A C A C C SNP on branch D0 Y 15008294 G A,T G A G G SNP on branch D0 Y 15027821 A C A C A A SNP on branch D0 Y 15091079 A G A G A A SNP on branch D0 Y 15093392 T C T C T T SNP on branch D0 Y 15134900 A C,T A T A A SNP on branch D0 Y 15156004 G A G A G G SNP on branch D0 Y 15263809 G A G A G G SNP on branch D0 Y 15267283 C T C T C C SNP on branch D0 Y 15304575 T C T C T T SNP on branch D0 Y 15329617 C A C A C C SNP on branch D0 Y 15346240 C T C T C C SNP on branch D0 Y 15359753 C T C T C C SNP on branch D0 Y 15361848 G A G A G G SNP on branch D0 Y 15386553 C T C T C C SNP on branch D0 Y 15405581 A G A G A A SNP on branch D0 Y 15407380 C G C G C C SNP on branch D0 14 Y 15409664 T C T C T T SNP on branch D0 Y 15421012 C T C T C C SNP on branch D0 Y 15430391 T C T C T T SNP on branch D0 Y 15454212 T C T C T T SNP on branch D0 Y 15485975 T C T C T T SNP on branch D0 Y 15492613 A G A G A A SNP on branch D0 Y 15556620 G A G A G G SNP on branch D0 Y 15565074 C G C G C C SNP on branch D0 Y 15569204 T C T C T T SNP on branch D0 Y 15573158 G A G A G G SNP on branch D0 Y 15576256 T A,C T C T T SNP on branch D0 Y 15586075 T C,G T C T T SNP on branch D0 Y 15590430 C A C A C C SNP on branch D0 Y 15606025 C T C T C C SNP on branch D0 Y 15649155 C T C T C C SNP on branch D0 Y 15650150 A C,G A G A A SNP on branch D0 Y 15650584 C G C G C C SNP on branch D0 Y 15654719 A G A G A A SNP on branch D0 Y 15654720 G A G A G G SNP on branch D0 Y 15720552 G T G T G G SNP on branch D0 Y 15728677 T C T C T T SNP on branch D0 Y 15741085 G A,C G A G G SNP on branch D0 Y 15753305 G A G A G G SNP on branch D0 Y 15759024 T C T C T T SNP on branch D0 Y 15773171 C T C T C C SNP on branch D0 Y 15813009 C A C A C C SNP on branch D0 Y 15857645 G A G A G G SNP on branch D0 Y 15877770 A C,G A G A A SNP on branch D0 Y 15904335 A G A G A A SNP on branch D0 Y 15915838 G A G A G G SNP on branch D0 Y 15919672 T C T C T T SNP on branch D0 Y 15928533 A C,G A G A A SNP on branch D0 Y 15940113 C T C T C C SNP on branch D0 Y 15949650 T C T C T T SNP on branch D0 Y 15976108 C T C T C C SNP on branch D0 Y 15995957 C T C T C C SNP on branch D0 Y 15997967 C A,T C A C C SNP on branch D0 Y 16207875 T C T C T T SNP on branch D0 Y 16215815 C T C T C C SNP on branch D0 Y 16247178 C T C T C C SNP on branch D0 Y 16250850 T A,C T C T T SNP on branch D0 Y 16285874 C T C T C C SNP on branch D0 Y 16304373 C A,T C T C C SNP on branch D0 Y 16312046 A G A G A A SNP on branch D0 Y 16351218 C T C T C C SNP on branch D0 Y 16362847 G A G A G G SNP on branch D0 Y 16398835 G C G C G G SNP on branch D0 Y 16428359 C T C T C C SNP on branch D0 Y 16435213 A C A C A A SNP on branch D0 Y 16500555 A C A C A A SNP on branch D0 15 Y 16504575 T A T A T T SNP on branch D0 Y 16508626 G A G A G G SNP on branch D0 Y 16547686 G A G A G G SNP on branch D0 Y 16578892 A G A G A A SNP on branch D0 Y 16648170 A T A T A A SNP on branch D0 Y 16688265 G A G A G G SNP on branch D0 Y 16756545 C A C A C C SNP on branch D0 Y 16766202 G T G T G G SNP on branch D0 Y 16790529 C A C A C C SNP on branch D0 Y 16801665 C A,T C T C C SNP on branch D0 Y 16835918 T G T G T T SNP on branch D0 Y 16852006 C A C A C C SNP on branch D0 Y 16874041 A T A T A A SNP on branch D0 Y 16929836 G T G T G G SNP on branch D0 Y 16972336 C A,T C T C C SNP on branch D0 Y 16975173 C T C T C C SNP on branch D0 Y 16981686 G T G T G G SNP on branch D0 Y 16982518 T C T C T T SNP on branch D0 Y 16999327 A C A C A A SNP on branch D0 Y 17070326 C T C T C C SNP on branch D0 Y 17098346 C T C T C C SNP on branch D0 Y 17111150 C T C T C C SNP on branch D0 Y 17131671 G T G T G G SNP on branch D0 Y 17152734 G T G T G G SNP on branch D0 Y 17221253 G A G A G G SNP on branch D0 Y 17232821 A G A G A A SNP on branch D0 Y 17238111 G A G A G G SNP on branch D0 Y 17277598 A G A G A A SNP on branch D0 Y 17300065 C A,T C T C C SNP on branch D0 Y 17323499 G C,T G C G G SNP on branch D0 Y 17372204 C T C T C C SNP on branch D0 Y 17420429 G A G A G G SNP on branch D0 Y 17426707 T C T C T T SNP on branch D0 Y 17465843 G A G A G G SNP on branch D0 Y 17496835 G A,C G A G G SNP on branch D0 Y 17530902 A C A C A A SNP on branch D0 Y 17546996 C T C T C C SNP on branch D0 Y 17569596 G A G A G G SNP on branch D0 Y 17572599 G C G C G G SNP on branch D0 Y 17583888 G T G T G G SNP on branch D0 Y 17600941 T A T A T T SNP on branch D0 Y 17605224 C A C A C C SNP on branch D0 Y 17611578 A G A G A A SNP on branch D0 Y 17612716 C A C A C C SNP on branch D0 Y 17616784 A C A C A A SNP on branch D0 Y 17625292 T C T C T T SNP on branch D0 Y 17631594 C T C T C C SNP on branch D0 Y 17635514 G A G A G G SNP on branch D0 Y 17673481 A G A G A A SNP on branch D0 Y 17708490 A C,G A G A A SNP on branch D0 16 Y 17713132 T A,C T C T T SNP on branch D0 Y 17730899 T A T A T T SNP on branch D0 Y 17774409 G A G A G G SNP on branch D0 Y 17795962 G A G A G G SNP on branch D0 Y 17837771 C T C T C C SNP on branch D0 Y 17839512 T C T C T T SNP on branch D0 Y 17840656 A G A G A A SNP on branch D0 Y 17854404 G A G A G G SNP on branch D0 Y 17871478 A G A G A A SNP on branch D0 Y 17905232 C G C G C C SNP on branch D0 Y 17957591 T C T C T T SNP on branch D0 Y 17963043 G T G T G G SNP on branch D0 Y 17965193 A C A C A A SNP on branch D0 Y 18063984 C T C T C C SNP on branch D0 Y 18066111 C G C G C C SNP on branch D0 Y 18096914 G C,T G C G G SNP on branch D0 Y 18101403 C T C T C C SNP on branch D0 Y 18159807 C A C A C C SNP on branch D0 Y 18206875 C T C T C C SNP on branch D0 Y 18211128 T C T C T T SNP on branch D0 Y 18254196 C T C T C C SNP on branch D0 Y 18562963 T C T C T T SNP on branch D0 Y 18592596 G A G A G G SNP on branch D0 Y 18620031 G A,T G A G G SNP on branch D0 Y 18687010 C A C A C C SNP on branch D0 Y 18708674 C A,T C T C C SNP on branch D0 Y 18749926 T C T C T T SNP on branch D0 Y 18753056 G A G A G G SNP on branch D0 Y 18777364 A G A G A A SNP on branch D0 Y 18806613 C T C T C C SNP on branch D0 Y 18889866 G T G T G G SNP on branch D0 Y 18911543 A G A G A A SNP on branch D0 Y 18964792 G C G C G G SNP on branch D0 Y 18966290 G A G A G G SNP on branch D0 Y 19025196 A G A G A A SNP on branch D0 Y 19092442 C T C T C C SNP on branch D0 Y 19133279 T C T C T T SNP on branch D0 Y 19134383 C T C T C C SNP on branch D0 Y 19155428 G T G T G G SNP on branch D0 Y 19160994 G A G A G G SNP on branch D0 Y 19209312 C T C T C C SNP on branch D0 Y 19267189 A C,G A G A A SNP on branch D0 Y 19277949 T G T G T T SNP on branch D0 Y 19284117 C T C T C C SNP on branch D0 Y 19328534 A T A T A A SNP on branch D0 Y 19367635 T G T G T T SNP on branch D0 Y 19411026 A G A G A A SNP on branch D0 Y 19424791 C T C T C C SNP on branch D0 Y 19426445 C T C T C C SNP on branch D0 Y 19437058 C A,G,T C T C C SNP on branch D0 17 Y 19438023 G A G A G G SNP on branch D0 Y 19448656 C T C T C C SNP on branch D0 Y 19495738 G A G A G G SNP on branch D0 Y 19496243 T C,G T C T T SNP on branch D0 Y 19499938 G A G A G G SNP on branch D0 Y 19501594 C T C T C C SNP on branch D0 Y 19509942 C G,T C T C C SNP on branch D0 Y 19527979 C A C A C C SNP on branch D0 Y 19538674 T C T C T T SNP on branch D0 Y 21073079 C A C A C C SNP on branch D0 Y 21082726 C T C T C C SNP on branch D0 Y 21134961 G A,C G A G G SNP on branch D0 Y 21138948 G T G T G G SNP on branch D0 Y 21143021 A C A C A A SNP on branch D0 Y 21143443 T C,G T C T T SNP on branch D0 Y 21250107 G A G A G G SNP on branch D0 Y 21260548 G A G A G G SNP on branch D0 Y 21263431 C T C T C C SNP on branch D0 Y 21270765 T A T A T T SNP on branch D0 Y 21282240 T C T C T T SNP on branch D0 Y 21284949 A C,G A G A A SNP on branch D0 Y 21297607 G A G A G G SNP on branch D0 Y 21318800 G A G A G G SNP on branch D0 Y 21326964 C T C T C C SNP on branch D0 Y 21353436 G A G A G G SNP on branch D0 Y 21369802 A G A G A A SNP on branch D0 Y 21379315 T G T G T T SNP on branch D0 Y 21447391 A T A T A A SNP on branch D0 Y 21472585 A T A T A A SNP on branch D0 Y 21472598 A G A G A A SNP on branch D0 Y 21481057 G A G A G G SNP on branch D0 Y 21491737 A G A G A A SNP on branch D0 Y 21493978 G T G T G G SNP on branch D0 Y 21537520 A C C A A A SNP on branch D0 Y 21539806 G A,T G A G G SNP on branch D0 Y 21545069 G A G A G G SNP on branch D0 Y 21551016 C T C T C C SNP on branch D0 Y 21555791 G C G C G G SNP on branch D0 Y 21559261 G T G T G G SNP on branch D0 Y 21562163 A C A C A A SNP on branch D0 Y 21571786 C T C T C C SNP on branch D0 Y 21579488 T C T C T T SNP on branch D0 Y 21626431 C A C A C C SNP on branch D0 Y 21635462 G A G A G G SNP on branch D0 Y 21674409 A G A G A A SNP on branch D0 Y 21677887 A G A G A A SNP on branch D0 Y 21686075 G A G A G G SNP on branch D0 Y 21718932 C A C A C C SNP on branch D0 Y 21738640 A T A T A A SNP on branch D0 Y 21817774 G T G T G G SNP on branch D0 18 Y 21862376 G A G A G G SNP on branch D0 Y 21870701 A G A G A A SNP on branch D0 Y 21888494 G A G A G G SNP on branch D0 Y 21914507 C T C T C C SNP on branch D0 Y 21963685 C A C A C C SNP on branch D0 Y 21980440 A T A T A A SNP on branch D0 Y 21991481 C T C T C C SNP on branch D0 Y 22023057 T C T C T T SNP on branch D0 Y 22066023 C T C T C C SNP on branch D0 Y 22090416 C A C A C C SNP on branch D0 Y 22091959 A C,G A G A A SNP on branch D0 Y 22095736 A C A C A A SNP on branch D0 Y 22204423 C A C A C C SNP on branch D0 Y 22206482 T C T C T T SNP on branch D0 Y 22545071 A C A C A A SNP on branch D0 Y 22554520 G T G T G G SNP on branch D0 Y 22570658 C G C G C C SNP on branch D0 Y 22656596 T C T C T T SNP on branch D0 Y 22695457 C A,T C T C C SNP on branch D0 Y 22695458 A G A G A A SNP on branch D0 Y 22713623 G A,C G A G G SNP on branch D0 Y 22741850 C G C G C C SNP on branch D0 Y 22762400 C T C T C C SNP on branch D0 Y 22795562 C A C A C C SNP on branch D0 Y 22828227 C A C A C C SNP on branch D0 Y 22831056 C T C T C C SNP on branch D0 Y 22834876 A G A G A A SNP on branch D0 Y 22859408 G T G T G G SNP on branch D0 Y 22877026 A G A G A A SNP on branch D0 Y 22891209 A G A G A A SNP on branch D0 Y 22957062 T G T G T T SNP on branch D0 Y 22981592 G A G A G G SNP on branch D0 Y 22983357 A T A T A A SNP on branch D0 Y 22989279 T C T C T T SNP on branch D0 Y 22989825 T G T G T T SNP on branch D0 Y 23021904 A G A G A A SNP on branch D0 Y 23036719 C A,T C T C C SNP on branch D0 Y 23045034 T G T G T T SNP on branch D0 Y 23046699 A G A G A A SNP on branch D0 Y 23056406 G C G C G G SNP on branch D0 Y 23079345 G A G A G G SNP on branch D0 Y 23089277 C T C T C C SNP on branch D0 Y 23096248 G A G A G G SNP on branch D0 Y 23098745 A G A G A A SNP on branch D0 Y 23108446 G A G A G G SNP on branch D0 Y 23111012 C A C A C C SNP on branch D0 Y 23122442 A T A T A A SNP on branch D0 Y 23124543 G A G A G G SNP on branch D0 Y 23144492 G A,T G A G G SNP on branch D0 Y 23146811 G C G C G G SNP on branch D0 19 Y 23154044 C T C T C C SNP on branch D0 Y 23155355 A G A G A A SNP on branch D0 Y 23273837 C G C G C C SNP on branch D0 Y 23292618 C T C T C C SNP on branch D0 Y 23348356 T C,G T G T T SNP on branch D0 Y 23350698 T C T C T T SNP on branch D0 Y 23369345 G A G A G G SNP on branch D0 Y 23377470 A G A G A A SNP on branch D0 Y 23382836 C A C A C C SNP on branch D0 Y 23436143 A C,T A T A A SNP on branch D0 Y 23441139 G C G C G G SNP on branch D0 Y 23450248 A G A G A A SNP on branch D0 Y 23462529 G A G A G G SNP on branch D0 Y 23499910 A G A G A A SNP on branch D0 Y 23504298 A T A T A A SNP on branch D0 Y 23523496 G C G C G G SNP on branch D0 Y 23575674 G A G A G G SNP on branch D0 Y 23583903 T C,G T C T T SNP on branch D0 Y 23594139 C T C T C C SNP on branch D0 Y 23749105 T C,G T C T T SNP on branch D0 Y 23758375 C G C G C C SNP on branch D0 Y 23761946 A G A G A A SNP on branch D0 Y 23797822 C A,T C T C C SNP on branch D0 Y 23849932 G A G A G G SNP on branch D0 Y 23852103 G A G A G G SNP on branch D0 Y 23857560 A C A C A A SNP on branch D0 Y 23860258 A G A G A A SNP on branch D0 Y 23872967 G A G A G G SNP on branch D0 Y 23968194 C T C T C C SNP on branch D0 Y 24459882 A G A G A A SNP on branch D0 Y 28604656 G C G C G G SNP on branch D0 Y 28615774 G T G T G G SNP on branch D0 Y 28639397 T C T C T T SNP on branch D0 Y 28654892 G A G A G G SNP on branch D0 Y 28655745 T C T C T T SNP on branch D0 Y 28676222 G A G A G G SNP on branch D0 Y 28705352 C T C T C C SNP on branch D0 Y 28731389 G A G A G G SNP on branch D0 Y 28741469 C T C T C C SNP on branch D0 Y 28747656 T A T A T T SNP on branch D0 Y 28750611 T A,G T A T T SNP on branch D0 Y 28761112 G A G A G G SNP on branch D0 Y 8685674 C T C T C C SNP on branch D0 recurs elsewhere (C1b2a) Y 2708779 A C,G A G A A SNP on branch D0 recurs elsewhere (F2) SNP on branch D0 recurs on E (or back mutation on Y 23554761 A C A C A C D) Y 15281995 G A G A A G SNP on branch D0|D Y 15343571 T C T C C T SNP on branch D0|D Y 15343575 T G T G G T SNP on branch D0|D Y 17545328 C T C T T C SNP on branch D0|D

20 Y 17839184 C T C T T C SNP on branch D0|D Y 19025435 A T A T T A SNP on branch D0|D Y 23286394 T C T C C T SNP on branch D0|D Y 2728927 A G A G G G SNP on branch D0|D|E Y 6931590 T A,C,G T C C C SNP on branch D0|D|E Y 7169899 C T C T T T SNP on branch D0|D|E Y 7320623 G A G A A A SNP on branch D0|D|E Y 7332132 C T C T T T SNP on branch D0|D|E Y 7540450 A G A G G G SNP on branch D0|D|E Y 8072664 C T C T T T SNP on branch D0|D|E Y 8816007 T G T G G G SNP on branch D0|D|E Y 9142309 C T C T T T SNP on branch D0|D|E Y 9851457 G T G T T T SNP on branch D0|D|E Y 14842223 C T C T T T SNP on branch D0|D|E Y 15543875 C A,G,T C T T T SNP on branch D0|D|E Y 15590674 T A T A A A SNP on branch D0|D|E Y 15591537 G A,C,T G C C C SNP on branch D0|D|E Y 17011456 G A,T G A A A SNP on branch D0|D|E Y 18561042 T G T G G G SNP on branch D0|D|E Y 18907927 A G A G G G SNP on branch D0|D|E Y 19338287 T C T C C C SNP on branch D0|D|E Y 19370916 G A G A A A SNP on branch D0|D|E Y 21227423 G T G T T T SNP on branch D0|D|E Y 21306315 G A G A A A SNP on branch D0|D|E Y 21398584 T A,G T A A A SNP on branch D0|D|E Y 21458735 A G,T A G G G SNP on branch D0|D|E Y 21582351 A C,G A G G G SNP on branch D0|D|E Y 21717208 C T C T T T SNP on branch D0|D|E Y 22014732 C A,G C G G G SNP on branch D0|D|E Y 22721522 T C T C C C SNP on branch D0|D|E Y 23109543 T C,G T C C C SNP on branch D0|D|E Y 24392653 C G C G G G SNP on branch D0|D|E

21

Table S2. Split-time estimates using the ρ statistic 95% CI‡ Haplogroup 1† Haplogroup 2 T (kya) Lower Upper AB CT 101.1 89.4 114.7 DE CF 76.5 67.6 86.7 C F 76.0 67.2 86.2 D0|D E 73.2 64.7 83.0 D0 D 71.4 63.1 81.0 Nigerian-1|Nigerian-2 Nigerian-3 2.5 2.2 2.8 Nigerian-1 Nigerian-2 1.9 1.7 2.2 E1 E2 58.9 52.1 66.8 E1b1a E1b1b 47.4 41.9 53.7 E1b1b1a E1b1b1b 39.6 35.0 44.9 E1b1b1a1a E1b1b1a1b 14.8 13.1 16.8 C1 C2 53.3 47.1 60.5 G HIJK 56.9 50.3 64.6 H IJK 56.4 49.9 64.0 IJ K 55.2 48.8 62.6 I J 49.9 44.1 56.6 I1 I2 32.9 29.0 37.3 J1 J2 37.1 32.8 42.1 J2a J2b 32.6 28.8 37.0 LT MNOPS 53.4 47.2 60.6 L T 51.0 45.0 57.8 NO MPS 52.8 46.7 59.9 MS P 52.8 46.7 59.9 M S 52.7 46.6 59.8 N O 46.7 41.2 52.9 O1 O2 38.5 34.0 43.7 Q R 37.7 33.3 42.8 Q1a Q1b 35.4 31.2 40.1 R1 R2 32.4 28.7 36.8 R1a R1b 25.6 22.6 29.0 R1b1a2a2 R1b1a2a1 7.8 6.9 8.9 R1a1a1b1 R1a1a1b2 6.2 5.5 7.1 †Assigned using ISOGG v11.01 ‡ Based on 95% HPD Interval of the mutation rate estimate

22