1

2 Supplementary Information

3

4 Human paternal and maternal demographic histories:

5 insights from high-resolution Y chromosome and mtDNA sequences

6

7

8 Sebastian Lippold, Hongyang Xu, Albert Ko, Anne Butthof, Mingkun Li,

9 Gabriel Renaud, Roland Schröder, Mark Stoneking

10

11

12 This supplement contains the complete Methods, Supplementary Figures 1-19, Supplementary

13 Tables 3-10 (Supplementary Tables 1 and 2 are provided as individual Excel files), and associated

14 References.

15

16

1

1 Methods

2

3 Samples and sequencing library preparation

4 The samples consist of 623 males (Supplementary Table 2) from the CEPH Human Genome

5 Diversity Panel (HGDP)1. The samples were taken from the subset “H952”, which excludes atypical,

6 duplicated and closely related samples2. Approximately 200 ng of genomic DNA from each sample

7 was sheared by sonication using a Bioruptor system (Diogenode) and used to construct an Illumina

8 Sequencing library with a specific double-index as described previously3. The libraries were then

9 enriched separately for NRY and mtDNA sequences as described below.

10

11 Y-chromosome capture array design

12 Our strategy in designing a capture array to enrich sequencing libraries for NRY sequences was

13 to target unique regions that are free of repeats and to which the typically short next-generation

14 sequencing reads could be mapped with high confidence. We used the UCSC table browser4 and the

15 February 2009 (GRCh37/hg19) assembly and applied the following filter criteria. First, from the

16 group “variation and repeats”, sequence regions annotated in the following tracks were removed:

17 Interrupted Repeats, RepeatMasker, Simple Repeats, and Segmental Duplications. Next, we used the

18 “mapability” table “CRG Align 75” from the group “mapping and sequencing tracks” to identify and

19 remove regions with mapability scores below 1. We then removed regions of less than 500 bp in

20 order to reduce the number of fragments and thereby the number of fragment ends, which have low

21 probe densities. We also removed 15mers that occurred more than 100 times in the hg19 genome

22 assembly, as described previously5, which resulted in splitting some target regions into sub-regions

23 that were less than 500 bp. The final result was a total of ~500 kb of unique NRY sequence,

24 distributed among 655 target regions ranging from 61 bp –3.9 kb (Supplementary Table 1). These

2

1 regions were then used to design a custom array (SureSelect 1M capture array, Agilent) with 60nt

2 probes that were printed twice with a tiling density of 1 bp.

3

4 NRY enrichment

5 Up to 60 barcoded libraries were pooled in equimolar ratio. The library mix was enriched for

6 target NRY regions by hybridization-capture on the custom designed array following the protocol

7 described previously6. After enrichment the library-pool was quantified by qPCR and then amplified

8 to a total of ~1012 molecules. The final concentration and length distribution was measured on an

9 Agilent DNA 100 microchip, and 10 nmol of the amplified library pool was used for sequencing.

10 Each pool, consisting of 48-60 samples, was sequenced on a Solexa GAII lane using a paired end 75

11 cycle run plus two 7nt index reads.

12

13 MtDNA enrichment

14 Up to 94 libraries were pooled in equimolar ratio and the library pool was enriched for mtDNA

15 sequences by an in-solution based hybridization-capture method7. The hybridization eluate was

16 measured by qPCR and then amplified to produce a final concentration of 10 nmol. Up to 200

17 samples were sequenced on a Solexa GAII lane using a paired end 75 cycle run, plus two 7nt index

18 reads.

19

20 Data processing

21 In each Solexa GAII lane, 1% PhiX174 phage DNA was spiked in and used as a training set to

22 estimate base quality scores with the IBIS base-caller8. Reads with more than 5 bases having a

23 PHRED scaled quality score below Q15 were discarded, as were reads having a single base quality

3

1 in the index read (7nt) score below Q10. Reads with no mismatches to the expected double index

2 sequences were assigned to each individual sample library.

3 For the NRY-enriched data, reads were mapped to the human reference genome (GRCh37) using

4 default settings with BWA v0.5.109. We mapped to the whole genome rather than just the target

5 region, in order to identify reads that might, with equal probability, map to another position in the

6 genome. The bam files containing the mapping information and reads were processed with samtools

7 v0.1.1810. We used Picard 1.42 (http://picard.sourceforge.net) to mark duplicates, based on the start

8 and end coordinates of the read pairs. The final SNP call was done on all samples simultaneously

9 using the UnifiedGenotyper from the GATK v2.0-35 package11 and the following options: --

10 output_mode EMIT_ALL_CONFIDENT_SITES, --genotype_likelihoods_model SNP, --

11 min_base_quality_score 20 and --heterozygosity 0.0000000001. The result was stored in a VCF file

12 containing information for each callable site of the target region (Supplementary Table 1), and a

13 second VCF file was created that contained only the variable positions among the 623 samples. For

14 each sample at each variable position the PL scores, which are the normalized, PHRED-scaled

15 likelihoods for the three genotypes (0/0, 0/1, 1/1) were investigated. Positions that showed a

16 difference in the PL value of less than 30 between homozygote reference (0/0) and homozygote

17 alternative (1/1) were removed, as were heterozygote calls (0/1) with a difference greater than or

18 equal to 30 to the most likely homozygous genotype. Sites where more than two bases were called

19 (i.e., multi-allelic sites) were also removed.

20 For the mtDNA-enriched data, reads were mapped to the revised mtDNA reference sequence

21 (GenBank number: NC_012920) using the software MIA12.The consensus sequences were aligned

22 using MUSCLE v3.8.3113 (cmd line: muscle -maxiters 1 –diags mt_623seq.fasta mt_623seq.aln).

23

24

25

4

1 Imputation for the NRY

2 After quality filtering, there were 2847 variable sites in the 623 individuals, with a total of 2.54%

3 of the individual genotypes at variable positions scored as “N” (i.e., as missing data; the number of

4 missing sites per individual ranged from 9-1173, with an average of 122 missing sites per individual).

5 Since missing data can influence the results of some analyses, we took advantage of the fact that the

6 NRY target regions are completely linked with no recombination to impute missing data as follows.

7 First, all sites with no missing data (605 sites) were used as the reference set to define haplotypes and

8 calculate the number of differences between each haplotype. Sites with missing data were then

9 imputed, beginning with the site with the smallest amount of missing data and proceeding

10 sequentially. For each haplotype with missing data for that site, the missing base was imputed as the

11 allele present in the reference haplotype that had the fewest differences (based on the sites with no

12 missing data). After imputation was finished for that site, it was added to the reference set, and the

13 procedure continued for the next site with the smallest amount of missing data.

14 As a check on the accuracy of the imputation, we randomly deleted 2.54% of the known alleles,

15 following the distribution of missing alleles in the full dataset, thereby creating an artificial dataset

16 with a similar distribution of missing alleles as in the observed dataset. We then imputed the missing

17 data according to the above procedure and compared the imputed alleles to the true alleles; this

18 procedure was carried out 1000 times. The imputed allele matched the true allele in 99.1% of the

19 comparisons, indicating that the imputation procedure is quite accurate.

20

21 Recurrent NRY mutations

22 We expect the majority of the NRY SNPs to have mutated only once, as recurrent mutations in

23 the known NRY phylogeny are quite rare14,15. Therefore, as a further quality control measure, we

24 investigated the NRY data for recurrent mutations by constructing a maximum parsimony tree for

25 the 2847 SNPs using programs in PHYLIP (http://evolution.genetics.washington.edu/phylip.html).

5

1 We then estimated the number of mutations at each SNP, and removed 48 SNPs (1.7%) that mutated

2 more than twice as these are likely to reflect sequencing errors. The final dataset contains 2799 SNPs,

3 of which 2228 are from the target resequencing regions, and the remaining 571 are from the

4 ascertained SNPs from dbSNP.

5

6 Data Analysis

7 Basic summary statistics (haplotype diversity, mean number of pairwise differences, nucleotide

8 diversity, Tajima’s D value and theta(S)) were calculated using Arlequin v3.5.1.316. Arlequin was

9 further used to estimate pairwise ΦST values and for Analysis of Molecular Variance (AMOVA). The

10 pairwise ΦST matrices were used in non-metric Multidimensional Scaling (MDS) analyses, which

11 were carried out in R using the ‘isoMDS’ command from the ‘MASS’ package17. The observed ratio

12 of the mean pairwise differences (mpd) for the NRY vs. mtDNA was calculated as mpdNRY/mpdmt. In

13 order to detect group-specific deviations from the mean distribution of the mpd ratio in the dataset,

14 we carried out a resampling approach. For each group sample size (Ngroup) we chose randomly Ngroup

15 individuals (out of 623) and calculated the mpd ratio using the dist.dna command from the APE

18 16 package in R. This was repeated 10,000 times for each Ngroup sample size to obtain the distribution

17 of resampled mpd ratios.

18 Divergence times in the NRY and mtDNA phylogenies were estimated using a Bayesian

19 approach implemented in BEAST v1.6.219. For the mtDNA genome sequences we divided the

20 alignment into two partitions consisting of the coding and noncoding regions respectively. For both

21 partitions we estimated the best fitting substitution model using jModeltest20 and the mutation rates

22 estimated previously21. For the noncoding region we used the GTR+I+G substitution model and a

23 mutation rate of 9.883 × 10−8 substitutions/site/year, while for the coding region we used the

24 TrN+I+G model and a mutation rate of 1.708 × 10−8 substitutions/site/year. A strict clock and a

25 constant size coalescence model were used, and the MCMC was run for 10 million steps with

6

1 sampling from the posterior every 2000 steps. The MCMC was run on five independent chains in

2 parallel. After careful inspection of the log files in tracer, the tree files of the five runs were merged

3 after discarding the first 2500 trees (50%) of each run as burn-in. Based on the merged tree file a

4 consensus tree was built using Treeannotator. The consensus tree showing the divergence times for

5 each node was visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree).

6 For the NRY sequences the same procedure was used, but modified as only variable sites were

7 included in the BEAST analysis in order to reduce the computational time. The substitution model

8 used was HKY without I+G, and the substitution rate was multiplied by the number of callable sites

9 (501,108 sites) divided by the number of variable sites (2228 sites). As there is uncertainty regarding

10 the mutation rate, we ran the analysis twice, with a “fast” rate22 of 1.00 × 10−9 substitutions/site/year

11 (transformed to 2.25 × 10-7) and with a “slow” rate23 of 6.17 × 10-10 substitutions/site/year

12 (transformed to 1.39 × 10-7).

13 Bayesian skyline plots24 were used to estimate population size change through time, using the

14 same mutation rates and substitution models described above. The piecewise-linear Skyline

15 coalescence model was chosen and the number of groups (bins) was set to half the sample size per

16 group with a maximum of 20. A single MCMC chain was ran for 30 Million steps and sampled every

17 3000 steps from the posterior. The log file was inspected in tracer for convergence of the chain and

18 ESS values and the Bayesian Skyline Reconstruction was run.

19

20 Simulations

21 We used a simulation-based approach to estimate current and ancestral effective population sizes

22 for each regional grouping of populations. We started with the model of population history shown in

23 Supplementary Figure 15, along with associated priors on divergence times (Supplementary

24 Table 5). The model is based on 6 geographic regions, 44 populations, and 511 individuals; we

25 excluded the Adygei, Uygur, Hazara, and all of the ME/NA populations as these exhibit high levels

7

1 of admixture in genome-wide analyses25,26. We then simulated the combined mtDNA and NRY

2 sequences in fastsimcoal27 and used approximate Bayesian computation28 to estimate divergence

3 times based on the combined dataset. We simulated 5,808,805 observations, which were log

4 transformed via ABC linear regression28 using the following statistics: polymorphic sites (S),

5 pairwise differences (Pi), Tajima’s D, pairwise ΦST, and the variance components for an AMOVA

6 based on two groups, Africa versus non-Africa (the latter consisting of the pooled data from the 5

7 non-African regional groups). Supplementary Table 5 also provides measures of the reliability of

8 the parameter estimation based on the pseudo-observed values: average R2=0.9, which exceeds the

9 suggested threshold29 of 10%; the average coverage is 89% and factor 2 (proportion of estimated

10 values for the statistics that are within 50-200% of the true value) is 90%; the average bias is 2%

11 and relative mean square error (RMSE) is 9%. As these measures indicate satisfactory performance

12 of the simulation29, we retained the top 1000 simulations (tolerance of 0.02%) for estimating the

13 divergence times. In addition, the posterior distributions show a markedly improved fit to the

14 summary statistics, compared to the prior distributions (Supplementary Table 6, Supplementary

15 Figure 17).

16 We then used this history (Supplementary Figure 14) and the mean divergence times based on

17 the combined data (Supplementary Table 5) in a further set of simulations to estimate separately for

18 the mtDNA and NRY sequences the ancestral and current effective populations sizes for each

19 regional group of populations. We simulated 5,116,984 observations for the mtDNA genome

20 sequences and 5,325,179 observations for the NRY sequences, and retained the top 1000 simulations

21 (tolerance of 0.03%) in each case for parameter estimation. Although the reliability measures

22 indicate greater variance in the simulation results (Supplementary Tables 7 and 9), the posterior

23 distributions still show a markedly improved fit to the summary statistics (Supplementary Tables 8

24 and 10, Supplementary Figures 17 and 18), indicating satisfactory performance of the simulations.

8

1 The distribution of the estimated current and ancestral Nf and Nm are shown for each regional group

2 in Supplementary Figure 16.

9

Supplementary Figure 1. Average coverage for the NRY (top) and mtDNA (bottom) sequences.

NRY

mtDNA

10

Supplementary Figure 2. Comparison of (a) nucleotide diversity and (b) mean number of pairwise differences, based on mtDNA genome sequences from the 623 males vs. all 952 individuals from the HGDP.

a 0.008 623 mtDNA sequences 0.007 952 mtDNA sequences 0.006 0.005 diversity 0.004 0.003 0.002

nucleotide 0.001 0.000

b

120 623 mtDNA sequences 100 952 mtDNA sequences

differences 80

60 pairwise 40 of

20 number

0 mean

11

Supplementary Figure 3. Bayesian trees for: (a) mtDNA sequences; (b) NRY sequences, with the fast mutation rate; and (c) NRY sequences, with the slow mutation rate.

a

12

b

c

13

Supplementary Figure 4. Bayesian tree for sequences from NRY A and B. In this and subsequent trees, all dates are based on the “fast” rate. The only sublineage information is for B2b4, which is monophyletic but does contain one haplogroup B sequence that was not genotyped as haplogroup B2b4.

14

Supplementary Figure 5. Bayesian tree for NRY haplogroup C sequences. The estimated age for this haplogroup is about 45 ky, in good agreement with previous estimates15,30,31. Although several sublineages of haplogroup C are known14, and there are several clades evident in the tree, the HGDP samples have not been typed for haplogroup C sublineages. The arrow indicates a clade of 6 sequences from the Hazara that dates to about 2 kya. This clade is nested within a clade of sequences from populations from inner , suggesting a migration from this region to the Hazara at around this time.

15

Supplementary Figure 6. Bayesian tree for NRY haplogroup D sequences. The estimated age of this haplogroup is about 34 ky, more recent than previous estimates15,32, which could reflect incomplete sampling of haplogroup D diversity in the HGDP populations. The three major sublineages of haplogroup D14 are all resolved in the tree; moreover, the sequences suggest that D1 and D3 are more closely related, whereas the current NRY tree has D1, D2, and D3 as a trichotomy14. The Japanese-specific clade of D2 sequences has an estimated age of about 17 ky, in good agreement with previous dates for the expansion of haplogroup D in Japan33.

16

Supplementary Figure 7. Bayesian tree and time scale (based on the fast rate) for NRY haplogroup E sequences. Tips are labeled with the HGDP ID number followed by haplogroup as determined previously by SNP typing34,35. There are four major clades in the tree, corresponding to haplogroups E2, E1a, E1b1a and sublineages, and E1b1b1 and sublineages. Some sublineages are monophyletic (e.g., E1b1b1b) while others are not (e.g., E1b1a7a). The red arrow points to a clade that contains most of the non-African sequences, while a blue arrow indicates the subclade that contains all of the European sequences.

17

Supplementary Figure 8. Bayesian tree for sequences from NRY haplogroups F and G. Haplogroup F in the HGDP is found only in the Lahu population, as haplogroup F2. Haplogroup G has an estimated age of about 28 ky in our data, older than previous estimates of 9.5-17 ky based on STR data36,37, but more in line with recent sequence-based studies38,39. The origin of haplogroup G has been placed in the Caucasus or the Middle East14, whereas the deepest divergences in our haplogroup G tree are in . There is a clade of 11 haplogroup G sequences that is specific to Palestinians (arrow) and has an age of about 2 ky, but diverged from other haplogroup G sequences (mostly from Europe, along with one native American Pima sequence that probably reflects recent European admixture) about 10 kya. The deep divergence and recent age for this clade suggests a possible bottleneck or founder event in the history of Palestinians.

18

Supplementary Figure 9. Bayesian tree for sequences from NRY haplogroups H, K, M, and L. Also included is one haplogroup F sequence that was placed with haplogroup H sequences rather than other haplogroup F sequences, but the support for this grouping is very low. Haplogroups H, H1, and L are monophyletic, but some sequences from haplogroup K form a monophyletic clade (red arrow) while others are interspersed among haplogroup M1 sequences (blue arrow). Haplogroup H has an estimated age of about 31 ky in our data, while haplogroup H1 has an age of 12.5 ky, in good agreement with previous estimates40. Our estimated age for haplogroup L is about 15 ky, slightly older than a previous estimate of 10 ky based on STR variation40. Haplogroups H and L are found mostly in central and east Asians in the HGDP. Haplogroups K and M1 are found exclusively in Oceania (with the exception of one Xibo with a haplogroup K sequence), and the clade of exclusively haplogroup K sequences has an estimated age of about 35 ky, in good agreement with previous estimates for Oceanic haplogroup K sequences14,31. The clade with both M1 and K sequences has a much younger age, only about 11 ky, suggesting a different history for these sequences than for the more divergent clade of exclusively haplogroup K sequences.

19

Supplementary Figure 10. Bayesian tree for sequences from NRY haplogroups I and J. Sequences from both haplogroups form monophyletic clades, as does subhaplogroup J2; haplogroup I has an estimated age of about 17.5 ky, while haplogroup J has an estimated age of about 23 ky. Haplogroup I is exclusively European in the HGDP, with a clade of 10 Sardinian sequences (red arrow) with an estimated age of nearly 5 ky. Haplogroup J HGDP sequences come from Asia, the Middle East, and Europe; there is a clade in this haplogroup of 14 Bedouin haplogroup J sequences (blue arrow) with an age of about 4 ky.

20

Supplementary Figure 11. Bayesian tree for sequences from NRY haplogroups NO, N and O. Haplogroup NO is considered the ancestor of haplogroups N and O14,39, although in the HGDP tree haplogroup NO sequences are related to haplogroup N sequences (represented in the HGDP by only haplogroup N1c). Haplogroup NO sequences in the HGDP are all from . Haplogroup N1c has an estimated age of just over 10 ky in our data, in good agreement with previous estimates41. There is a clade of 16 Yakut N1c sequences with an age of nearly 6 ky (red arrow). High frequencies of haplogroup N1c sequences have been reported previously in , but the estimated age in the present study is much older than previous estimates of less than 1 ky for N1c in Yakuts42; more sampling of Siberian groups are needed to determine if the clade of N1c sequences in our study are really specific to Yakuts. Haplogroup O has an estimated age of 27.5 ky based on the HGDP sequences, in good agreement with previous estimates31. The major subhaplogroups of haplogroup O in the HGDP data include O3, with an estimated age of 21 ky, and haplogroup O3a3c, with an estimated age of 13 ky; haplogroup O sequences are almost exclusively from east Asia.

21

Supplementary Figure 12. Bayesian tree for sequences from NRY haplogroup Q. Haplogroup Q has an estimated age of 28 ky based on the HGDP sequences, older than previous estimates43. There are three Central Asian haplogroup Q* sequences in the HGDP, while within haplogroup Q1a there is a clade (arrow) that contains almost exclusively sequences from the Americas, along with one Mongolian sequence. The age of this clade is about 12.5 ky, in good agreement with current evidence for the initial colonization of the Americas44.

22

Supplementary Figure 13. Bayesian tree for sequences from NRY haplogroup R. This haplogroup has an estimated age of 21 ky based on the HGDP sequences, slightly younger than previous estimates of about 27 ky14. Whereas in the SNP-based tree of NRY haplogroups R* is ancestral to other R haplogroups, in the HGDP tree the R* sequences diverged from R2 sequences about 11 kya, while R1a1 sequences diverged from R1b1, R1b1b1, and R1b1b2 sequences about 18 kya. R* and R2 sequences are almost exclusively from central Asia, while R1a1 and R1b1 sequences are more widespread and include central Asia, the Middle East, and Europe (mostly Adygei and Russians from eastern Europe). The R1b1b1 sequences are only from the Hazara, and diverged from R1b1b2 sequences (which are almost exclusively European) about 11 kya. The diversity within R1b1b2 dates to about 8 kya, so the age and geographic distribution of subhaplogroup R1b1b2 suggest a possible spread to Europe during the Neolithic38.

23

Supplementary Figure 14. Bayesian tree of mtDNA sequences. As the human mtDNA phylogeny is much better characterized than the NRY phylogeny45, we do not comment on the details.

24

Ban1030_L0d San1032_L0d San0987_L0d San1029_L0d Ban1034_L0d San0991_L0d San1036_L0d Ban1033_L0d Mbu0467_L0a Mbu0456_L0a Mbu0449_L0a Mbu0982_L0a Mbu0984_L0a Bia0457_L0a Bia0465_L0a Bia0453_L0a Ban0994_L0a Sin0177_L0a Ban1406_L0a Bed0631_L0a Ban1418_L0a Ban1419_L0a Bed0640_L0a Bed0621_L0a Mak0148_L0a San0992_L0k Yor0941_L1b Yor0927_L1b Mak0130_L1b Bed0627_L1b Man0913_L1b Man0907_L1b Man0911_L1b Man0906_L1b Man1286_L1b Man1202_L1b Man0905_L1b Yor0940_L1c Sin0175_L1c Sin0173_L1c Ban1411_L1c Bia0470_L1c Bia1094_L1c Bia0461_L1c Bia0475_L1c Bia0986_L1c Bia0472_L1c Bia0985_L1c Bia0466_L1c Bia0473_L1c Bia1086_L1c Bia0458_L1c Bia0452_L1c Bia0464_L1c Bia0455_L1c Bia1090_L1c Bia0454_L1c Bia0479_L1c Bia0460_L1c Bia0459_L1c Bia0469_L1c Mbu0462_L5a Ban1415_L5a Bed0626_L2b Man1200_L2c Man1284_L2c Man1199_L2c Mbu0478_L2a Mbu0450_L2a Mbu1081_L2a Mbu0463_L2a Mbu0474_L2a Bed0618_L2a Man1285_L2a Man0908_L2a Man1283_L2a Yor0942_L2a Yor0936_L2a Yor0929_L2a Mak0160_L2a Yor0931_L2a Ban1408_L2a Mak0134_L2a Moz1261_L2a Pal0732_L2a Mak0149_L2a Bed0609_L2a Yor0943_L2a Ban0993_L2a Mak0140_L2a Ban1412_L3b Ban1405_L3b Man0912_L3b Yor0932_L3d Man0904_L3d Mak0139_L3d Ban1035_L3d Ban1031_L3d Ban1028_L3d Yor0937_L3e Yor0930_L3e Ban1417_L3e Moz1259_L3e Moz1282_L3e Yor0944_L3e Pal0727_L3e Ban1416_L3h Jap0763_D5a Yak0952_D5a Yak0946_D5a Bra0017_D5a Bur0364_D5a Bur0388_D5a Mia1192_D5a Yak0960_D5b Han0977_D5b Sur0843_D1a Sur0849_D1a Sur0837_D1a Sur0845_D1a Kar1009_D1f Kar0998_D1 Kar1019_D1 Kar1012_D1e Kar1015_D1e Kar1013_D1e Jap0755_D4g Han0782_D4g Dau1221_D4 Tu1349_D4 Lah1320_D4j Lah1322_D4j Pat0254_D4j Yak0969_D4j Mia1191_D4i Mia1190_D4i Dai1308_D4 Yak0954_D4m Yak0968_D4o Hez1239_D4o Han1296_D4o Jap0758_D4i Jap0764_D4i Jap0753_D4a Mon1228_D4c Dau1220_D4c Tu1347_D4h Jap0828_D4b Oro1206_D3 Oro1203_D3 Jap0750_D4b Tu1353_D4b Tuj1102_D4b Han0779_D4b Xib1244_D4b Sin0167_M2a She1333_M7c She1327_M7c Han0973_M7c Han0774_M7c Han0777_M7b Han0780_M7b Han0785_M7b Han0819_M7b Tuj1103_M7b Mia1193_M7b Han1290_M7b Jap0757_M7b Han1294_M7b Jap0769_M7a Jap0751_M7a Haz0105_M9a Tu1351_M9a Dau1218_M9a Cam0716_M17 Pat0259_M5a Bur0376_M5a Pat0262_M5a Bra0015_M5a Bra0021_M5a Mak0135_M5a Bra0005_M5a Bal0090_M5a Bra0011_M5a Mak0144_M5a Bra0043_M5a Bal0088_M5a Sin0179_M5a Bal0098_M5a Bur0346_M3a Pat0248_M3a Bal0092_M3a Bal0096_M3a Bra0041_M3a Bra0047_M3a Haz0108_M3a Haz0125_M3a Mak0136_M3a Bal0058_M3a Sin0205_M3a Sin0187_M3a Bra0001_M3c Cam0711_M24 Haz0104_M4 Haz0118_M4 Uyg1297_M18 Sin0183_M18 Bal0070_M65 Sin0201_M65 Bal0064_M65 Sin0181_M30 Sin0165_M30 Pal0726_M30 Mak0146_M30 Sin0208_M30 Yi1181_M11 Yi1180_M11 Oro1207_M11 Cam0720_M51 Cam0714_M51 Tu1348_M12 Bur0428_C4 Yak0949_C4 Yak0953_C4 Hez1240_C4b Yak0965_C4b Yak0962_C4a Ady1397_C4a Yak0958_C4a Tuj1104_C4a Haz0115_C4a Mon1225_C4a Yak0961_C5a Yak0947_C5a Yak0950_C5a Xib1248_C5a Xib1250_C5d Xib1249_C5d Hez1241_C5d Dai1309_C7a Dai1311_C7a Dai1313_C7a Han1289_C7a Nax1340_C7b Col0703_C1d Dau1217_C1a Pim1059_C1b Pim1043_C1b Pim1047_C1b Pim1060_C1b Pim1050_C1b Pim1055_C1c Pim1057_C1c May1037_C1c Haz0103_Z3 Xib1245_Z3a Yak0945_Z3a Haz0120_Z3 Mon1230_Z3 Pap0556_Q1 Pap0555_Q1 Mel0491_Q1c Pap0553_Q1a Pap0549_Q1a Pap0545_Q1a Pap0542_Q3a Pap0541_Q3a Pap0547_Q3a Pap0540_Q3a Yak0948_G1b Haz0122_G2a Dau1216_G2a Dau1213_G2a Jap0790_G2a Lah1317_M13 Bed0611_M1a Dai1310_R9b Han0775_R9b Yi1185_F3b She1329_F4a Haz0106_F2 Yak0951_F2 Han0971_F2 Han1288_F2 Yi1183_F2a Jap0747_F1d Oro1204_F1b Uyg1298_F1b Uyg1300_F1b Jap0752_F1b Jap0759_F1b Yi1186_F1b Yi1179_F1 Haz0121_F1 Yi1182_F1c Oro1208_F1c Lah1321_F1a Cam0717_F1a She1330_F1a Tu1350_F1a Lah1326_F1a Lah1319_F1a Han1292_F1a Tuj1099_F1a Jap0766_F1a Xib1243_F1a Dai1312_F1a Ady1383_R1 Bur0402_R5a Dru1598_R5a Pap0551_P1 Pap0546_P1d Pap0548_P1d Pap0543_P2 Tuj1095_B4a Nax1344_B4a Tuj1101_B4a Mel0662_B4a Mel0655_B4a Mel0788_B4a Dai1307_B4a Mon1224_B4a Nax1339_B4a Nax1342_B4a Tuj1097_B4h Han1295_B4 Uyg1301_B4 Mia1195_B4 Haz0110_B4c Tu1352_B4c Col0710_B2 She1328_B4b Yak0964_B4b Han0815_B4b Jap0767_B5b Han1293_B5b She1332_B5a Mia1194_B5a Han0822_B5a Han0821_B5a Cam0715_B5a Yi1187_B5a Han0786_R11 Tuj1100_R11 Xib1246_R11 Bed0620_R0a Dru1600_R0a Bed0628_R0a Bed0616_R0a Pat0230_R0a Pal0678_R0a Kal0281_R0a Kal0319_R0a Kal0285_R0a Kal0333_R0a Bal0082_HV2 Pal0734_HV1 Bed0625_HV1 Pal0729_HV1 Pat0218_HV Bal0080_HV Haz0119_HV Bed0641_HV Tus1161_HV7 Mak0133_HV Mak0150_HV Haz0124_HV Bed0654_HV Ita1152_HV0 Ita1174_V3 Moz1262_V Moz1256_V Bas1374_V1 Fre0519_V1 Bas1372_V Bas1377_HV0 Pal0677_HV0 Bur0412_HV Ita1173_H Mak0131_H Orc0798_H23 Moz1263_H Pal0733_H Ita1155_H7 Dru1602_H7b Haz0109_H7b Moz1255_H Bas1371_H1t Bas1376_H1t Bas1358_H1t Sar1071_H1 Bed0629_H1 Moz1253_H1v Moz1258_H1v Moz1264_H1v Dru1576_H1u Fre0515_H1u Fre0530_H1 Mak0158_H1 Bra0027_H1 Sar0668_H1 Ady1403_H1c Dru1562_H1b Dru0594_H1b Rus0891_H1 Rus0887_H1 Sar1077_H1 Tus1162_H1 Ita1151_H1 Bas1362_H1e Bas1364_H1e Rus0879_H3h Fre0512_H3 Sar0665_H3f Ita1153_H3 Bas1360_H3c Bas1370_H3g Dru1604_H26 Rus0883_H5 Orc0810_H5b Orc0808_H5b Moz1265_H5a Haz0116_H5a Rus0890_H5a Bas1375_H5 Bra0033_H4 Orc0795_H4a Dru1595_H4b Rus0892_H11 Fre0522_H11 Bed0622_H Mak0145_H14 Uyg1304_H8 Uyg1299_H Dru0597_H33 Dru1588_H33 Bra0045_H6a Bra0037_H Bra0013_H Mak0137_H Bur0397_H Tus1164_H13 Pal0675_H13 Bal0066_H13 Bal0054_H13 Bal0086_H13 Bra0025_H13 Tus1166_H13 Bed0645_H2a Ady1404_H2a Sin0163_H2a Rus0900_H2a Bra0007_H2b Pat0224_H2b Rus0894_H2b Bal0078_J1b Bra0023_J1b Bal0052_J1b Bur0341_J1b Pat0264_J1b Sin0171_J1b Ady1396_J1c Sar0666_J1c Sar1069_J1c Sar1073_J1c Oro1205_J1c Bed0624_J1c Pat0216_J1d Sar0670_J2a Sar0674_J2a Bed0623_J2a Moz1266_J2a Moz1271_J2a Bal0076_J2a Kal0288_J2b Kal0267_J2b Kal0330_J2b Pat0251_J2b Sar1076_J2b Bed0639_T1a Bas1378_T1a Pat0222_T1a Bra0031_T1a Bra0029_T1a Pat0234_T1a Fre0525_T1a Rus0893_T1a Fre0521_T1a Kal0307_T2a Bur0445_T2c Bra0039_T2 Sar0671_T2b Sar1063_T2b Rus0880_T2b Rus0895_T2b Bur0359_T2b Pal0724_T2b Orc0804_T2b Sin0169_T2b Bal0057_R2 Bra0003_R2 Bal0072_R2 Bal0094_R2 Mak0141_R2 Uyg1303_R2 Fre0511_U5b Sar1075_U5b Sar1067_U5b Sar1079_U5b Fre0533_U5b Bas1359_U5b Rus0882_U5a Pal0676_U5a Haz0100_U5a Haz0112_U5a Fre0528_U5a Uyg1302_U5a Bas1357_U5a Fre0518_U5a Moz1268_U6a Moz1272_U6a Moz1257_U6a Pal0725_U6a Moz1260_U6a Pat0214_U9b Haz0102_U4b Kal0290_U4b Bur0351_U4b Rus0897_U4d Kal0309_U4a Kal0302_U4a Kal0277_U4a Kal0311_U4a Kal0313_U4a Kal0279_U4a Bal0056_U4a Haz0099_U4a Rus0888_U4a Xib1247_U4a Bas1361_K1a Tus1163_K1a Bed0644_K1a Bed0619_K1a Dru1599_K1a Fre0538_K1a Bas1379_K1a Ady1385_K1a Sin0199_K2a Pat0213_K2a Moz1279_U8b Pat0241_U7 Bra0019_U7 Haz0129_U7 Bed0610_U7 Bra0035_U7 Bra0009_U7 Moz1278_U3a Moz1269_U3a Kal0326_U2e Kal0315_U2e Kal0328_U2e Tus1167_U2e Bal0062_U2b Bal0074_U2b Bra0049_U2 Pat0226_U2 Sin0189_U2a Pal0723_U1a Ady1402_U1a Dru1580_U1a Haz0127_U1a Bur0392_U1a Pal0731_U1a Ita1149_U1a Hez1236_Y1a Hez1233_Y1a Dau1214_Y1a Hez1237_Y1a Jap0768_N9a Jap0748_N9a Jap0791_N9a She1331_N9a Jap0762_N9b Nax1338_A11 Nax1341_A4 May0877_A2 May0856_A2 Yi1184_A4 Mon1226_A4a Lah1318_A4c Nax1337_A4c Mon1227_A4 Mon1229_A4 Han0778_A5b Mia1189_A5b Tuj1096_A5 Bal0068_N1d Rus0896_I2 Pal0730_I Bed0608_I5a Bur0407_I1b Bur0382_I4 Bur0438_N1e Bal0060_N1e Bed0630_N1a Bed0642_N1b Pal0722_N1b Sar1066_X2c Bed0648_X2e Bur0423_X2 Bur0433_X2 Mak0143_X2 Orc0803_W6 Pat0243_W3a Sin0185_W3a Pat0228_W3a Pat0258_W3a Bur0372_W3 Bur0417_W3 Mak0161_W Rus0886_W1 Sin0197_W4 Sin0191_W4

175000.0 150000.0 125000.0 100000.0 75000.0 50000.0 25000.0 0.0 Supplementary Figure 15. The model of population history used in simulations. We assumed a single out-of-Africa migration46,47, which begins with the ancestral population in Africa (at time T1), a single out-of-Africa migration (T2), the first split is between Oceania and (T3), then Europe and Asia (T4), followed by Central and East Asia (T5), and finally between East Asia and the Americas (T6). We also required T2 to be greater than T3. The model assumes no migration between regions following divergence; in support of this assumption, there is very little sequence sharing between regions. We do allow changes in population size. This model was first used to estimate divergence times (with priors as specified in Supplementary Table 5) with combined mtDNA and NRY sequences, then the model and estimated mean divergence times (Supplementary Table 5) were used in separate simulations of the mtDNA and NRY sequences to estimate ancestral and current Nf and Nm.

T1

T2 T3 T4 T5 T6

Central East Africa Oceania Europe Asia Asia Americas

25

Supplementary Figure 16. The distribution of Nf and Nm values, according to the density of the top 1% of the values from simulations of the mtDNA and NRY sequences. (a) ancestral effective population sizes; (b) current effective population sizes; (c) box plots of the ratio of Nf to Nm. The dashed line in each plot follows a 1:1 ratio.

a Ancestral effective population sizes

95% Posterior

75%

50% density

25%

0%

b Current effective population sizes

26

c box plots of Nf/Nm

Ancestral

Present-day log (ratio) log 10 1 2 3 − 2

− a a a e a s a e a s i ia i c a i p a i a ic p s a i c n i s r n o s c r i o s ic a r i f r a r A f A A r f e A r e u l A c u t e A c st A E l s E a a a m O r a me f O r t E o t E A n A t n e u C Ce O

27

Supplementary Figure 17. Prior (blue) and posterior (red) distributions of summary statistics used in the simulations to estimate population divergence times (and thus based on combined mtDNA and NRY sequences), along with the observed value (vertical black line). Regional groups are numbered as follows: 1-Africa; 2-Oceania; 3-Europe; 4-Central Asia; 5-East Asia; 6-Americas. S is the number of polymorphic sites, D is Tajima’s D value; Pi is the mean number of pairwise differences, FST is the ΦST value between each pair of groups, and Va, Vb, and Vc are the among group, among populations within groups, and within population variance components of an AMOVA with two groups (African populations and non-African populations).

28

Supplementary Figure 18. Prior and posterior distributions and observed values for the summary statistics used in the simulations of mtDNA sequences to estimate ancestral and current Nf for each regional group. See the legend to Supplementary Figure 17 for further explanation concerning the plots.

29

Supplementary Figure 19. Prior and posterior distributions and observed values for the summary statistics used in the simulations of NRY sequences to estimate ancestral and current Nm for each regional group. See the legend to Supplementary Figure 17 for further explanation concerning the plots.

30

Supplementary Table 3. Summary statistics for individual HGDP populations. (a) NRY sequences; (b) mtDNA sequences. n, sample size; h, number of different haplotypes (sequences); s, number of polymorphic sites; HD, haplotype diversity; mdp, mean number of pairwise differences; π, nucleotide diversity, SE, standard error of the respective statistic.

a) NRY sequences

Population n h s HD HD SE mpd mpd SE π π SE Adygei 7 6 81 0.952 0.096 33.8 16.8 0.0152 0.0087 Balochi 24 21 167 0.986 0.018 32.7 14.8 0.0147 0.0074 Bantu 18 16 198 0.987 0.023 34.8 15.9 0.0156 0.0080 Basque 16 14 98 0.975 0.035 15.3 7.2 0.0069 0.0036 Bedouin 27 19 172 0.917 0.047 28.3 12.8 0.0127 0.0064 BiakaPygmy 23 18 153 0.968 0.026 38.6 17.4 0.0173 0.0087 Brahui 25 19 211 0.967 0.024 35.4 16.0 0.0159 0.0080 Burusho 20 17 188 0.984 0.021 34.0 15.5 0.0153 0.0078 Cambodian 6 6 73 1.000 0.096 27.1 13.9 0.0121 0.0072 Colombian 2 2 13 1.000 0.500 13.0 9.5 0.0058 0.0061 Dai 7 7 107 1.000 0.076 37.2 18.5 0.0167 0.0095 Daur 7 7 124 1.000 0.076 45.6 22.6 0.0205 0.0116 Druze 12 10 129 0.970 0.044 38.7 18.1 0.0174 0.0092 French 12 12 133 1.000 0.034 34.6 16.2 0.0155 0.0082 Han 24 24 198 1.000 0.012 27.7 12.6 0.0124 0.0063 Hazara 22 13 147 0.892 0.049 36.4 16.5 0.0163 0.0083 Hezhen 6 6 99 1.000 0.096 40.9 20.8 0.0183 0.0108 Italian 7 7 62 1.000 0.076 22.1 11.1 0.0099 0.0057 Japanese 20 20 182 1.000 0.016 40.7 18.4 0.0183 0.0092 Kalash 18 8 115 0.869 0.049 33.9 15.5 0.0152 0.0078 Karitiana 6 5 14 0.933 0.122 6.9 3.8 0.0031 0.0020 Lahu 7 6 40 0.952 0.096 15.9 8.1 0.0071 0.0042 Makrani 20 19 146 0.995 0.018 29.9 13.6 0.0134 0.0068 Mandenka 15 15 97 1.000 0.024 19.2 9.0 0.0086 0.0045 Maya 2 2 12 1.000 0.500 12.0 8.8 0.0054 0.0056 MbutiPygmy 11 8 135 0.946 0.054 50.4 23.7 0.0226 0.0120 Melanesian 4 4 41 1.000 0.177 20.5 11.6 0.0092 0.0062 Miao 7 5 45 0.905 0.103 18.4 9.3 0.0083 0.0048 Mongola 7 7 117 1.000 0.076 38.5 19.1 0.0173 0.0098 Mozabite 20 11 76 0.763 0.103 16.0 7.5 0.0072 0.0037 Naxi 7 6 71 0.952 0.096 29.0 14.5 0.0130 0.0074 Orcadian 6 5 50 0.933 0.122 22.5 11.6 0.0101 0.0060 Oroqen 6 5 87 0.933 0.122 31.9 16.3 0.0143 0.0085 Palestinian 16 7 91 0.625 0.139 22.2 10.3 0.0100 0.0052 Papuan 13 12 120 0.987 0.035 37.3 17.3 0.0167 0.0088 Pathan 19 16 159 0.977 0.027 26.1 12.0 0.0117 0.0060 Pima 8 7 47 0.964 0.077 13.3 6.7 0.0059 0.0034 Russian 16 16 117 1.000 0.022 25.8 11.9 0.0116 0.0060 San 6 6 164 1.000 0.096 74.7 37.7 0.0335 0.0195 Sardinian 16 11 125 0.908 0.063 24.1 11.2 0.0108 0.0056 She 7 6 45 0.952 0.096 18.8 9.5 0.0084 0.0049 Sindhi 20 19 127 0.995 0.018 25.0 11.5 0.0112 0.0058 Surui 4 3 12 0.833 0.222 6.0 3.6 0.0027 0.0019 Tu 7 7 52 1.000 0.076 19.7 10.0 0.0088 0.0051 Tujia 9 9 126 1.000 0.052 31.2 15.1 0.0140 0.0077 Tuscan 6 6 123 1.000 0.096 45.6 23.1 0.0205 0.0120 Uygur 8 8 130 1.000 0.063 36.5 17.8 0.0164 0.0091 Xibo 8 8 119 1.000 0.063 41.7 20.3 0.0187 0.0104 Yakut 18 10 69 0.843 0.077 14.2 6.7 0.0064 0.0033 Yi 9 9 91 1.000 0.052 29.7 14.3 0.0133 0.0073 Yoruba 12 10 88 0.955 0.057 15.5 7.4 0.0069 0.0038

31

b) MtDNA sequences

Population n h s HD HD SE mpd mpd SE π π SE Adygei 7 7 121 1.000 0.076 38.5 19.1 0.0023 0.0013 Balochi 24 23 216 0.996 0.013 36.1 16.3 0.0022 0.0011 Bantu 18 16 330 0.980 0.028 75.8 34.3 0.0046 0.0023 Basque 16 16 126 1.000 0.022 24.0 11.2 0.0015 0.0008 Bedouin 27 27 425 1.000 0.010 57.8 25.7 0.0035 0.0017 BiakaPygmy 23 17 167 0.968 0.022 50.0 22.4 0.0030 0.0015 Brahui 25 22 197 0.980 0.022 33.6 15.2 0.0020 0.0010 Burusho 20 18 257 0.990 0.019 43.1 19.5 0.0026 0.0013 Cambodian 6 6 111 1.000 0.096 44.5 22.6 0.0027 0.0016 Colombian 2 2 61 1.000 0.500 61.0 43.5 0.0037 0.0037 Dai 7 7 105 1.000 0.076 40.4 20.1 0.0024 0.0014 Daur 7 7 98 1.000 0.076 31.6 15.8 0.0019 0.0011 Druze 12 11 118 0.985 0.040 24.6 11.6 0.0015 0.0008 French 12 12 123 1.000 0.034 30.3 14.3 0.0018 0.0010 Han 24 24 289 1.000 0.012 46.0 20.7 0.0028 0.0014 Hazara 22 22 265 1.000 0.014 39.4 17.8 0.0024 0.0012 Hezhen 6 6 75 1.000 0.096 33.1 16.9 0.0020 0.0012 Italian 7 7 61 1.000 0.076 20.1 10.1 0.0012 0.0007 Japanese 20 20 244 1.000 0.016 42.7 19.4 0.0026 0.0013 Kalash 18 15 125 0.974 0.029 35.6 16.3 0.0021 0.0011 Karitiana 6 5 30 0.933 0.122 12.9 6.8 0.0008 0.0005 Lahu 7 6 100 0.952 0.096 41.7 20.7 0.0025 0.0014 Makrani 20 20 279 1.000 0.016 46.9 21.2 0.0028 0.0014 Mandenka 15 12 171 0.971 0.033 56.7 26.0 0.0034 0.0018 Maya 2 2 4 1.000 0.500 4.0 3.2 0.0002 0.0003 MbutiPygmy 11 8 176 0.891 0.092 68.4 32.0 0.0041 0.0022 Melanesian 4 4 70 1.000 0.177 35.5 19.8 0.0021 0.0014 Miao 7 7 115 1.000 0.076 42.8 21.2 0.0026 0.0015 Mongola 7 7 122 1.000 0.076 43.3 21.5 0.0026 0.0015 Mozabite 20 17 174 0.979 0.025 32.4 14.7 0.0020 0.0010 Naxi 7 7 97 1.000 0.076 37.9 18.8 0.0023 0.0013 Orcadian 6 6 75 1.000 0.096 28.4 14.5 0.0017 0.0010 Oroqen 6 5 110 0.933 0.122 46.2 23.4 0.0028 0.0016 Palestinian 16 16 224 1.000 0.022 37.7 17.3 0.0023 0.0012 Papuan 13 12 141 0.987 0.035 37.0 17.2 0.0022 0.0012 Pathan 19 19 242 1.000 0.017 40.9 18.6 0.0025 0.0013 Pima 8 5 26 0.786 0.151 11.8 6.0 0.0007 0.0004 Russian 16 16 167 1.000 0.022 31.6 14.5 0.0019 0.0010 San 6 5 108 0.933 0.122 42.1 21.4 0.0025 0.0015 Sardinian 16 15 144 0.992 0.025 34.0 15.7 0.0021 0.0011 She 7 6 133 0.952 0.096 49.2 24.4 0.0030 0.0017 Sindhi 20 19 297 0.995 0.018 52.3 23.6 0.0032 0.0016 Surui 4 1 0 0.000 0.000 0.0 0.0 0.0000 0.0000 Tu 7 7 133 1.000 0.076 43.3 21.5 0.0026 0.0015 Tujia 9 9 167 1.000 0.052 47.7 22.9 0.0029 0.0016 Tuscan 6 6 75 1.000 0.096 28.0 14.3 0.0017 0.0010 Uygur 8 8 118 1.000 0.063 35.5 17.4 0.0021 0.0012 Xibo 8 8 135 1.000 0.063 43.9 21.3 0.0026 0.0015 Yakut 18 17 193 0.994 0.021 36.6 16.7 0.0022 0.0011 Yi 9 9 152 1.000 0.052 45.9 22.0 0.0028 0.0015 Yoruba 12 12 195 1.000 0.034 57.0 26.5 0.0034 0.0018

32

Supplementary Table 4. Frequencies of (a) NRY and (b) mtDNA haplogroups in the HGDP populations. These haplogroup frequencies provide further insights into the regional differences in mtDNA vs. NRY variation. For example, the comparatively low diversity and smaller differences among populations for the NRY in Africa is due to the high frequency of NRY haplogroup E (55- 100% in the non-Khoisan groups). This haplogroup is widespread in western Africa and is also associated with the Bantu expansion34,48. NRY haplogroup E is also of interest because it occurs in some European and ME/NA groups, at frequencies of up to 17%, as well as in two Balochi and one Makrani from Central Asia. Inspection of the phylogeny of haplogroup E sequences (Supplementary Figure 3) reveals that all of the European and most of the ME/NA haplogroup E sequences form a clade distinct from the African haplogroup E sequences, and the age of this clade is about 18 ky. Moreover, all of the European haplogroup E sequences fall into a subclade that is about 14 ky. These results may reflect a migration from North Africa to Europe suggested from analyses of genome- wide SNP data49, and would thus provide a timeframe for this migration. In Oceania, the bigger differences between populations for mtDNA than for the NRY (Figure 2) probably reflect the high frequency of mtDNA haplogroup B in just one of the two Oceania populations (75% in the Melanesian population, vs. 0% in the Papuan population). MtDNA haplogroup B is associated with the Austronesian expansion50,51; given the lack of NRY haplogroups associated with the Austronesian expansion (e.g., haplogroup O) in the HGDP Oceania populations, this result further testifies to the larger maternal than paternal impact of the Austronesian expansion on Oceanian populations31,50,52,53. In the Americas, there are dramatic differences in mtDNA haplogroup frequencies among populations (the Karitiana and Surui are 100% haplogroup D, the Pima are 100% haplogroup C, the Maya are 100% haplogroup A, and the Colombians are 50% haplogroup B and 50% haplogroup C), which are at least partly due to the small sample sizes but also in keeping with previous studies54. However, all NRY sequences from the Americas fall into one haplogroup, haplogroup Q (with the exception of one Pima with a haplogroup G sequence that likely reflects recent European admixture).While the small number of HGDP males from the Americas precludes any definitive statements, the apparently much greater mtDNA than NRY diversity in the Americas deserves further investigation.

33 a) NRY haplogroup frequencies

Population A B C D E F G H I J K L M N O P Q R Adygei 0.429 0.286 0.286 Balochi 0.083 0.042 0.167 0.208 0.500 Bantu 0.056 0.056 0.889 Basque 0.063 0.063 0.875 Bedouin 0.037 0.148 0.037 0.704 0.074 BiakaPygmy 0.304 0.696 Brahui 0.040 0.160 0.040 0.280 0.080 0.040 0.360 Burusho 0.050 0.050 0.150 0.050 0.150 0.050 0.500 Cambodian 0.167 0.167 0.667 Colombian 1.000 Dai 0.286 0.714 Daur 0.286 0.143 0.143 0.429 Druze 0.083 0.417 0.417 0.083 French 0.083 0.083 0.167 0.167 0.500 Han 0.083 0.125 0.750 0.042 Hazara 0.364 0.045 0.045 0.091 0.091 0.364 Hezhen 0.333 0.667 Italian 0.143 0.143 0.714 Japanese 0.150 0.300 0.550 Kalash 0.167 0.167 0.111 0.278 0.278 Karitiana 1.000 Lahu 0.714 0.286 Makrani 0.050 0.250 0.200 0.050 0.450 Mandenka 1.000 Maya 1.000 MbutiPygmy 0.455 0.545 Melanesian 0.500 0.500 Miao 0.143 0.857 Mongola 0.143 0.143 0.143 0.429 0.143 Mozabite 0.900 0.100 Naxi 0.286 0.286 0.429 Orcadian 0.333 0.667 Oroqen 0.833 0.167 Palestinian 0.125 0.813 0.063 Papuan 0.231 0.615 0.154 Pathan 0.105 0.105 0.105 0.105 0.579 Pima 0.125 0.875 Russian 0.063 0.250 0.188 0.063 0.438 San 0.500 0.500 Sardinian 0.063 0.063 0.625 0.125 0.125 She 1.000 Sindhi 0.350 0.050 0.050 0.550 Surui 1.000 Tu 0.286 0.571 0.143 Tujia 0.111 0.111 0.778 Tuscan 0.167 0.167 0.167 0.167 0.333 Uygur 0.125 0.125 0.125 0.125 0.500 Xibo 0.375 0.125 0.125 0.125 0.250 Yakut 0.111 0.889 Yi 0.222 0.111 0.667 Yoruba 0.083 0.917

34 b) MtDNA haplogroup frequencies

PopulationA B C D F G H I J K L M N P Q R T U V W X Y Z Adygei 0.143 0.286 0.143 0.143 0.143 0.143 Balochi 0.208 0.125 0.333 0.083 0.125 0.125 Bantu 1.000 Basque 0.563 0.125 0.063 0.125 0.125 Bedouin 0.222 0.037 0.074 0.074 0.259 0.037 0.074 0.111 0.037 0.037 0.037 BiakaPygmy 1.000 Brahui 0.040 0.280 0.040 0.320 0.040 0.120 0.160 Burusho 0.050 0.100 0.100 0.100 0.050 0.100 0.050 0.050 0.100 0.100 0.100 0.100 Cambodian 0.167 0.167 0.667 Colombian 0.500 0.500 Dai 0.143 0.429 0.143 0.143 0.143 Daur 0.143 0.286 0.286 0.143 0.143 Druze 0.667 0.083 0.167 0.083 French 0.333 0.083 0.167 0.333 0.083 Han 0.042 0.208 0.042 0.167 0.125 0.333 0.083 Hazara 0.045 0.045 0.091 0.045 0.182 0.227 0.273 0.091 Hezhen 0.333 0.167 0.500 Italian 0.714 0.143 0.143 Japanese 0.050 0.350 0.200 0.050 0.150 0.200 Kalash 0.167 0.222 0.056 0.556 Karitiana 1.000 Lahu 0.143 0.286 0.429 0.143 Makrani 0.300 0.350 0.200 0.050 0.050 0.050 Mandenka 1.000 Maya 1.000 MbutiPygmy 1.000 Melanesian 0.750 0.250 Miao 0.143 0.286 0.429 0.143 Mongola 0.429 0.143 0.143 0.143 0.143 Mozabite 0.300 0.100 0.150 0.350 0.100 Naxi 0.429 0.429 0.143 Orcadian 0.667 0.167 0.167 Oroqen 0.333 0.333 0.167 0.167 Palestinian 0.313 0.063 0.125 0.063 0.063 0.063 0.063 0.250 Papuan 0.308 0.692 Pathan 0.053 0.105 0.158 0.053 0.158 0.053 0.105 0.158 0.158 Pima 1.000 Russian 0.500 0.063 0.188 0.188 0.063 San 1.000 Sardinian 0.250 0.375 0.125 0.188 0.063 She 0.286 0.286 0.286 0.143 Sindhi 0.050 0.050 0.050 0.150 0.450 0.050 0.050 0.150 Surui 1.000 Tu 0.143 0.429 0.143 0.286 Tujia 0.111 0.333 0.111 0.111 0.111 0.111 0.111 Tuscan 0.667 0.167 0.167 Uygur 0.125 0.250 0.250 0.125 0.125 0.125 Xibo 0.375 0.125 0.125 0.125 0.125 0.125 Yakut 0.056 0.444 0.333 0.056 0.056 35 0.056 Yi 0.111 0.111 0.556 0.222 Yoruba 1.000 Supplementary Table 5. Prior estimates of divergence time (all priors uniformly distributed) and the mean, mode, and 95% HPD (highest posterior density) intervals. Simulations were based on combined mtDNA and NRY sequences and the model of population history shown in Supplementary Figure 15. Also shown are various statistics related to 1000 pseudo-observed parameter estimations: R2 is the proportion of the variance in the parameters explained by the summary statistics; Bias indicates whether the parameter tends to be over-estimated (positive bias) or under-estimated (negative bias); RMSE (root mean square error) is a distance between the true and estimated values of the parameter; Coverage is the proportion of times the true value for the parameter lies within the 90% credible interval around the parameter estimate; and Factor 2 is the proportion of estimated values that are within 50%-200% of the true value.

Pseudo-observed Parameters Prior Mean Mode 95% HPD R2 Bias RMSE Coverage Factor 2

T1 100,000-150,000 107,067102,125 100,175-123,116 0.98-0.01 0.07 95 1 T2 60,000-100,000 74,916 74,691 63,350-93,892 0.970.03 0.13 97 1 T3 60,000-100,000 63,210 61,152 60,200-67,718 0.980.01 0.05 100 1 T4 40,000-60,000 49,28042,637 40,574-58,075 1 0.01 0.06 100 1 T5 20,000-40,000 36,70038,394 30,475-39,581 0.910.03 0.09 92 1 T6 10,000-20,000 15,82817,798 11,280-19,500 0.990.02 0.11 100 1

36

Supplementary Table 6. Improvement in goodness of fit of posterior vs. prior estimates of summary statistics to observed values, based on simulations to estimate population divergence times using combined mtDNA and NRY sequences. Shown are the Bias and RMSE (defined in the legend to Supplementary Table 5) for each summary statistic (defined in the legend to Supplementary Figure 17) based on the prior and then the posterior distributions. There is a pronounced decrease in both the bias and the RMSE in the posterior estimates, indicating that the simulations are greatly improving the fit to the summary statistics.

Prior Posterior Statistic Observed Bias RMSE Bias RMSE

S_1 1108 2.21 2.27 -0.04 0.08 S_2 296 9.74 10.1 -0.160.2 S_3 733 3.82 3.88 -0.280.3 S_4 1279 1.81 1.84 -0.010.07 S_5 1540 1.3 1.34 -0.030.08 S_6 214 13.3 13.92 -0.080.15 D_1 -1.744 -3.27 -3.63 -0.4 -0.4 D_2 -0.866 -3.63 -4.24 -0.79-0.9 D_3 -2.194 -2.57 -2.86 -0.46-0.48 D_4 -2.301 -2.65 -2.96 -0.03-0.04 D_5 -2.431 -2.54 -2.84 -0.02-0.03 D_6 -1.342 -2.64 -3.03 -0.7 -0.75 Pi_1 109.528 13.4114.78 0.35 0.37 Pi_2 69.971 20.7922.93 0 0.16 Pi_3 54.731 25.9928.77 0.27 0.32 Pi_4 69.754 19.5221.8 0.06 0.11 Pi_5 70.205 18.8721.19 0.02 0.09 Pi_6 39.398 34.3238.41 0.24 0.31 PhiST_2_1 0.284 -0.86 0.89 0.12 0.16 PhiST_3_1 0.346 -0.84 0.88 0.06 0.1 PhiST_3_2 0.24 -0.81 0.9 0.52 0.57 PhiST_4_1 0.306 -0.77 0.84 0.23 0.25 PhiST_4_2 0.151 -0.57 1 1.25 1.28 PhiST_4_3 0.072 -0.51 1.16 1.53 1.66 PhiST_5_1 0.31 -0.73 0.82 0.26 0.28 PhiST_5_2 0.131 -0.39 1.14 1.7 1.73 PhiST_5_3 0.159 -0.7 0.9 0.23 0.37 PhiST_5_4 0.093 -0.73 0.98 -0.31 0.36 PhiST_6_1 0.343 -0.78 0.83 0.05 0.1 PhiST_6_2 0.29 -0.76 0.84 0.6 0.62

37

Supplementary Table 7. Current and ancestral estimates of female effective population size (Nf) based on simulations of the HGDP mtDNA sequences. The simulations assumed the model of population history in Supplementary Figure 15 and the mean divergence time estimates in Supplementary Table 5. Simulations were carried out with a uniform prior distribution on Nf of 1-100,000 for each regional group. The statistics for the pseudo-observed values are as defined in the legend to Supplementary Table 5.

Pseudo-observed Mean Mode 95% HPD R2 Bias RMSE Coverage Factor 2 Current sizes Africa 11,505 11,841 11,052-11,951 0.93 -0.01 0.03 75 1 Oceania 3,509 3,936 3,053-3,952 0.98 -0.02 0.09 74 1 Europe 8,029 8,895 7,111-8,906 0.98 0.01 0.07 91 1 Central Asia 29,513 30,740 28,155-30,853 0.97 0 0.03 80 1 East Asia 100,111 108,787 91,032-109,030 0.97 0 0.06 71 1 Americas 1,802 2,030 1,531-2,070 0.97 0.04 0.1 78 1

Ancestral sizes Africa 57 10 5-113 0.67 1.96 1.88 82 1 Out-of-Africa 26 5 1-107 0.69 5.48 3.98 75 1 Oceania 52 13 4-112 0.65 2.09 2.21 90 1 Europe 118 23 10-253 0.88 3.09 2.77 73 1 Central Asia 1,663 2,863 372-2,956 0.91 0.19 0.41 97 1 East Asia 4,710 7,274 1,310-8,374 0.98 0.09 0.26 96 1 Americas 90 111 8-1,970 0.87 6.1 3.82 71 1

38

Supplementary Table 8. Improvement in goodness of fit of posterior vs. prior estimates of summary statistics to observed values, based on simulations to estimate current and ancestral Nf from the HGDP mtDNA sequences. Shown are the Bias and RMSE (defined in the legend to Supplementary Table 5) for each summary statistic (defined in the legend to Supplementary Figure 17) based on the prior and then the posterior distributions. There is a pronounced decrease in both the bias and the RMSE in the posterior estimates, indicating that the simulations are greatly improving the fit to the summary statistics.

Prior Posterior Statistics Observed Bias RMSE Bias RMSE

S_1 563 1.6 1.74 -0.01 0.06 S_2 149 6.76 7.4 -0.01 0.12 S_3 383 2.67 2.82 -0.07 0.09 S_4 755 0.94 1.03 -0.02 0.04 S_5 831 0.69 0.8 -0.03 0.05 S_6 118 7.97 8.89 -0.07 0.15 D_1 -1.343 -2.42 -2.93 -0.08 -0.13 D_2 -0.851 -2.09 -2.79 -0.28 -0.47 D_3 -2.356 -1.62 -1.89 -0.21 -0.21 D_4 -2.381 -1.68 -1.98 -0.01 -0.02 D_5 -2.484 -1.62 -1.91 -0.02 -0.03 D_6 -0.593 -2.4 -3.77 -0.45 -0.83 Pi_1 68.538 6.11 7.31 0.04 0.07 Pi_2 35.309 11.91 14.17 0.06 0.13 Pi_3 24.761 16.66 19.82 0.34 0.35 Pi_4 37.669 10.14 12.31 0.01 0.05 Pi_5 35.255 10.22 12.61 0.02 0.07 Pi_6 27.628 13.23 16.31 0 0.14 PhiST_2_1 0.236 -0.7 0.82 0.05 0.13 PhiST_3_1 0.314 -0.71 0.79 -0.06 0.1 PhiST_3_2 0.255 -0.71 0.87 -0.09 0.16 PhiST_4_1 0.245 -0.54 0.76 0.23 0.25 PhiST_4_2 0.122 -0.2 1.27 0.64 0.69 PhiST_4_3 0.055 0.09 1.78 0.49 0.57 PhiST_5_1 0.279 -0.5 0.73 0.14 0.17 PhiST_5_2 0.117 0.11 1.56 0.89 0.93 PhiST_5_3 0.145 -0.39 0.95 -0.28 0.31 PhiST_5_4 0.049 0.17 1.96 0.04 0.3 PhiST_6_1 0.269 -0.55 0.71 0.09 0.14 PhiST_6_2 0.229 -0.5 0.8 0.38 0.43 PhiST_6_3 0.346 -0.77 0.83 -0.4 0.42 PhiST_6_4 0.175 -0.68 0.86 -0.19 0.27 PhiST_6_5 0.096 -0.81 1 0.11 0.38 Va 25.440 -0.75 0.9 0.13 0.18 Vb 6.350 -0.01 1.27 -0.02 0.14 Vc 68.210 0.28 0.34 -0.05 0.06

39

Supplementary Table 9. Current and ancestral estimates of male effective population size (Nm) based on simulations of the HGDP NRY sequences. The simulations assumed the model of population history in Supplementary Figure 15 and the mean divergence time estimates in Supplementary Table 5. Simulations were carried out with a uniform prior distribution on Nm of 1-100,000 for each regional group. The statistics for the pseudo-observed values are as defined in the legend to Supplementary Table 5.

Pseudo-observed Mean Mode 95% HPD R2 Bias RMSE Coverage Factor 2 Current sizes Africa 6,565 7,662 4,632-7,898 0.99 -0.01 0.11 100 1 Oceania 2,060 2,172 1,920-2,188 0.92 0 0.04 75 1 Europe 3,815 4,327 2,814-4,456 0.99 0.02 0.11 98 1 Central Asia 8,579 8,888 8,155-8,961 0.97 0 0.03 94 1 East Asia 22,009 22,630 21,113-22,901 0.96 0 0.03 81 1 Americas 685 746 566-789 0.95 0 0.11 79 1

Ancestral sizes Africa 32 48 2-75 0.69 2.97 2.62 81 0.63 Out-of-Africa 15 10 1-59 0.69 3.27 2.61 75 0.69 Oceania 30 12 3-62 0.67 1.91 2.19 88 0.56 Europe 18 17 1-42 0.7 2.77 2.43 83 0.62 Central Asia 74 122 10-129 0.78 1.18 1.09 89 0.78 East Asia 4,935 4,704 4,269-5,664 0.98 -0.02 0.07 89 1 Americas 21 28 2-45 0.58 2.41 2.39 80 0.64

40

Supplementary Table 10. Improvement in goodness of fit of posterior vs. prior estimates of summary statistics to observed values, based on simulations to estimate current and ancestral Nm from the HGDP NRY sequences. Shown are the Bias and RMSE (defined in the legend to Supplementary Table 5) for each summary statistic (defined in the legend to Supplementary Figure 17) based on the prior and then the posterior distributions. There is a pronounced decrease in both the bias and the RMSE in the posterior estimates, indicating that the simulations are greatly improving the fit to the summary statistics.

Prior Posterior Statistics Observed Bias RMSE Bias RMSE

S_1 545 2.14 2.28 -0.11 0.15 S_2 147 8.35 9.02 -0.05 0.14 S_3 350 3.74 3.9 -0.2 0.22 S_4 524 2.28 2.38 -0.08 0.09 S_5 709 1.33 1.43 -0.03 0.05 S_6 96 12.07 13.3 -0.28 0.31 D_1 -2.147 -1.94 -2.23 -0.57 -0.58 D_2 -0.868 -2.12 -2.85 -0.79 -0.91 D_3 -1.995 -1.78 -2.14 -0.4 -0.42 D_4 -2.168 -1.79 -2.16 -0.07 -0.08 D_5 -2.355 -1.69 -2.02 -0.01 -0.02 D_6 -2.228 -1.38 -1.59 -0.98 -1.04 Pi_1 40.990 13.12 15.32 0.72 0.75 Pi_2 34.662 14.72 17.37 0.13 0.2 Pi_3 29.971 16.44 19.48 0.23 0.26 Pi_4 32.085 14.6 17.5 0.05 0.08 Pi_5 34.950 12.48 15.26 -0.02 0.06 Pi_6 11.771 38.71 46.8 0.6 0.74 PhiST_2_1 0.350 -0.8 0.85 -0.05 0.13 PhiST_3_1 0.384 -0.77 0.82 -0.01 0.1 PhiST_3_2 0.227 -0.69 0.87 0.35 0.39 PhiST_4_1 0.377 -0.71 0.79 0.1 0.15 PhiST_4_2 0.182 -0.47 0.95 0.81 0.83 PhiST_4_3 0.088 -0.31 1.16 1.35 1.4 PhiST_5_1 0.344 -0.6 0.74 0.21 0.24 PhiST_5_2 0.145 -0.11 1.26 1.23 1.26 PhiST_5_3 0.173 -0.49 0.88 0.17 0.25 PhiST_5_4 0.136 -0.57 0.91 -0.36 0.38 PhiST_6_1 0.440 -0.73 0.78 -0.09 0.14 PhiST_6_2 0.362 -0.69 0.79 0.29 0.33 PhiST_6_3 0.270 -0.7 0.81 0.21 0.28 PhiST_6_4 0.211 -0.73 0.86 0.15 0.29 PhiST_6_5 0.236 -0.93 0.95 -0.17 0.29 Va 27.910 -0.79 0.9 0.25 0.3 Vb 10.880 -0.42 0.87 -0.01 0.15 Vc 61.220 0.43 0.48 -0.11 0.13

41

Additional References

32. Rosenberg, N.A. Standardized subsets of the HGDP‐CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet 70, 841‐ 7 (2006). 33. Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010, pdb prot5448 (2010). 34. Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32, D493‐6 (2004). 35. Hodges, E. et al. Genome‐wide in situ exon capture for selective resequencing. Nat Genet 39, 1522‐7 (2007). 36. Hodges, E. et al. Hybrid selection of discrete genomic intervals on custom‐designed microarrays for massively parallel sequencing. Nat Protoc 4, 960‐74 (2009). 37. Maricic, T., Whitten, M. & Paabo, S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5, e14004 (2010). 38. Kircher, M., Stenzel, U. & Kelso, J. Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 10, R83 (2009). 39. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows‐Wheeler transform. Bioinformatics 25, 1754‐60 (2009). 40. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078‐9 (2009). 41. DePristo, M.A. et al. A framework for variation discovery and genotyping using next‐generation DNA sequencing data. Nat Genet 43, 491‐8 (2011). 42. Briggs, A.W. et al. Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325, 318‐21 (2009). 43. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792‐7 (2004). 44. Underhill, P.A. et al. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet 65, 43‐62 (2001). 45. Excoffier, L. & Lischer, H.E.L. Arlequin suite ver 3.5: a new series of programs to perform analyses under Linux and Windows. Molecular Ecology Resources 10, 564‐567 (2010). 46. Venables, W.N. & Ripley, B.D. Modern Applied Statistics with S, (Springer, New York, 2002). 47. Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289‐90 (2004). 48. Drummond, A.J., Suchard, M.A., Xie, D. & Rambaut, A. Bayesian Phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29, 1969‐1973 (2012). 49. Posada, D. jModelTest: Phylogenetic model averaging. Molecular Biology and Evolution 25, 1253‐ 1256 (2008). 50. Soares, P. et al. Correcting for Purifying Selection: An Improved Human Mitochondrial Molecular Clock. American Journal of Human Genetics 84, 740‐759 (2009). 51. Drummond, A.J., Rambaut, A., Shapiro, B. & Pybus, O.G. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22, 1185‐92 (2005). 52. Excoffier, L. & Foll, M. fastsimcoal: a continuous‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332‐4 (2011). 53. Beaumont, M.A., Zhang, W. & Balding, D.J. Approximate Bayesian computation in population genetics. Genetics 162, 2025‐35 (2002). 54. Neuenschwander, S. et al. Colonization history of the Swiss Rhine basin by the bullhead (Cottus gobio): inference under a Bayesian spatially explicit framework. Mol Ecol 17, 757‐72 (2008). 55. Zhong, H. et al. Global distribution of Y‐chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia. J Hum Genet 55, 428‐35 (2010).

42

56. Scheinfeldt, L. et al. Unexpected NRY chromosome variation in Northern Island Melanesia. Mol Biol Evol 23, 1628‐41 (2006). 57. Shi, H. et al. Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations. BMC Biol 6, 45 (2008). 58. Hammer, M.F. et al. Dual origins of the Japanese: common ground for hunter‐gatherer and farmer Y chromosomes. J Hum Genet 51, 47‐58 (2006). 59. de Filippo, C. et al. Y‐chromosomal variation in sub‐Saharan Africa: insights into the history of Niger‐ Congo groups. Mol Biol Evol 28, 1255‐69 (2011). 60. Cinnioglu, C. et al. Excavating Y‐chromosome haplotype strata in Anatolia. Hum Genet 114, 127‐48 (2004). 61. Semino, O. et al. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 290, 1155‐9 (2000). 62. Sengupta, S. et al. Polarity and temporality of high‐resolution y‐chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 78, 202‐21 (2006). 63. Rootsi, S. et al. A counter‐clockwise northern route of the Y‐chromosome haplogroup N from towards Europe. Eur J Hum Genet 15, 204‐11 (2007). 64. Pakendorf, B. et al. Investigating the effects of prehistoric migrations in : genetic variation and the origins of Yakuts. Hum Genet 120, 334‐53 (2006). 65. Hammer, M.F. & Zegura, S.L. The human Y chromosome haplogroup tree: Nomenclature and phylogeography of its major divisions. Annual Review of Anthropology 31, 303‐321 (2002). 66. Reich, D. et al. Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am J Hum Genet 89, 516‐28 (2011). 67. Wollstein, A. et al. Demographic history of Oceania inferred from genome‐wide data. Curr Biol 20, 1983‐92 (2010). 68. Wood, E.T. et al. Contrasting patterns of Y chromosome and mtDNA variation in Africa: evidence for sex‐biased demographic processes. Eur J Hum Genet 13, 867‐76 (2005). 69. Botigue, L.R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc Natl Acad Sci U S A 110, 11791‐6 (2013). 70. Kayser, M. et al. Melanesian and Asian origins of Polynesians: mtDNA and Y chromosome gradients across the Pacific. Mol Biol Evol 23, 2234‐44 (2006). 71. Trejaut, J.A. et al. Traces of archaic mitochondrial lineages persist in Austronesian‐speaking Formosan populations. PLoS Biol 3, e247 (2005). 72. Friedlaender, J.S. et al. Melanesian mtDNA complexity. PLoS One 2, e248 (2007). 73. Kayser, M. et al. The impact of the Austronesian expansion: evidence from mtDNA and Y chromosome diversity in the Admiralty Islands of Melanesia. Mol Biol Evol 25, 1362‐74 (2008). 74. Schurr, T.G. The peopling of the New World: Perspectives from molecular anthropology. Annual Review of Anthropology 33, 551‐583 (2004).

43