Journal of Human Genetics (2012) 57, 717–726 & 2012 The Japan Society of Human Genetics All rights reserved 1434-5161/12 $32.00 www.nature.com/jhg

ORIGINAL ARTICLE

Mitochondrial diversity patterns and the resettlement of : new insights from the edge of the Franco-Cantabrian refuge

Antonio F Pardin˜as1, Agustı´n Roca2, Eva Garcia-Vazquez2 and Belen Lopez1

Phylogeography of the mitochondrial lineages commonly found in Western Europe can be interpreted in the light of a postglacial resettlement of the continent. The center of this proposal lies in the Franco-Cantabrian glacial refuge, located in the northern and Southwestern France. Recently, this interpretation has been confronted by the unexpected patterns of diversity found in some European . To shed new lights on this issue, research on Iberian populations is crucial if events behind the actual genetics of the European continent are to be untangled. In this regard, the region of Asturias has not been extensively studied, despite its convoluted history with prolonged periods of isolation. As mitochondrial DNA is a kind of data that has been commonly used in human , we conducted a thorough regional study in which we collected buccal swabs from 429 individuals with confirmed Asturian ancestry. The joint analysis of these sequences with a large continent-wide database and previously published diversity patterns allowed us to discuss a new explanation for the population dynamics inside the Franco-Cantabrian area, based on range expansion theory. This approximation to previously contradictory findings has made them compatible with most proposals about the postglacial resettlement of Western Europe. Journal of Human Genetics (2012) 57, 717–726; doi:10.1038/jhg.2012.100; published online 16 August 2012

Keywords: Iberia; isolation; linear model; migration; private lineages

INTRODUCTION shown these clines to be consistent and compatible with a postglacial The genetic diversity of European populations has been an intensely (formerly late-glacial) resettlement of most European regions, which researched topic because of its implications in the reconstruction of would be centered in this refuge.7,8 the history of the human species.1 In the last decade, many efforts Lately, this idea has been criticized with the argument that while have been dedicated to the search for patterns that could be related to the frequencies of European postglacial haplogroups are the highest in known prehistoric events; for example, migrations during the Last Franco-Cantabrian populations, they also show their lowest molecular Glacial Maximum or the demic diffusion.2 Uniparental diversities there.9 Furthermore, marked heterogeneities in markers, such as mitochondrial DNA (mtDNA) and the composition appear even between geographically close populations of Y-chromosome, have proven increasingly useful for delving into this area, which are probably related to genetic drift due the steep these patterns, allowing for a mostly accepted chronology of key relief of northern Iberia. Both evidences point to the possibility that events for European demography to be established throughout the the Franco-Cantabrian genetic makeup may have evolved discretely years.3 Nevertheless, discussion of some of the inferences related to and differently from the rest of Europe, and may not be adequate for this chronology has caused controversies that remain open, including inferring continental patterns. Following this view, the proposed the presumed repopulation of most of Western Europe during the repopulation should be centered in other refuges, such as those Magdalenian (ca.17–12 ka).4 located in Southern France,10 ,11 the Balkan Peninsula12,13 or the The most common interpretation of the phylogeography of .14 European mtDNA haplogroups currently supports this theory, as But these low diversities can be interpreted in another way. It is there is a strong geographical southwestern–northeastern cline in the known that human population size in Europe declined from the frequencies of haplogroups with postglacial coalescence times, such as beginning of the Upper Paleolithic until the Bølling-Allerød inter- H1, H3 and V.5 Frequency peaks for those lineages can all be found in stadial, because of a dietary breadth contraction caused by prey the northern Iberian Peninsula and the southwestern part of France, extinction.15 A likely effect of this would have been an enhancement in a territory which formed a glacial refuge called the ‘Franco- of genetic drift coupled with recurrent population bottlenecks, which Cantabria’.6 Molecular dissection studies and intensive samplings have are the major factors in the shaping of genetic diversity.16 During the

1Departamento de Biologı´a de Organismos y Sistemas, Universidad de Oviedo, Oviedo, and 2Departamento de Biologı´a Funcional, Universidad de Oviedo, Oviedo, Spain Correspondence: Professor B Lopez, Departamento de Biologı´a de Organismos y Sistemas, Universidad de Oviedo, C/Catedratico Rodrigo Uria s/n, Oviedo, Asturias 33006, Spain. E-mail: [email protected] Received 11 May 2012; revised 9 July 2012; accepted 12 July 2012; published online 16 August 2012 MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 718

Last Glacial Maximum, conditions would have been worsened, as population (which would also increase due to mutation and the paleoecological modeling has shown the maximum range of the assimilation of the remnants of indigenous populations that survived glacial refuges to be small and constrained by permafrost areas.17 In in cryptic refugia).12 this situation, carrying capacity of the Franco-Cantabrian refuge As for the heterogeneity found even in close extant Franco- would not have been able to sustain a large, expanding population, Cantabrian populations, it is not surprising in the context of the and a series of fragmented populations would have been more likely, Iberian Peninsula, which is a crossroad between mainland Europe, as has been inferred to have occurred in regions of the refuge in a Northern Africa, the Atlantic Ocean and the Mediterranean Sea.27 The previous ancient DNA study.18 In such a situation, incoming varied Iberian geography favors the existence of barriers and isolated migration promotes increased competition for resources in the area, populations all throughout the territory, which calls for detailed studies which in turn hinders population growth; meanwhile, fragmentation to account for localized historical processes.28 Thesehavetobepaired promotes ‘refuge effects’19 in which local populations can be entirely with extensive sampling efforts to have a chance of identifying the monomorphic, but the majority of lineages remain somewhere in the historic and demographic processes behind the distribution of mtDNA metapopulation.20 Such heterogeneity can be indirectly seen by the lineages.29 On that basis, one of the most understudied regions of the diversification and coexistence of Epigravettian and Solutrean cultures peninsula is the Principality of Asturias, which is located precisely in during glacial times.17,21 After the improvement of climatic the western edge of the Franco-Cantabrian refuge (Figure 1). conditions, a major increase of carrying capacity would occur So far, mtDNA studies in Asturias have been clinically oriented and outside the refuge, promoting a range expansion of the refugee based on classical restriction fragment-length polymorphisms ana- population and migration towards the newly available area. This lyses30,31 or included only a few samples as a part of broader would be paired with an increase of the metapopulation size, studies.32,33 To our knowledge, the only other sizeable population especially in the margins of the inhabited area, as has been inferred genetic study is the one from Garcia et al.,9 which used 89 samples by archeological findings for this period.22 After migration, collected in Asturias. They noted the aforementioned pattern of local demographic expansions would follow, and if gene flow with the drift as a possible cause for the differences between Asturias and other former refugia is not interrupted, genetic diversity will increase.23 This regions of the Franco-Cantabrian area, but a greater number of is compatible with the neutrality statistics and mismatch distributions samples is likely needed to achieve an unbiased representation of of most European populations, which bear the signal of a population Asturian mtDNA lineages.34 Recent immigration should also be taken expansion.8 As for the parent population, still in the refuge, it will into account to avoid admixture biases in the sample, at least to some only expand if the carrying capacity of the area rises, which can be extent. Thus, we decided to perform an extensive ancestry-restricted related with the fact that Franco-Cantabrian populations have been population-wide study. The aim of the study was to perform spatial inferred to begin their expansion in the Neolithic (after the analyses that can be used to test the hypothesis of a postglacial range introduction of agriculture), whereas populations from Southern- expansion compatible with the present-day genetic makeup of Central Europe show earlier tardiglacial expansion times.24 In these Western Europe. For that matter, new mtDNA sequences would be conditions, it becomes possible that a glacial refuge, due to a long- obtained from a population in the border of the presumed center of term small carrying capacity and internal extinction–recolonization the expansion, which would be used to obtain picture of the pattern processes,25 might nowadays show low genetic diversity values in of local heterogeneities and drift processes. comparison with outside areas repopulated from that refuge. If our hypothesis is true, three variables should positively correlate MATERIALS AND METHODS with the diversity of expanded lineages: geographical distance to the Study area providing new data within the Franco-Cantabrian refuge (as northern territories would have been more densely refuge 4 repopulated), effective population size (which should be regarded Asturias is a Spanish autonomous community of around one million as an historical mean)26 and the number of private lineages of each inhabitants, unequally distributed between the densely populated cities of

Figure 1 Map of the Iberian Peninsula showing the location of the Principality of Asturias (black). The Franco-Cantabrian region is shaded in gray. Names are given of populations mentioned in the text.

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 719 the center and many smaller scattered towns. Its territory, spanning Statistical analyses 10 000 km2, is the most mountainous region of Spain and one of the most The software Arlequin v3.51251 was used to obtain estimates of the haplotype mountainous region of Europe, with political limits that have existed since the and nucleotide diversities described by Nei52 and Tajima,53 respectively. Also, 35 Roman times. After the Roman invasion of the peninsula (29–19 BC), no the number of private lineages found in each population was obtained, defined important population movements into the territory were historically recorded, as the number of lineages that are only found in a given population when this as its difficult terrain kept it outside the main routes of migration into Iberia is compared with all the others of our HVS-I data set. Comparison of the for many centuries.36 In fact, in medieval times, Asturias was described as an haplotype diversities of the pairs of populations was performed with the isolated and impoverished province on the northern margin of the Spanish Test_h_diff software,54 which implements a Bayesian procedure. Differentiation 37 kingdom, a status that hardly changed even after the Industrial Revolution. between populations was assessed in terms of FST pairwise values, which were Only in the years following 1950, the development of coal-mining facilities in computed using the Tamura and Nei55 distance between sequences, with an the region improved the economy and favored the immigration of entire unequal transition/transversion weighting of 4/5.56 To depict population families from other Spanish regions, shaping the Asturian demography as it is relationships in a simpler manner, Slatkin57 generalized distances that were 38 today. computed from the pairwise FST values were plotted into a two-dimensional multidimensional scaling, with the algorithm ALSCAL included in the software SPSS v17.58 Subjects 59 The Watterson estimator yK (also called ‘population-mutation rate Samples of buccal cells were obtained from 429 volunteers recruited in various parameter’) was also computed according to the Ewens sampling formula.60 cities and towns of Asturias during the period 2009–2011, who confirmed This value is an approximation, based on the number of haplotypes, to the having at least two generations of autochthonous maternal ancestry (i.e., widely known composite 2Nef Á m, in which Nef is the effective number of grandmothers born in Asturias). No volunteers less than 18 years of age were females of the population and m is the mutation rate per nucleotide of the allowed in the study, informed consent was obtained from each sample donor, HVS-I region. Previous studies have defined the existence of a positive, and the study conformed to the Spanish Law for Biomedical Research (Law 14/ significant correlation between yK and the proportion of private lineages of a 2007—3 July). Sampling was performed using swabs with Dacron tips group of populations.50,61 This is because private lineages arise by mutation of (Deltalab, Barcelona, Spain), and a protocol based on the Chelex-100 resin a previously existing haplotype, which increases the y estimates under the 39 K (Bio-Rad, Hercules, CA, USA) was used for DNA extraction. theoretical framework of an infinite-allele mutation model. To properly compare different populations on the basis of their diversity Sequence amplification and analyses patterns, sampling coverage estimates (i.e., the proportion of the true number From each sample, the complete hypervariable segment I (HVS-I) of the of different haplotypes that have actually been sampled) in each population 62 control region of the mtDNA was amplified by PCR using the primers L15900 were calculated with the R statistical framework using the unseen2 package, 63 and H00016.40 Two independent amplifications were performed to reduce the which incorporates the formulae developed by Egeland and Salas for this possibility of sequence ambiguities derived from low-quality bases. Sequencing matter. This same framework was also used to fit linear models and to was carried out in an ABI 3730xl DNA Analyzer by the company Macrogen compute Moran autocorrelation indexes for some statistics, the latter using 64 (Seoul, Korea). package ape. Spatial mappings of population genetic statistics were 65 Sequences were analyzed using the software Geneious v4.85,41 in which the performed with the Quantum GIS v1.73 software, whereas interpolation highest quality consensuses were obtained and aligned with the revised routines shown in the graphics were performed with the Kriging algorithm of 66 Cambridge Reference Sequence.42 This alignment was checked manually for the SDA4PP plugin. Distances between two populations, when needed, were 67 phylogenetic consistence following the recommendations of Bandelt and computed from geographical coordinates using GDMG v1.23. Parson43 and the EMPOP online tool.44 Haplogroup assignment of the Estimation of migration rates between populations was performed with the 68 samples was performed with the software HaploGrep,45 using version 13 of software MIGRATE-N v3.216. For each model that we tested, we used 80 the mtDNA phylogenetic tree as a reference.46 Samples were included in random sequences from each included population and executed 5 separate when various equally probable assignments were possible due to runs under the Bayesian implementation. Each analysis involved two long the absence of diagnostic mutations. All samples that obtained a HaploGrep Markov chains of 10 million generations, with a standard uniform slice priors quality index lower than 80% and all samples allocated to the HV* on the M and y parameters. A static heating scheme with four chains and were analyzed further by genotyping the coding-region single-nucleotide standard temperatures was also used. Model likelihood was obtained by a polymorphisms adequate for restriction fragment-length polymorphism thermodynamic integration with Bezier approximation, as implemented in the assays,47 to assure the accuracy of their phylogenetic classification. Also, all software. Finally, the median parameter values of each run were compared with samples allocated to the H* paragroup were restriction fragment-length the 95% credibility intervals (95% CI) of all other runs to ensure their fit and polymorphism-tested for the presence of diagnostic mutations for the H1 remove outliers if any. and H3 haplogroups.48 RESULTS Additional data acquisition Haplogroup composition and diversity of the Asturian population Asturian sequences were compared with an HVS-I data set of 9539 sequences The HVS-I sequences from the Asturian volunteers, along with their from 45 European and North African populations (Supplementary Table 1). To haplogroup assignment, can be found in Supplementary Table 2. A include the maximum number of sequences in this data set, all of them were total of 63 different haplogroups were identified in the sample, and trimmed to a length of 342 nucleotides between positions 16 024 and 16 366 of their frequencies are shown in Table 1. The haplogroup composition the mitochondrial genome. This range is sufficient for our analyses, as it of Asturias is dominated by nearly a 50% of the haplogroup H, as includes the most polymorphic interval of the entire hypervariable region.49 could be expected given a Southern European Atlantic location. A Data for the diversity of the haplogroups H1, H3, K, T2b, W, HV0 and V previous study highlighted the high frequency of haplogroup J in the were computed from the Asturian haplotypes and obtained from the Garcia region, which amounts to a 10.51% in our results (95% CI 7.85– et al. study9 for nine European regions and Northern Africa, including the 13.71) and seems to be a local peak in respect to neighboring Franco-Cantabrian area. Values for the yK and KP parameters were also 9 computed for these regions using the most approximate populations of our populations. About other haplogroups commonly encountered in HVS-I data set (Supplementary Table S3), with the aim of performing model the Franco-Cantabrian area, another peak appears in the frequency of fitting on the diversity data. For Central and Eastern European populations not T2b, which is a 4.66% of the sample (95% CI 2.94–7.04). present in our data set, estimates previously computed by Helgason et al.50 Interestingly, this frequency is higher than those reported on the were used. neighboring regions of Galicia, Leon and Cantabria, nearly reaching

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 720

Table 1 Haplogroups found on the Asturias sample and their Table 1 (Continued ) frequency Haplogroup Individuals Frequency, % (95% CI)a Haplogroup Individuals Frequency, % (95% CI)a U5b1* 3 0.70 (0.19–2.00) D4g1 1 0.23 (0.01–1.29) U5b1d 4 0.93 (0.32–2.35) H* 101 23.54 (19.63–27.80) U5b3 1 0.23 (0.01–1.29) H1* 53 12.35 (9.47–15.77) U6 5 1.17 (0.46–2.70) H1a 1 0.23 (0.01–1.29) U8b 4 0.93 (0.32–2.35) H1a3 4 0.93 (0.32–2.35) V 9 2.10 (1.05–3.88) H2a2b* 2 0.47 (0.08–1.65) W 3 0.70 (0.19–2.00) H2a2b1 11 2.56 (1.29–4.47) Abbreviation: CI, credential interval. H2b 1 0.23 (0.01–1.29) aConfidence intervals for the haplogroup frequency are calculated with the Blaker exact H3 30 6.99 (4.79–9.81) formulae using R-code, publicly available from Alan Agresti (http://www.stat.ufl.edu/Baa/cda/R/ one-sample/R1/index.html). H5 8 1.86 (0.83–3.64) H6* 9 2.10 (1.05–3.88) H6a1a1 2 0.47 (0.08–1.65) H7a1 2 0.47 (0.08–1.65) the 5%, which is common in the Mediterranean.69 In contrast, H1, H9a 1 0.23 (0.01–1.29) HV, V and X haplogroups seem to experience local minimums in HV* 2 0.47 (0.08–1.65) Asturias. HV0 8 1.86 (0.83–3.64) Some non-European haplogroups, of Near Eastern, African and HV4a* 3 0.70 (0.19–2.00) Asian affinities or origins, were also found: M1, N1b, L3f1b4a and HV4a1a 3 0.70 (0.19–2.00) D4g1. Although finding such lineages is rare outside of their proposed I* 1 0.23 (0.01–1.29) areas of origin, they have been reported in other populations from the I1a 1 0.23 (0.01–1.29) Iberian Peninsula28 and appear sporadically in European surveys in I2a 1 0.23 (0.01–1.29) 70 J* 27 6.29 (4.22–9.02) frequencies around 1% or less, as it is the case for Asturias. J1b1a1 8 0.23 (0.01–1.29) More detailed information can be inferred by observing the raw J2a1a 1 1.86 (0.83–3.64) Slatkin FST matrix, which is in Supplementary Table 5. Regarding J2b1a 9 2.10 (1.05–3.88) Asturias, this index is statistically significant for its comparisons with K* 11 2.56 (1.29–4.47) all other Spanish populations, with the exception of the Balearic K1a1 2 0.47 (0.08–1.65) Islands, but this could be an effect of their small sample size. It has to K1a5 1 0.23 (0.01–1.29) be noted that the regions immediately to the west and east of Asturias, K1a11 1 0.23 (0.01–1.29) Galicia and Cantabria, respectively (Figure 1), have low FST values of K1b1a 2 0.47 (0.08–1.65) 0.003 and 0.004, whereas the value for the southern neighboring K1c2 1 0.23 (0.01–1.29) region, Leon (Figure 1), reaches 0.008. This is an even higher value K2b1 1 0.23 (0.01–1.29) than those estimated between Asturias and every region of the British L1b 1 0.23 (0.01–1.29) Isles, with the exception of Orkney and the Hebrides. Also, a L2a 1 0.23 (0.01–1.29) significant difference in haplotype diversity (P ¼ 0.002) between L3f1b4a 5 1.17 (0.46–2.70) Asturias and Leon was found, which did not appear in comparisons M1* 2 0.47 (0.08–1.65) with Galicia, Cantabria or even England. This suggests a relationship M1b1 1 0.23 (0.01–1.29) with the north (British regions), which can be seen also in the case of N1* 1 0.23 (0.01–1.29) Galicia and Cantabria, without the steep differences with the south N1b* 2 0.47 (0.08–1.65) (Leon) in terms of diversity or FST. Graphically, the pattern can also R0a 1 0.23 (0.01–1.29) be seen in the multidimensional scaling plot (Figure 2), in which the T* 5 1.17 (0.46–2.70) Asturian sample (ES-AS, marked with an arrow) is located in the edge T1a* 10 2.33 (1.20–4.22) of the Iberian group and farthest from the African group, surrounded T1a2a 1 0.23 (0.01–1.29) T2* 2 0.47 (0.08–1.65) by British and Scandinavian samples. As a sidenote, in this study, the T2b* 20 4.66 (2.94–7.04) term ‘Iberian’ includes the Balearic Islands and the French Province of 71,72 T2b3a 1 0.23 (0.01–1.29) Be´arn (Figure 1) because of their deep historical relationship. T2c 3 0.70 (0.19–2.00) T2e* 1 0.23 (0.01–1.29) Effective population sizes and private lineages in Iberia T2e1 5 1.17 (0.46–2.70) Estimates of sampling coverage, nucleotide and haplotype diversity, U* 2 0.47 (0.08–1.65) the yK parameter and the percentage of private lineages (KP) for each U1a2 1 0.23 (0.01–1.29) population of the HVS-I data set can be found in Table 2. We U4* 4 0.93 (0.32–2.35) obtained a positive and significant correlation r ¼ 0.683 (Po0.001) U4a1 6 1.40 (0.61–2.95) between yK and the number of private lineages for the whole HVS-I U4a3 1 0.23 (0.01–1.29) data set and r ¼ 0.649 (P ¼ 0.006) for only the Iberian regions. The U5* 4 0.93 (0.32–2.35) latter is graphically shown in Figure 3 along with the 95% confidence U5a1* 7 1.63 (0.74–3.29) interval of the linear regression line, to highlight the different U5a1b1* 1 0.23 (0.01–1.29) population groups that exist in territory of the Iberian Peninsula. U5a1b1e 1 0.23 (0.01–1.29) Two of the populations of the Franco-Cantabrian refuge, Cantabria U5a2 2 0.47 (0.08–1.65) and Navarra, are outside the 95% CI because of a scarcity of private

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 721

Figure 2 Multidimensional scaling plot of European and African populations based on their population pairwise Slatkin linearized FST distances. The Asturian population (ES-AS) is marked with a black arrow. Other population identifiers are given in Supplementary Table 1.

Table 2 Diversity indices obtained from the populations of the HVS-I Table 2 (Continued ) database

Population NKKP H p C* (95% CI) yK Population NKKP H p C* (95% CI) yK Iceland 946 197 95 0.979 0.016 0.752 (0.695–0.809) 75.420 47 27 10 0.957 0.020 0.553 (0.411–0.695) 25.571 323 152 59 0.953 0.136 0.546 (0.461–0.631) 111.449 263 154 82 0.988 0.022 0.548 (0.463–0.632) 154.769 Brittany 236 133 42 0.974 0.013 0.495 (0.410–0.580) 125.252 Moroccan 116 78 28 0.970 0.021 0.371 (0.227–0.515) 103.167 France 80 59 24 0.971 0.015 0.224 (0.094–0.354) 99.990 Imazhigen Languedoc 85 55 21 0.963 0.016 0.352 (0.211–0.493) 66.360 Morocco 576 294 151 0.981 0.021 0.542 (0.479–0.606) 239.885 Lyonnais 46 36 13 0.097 0.014 0.277 (0.136–0.417) 74.347 Tunisia 115 90 44 0.992 0.025 0.322 (0.218–0.426) 188.819 Maine-Anjou 91 60 36 0.962 0.012 0.370 (0.207–0.533) 75.565 Tunisian 155 84 35 0.965 0.019 0.563 (0.444–0.681) 74.146 Normandie 46 31 19 0.962 0.013 0.426 (0.232–0.619) 40.497 Andalusians Picardie 79 79 37 0.984 0.014 0.456 (0.325–0.586) 78.951 England 142 97 39 0.966 0.014 0.256 (0.164–0.349) 133.610 Poitou 124 66 47 0.937 0.010 0.452 (0.327–0.576) 56.566 Hebrides (Western 181 84 23 0.979 0.016 0.662 (0.567–0.758) 60.308 Abbreviations: CI, credential interval; N, total sample size; K, number of mitochondrial Isles) lineages; KP, number of private lineages, considering the full mtDNA data set; H, haplotype Ireland 300 152 56 0.958 0.013 0.424 (0.332–0.515) 122.462 diversity; p: nucleotide diversity; C*, sampling coverage; yK, population-mutation rate Orkney 78 38 8 0.951 0.014 0.641 (0.465–0.816) 28.605 parameter. Scotland 672 245 108 0.967 0.015 0.645 (0.587–0.703) 138.378 Scottish Highlands 219 102 25 0.971 0.015 0.674 (0.577–0.770) 73.671 Shetland 502 174 78 0.950 0.012 0.675 (0.616–0.735) 93.965 lineages, which is typical of areas in which both inward and outward Skye 49 25 4 0.945 0.015 0.673 (0.500–0.847) 19.760 migrations have been historically high.50 Asturias, on the contrary, is Andalusia 344 197 73 0.969 0.014 0.490 (0.411–0.568) 190.869 in the upper half, but outside the regression 95% CI for having an Astorga (Maragatos) 50 25 6 0.900 0.011 0.486 (0.271–0.701) 19.234 excess of private lineages relative to its effective number of females. It Asturias 429 173 70 0.949 0.011 0.649 (0.580–0.718) 105.507 is, in fact, a population remarkable for having the highest proportion Balearic Islands 73 45 18 0.949 0.013 0.421 (0.227–0.615) 49.005 of private lineages of all our Iberian data set. Basque Country 327 112 32 0.938 0.010 0.593 (0.498–0.688) 59.737 Be´arn (French 81 33 8 0.909 0.009 0.695 (0.511–0.878) 20.256 ) Spatial analysis of European populations Cantabria 278 111 23 0.962 0.011 0.678 (0.589–0.767) 67.958 Spatial maps of the yK and KP values of the populations of our HVS-I Catalonia 206 133 47 0.972 0.014 0.370 (0.276–0.464) 161.236 data set can be found in Figures 4 and 5. As it can be seen, both Galicia 366 193 71 0.962 0.012 0.482 (0.407–0.558) 164.583 statistics show a cline with the lowest values in the Franco-Cantabrian Leon 113 81 22 0.974 0.013 0.260 (0.151–0.369) 126.604 2 refuge area. As the yK and KP values were correlated (r ¼ 0.458; Navarra 110 42 8 0.949 0.013 0.700 (0.569–0.831) 24.349 P-value ¼ 0.018), only the most explanatory variable was used in the Pas Valley 182 42 9 0.950 0.011 0.870 (0.788–0.951) 16.817 models to avoid collinearity. Diversity of the H1 haplogroup was (Pasiegos) fitted into a linear model that used the distance to the Franco- Portugal Center 239 137 45 0.963 0.014 0.439 (0.351–0.526) 132.693 Cantabrian refuge and the y value as predictors (r2 ¼ 0.569; Portugal North 187 115 41 0.954 0.014 0.378 (0.270–0.487) 126.163 K Portugal South 123 79 20 0.959 0.015 0.354 (0.230–0.477) 94.281 P-value ¼ 0.022). The same two predictors were adequate for fitting 2 Zamora 216 110 35 0.949 0.013 0.535 (0.435–0.636) 89.050 the diversity of the K haplogroup (r ¼ 0.587; P-value ¼ 0.012). Corsica 53 36 13 0.959 0.012 0.434 (0.273–0.595) 48.130 Diversity of the H3 haplogroup was fitted by using the distance to Italy 395 238 117 0.974 0.015 0.391 (0.323–0.460) 252.239 the refuge and the percentage of private lineages as predictors 234 119 59 0.949 0.013 0.533 (0.437–0.629) 96.208 (r2 ¼ 0.491; P-value ¼ 0.039). As for the V haplogroup, diversity in Sicily 106 56 15 0.917 0.013 0.474 (0.317–0.630) 47.369 Asturias was much higher than in other Franco-Cantabrian areas Tuscany 383 193 101 0.975 0.014 0.593 (0.518–0.668) 154.543 (0.806 vs 0.653), which hindered the fit of any model that considered

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 722

Figure 3 Scatterplot of Iberian populations based on their yK values and proportions of private lineages. The linear regression line and its 95% confidence interval (shaded) corresponds to r ¼ 0.649.

Figure 4 Map of yK values in Europe and Northern Africa, with the locations of the HVS-I data set populations shown as asterisks. Lighter colors indicate lower values, whereas darker colors indicate higher values. Moran’s I spatial autocorrelation analyses were not significant (P ¼ 0.089). Iberian genetic isolates (Pasiegos and Maragatos) were not considered in the analysis.

it. After removing this outlier value, diversity of the V haplogroup for To further assess the population dynamics of the Franco-Cantabrian the rest of the data set could be fitted to a high goodness by using a populations, we computed a MIGRATE analysis on seven Iberian, quadratic model that considered the distance to the refuge and the four French and one British population. All of them were inside or percentage of private lineages (r2 ¼ 0.883; P-value ¼ 0.009). Diversities near the Franco-Cantabrian area. First, the likelihood of five models of the T2b, W and HV0 haplogroups could not be significantly fitted with differing levels of migration and panmixia were calculated, to a model containing any of the aforementioned variables. Plots of including a full model to serve as a reference (Supplementary these models can be found in Supplementary Figures 1 to 4. Table 4). With the likelihood values, a model was chosen according

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 723

Figure 5 Map of percentage of private lineages in Europe and Northern Africa, with the locations of the HVS-I data set populations shown as asterisks. Lighter colors indicate lower values, whereas darker colors indicate higher values. Moran’s I spatial autocorrelation analyses were statistically significant (P ¼ 0.006). Iberian genetic isolates (Pasiegos and Maragatos) were not considered in the analysis.

to the Bayes factor criterion.73 This model supported the division of Scandinavia,77 and does not disrupt the general trend. Also, the the Franco-Cantabrian refuge in two different areas: Asturias– historical migration pattern that enhanced the diversity of these Cantabria and Basques–Navarra–Be`arn, with panmictic populations haplogroups during the range expansion can be seen in the results of inside each area. The migration matrix was constrained so that the MIGRATE analysis (Figure 6). The result of haplogroup K is migration was only possible between directly adjacent areas, and this particularly noteworthy, as its coalescence in Europe has been dated to can be graphically seen in Figure 6. Reported values of migration the Initial Upper Paleolithic and not to postglacial times, as is the case parameters are their averages over the five runs, as no outlier results for the other haplogroups.4 Nevertheless, its involvement in a were found. They were rounded to integers, as they represent the postglacial repopulation was already proposed on the basis of the 78 effective number of migrants per generation (Nef Á m). In all the regions star-like mutational pattern of its common haplotypes. tested, inward–outward gene flow was nearly symmetrical with the As for the diversity of the V haplogroup, a very good fit was exception of trans-Pyrenean migrations, which were greater towards obtained using a quadratic model (not accounting for the Asturian France. Migration from the Basque populations was nearly six times data), where increases of predictors are correlated with increases of higher towards Catalonia and France (60) than towards Asturias and diversity up to a point. This accounts for the founder effects that Cantabria (11). constrained the diversity of distant northern European populations,79 and is consistent with our interpretation of the refuge effect in DISCUSSION Southern-Central Europe (Supplementary Figure 4). The outlier Population dynamics around the Franco-Cantabrian refuge result of Asturias is probably a reflection of the heterogeneity of Both yK and KP calculated from our HVS-I data set show a cline in extant northern Iberian populations, which is discussed in the Europe (Figures 4 and 5), in which the lower values are coincident in-depth analysis of its sample. As for the high diversity of southern with the Franco-Cantabrian populations. This pattern is opposite to Iberian populations, it could be due to long-term contacts with the frequency clines shown by haplogroups H1, H3 and V in earlier Berber populations across the Gibraltar Strait,80 or be a signal of the studies.7,74,75 proposed Iberian origin of this haplogroup.5 In our analysis, diversities of H1, H3 and K can be explained with simple linear models using only two of the aforementioned covariates. The Asturian edge of the Franco-Cantabrian area The fit is hindered by some outlier populations (Supplementary In the genetic landscape of the northwestern Iberian Peninsula, a clear Figures 1–3), but this fact can be related to local drift processes, such outlier in terms of effective size and percentage of private lineages can as those previously inferred in some regions from France76 or be found in Asturias (Figure 3). Such an outlier result could be

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 724

Figure 6 Results of the MIGRATE analysis. Values indicate the effective number of migrants per generation (Nef Á m). Populations with the same number belong to the same panmictic cluster. Identifiers are: 1: Galicia; 2a: Asturias; 2b: Cantabria; 3a: Basque Country; 3b: Navarra; 3c: Be`arn; 4: Catalonia; 5a: Languedoc; 5b: Poitou; 5c: Maine-Anjou; 5d: Brittany; 6: England.

explained by at least three basic causes:50 demographic isolation, fragmented mitochondrial pools that have been found in populations significant gene flow from unsampled regions and/or higher sampling from different timeframes throughout the Franco-Cantabrian area.18 coverage than other regions. Before discussing the first two, it should Nevertheless, there is no clear east–west spatial barrier in the be said that the latter is unlikely, given the sampling coverage geography of northern Spain; and archeological, climatological and estimated for the different Iberian regions, as most populations have paleontological contexts of Asturias, Cantabria and the Basque been equally or more thoroughly sampled than Asturias, without Country were all very similar during the Upper Paleolithic.85,86 finding a higher proportion of private lineages in them (Table 2). Alternative explanations for this low historical migration rate The historical isolation of this region, coupled with its complex should be sought with a more in-depth regional study, in which orography, is well documented. Consequently, it is reasonable to relatively recent timeframes should be explicitly considered.87 deduce such feature as the most probable cause of its great proportion The second explanation for an excess of private lineages in Asturias, of private lineages, probably because of internal genetic structuring or gene flow with unsampled regions, cannot be discarded but does not microdifferentiation, in a manner similar to cryptically subdivided seem to be the main cause, as previous studies did not show populations.81 Also, particularly high inbreeding values for the significant relationships between Spain and regions not present in region were inferred in a previous isonymy study.82 For the female our database, such as the eastern countries.88 In any case, lineages that case, FST pairwise P-values show that the mitochondrial pool of reached Iberia by ancient southward migrations should be more or Asturians is statistically different from most of the other Iberian less homogenously distributed throughout it, so any bias in our regions. However, the greater neighboring divergence, not counting results due to unrecognized shared haplotypes is probably very small. the isolated Pasiegos and Maragatos, happens with Leon at the south. Regarding similarities with other European regions, the FST Significant differences in haplotype diversities between these two patterns observed in the multidimensional scaling plot are consistent populations are consistent with a geographical–genetical structuring with the results of the migration analysis and suggest a close process, as it indicates that most of the differences between them relationship between the northwestern Iberian coast and the British would be related to mutational distances between non-shared Isles, in accordance to the genetic makeup of the Atlantic fac¸ade haplotypes.83 Such a process could be promoted by a genetic drift, evidenced in other studies.89 Neolithic coastal dispersions could be a aided by a barrier that inhibits, at least partially, the flow of parsimonious explanation,4 without excluding some input of ‘British haplotypes after their apparition. A candidate barrier, in this case, Celtic’ markers like the haplogroup J2b1a (2.10% of Asturian sample; would be the Cantabrian Mountain Range of the southern part of 95% CI 1.05–3.88), which is also shared with Mediterranean Asturias, a loftier and most intricate mountainous system called the populations.90 Asturian Massif, which is the political border between the Province of Asturias and the autonomous community of Castilla–Leon.84 CONCLUSIONS Our migration analysis also suggests restricted gene flow from the The spatial patterns of classical population genetic statistics seem to Basque population group. This is reminiscent of the distinct and correlate well with the diversity of haplogroups which were proposed

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 725 to participate in the repopulation of Europe during the Magdalenian 10 Richard, C., Pennarun, E., Kivisild, T., Tambets, K., Tolk, H. -V., Metspalu, E. et al. An period. These diversities are low in the former territory of the Franco- mtDNA perspective of French genetic variation. Ann. Hum. Biol. 34, 68–79 (2007). 11 Pala, M., Achilli, A., Olivieri, A., Kashani, B. H., Perego, U. A., Sanna, D. et al. Cantabrian refuge, which can be interpreted in the context of a Mitochondrial haplogroup U5b3: a distant echo of the Epipaleolithic in Italy and the spatially restricted refuge with small carrying capacity. A recoloniza- legacy of the early Sardinians. Am. J. Hum. Genet. 84, 814–821 (2009). tion process in which the populations greatly expanded after arriving 12 Malyarchuk, B., Derenko, M., Grzybowski, T., Perkova, M., Rogalla, U., Vanecek, T. et al. The peopling of Europe from the mitochondrial haplogroup U5 perspective. PLoS to the new territories would explain the positive association of ONE 5, e10285 (2010). diversity and distance to the refuge. In this way, diversity measures 13 Malyarchuk, B., Grzybowski, T., Derenko, M., Perkova, M., Vanecek, T., Lazur, J. et al. Mitochondrial DNA phylogeny in Eastern and Western Slavs. Mol. Biol. Evol. 25, can be reconciled with previous interpretations based on haplogroup 1651–1658 (2008). frequency distributions and paleontological findings. Nevertheless, 14 Pala, M., Olivieri, A., Achilli, A., Accetturo, M., Metspalu, E., Reidla, M. et al. any model that can be proposed based on extant DNA data will have Mitochondrial DNA signals of Late Glacial recolonization of Europe from Near Eastern refugia. Am. J. Hum. Genet. 90, 915–924 (2012). to be tested by the analysis of ancient DNA of Franco-Cantabrian 15 Morin, E. Evidence for declines in human population densities during the early Upper populations. Only such a direct approach can be used to find if the Paleolithic in western Europe. Proc. Natl Acad. Sci. 105, 48–53 (2008). now common European haplogroups were actually present in the 16 Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton University Press, Princeton, USA, 1994). gene pool of Paleolithic Europeans and clarify their spatial distribu- 17 Banks, W. E., d’Errico, F., Peterson, A. T., Vanhaeren, M., Kageyama, M., Sepulchre, P. tion. In this regard, we have to highlight the recent study of Hervella et al. Human ecological niches and ranges during the LGM in Europe derived from an 18 application of eco-cultural niche modeling. J. Archaeol. Sci. 35, 481–491 (2008). et al., which has been the first report of haplogroup H among 18 Hervella, M., Izagirre, N., Alonso, S., Fregel, R., Alonso, A., Cabrera, V. M. et al. Magdalenian Iberians. Ancient DNA from Hunter-Gatherer and Farmer groups from Northern Spain supports a Also, this study was a first attempt to characterize the mitochon- random dispersion model for the Neolithic expansion into Europe. PLoS ONE 7, e34417 (2012). drial gene pool of the Asturian population, by performing direct 19 Aguile´e, R., Claessen, D. & Lambert, A. Allele fixation in a dynamic metapopulation: sequencing of the hypervariable HVS-I of the mtDNA control region. founder effects vs refuge effects. Theor. Popul. Biol. 76, 105–117 (2009). Regional peculiarities highlight a partial isolation of the Asturian 20 Arenas, M., Ray, N., Currat, M. & Excoffier, L. Consequences of range contractions and range shifts on molecular diversity. Mol. Biol. Evol. 29, 207–218 (2011). mitochondrial pool that differentiates it from the rest of Franco- 21 Banks, W. E., Zilha˜o, J., d’Errico, F., Kageyama, M., Sima, A. & Ronchitelli, A. Cantabrian populations, with an unexpected high number of private Investigating links between ecology and bifacial tool types in Western Europe during lineages that would be compatible with a certain degree of genetic the . J. Archaeol. Sci. 36, 2853–2867 (2009). 22 Gamble, C., Davies, W., Pettitt, P., Hazelwood, L. & Richards, M. The archaeological structuring inside the population. It seems that the complex history and genetic foundations of the European population during the Late Glacial: of the Franco-Cantabrian region might have left signals in the genetic implications for ‘agricultural thinking’. Cambridge Archaeol. J. 15, 193–223 (2005). 23 Excoffier, L., Foll, M. & Petit, R. J. Genetic consequences of range expansions. Annu. makeup of its populations that have not been entirely uncovered yet. Rev. Ecol. Evol. Syst. 40, 481–501 (2009). 24 Shi, W., Ayub, Q., Vermeulen, M., Shao, R., Zuniga, S., Van Der Gaag, K. et al. A CONFLICT OF INTEREST worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP–CEPH populations. Mol. Biol. Evol. 27, 385–393 (2010). The authors declare no conflict of interest. 25 Slatkin, M. Gene flow and genetic drift in a species subject to frequent local extinctions. Theor. Popul. Biol. 12, 253–262 (1977). 26 Charlesworth, B. Effective population size and patterns of molecular evolution and ACKNOWLEDGEMENTS variation. Nat. Rev. Genet. 10, 195–205 (2009). Antonio F. Pardin˜as is currently supported by a ‘Severo Ochoa’ FICYT-PCTI 27 Dietler, M. & Lo´pez-Ruiz, C. eds. Colonial encounters in Ancient Iberia: Phoenician. Grant from the Asturias Regional Government (ref. BP09038). We are grateful Greek and Indigenous relations (University of Chicago Press, Chicago, USA, 2009). to all the donors who volunteered for sampling and to all those who helped 28 Alvarez, L., Santos, C., Ramos, A., Pratdesaba, R., Francalacci, P. & Aluja, M.P. Mitochondrial DNA patterns in the Iberian Northern plateau: population dynamics and during the organization of the sampling campaign, especially the mayors substructure of the Zamora province. Am.J.Phys.Anthropol.142, 531–539 (2010). of Llanes and Ribadesella. We are also grateful to Dr Daniel Campo and 29 Cardoso, S., Alfonso-Sa´nchez, M. A., Valverde, L., Odriozola, A., Pe´rez-Miranda, A.M., Dr Hannes Schroeder for helpful comments, which greatly improved the Pen˜a, J. A. et al. The maternal legacy of Basques in northern navarre: new insights into manuscript. This research received support from the SYNTHESYS Project the mitochondrial DNA diversity of the Franco-Cantabrian area. Am. J. Phys. Anthropol. 145, 480–488 (2011). (http://www.synthesys.info/), which is financed by the European Community 30 Castro, M. G., Huerta, C., Reguero, J. R., Soto, M. I., Domenech, E., Alvarez, V. et al. Research Infrastructure Action under the FP7 ‘Capacities’ Program. Mitochondrial DNA haplogroups in Spanish patients with hypertrophic cardiomyopathy. Int. J. Cardiol. 112, 202–206 (2006). 31 Palacı´n, M., Alvarez, V., Martı´n, M., Dı´az, M., Corao, A. I., Alonso, B. et al. Mitochondrial DNA and TFAM gene variation in early-onset myocardial infarction: evidence for an association to haplogroup H. Mitochondrion 11, 176–181 (2011). 1 Torroni, A., Achilli, A., Macaulay, V., Richards, M. & Bandelt, H. J. Harvesting the fruit 32 Alvarez, J.C., Johnson, D.L.E., Lorente, J.A., Martinez-Espin, E., Martinez-Gonzalez, of the human mtDNA tree. Trends Genet. 22, 339–345 (2006). L.J., Allard, M. et al. Characterization of human control region sequences for Spanish 2 Richards, M., Macaulay, V., Torroni, A. & Bandelt, H. -J. In search of geographical individuals in a forensic mtDNA data set. Leg. Med. 9, 293–304 (2007). patterns in European mitochondrial DNA. Am. J. Hum. Genet. 71, 1168–1174 33 Prieto, L., Zimmermann, B., Goios, A., Rodriguez-Monge, A., Paneto, G. G., Alves, C. (2002). et al. The GHEP-EMPOP collaboration on mtDNA population data–A new resource for 3 Forster, P. Ice Ages and the mitochondrial DNA chronology of human dispersals: a forensic casework. Forensic Sci. Int. Genet. 5, 146–151 (2011). review. Philos. Trans. R Soc. Lond. B Biol. Sci. 359, 255 (2004). 34 Pereira, L., Cunha, C. & Amorim, A. Predicting sampling saturation of mtDNA 4 Soares, P., Achilli, A., Semino, O., Davies, W., Macaulay, V., Bandelt, H. -J. et al. The haplotypes: an application to an enlarged Portuguese database. Int. J. Legal Med. archaeogenetics of Europe. Curr. Biol. 20, R174–R183 (2010). 118, 132–136 (2004). 5 Achilli, A., Rengo, C., Magri, C., Battaglia, V., Olivieri, A., Scozzari, R. et al. The 35 Go´mez-Pello´n, E. Aproximacio´n al estudio antropolo´gico de Asturias. Rev. Antropol. molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian Soc. 0, 31–63 (1991). glacial refuge was a major source for the European gene pool. Am. J. Hum. Genet. 75, 36 Santos, F. D. Historia de Asturias: Asturias romana y visigoda (Ayalga, Oviedo, Spain, 910–918 (2004). 1977). 6Gamble,C.The Palaeolithic settlement of Europe (Cambridge University Press, 37 Puerta, M. C. La Asturias Medieval. In: Ferna´ndez-Pe´rez, A. & Friera-Sua´rez, F. eds. Cambridge, UK, 1986). Historia de Asturias (KRK, Oviedo, Spain, 2005). 7 Pereira, L., Richards, M., Goios, A., Alonso, A., Albarra´n, C., Garcia, O. et al. High- 38 Infiesta, V. R. Asturias en los siglos XX y XXI. In: Ferna´ndez-Pe´rez, A. & Friera-Sua´rez, resolution mtDNA evidence for the late-glacial resettlement of Europe from an Iberian F. e ds . Historia de Asturias (KRK, Oviedo, Spain, 2005). refugium. Genome Res. 15, 19 (2005). 39 Walsh, P. S., Metzger, D. A. & Higuchi, R. Chelex 100 as a medium for simple 8A´ lvarez-Iglesias, V., Mosquera-Miguel, A., Cerezo, M., Quinta´ns, B., Zarrabeitia, M. T., extraction of DNA for PCR-based typing from forensic material. Biotechniques 10, Cusco´,I.et al. New population and phylogenetic features of the internal variation 506–513 (1991). within mitochondrial DNA macro-haplogroup R0. PLoS ONE 4, e5112 (2009). 40 Brandsta¨tter, A., Niedersta¨tter, H., Pavlic, M., Grubwieser, P. & Parson, W. Generating 9 Garcia, O., Fregel, R., Larruga, J. M., Alvarez, V., Yurrebaso, I., Cabrera, V.M. et al. population data for the EMPOP database—an overview of the mtDNA sequencing and Using mitochondrial DNA to test the hypothesis of a European post-glacial human data evaluation processes considering 273 Austrian control region sequences as recolonization from the Franco-Cantabrian refuge. Heredity 106, 37–45 (2011). example. Forensic Sci. Int. 166, 164–175 (2007).

Journal of Human Genetics MtDNA diversity and Magdalenian repopulation AF Pardin˜as et al 726

41 Drummond, A., Ashton, B., Cheung, M., Heled, J., Kearse, M., Moir, R. et al. Geneious 68 Beerli, P. How to use migrate or why are Markov Chain Monte Carlo programs v4.85 (Biomatters Ltd, Auckland, New Zealand, 2009). difficult to use? in Population Genetics for Animal Conservation (eds Bertorelle, G., 42 Andrews, R. M., Kubacka, I., Chinnery, P. F., Lightowlers, R. N., Turnbull, D. M. & Bruford, M. W., Hauffe, H. C., Rizzoli, A. & Vernesi, C.) 42–79 (Cambridge University Howell, N. Reanalysis and revision of the Cambridge reference sequence for human Press, Cambridge, UK, 2009). mitochondrial DNA. Nat. Genet. 23, 147 (1999). 69 Ottoni, C., Martinez-Labarga, C., Vitelli, L., Scano, G., Fabrini, E., Contini, I. et al. Human 43 Bandelt, H. -J. & Parson, W. Consistent treatment of length variants in the human mitochondrial DNA variation in Southern Italy. Ann. Hum. Biol. 36, 785–811 (2009). mtDNA control region: a reappraisal. Int. J. Legal Med. 122, 11–21 (2008). 70 Irwin, J., Egyed, B., Saunier, J., Szamosi, G., O’Callaghan, J., Padar, Z. et al. 44 Parson, W. & Du¨ r, A. EMPOP—a forensic mtDNA database. Forensic Sci. Int. Genet. 1, Hungarian mtDNA population databases from and the Baranya county 88–92 (2007). Roma. Int. J. Legal Med. 121, 377–383 (2007). 45 Kloss-Brandsta¨tter, A., Pacher, D., Scho¨nherr, S., Weissensteiner, H., Binna, R., 71 Rodrı´guez-Ezpeleta, N., A´ lvarez-Busto, J., Imaz, L., Regueiro, M., Azca´rate, M., Bilbao, Specht, G. et al. HaploGrep: a fast and reliable algorithm for automatic classification R. et al. High-density SNP genotyping detects homogeneity of Spanish and French of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25–32 (2011). Basques, and confirms their genomic distinctiveness from other European populations. 46 van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human Hum. Genet. 128, 113–117 (2010). mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009). 72 Ayuso, V. M. G. The Balearic Islands: prehistoric colonization of the furthest 47 Pardin˜as, A., Dopico, E., Roca, A., Garcia-Vazquez, E. & Lopez, B. Introducing human Mediterranean islands from the mainland. J. Mediterr. Archaeol. 14, 136–158 (2002). population biology through an easy laboratory exercise on mitochondrial DNA. 73 Goodman, S. N. Toward evidence-based medical statistics. 2: The Bayes factor. Ann. Biochem. Mol. Biol. Educ. 38, 110–115 (2010). Intern. Med. 130, 1005–1013 (1999). 48 Ennafaa, H., Cabrera, V. M., Abu-Amero, K. K., Gonza´lez, A. M., Amor, M. B., Bouhaha, 74 Torroni, A., Bandelt, H. J., D’Urbano, L., Lahermo, P., Moral, P., Sellitto, D. et al. R. et al. Mitochondrial DNA haplogroup H structure in . BMC Genet. 10, 8 mtDNA analysis reveals a major late Paleolithic population expansion from south- (2009). western to northeastern Europe. Am.J.Hum.Genet.62, 1137–1152 (1998). 49 Soares, P., Ermini, L., Thomson, N., Mormina, M., Rito, T., Ro¨hl, A. et al. Correcting for 75 Torroni, A., Bandelt, H. -J., Macaulay, V., Richards, M., Cruciani, F., Rengo, C. et al. A purifying selection: an improved human mitochondrial molecular clock. Am. J. Hum. signal, from human mtDNA, of postglacial recolonization in Europe. Am. J. Hum. Genet. 84, 740–759 (2009). Genet. 69, 844–852 (2001). 50 Helgason, A., Hickey, E., Goodacre, S., Bosnes, V., Stefa´nsson, K., Ward, R. et al. 76 Dubut, V., Chollet, L., Murail, P., Cartault, F., Be´raud-Colomb, E., Serre, M. et al. mtDNA and the islands of the North Atlantic: estimating the proportions of Norse and mtDNA polymorphisms in five French groups: importance of regional sampling. Eur. J. Gaelic ancestry. Am. J. Hum. Genet. 68, 723–737 (2001). Hum. Genet. 12, 293–300 (2003). 51 Excoffier, L. & Lischer, H.E.L. Arlequin suite ver 3.5: a new series of programs to 77 Helgason, A., Nicholson, G., Stefa´nsson, K. & Donnelly, P. A reassessment of genetic perform population genetics analyses under Linux and Windows. Mol. Ecol. Res. 10, diversity in Icelanders: strong evidence from multiple loci for relative homogeneity 564–567 (2010). caused by genetic drift. Ann. Hum. Genet. 674 281–297, 2003). 52 Nei, M. Molecular Evolutionary Genetics (Columbia University Press, New York, USA, 78 Richards, M., Corte-Real, H., Forster, P., Macaulay, V., Wilkinson-Herbots, H., 1987). Demaine, A. et al. Paleolithic and neolithic lineages in the European mitochondrial 53 Tajima, F. Measurement of DNA polymorphism. in Mechanisms of Molecular Evolution gene pool. Am. J. Hum. Genet. 59, 185 (1996). Introduction to Molecular Paleopopulation Biology (eds Takahata, N. & Clark, A. G.) 79 Tambets, K., Rootsi, S., Kivisild, T., Help, H., Serk, P., Loogva¨li, E. -L. et al. The 37–39 (Sinauer Associates, Inc, Sunderland, UK, 1993). Western and Eastern roots of the Saami—the story of genetic ‘‘Outliers’’ told by 54 Weale, M. E. Testing for differences in H between two populations. http://www.ucl. mitochondrial DNA and Y-chromosomes. Am. J. Hum. Genet. 74, 661–682 (2004). ac.uk/mace-lab/resources/software. (University College London, London, UK, 2003). 80 Coudray, C., Olivieri, A., Achilli, A., Pala, M., Melhaoui, M., Cherkaoui, M. et al. The 55 Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the Complex and Diversified Mitochondrial Gene Pool of Berber Populations. Ann. Hum. control region of mtDNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 Genet. 73, 196–214 (2009). (1993). 81 Carvalho-Silva, D. R. & Tyler-Smith, C. The grandest genetic experiment ever performed 56 Rubinstein, S., Dulik, M. C., Gokcumen, O., Zhadanov, S., Osipova, L., Cocca, M. et al. on man? A Y-chromosomal perspective on genetic variation in India. Int. J. Hum. Genet. Russian Old Believers: genetic consequences of their persecution and exile, as shown 8, 21–29 (2008). by mitochondrial DNA evidence. Hum. Biol. 80, 203–237 (2008). 82 Rodriguez-Larralde, A., Gonzales-Martin, A., Scapoli, C. & Barrai, I. The names 57 Slatkin, M. Isolation by distance in equilibrium and non-equilibrium populations. of Spain: A study of the isonymy structure of Spain. Am. J. Phys. Anthropol. 121, Evolution 47, 264–279 (1993). 280–292 (2003). 58 SPSS Inc. SPSS Statistics v17 for Windows (SPSS Inc, Chicago, USA, 2008). 83 Pons, O. & Petit, R. Measuring and testing genetic differentiation with ordered versus 59 Watterson, G. A. On the number of segregating sites in genetical models without unordered alleles. Genetics 144, 1237 (1996). recombination. Theor. Popul. Biol. 7, 256–276 (1975). 84 Mun˜oz-Jimenez, J. Geografı´adeAsturias1:Geografı´aFı´sica. El relieve, el clima y las 60 Ewens, W. J. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, aguas (Ayalga, Oviedo, Spain, 1982). 87–112 (1972). 85 Sa´nchez, M.J. & Arquer, P. F. New radiometric and geomorphologic evidences of a last 61 Goodacre, S., Helgason, A., Nicholson, J., Southam, L., Ferguson, L., Hickey, E. et al. glacial maximum older than 18 ka in SW European mountains: the example of Redes Genetic evidence for a family-based Scandinavian settlement of Shetland and Orkney Natural Park (Cantabrian Mountains, NW Spain). Geodin. Acta. 15, 93–101 (2002). during the Viking periods. Heredity 95, 129–135 (2005). 86 Straus, L.G. Were there human responses to Younger Dryas in Cantabrian Spain? 62 R Development Core Team. R: A language and environment for statistical computing Quatern. Int. 242, 328–335 (2011). (R Foundation for Statistical Computing, Vienna, Austria, 2010). 87 Melchior, L., Lynnerup, N., Siegismund, H. R., Kivisild, T. & Dissing, J. Genetic 63 Egeland, T. & Salas, A. Estimating haplotype frequency and coverage of databases. Diversity among Ancient Nordic Populations. PLoS ONE. 5, e11898 (2010). PLoS ONE 3, e3988 (2008). 88 Karachanak, S., Carossa, V., Nesheva, D., Olivieri, A., Pala, M., Hooshiar Kashani, B. 64 Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in et al. Bulgarians vs the other European populations: a mitochondrial DNA perspective. R language. Bioinformatics 20, 289–290 (2004). Int. J. Legal Med. 126, 497–503 (2011). 65 Quantum GIS Development Team. Quantum GIS Geographic Information System. 89 McEvoy, B., Richards, M., Forster, P. & Bradley, D. G. The Longue Duree of genetic http://qgis.osgeo.org. (Open Source Geospatial Foundation Project, 2011). ancestry: multiple genetic marker systems and Celtic origins on the Atlantic facade of 66 Kepoglu, V. Spatial Data Analysis Plugin for Point Pattern. http://code.google.com/p/. Europe. Am. J. Hum. Genet. 75, 693–702 (2004). ( Technical University, Ankara, , 2009). 90 Forster, P., Romano, V., Calı`,F.,Ro¨hl, A. & Hurles, M. MtDNA markers for Celtic and 67 Ersts, P.J. Geographic Distance Matrix Generator version 1.23. http://biodiversityinfor Germanic language areas in the British Isles. in Traces of Ancestry: Studies in Honour matics.amnh.org/open_source/gdmg. American Museum of Natural History (Center for of Colin Renfrew (ed. Jones, M.) 99–111 (McDonald Institute for Archaeological Biodiversity and Conservation, New York, USA, 2012). Research, Oxford, UK, 2004).

Supplementary Information accompanies the paper on Journal of Human Genetics website (http://www.nature.com/jhg)

Journal of Human Genetics