Feature Review
Toward a new history and geography
of human genes informed by ancient DNA
1,2 3,4,5
Joseph K. Pickrell and David Reich
1
New York Genome Center, New York, NY, USA
2
Department of Biological Sciences, Columbia University, New York, NY, USA
3
Department of Genetics, Harvard Medical School, Boston, MA, USA
4
Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
5
Broad Institute of the Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA, USA
Genetic information contains a record of the history of which propose that the present-day inhabitants of a region
our species, and technological advances have trans- descend from people who arrived during periods of tech-
formed our ability to access this record. Many studies nological or cultural change, replacing the previous inha-
have used genome-wide data from populations today to bitants.
learn about the peopling of the globe and subsequent In archaeology, this debate has played out around the
adaptation to local conditions. Implicit in this research is issue of whether sudden changes in material culture ap-
the assumption that the geographic locations of people parent in the archaeological record can be attributed to the
today are informative about the geographic locations of spread of culture or to population movements: ‘pots versus
their ancestors in the distant past. However, it is now people’ [1]. In physical anthropology, the debate has played
clear that long-range migration, admixture, and popula- out around the issue of whether changes in morphological
tion replacement subsequent to the initial out-of-Africa characters over time are due to in situ evolution or to the
expansion have altered the genetic structure of most of arrival of new populations (e.g., [2]).
the world’s human populations. In light of this we argue The same debate has also played out in genetics. On the
that it is time to critically reevaluate current models of side of population replacements, there are the ‘wave of
the peopling of the globe, as well as the importance of advance’ and ‘demic diffusion’ models, first proposed to
natural selection in determining the geographic distri- describe the spread of agriculture through Europe. In these
bution of phenotypes. We specifically highlight the models, the Neolithic transition was accompanied by the
transformative potential of ancient DNA. By accessing spread of farmers from the Near East across Europe, who
the genetic make-up of populations living at archaeolog- partially or completely replaced resident hunter-gatherers
ically known times and places, ancient DNA makes it [3–6]. On the side of stasis, there are the ‘serial founder
possible to directly track migrations and responses to effect’ models [7,8], which proposed that populations have
natural selection. remained in the locations they first colonized after the out-
of-Africa expansion, exchanging migrants only at a low
Introduction rate with their immediate neighbors until the long-range
Within the past 100 000 years anatomically modern migrations of the past 500 years [9–12].
humans have expanded to occupy every habitable area These genetic models – the wave-of-advance models on
of the globe. The history of this expansion has been ex- the one hand, and the serial founder effect models on the
plored with tools from several disciplines including linguis- other – were proposed before the availability of large-scale
tics, archaeology, physical anthropology, and genetics. genomic data. The great synthesis of genetic data with
These disciplines all can be used to ask the same question: historical, archaeological and linguistic information, ‘The
how did we get to where we are today? History and Geography of Human Genes’ [13], was pub-
Attempts to answer this question have often taken the lished in 1994 based on data from around 100 protein
form of a dialectic between two hypotheses. On the one
hand are arguments in favor of demographic stasis, which
propose that the inhabitants of a region are the descen-
Glossary
dants of the first people to arrive there. On the other side
Admixture: a sudden increase in gene flow between two differentiated
are the arguments in favor of rapid demographic change,
populations.
Bottleneck: a temporary decrease in population size in the history of a
Corresponding authors: Pickrell, J.K. ([email protected]); Reich, D. population.
Gene flow: the exchange of genes between two populations as a result of interbreeding.
0168-9525/
Heterozygosity: the number of differences between two random copies of a
ß 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2014.07.007
genome in a population.
Trends in Genetics, September 2014, Vol. 30, No. 9 377
Feature Review Trends in Genetics September 2014, Vol. 30, No. 9
polymorphisms, and the papers that popularized the no- The observation of a smooth decline in human diversi-
tion of a serial founder effect model were written based on ty with increasing geographic distance from Africa was
data from around 1000 microsatellites. However, it is now interpreted as a window into demographic events deep in
possible to genotype millions of polymorphisms in thou- our species’ past. Specifically, in the serial founder effect
sands of individuals using high-throughput sequencing. and related models (Figure 1A) [12], the peopling of the
Because of these technological advances, the past few years globe proceeded by an iterative process in which small
have seen a dramatic increase in the quantity of data bands of individuals pushed into unoccupied territory,
available for learning about human history. Equally im- experienced population expansions, and subsequently
portant has been rapid innovation in methods for making gave rise to new small bands of individuals who then
inferences from these data. We argue here that the tech- pushed further into unoccupied territory. This model has
nological breakthroughs of the past few years motivate a two important features: a large number of expansions
systematic reevaluation of human history using modern into new territory by small groups of individuals (and
genomic tools – a new ‘History and Geography of Human concurrent bottlenecks), and little subsequent migration
Genes’ that exploits many orders of magnitude more data [7–12]. As Prugnolle et al. [7] wrote: ‘what is clear...is that
than the original synthesis. [this] pattern of constant loss of genetic diversity along
In the first section of the paper we summarize what we colonisation routes could only have arisen through suc-
see as major lessons from the recent literature. In particu- cessive bottlenecks of small amplitude as the range of our
lar, it is now clear that the data contradict any model in species increased... The pattern we observe also suggests
which the genetic structure of the world today is approxi- that subsequent migration was limited or at least very
mately the same as it was immediately following the out- localised’. This qualitative conclusion was followed by
of-Africa expansion. Instead, the past 50 000 years of more quantitative ones; in an explicit fit of the serial
human history have witnessed major upheavals, such that founder effect model to data, Deshpande et al. [9] con-
much of the geographic information about the first human cluded that ‘incorporation into existing models of ex-
migrations has been overwritten by subsequent population change between neighbouring populations is essential,
movements. However, the data also often contradict mod- but at a very low rate’.
els of population replacement: when two distinct popula- This model has been influential in many fields. If it is
tion groups come together during demographic expansions correct, then the difficult problem of identifying the geo-
the result is often genetic admixture rather than complete graphic origin of all modern humans is reduced to the
replacement. This suggests that new types of models – with simpler problem of finding the geographic region where
admixture at their center – are necessary for describing people have the most genetic diversity [8]. This idea has
key aspects of human history ([14–16] for early examples of been used extensively in discussions of human origins (e.g.,
admixture models). [27,28]). The model also provides a null hypothesis of
In the second section of the paper we sketch out a way limited migration against which alternatives can be tested
forward for data-driven construction of these models. We [29,30]. Outside genetics, the serial founder effect model
specifically highlight the potential of ancient DNA studies has been used as a framework to interpret data from
of individuals from archaeologically important cultures. linguistics [31], physical anthropology [32–35], material
Such studies in principle provide a source of information culture [36], and economics [37].
about history that bypasses some fundamental ambigui- The serial founder effect model, however, is only one of
ties in the interpretation of genetic, archaeological, or many models that can produce qualitatively similar pat-
anthropological evidence alone. We discuss several poten- terns of genetic diversity (e.g., [38,39]). Producing the
tial applications of this technology to outstanding ques- empirical pattern of a smooth decline in diversity with
tions in human history. distance from Africa simply requires that the average time
There are a number of excellent articles that have to the most recent common ancestor between two chromo-
reviewed the literature on genome-wide studies of human somes in a population depend on the distance of that
history [17–23]. Our focus here is not on providing a population from Africa [12,39].
comprehensive review but instead on highlighting promis- To illustrate this point, we constructed two models
ing directions for future research. that produce smooth declines in diversity with distance
from Africa as in the serial founder effect model. Howev-
Reevaluation of the ‘serial founder effect’ model er, these models differ qualitatively from the serial
We begin with an illustration of some of the ambiguities in founder model in that this pattern is driven by admixture
interpretation of genetic data in the context of the ‘serial rather than bottlenecks in the distant past. These are
founder effect’ model. This model, initially proposed by (i) a model with two severe bottlenecks and extensive
Harpending, Eller, and Rogers [24,25], gained popularity subsequent population admixture (Figure 1B) and (ii) a
with the publication of two papers [7,8] that observed that model without bottlenecks but with archaic admixture
heterozygosity (see Glossary) declines approximately line- (from an anciently diverged population such as Nean-
arly with geographic distance from Africa. This pattern, derthals) as well as extensive recent population admix-
initially identified in genome-wide microsatellite geno- ture (Figure 1C). Details of the model specifications
types from around 50 worldwide human populations, and simulation parameters are in the Supplementary
was subsequently confirmed based on patterns of haplo- Material online. Using ms [40], we simulated 1000
type diversity in large single nucleotide polymorphism regions of 100 kb under the serial founder effect model
(SNP) datasets [26]. and each of these two models, and plotted the average
378
Feature Review Trends in Genetics September 2014, Vol. 30, No. 9
(A) Serial founder effect model (B) Few bolenecks, pervasive admixture (C) No bolenecks, pervasive admixture
Key: Boleneck Admixture
8%
20%
1 2 3 4 5 39 40 41 42 1 2 3 21 22 42 1 2 3 21 22 42
(D) Heterozygosity under model in A. (E) Heterozygosity under model in B. (F) Heterozygosity under model in C. 90 95 100 85 80 75 # heteryzygous sites/100 kb # heteryzygous sites/100 kb # heteryzygous sites/100 kb 65 70 50 60 70 80 90 100 100 105 110 115 120 125 Distance Distance Distance
TRENDS in Genetics
Figure 1. A negative correlation between heterozygosity and geographic distance from a source population can be generated by qualitatively different, historically plausible
demographic models. We simulated genetic data under different demographic models and calculated the average heterozygosity in each simulated population. (A)
Schematic of a serial founder effect model. (B) Schematic of a demographic model with two bottlenecks and extensive admixture. (C) Schematic of a demographic model
with no bottlenecks and extensive admixture. (D–F) Average heterozygosity in each population simulated under the demographic models in A, B, and C respectively. Each
point represents a population, ordered along the x-axis according as in A.
heterozygosity in each population. In all three cases Empirical data have shown that the current inhabitants
(Figure 1D–F) we recapitulated a smooth linear decline of a region are often poor representatives of the
in heterozygosity with distance from a reference popula- populations that lived there in the distant past
tion (in the serial founder model, this reference population The answer to the question posed above has been the
can be interpreted as the source of the expansion, but subject of considerable research over the past several
there is no analogous interpretation of this population years. In our opinion one finding is already clear: long-
in the other models). Qualitative patterns of linkage range migration and concomitant population replacement
disequilibrium were also similar in all scenarios or admixture have occurred often enough in recent human
(Figure S3). history that the present-day inhabitants of many places in
These simulations show that the main observation that the world are rarely related in a simple manner to the more
has been marshaled in support of the serial founder effect ancient peoples of the same region (Figure 2).
model is also consistent with very different histories (see The Americas over the past 500 years present one recent
also [29,38,39,41]). Specifically, in the absence of addition- example. The Americas experienced massive demographic
al data, the smooth linear decline in heterozygosity away change after the arrival of Europeans and Africans, such
from Africa could represent a signal of many population that most of the ancestry of the Americas is not derived
bottlenecks during the initial out-of-Africa expansion tens from the Native Americans who were the sole inhabitants
of thousands of years ago, or it could represent a signal of of the region half a millennium ago [42–47]. Another recent
extensive population mixture within the past few thousand example is Australia, where European migration over the
years (or, of course, a combination of these or many other past couple of hundred years is the main source of the
models that we have not considered). Because the data are genetic material in the region today [48].
compatible with both, arguing for one over the other An example from further back in time comes from the
involves a subjective determination of which class of model present-day hunter-gatherer and pastoralist populations
is more likely a priori. Perhaps the most important issue of Siberia, which are often treated as surrogates for the
affecting this determination is how important migration populations that crossed the Bering land bridge to people
has been over the past 50 000 years of human history. How the Americas beginning more than 15 000 years ago
representative are populations today of the populations (e.g., [47,49–51]). DNA sequences from two individuals
that lived in the same locations after the out-of-Africa who lived in the Lake Baikal region of Siberia 24 000
expansion? years before the present and 17 000 years before the
379
Feature Review Trends in Genetics September 2014, Vol. 30, No. 9
15 kya 6 kya 4 kya 3 kya 2 kya 1 kya Ame ricas Americas Europe Europe Near east Near East Central asia Central Asia Eurasia North eurasia North Eurasia India India East asia East Asia South east asia South East Asia North africa North Africa Africa West africa West Africa East africa East Africa South africa South Africa New guinea New Guinea Oceania Australia Australia
Admixed ancient Waves of gene Agriculture arrives West Eurasian Gene flow from Bantu Arab slave European north Eurasian flow into the from the near poplulaon enters the near east to expansion trade colonianialism/ populaon in Americas east to Europe India East Africa Austronesian Mongol Atlanc slave trade Siberia expansion empire
TRENDS in Genetics
Figure 2. A rough guide to genetically documented population movements in the history of anatomically modern humans.
present, respectively, have indicated that this assumption Neolithic cultures. In particular, people of late Neolithic
is unfounded. These two ancient individuals (from periods cultures bear more relatedness to the present-day popu-
before the time the Americas were thought to be peopled) lations of Eastern Europe and Russia than do people of
are more closely related to present-day Native Americans early Neolithic cultures. Thus, demographic turnover has
than are present-day Siberians [52]. They appear to have apparently occurred at least twice over the course of the
been members of an ‘Ancient North Eurasian’ population past 8000 years of European prehistory. This makes
that no longer exists in unmixed form, but that admixed inferences about the inhabitants of Europe tens of thou-
substantially with the ancestors of both present-day Eur- sands of years ago based on the locations of people today
opeans and Native Americans [52–54]. By contrast, most of unreliable.
the present-day indigenous populations of Siberia are more Evidence of major population mixture in the past sev-
closely related to populations currently living in East Asia, eral thousand years has also accumulated in parts of the
indicating that the present-day indigenous population of world where no ancient DNA is (yet) available. A series of
Siberia is descended in large part from populations that studies have detected and dated admixture in a range of
arrived in the region after the end of the last ice age [52]. human populations. These studies make it clear that there
Ancient DNA studies have also made considerable prog- are multiple distinct sources of ancestry in most popula-
ress in resolving the major debate about whether the tions. We caution, however, that without ancient DNA it is
arrival of agriculture in Europe involved the spread of not possible to be confident which populations lived in a
people or technology [53,55–63]. A key observation comes region before the admixture:
from genome-wide data from hunter-gatherer and agricul- India. Here nearly all people today are admixed be-
turalist populations that lived around 5000 years ago in tween two distinct groups, one most closely related to
present-day Sweden [62,63]. The farmer population present-day Europeans, Central Asians, and Near East-
appears most genetically similar to southern Europeans erners, and one most closely related to isolated populations
today, whereas the hunter-gatherers are more similar to in the Andaman islands [68]. Much of this admixture
northern Europeans (and, notably in light of the discussion occurred within the past 4000 years [69].
of the serial founder effect model above, have levels of North Africa. In this region nearly all people today
genetic diversity lower than in the modern European and descend from admixture between populations related to
East Asian populations [63]). Thus, at least in Scandinavia, those present today in western Africa and in the Near East
the spread of agriculture was accompanied by the spread of [70–72]. Some of this mixture can be dated to within the
people. The outcome of this spread of people was not past few thousand years [65], indicating that much of the
population replacement, but rather admixture, such that ancestry from these populations does not descend continu-
European populations today trace some of their ancestry to ously from the Stone Age peoples of North Africa.
both ancestral populations [53,62]. Mitochondrial DNA Sub-Saharan Africa. Genetic studies have documented
(mtDNA) studies suggest that this dynamic is characteris- multiple examples of populations with ancestry from dis-
tic of the arrival of agriculture throughout Europe [55–58]. parate sources. Many populations across sub-Saharan
The arrival of farmers was not the end of prehistoric Africa trace some fraction of their ancestry to admixture
migration in Europe (even putting aside discussion of in the past several thousand years with populations relat-
migrations since the invention of writing; see [64–67]). ed to those in western Africa [65]. Populations in eastern
In a single geographic region in present day Germany, and southern Africa have been influenced by gene flow
mtDNA has been obtained from hundreds of human from west Eurasian-related populations in the past 3000
samples from archaeological cultures ranging from the years [73,74]. In Madagascar all populations derive ap-
early Neolithic to the Bronze Age [56]. There is an appar- proximately half of their ancestry from populations related
ent genetic discontinuity between people of early and late to those currently living in southeast Asia [75].
380
Feature Review Trends in Genetics September 2014, Vol. 30, No. 9
Beyond case studies, several groups have applied tests Figure 3 shows the geographic locations of all the
for admixture to diverse populations from around the globe populations, together with the locations of the best pres-
and found that nearly all populations show evidence of ent-day proxies of their ancestral populations. Admixture
admixture [54,65,76]. To illustrate the degree to which this between populations related to ones that are now geo-
admixture involved populations whose closest genetic rela- graphically distant is evident in most populations of the
tives today are geographically distant from each other, in world. For example, Native American-related ancestry is
Figure 3 we present an analysis based on combining data present throughout Europe [54], likely reflecting the ge-
for 103 worldwide populations from several sources [26– netic input from the Ancient Northern Eurasian popula-
28,73,77,78] and running a simple three-population test for tion related to Upper Paleolithic Siberians [53,54] both
admixture [68] on all populations (for further details see into the Americas (most likely before 15 000 years ago) and
the Supplementary Material online). It is important to into Europe [52,53]. In addition, ancestry from a popula-
recognize that this test is not perfectly sensitive – for tion related to those living in the Near East is found in
example, large amounts of genetic drift in the history of Cambodia [30], likely due to mixture from an ancestral
a population can mask the statistical signal of an admix- South Asian population that was itself an admixed popu-
ture event – but when the test does detect a signal it lation containing ancestry related to present-day Near
provides incontrovertible evidence of gene flow into the Easterners [65]. The test we use as the basis for
ancestors of the population [54]. For all populations with Figure 3 detects only one signal of admixture per popula-
statistically significant evidence of admixture, we identi- tion, and cannot detect complete population replacement.
fied the present-day populations that share the most ge- The true population history is thus likely to have been
netic drift with the admixing ancestral populations. even more complex. Key: N. Europe N. Africa C. Asia Mid. East S. Europe W/C/E. Africa NE. Asia Americas Caucasus S. Africa E. Asia
TRENDS in Genetics
Figure 3. A birds-eye view of admixture in human populations. We performed a three-population test for admixture (Reich et al. [68]) on 103 worldwide populations. Circles
show the approximate current geographic locations of all tested populations (except for populations in the Americas, which are not plotted for ease of display). Filled circles
represent populations identified as admixed, and the colors represent the current geographic labels of the inferred admixing populations. Empty circles represent
populations with no statistically significant evidence for admixture in this test (however, in many of these instances, other tests do detect evidence of mixture [65,76]).
381
Feature Review Trends in Genetics September 2014, Vol. 30, No. 9
These examples show that the populations in a given Box 1. Surprising findings about human history illuminated
region today are rarely descended in a simple manner from by ancient DNA
the inhabitants in the distant past. This provides further