Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and 1880

Where our feet have taken us

Examples of human contact, migration, and adaptation as revealed by ancient DNA

ALEXANDRA COUTINHO

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0815-9 UPPSALA urn:nbn:se:uu:diva-397222 2019 Dissertation presented at Uppsala University to be publicly examined in Lindahlsalen, Evolutionary Biology Centre EBC, Norbyvagen 18, 75236, Uppsala, Friday, 17 January 2020 at 13:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor Tom Gilbert (Evolutionary Genomics, University of Copenhagen).

Abstract Coutinho, A. 2019. Where our feet have taken us. Examples of human contact, migration, and adaptation as revealed by ancient DNA. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1880. 78 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0815-9.

In spite of our extensive knowledge of the human past, certain key questions remain to be answered about human . One involves the of cultural change in material culture through time from the perspective of how different ancient human groups interacted with one another. The other is how humans have adapted to the different environments as they migrated and populated the rest of the world from their origin in . For my thesis I have investigated examples of human evolutionary history using genetic information from ancient human remains. Chapter 1 focused on the nature of possible interaction between the (PWC) and Battle Culture (BAC) on the island of Gotland, in the . Through the analysis of 4500 year old human remains from three PWC sites, I found that the existence of BAC influences in these burial sites was the result of cultural and not demic influence from the BAC. In chapter 2, I investigated the ancestry of a Late individual from the southwestern Cape of South Africa. Population genetic analyses revealed that this individual was genetically affiliated with Khoe groups in southern Africa, a genetic make-up that is today absent from the Cape. Chapter 3 investigated the genetic landscape of prehistoric individuals from southern Africa. Specifically, I explored frequencies of adaptive variants between Late Stone Age and individuals. I found an increase in disease resistance alleles in Iron Age individuals and attributed this to the effects of the Bantu expansion. Chapter 4 incorporated a wider range of trait-associated variants among a greater number of modern-day populations and ancient individuals in Africa. I found that many allele frequency patterns found in modern populations follow the routes of major migrations which took place in the African Holocene. The thesis attests to the complexity of human demographic history in general, and how migration contributes to adaptation by dispersing novel adaptive variants to populations.

Keywords: Human demography, migration, adaptation, human contact, ancient DNA, human evolution, African prehistory, Scandinavian prehistory

Alexandra Coutinho, Department of Organismal Biology, Human Evolution, Norbyvägen 18 A, Uppsala University, SE-752 36 Uppsala, .

© Alexandra Coutinho 2019

ISSN 1651-6214 ISBN 978-91-513-0815-9 urn:nbn:se:uu:diva-397222 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-397222)

I dedicate this thesis to my family, who have been my source of sanity and strength, and biggest supporters throughout this journey. And I dedi- cate this thesis to all the amazing friends I have made along the way, whose feet grace the cover of this work.

Front cover pictures locations

Uppsala, Sweden Algarve, Portugal Tierp, Sweden Albufeira, Portugal Eskilstuna, Sweden Praia de Santa Cruz, Portugal Hammarö, Sweden Alcabideche, Cascais, Portugal Öland, Sweden Pretoria, South Africa Marielund, Sweden Great Zimbabwe, Zimbabwe Paradiset, Huddinge, Sweden Red Sea, Egypt Landskrona, Sweden Wahiba Sands, Oman Landmannalaugar, Iceland Tehran, Iran Húsavik, Iceland Kathmandu, Nepal Paris, Sydney, NSW, Australia Valthorens, France Yarra Valley, Victoria, Australia Rome, Italy Adelaide, South Australia, Australia Vienna, Italy Port Elliot, South Australia, Australia Ljubljana, Slovenia West Coast, New Zeist, Aramoana Beach, New Zealand Teschelling, Netherlands Flower Dome, The Bay, Singapore Hradec Králové, Czech Republic Bombay Beach, California, North Maggia, River, Ticino, Switzerland America South Mallorca, Spain Iowa, North America Madrid, Spain Lima, Peru San Sebastian, Spain Mexico Pyrenees, Spain

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Coutinho, A., Günther, T., Munters, A. R., Svensson, E. M., Gö- therström, A., Storå, J., Malmström, H. & Jakobsson, M. (2019). The Pitted Ware culture foragers were culturally but not genetically influenced by the . Submitted manuscript

II Coutinho, A., Edlund, H., Malmström, H., Henshilwood, C., van Niekerk, K., Lombard, M., Schlebusch, C. & Jakobsson, M. (2019). Later Stone Age human hair from Vaalkrans Shelter, Cape Floristic Region of South Africa, reveals genetic affinity to Khoe groups. Submitted Manuscript

III Schlebusch, C. M., Malmström, H., Günther, T., Sjödin, P., Coutinho, A., Edlund, H., Munters, A. R., Vicente, M., Steyn, M., Soodyall, H., Lombard, M. & Jakobsson, M. (2017). South- ern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago. Science 358(6363):652-655.

IV Coutinho, A., Schlebusch, C. & Jakobsson, M. (2019). The Evo- lution of Adaptive traits in Indigenous human populations in Sub-Saharan Africa. Manuscript

Reprints were made with permission from the respective publishers. The following papers were written during the course of my doctoral studies but are not part of this dissertation.

I Günther, T., Malmström, H., Svensson, E. M., Omrak, A., Sánchez-Quinto, F., Kılınç, G. M., Krzewińska, M., Eriksson, G., Fraser, M., Edlund, H., Munters, A. R., Coutinho, A., Simões, L. G., Vicente, M., Sjölander, A., Sellevold, B. J., Jørgensen, R., Claes, P., Shriver, M. D., Valdiosera, C., Netea, M. G., Apel, J., Lidén, K., Skar, B., Storå , J., Götherström ,A. & Jakobsson, M. (2018) Population genomics of : Investi- gating early postglacial migration routes and high-latitude adap- tation. PLoS biology, 16(1):e2003703.

II Lombard, M., Malmström, H., Schlebusch, C., Svensson, E. M., Günther, T., Munters, A. R., Coutinho, A., Edlund, H., Zipfel, B. & Jakobsson, M. (2019). Genetic data and radiocarbon dating question Plovers Lake as a hominin-bearing site. Journal of Human Evolution, 131(1):203-209.

Contents

Introduction ...... 11 The genesis of ancient DNA as a tool for the study of human prehistory ...... 11 Human demographic history and the contribution from ancient DNA research ...... 13 Early migration history of Scandinavia ...... 15 Early migration history of Africa ...... 17 Interactions between different prehistoric populations ...... 21 Interactions in Neolithic Scandinavia ...... 21 The agricultural genesis in Africa ...... 23 Ancient DNA as a means to understand early human adaptation to their environment ...... 26 Wet and dry lab methods and the analysis of Ancient DNA ...... 28 DNA survival ...... 28 The challenges of working with ancient DNA ...... 29 DNA damage ...... 29 DNA contamination and authentication ...... 31 Wet lab methods used to sequence ancient DNA ...... 33 Sample material collection and preparation ...... 34 DNA extraction method ...... 35 Library building methods ...... 35 Blunt-end library building ...... 37 UDG treated DNA library building ...... 38 Single Strand library building ...... 40 Dry lab methods used to analyze ancient DNA...... 42 Population Genetic analysis ...... 43 Single parental markers (mtDNA and Y-chromosome DNA) ...... 43 Principal Components Analysis (PCA) ...... 44 Admixture ...... 45 F-statistics ...... 47 Research aims ...... 50 Results Summary ...... 52 Examples of the complexity of human demographic history (Paper I, Paper II) ...... 52

Migration dispersal of adaptive variants (Paper III, Paper IV) ...... 56 Conclusions and future prospects ...... 62 Svensk sammanfattning ...... 64 Acknowledgements ...... 67 References ...... 70

Abbreviations

aDNA Ancient DNA ALFRED ALlele FREquency Database BAC Battle Axe Culture BP Before present bp Base pairs CWC DNA Deoxyribonucleic acid dNTP deoxyribonucleotide triphosphate EHG Eastern Hunter Gatherers FBC Fst Fixation index kya Thousand years ago LBK Linear culture LGM Last Glacial Maximum LP Lactase persistence mtDNA Mitochondrial DNA PCR Polymerase Chain Reaction PWC Pitted Ware Culture SNP Single Nucleotide Polymorphism UDG Uracil DNA Glycosylase UPPMAX Uppsala Multidisciplinary Center for Advanced Computational Science WGS Whole Genome Sequencing WHG Western Hunter gatherers

Introduction

The genesis of ancient DNA as a tool for the study of human prehistory Human evolution and early human prehistory were once areas commonly ad- dressed by the disciplines of archaeology, physical anthropology, and linguis- tics. Although these fields have greatly contributed to what is known of the evolution and spread of Homo sapiens and their cultures, key questions re- garding the human past remain unanswered. Just as important as the evolution of cultures is to the story of the human past, so are the migration routes that humans took to populate the globe, as as how human populations interacted with one another to form new cul- tures and populations. Integral to the above is a question which archaeologists could not answer: whether changes in material culture can be ascribed to ‘Pots’, or ‘People’ 1. In other words, are observed cultural changes a result of the physical migration, and therefore replacement of one population by an- other 2, or are change due solely to the spread of ideas 3? Genetics has slowly permeated many academic fields as a tool and added perspective to the scientific research available, and archaeology is no excep- tion to this phenomenon 4,5. The first published segment of human ancient DNA was sequenced by Svante Pääbo in 1985 through bacterial cloning, and following the advent of the Polymerase Chain Reaction (PCR), the field of ancient DNA research quickly developed 6,7. Initial ancient DNA (aDNA) studies on humans towards the end of the 1980s 8–10 focused on mitochondrial DNA (mtDNA). MtDNA was more easily sequenced than nuclear genomic DNA, there being far more copies of mtDNA than nuclear DNA in the human cell, and their circular structure ensured less damage to mtDNA than to linear nuclear DNA 7. Investigating mtDNA has further been crucial for our under- standing of human prehistory, for example, an early investigation of many present-day human mitochondria traced mitochondrial DNA from 147 indi- viduals from all over the world to one woman who lived 200 000 years ago in Africa 11. This results gave more support to the ‘Out-of-Africa’ model over the multiregional model for how humans spread through the world 11. However, many questions were left unanswered using mtDNA analysis alone. As mtDNA does not undergo recombination during cell division, mtDNA represents the genetic information of one single marker 12. MtDNA

11 can be seen as a genetic marker tracing the direct maternal ancestry of an in- dividual 12. We may also glean similar information from the paternal lineage of a population, by analyzing the non-recombining portion of the Y-chromo- some (NRY) 13. There is much congruence between the genetic information on human migration history revealed by mtDNA and the NRY 13. As is the case with mtDNA, the Y-chromosome also originates in Africa, with the most basal branches (A and B for the Y chromosome and L0-L6 for mtDNA) being found in African populations 13. As mitochondrial DNA and Y-chromosomal DNA is inherited from either the mother or the father only, markers from these sources which are used for population genetic analysis are termed uniparental markers, and are limited in the amount of genetic information they contain (Fig. 1) 12. The investigation of more complex questions of human history re- quires the sequencing of nuclear genomes.

Figure 1. Ancestry ‘tree’ illustrating inheritance patterns of the Y chromosome (in blue) and the mitochondria (in pink) as well as the autosomes from grandparents to grandchildren. Sequencing the nuclear genome of an individual would allow the whole ancestry tree, including the autosomal ancestry of the grandparents, to be captured (Adapted from the National Geographic Society 2017).

Whereas mitochondrial and Y-chromosomal genomes contain the genetic in- formation of a single marker, nuclear DNA includes the genetic information of many markers, is inherited from both the mother and the father, and as nu- clear DNA recombines, stretches of DNA represent multiple evolutionary his- tories of many individuals going back in time 4,14. The type of genetic marker of interest in ancient DNA studies are Single Nucleotide Polymorphisms, or SNPs 15. SNPs are examples of point mutations, or changes to a DNA se- quence of one nucleotide only 15. Mutations which have no effect on the indi- vidual’s ability to reproduce are neutral and not impacted by natural selection, so their relative frequency in a population is determined by stochastic pro- cesses (genetic drift) and population demographic factors such as population size, migration, and admixture between different groups. Different human

12 populations rarely have different sets of genetic variants, instead, they carry different frequencies of the same genetic variant, creating genetic variation between populations. This genetic variation may then be utilized to study hu- man migration and evolutionary history 15. Human migrations have brought people to new environments, and in these new settings some genetic variants have been the targets of natural selection as part of the process of adaption 4. Other adaptive variants increase in frequency as a response to, for instance, infectious diseases. Such adaptive variants can also spread via migration, to different environments where the variant is no longer adaptive. Perhaps the greatest benefit of ancient DNA was, and is, in understanding how population migration and/or population continuity has impacted the ge- netic landscape of our species through time, and how we adapted to new en- vironments 4.

Human demographic history and the contribution from ancient DNA research Early studies in human evolutionary genetics, such as those of Cann et al. 11 utilized mitochondrial DNA from modern-day individuals to investigate hu- man demographic history. DNA studies of modern-day people already demonstrated the broad scale human migratory patterns across the rest of the world, and elucidated that human populations went through a major decrease in genetic diversity as they moved out of Africa, due to a decrease in popula- tion size 16,17. This is known as the Out-Of-Africa bottleneck occurring 60 to 80 kya (thousand years ago) 18. However, modern-day genomes have been influenced by a long history of migration, admixture and demographic changes, where later demographic events mask earlier such events, so they could only suggest as to what happened in the past. However, with the advances being made in ancient DNA research, more ancient genomic data was being made available to human evolutionary geneti- and clarified and elaborated upon the discoveries made using modern genomes. This is especially apparent in studies of human demographic history. Researchers succeeded in obtaining the first complete ancient human genome in 2010 from a 4 kya Paleo-Eskimo and consequently discovered that a mi- gration event may have occurred 5.5 kya from Siberia to America, independ- ent of the migration that led to the rise of modern Native Americans and the Inuit 19. The first study involving several ancient human individuals was pub- lished two years later 20 and has been followed by many more using even larger sample sizes 21–24. These studies have revealed many details of the genomic and migration history of and Eurasia from the Ice Age 22, to the 21,24. As the study of ancient DNA has become more established as a tool

13 to investigate human prehistory, researchers focused on migration histories in a wider set of regions. Ancient DNA allows us to study human migration history in ‘real time’, or as it happened. Using ancient genomes, we may study SNPs and neutral vari- ation of populations through time to understand how and where they migrated, and how past groups changed to become the populations living in the world today. The basics of such analysis stems down to the comparison of genetic variants between ancient and modern individuals and groups of individuals 15. To better understand the migration history of a specific region, the genetic material from prehistoric individuals collected in that region are compared and contrasted, to one another and the modern day genomes of people living in the region today 15. If the ancient individuals are genetically similar, in other words share the same genetic variants, they come from one population. In the same way, if the ancient populations share the same genetic variants with peo- ple living in that region today, that population has not received external gene- flow from a genetically different group. The reverse is also true with genetic dissimilarity, which would indicate geneflow or perhaps a severe population bottleneck. In this way, geneticists create a picture of the genetic landscape of humans living in that region across time and space. This genetic landscape may also be considered a genetic signature, in that it may be used to trace the movement of people from that region to different parts of the world 15,16,25. If that genetic signature is found outside the region from where the remains were excavated, that group has likely migrated. Using ancient DNA, that genetic signature may now not only be traced through space, but through time as well. And ancient DNA studies have found that more often than not, ancient human populations are completely different to modern populations living in the same geographic region today 9,23,26. The genetic signatures of different world pop- ulations have changed a great deal through time due to genetic mixing between populations from different regions. This is known as admixture 15. And as more ancient genomes are being sequenced and analyzed, the picture of our demographic history becomes ever more complex 21–24. The above may be illustrated with a brief account of the Neolithic Transi- tion in Europe. Before the arrival of farming to Europe, Mesolithic hunter- gatherers of Europe were genetically distinct from all modern-day Europeans (and all other modern-day groups as well), but somewhat similar to modern- day northern Europeans 9,20. Ancient DNA studies of the first farmers of Eu- showed that they were genetically dissimilar to Mesolithic European hunter-gatherers, and instead have similarities to the southwestern populations of present-day Europe, especially the Sardinians 20,27. Further studies on Early Neolithic farmers from Anatolia and other parts of the Levant showed that these regions were the source of European Neolithic populations 24,28. It has also been shown that hunter-gatherers and farmers did not remain genetically isolated, but admixed, with hunter-gatherers being assimilated into farming groups 23,27,29. Comparisons between these ancient individuals and modern-

14 day populations further clarifies that there was high population turnover dur- ing the Early Neolithic, and this continued through to the Bronze Age. The Late Neolithic is punctuated by a massive migration influx of steppe herders, also known as the Yamnaya, into Europe 21,23, and gave rise to the Late Neo- lithic Corded Ware culture, which has been linked to the spread of Indo-Eu- ropean languages 21,23. This and other later Bronze Age migrations greatly changed the genetic variation of ancient Europeans to that seen in the Euro- pean populations today 30. Yet the Neolithic transition for Scandinavia, while influenced to some degree by the migrations mentioned above, occurred later than for the rest of Europe which makes it an interesting region to study.

Early migration history of Scandinavia Due to the Fennoscandian ice sheet covering Scandinavia up until 11 kya, Scandinavia was one of the last parts of Europe to be settled by humans 31. Archaeological evidence places humans solidly in Fennoscandia by 11700 BP, towards the end of the last glacial maximum (LGM) 32,33. Before the set- tlement of , genetically distinct populations of foragers sur- vived in Eastern and Western Europe. Eastern hunter-gatherers (EHG) derived their ancestry from Ancient North Eurasian steppe populations of the Upper Paleolithic, whereas Western hunter-gatherers (WHG) were descendants of foragers who had been widespread through Europe for several millennia 23,24,34,35. Yet how Scandinavia was initially settled was a matter of contention 33. Migration routes suggested by lithic from Western Europe and their oldest representative in Scandinavia contradict those of a different from Northern Scandinavia, and other studies in climate modelling and genetics suggest several possibilities for initial migrations to Scandinavia 33. Genetically, Scandinavian Mesolithic Hunter-Gatherers (SHG) is an ad- mixed population that falls between EHG and WHG (Fig. 2) 33,34. A study by Gunther et al. 33 investigated 7 Mesolithic individuals from and the Baltic sea islands to determine the route of Mesolithic foragers into Scandina- via, and how they were related to other Mesolithic hunter gatherers. It was found that early foraging populations settled Scandinavia along two routes, one comprising EHGs from the East migrating through a north Atlantic route, and one comprising WHG migrating from the south. This proposed route is supported by archaeological data on dated settlements in this area, as well as the Mesolithic technological evidence for this region 33.

15

Figure 2. Map showing ancestry of the Mesolithic individuals used in 33 as well as the most plausible route taken by colonizing populations as they moved through Scandinavia. The map shows the relative proportions of Western (green) and East- ern (maroon) Hunter Gatherer ancestry in the study samples as pie charts for each individual. The represent the route taken by Western Hunter Gatherers (green), and Eastern Hunter Gatherers (maroon) into Scandinavia. Adapted from 33.

While farming and agriculture spread throughout the world from the Near East and established itself in Europe 10 kya, agricultural practice first arrived in Southern Sweden with the Funnel Beaker culture (FBC, also referred to as TRB from Trichterbecher kultur) about 6 kya with the start of the Swedish Neolithic 27. The FBC are so identified due to unique collared pot ceramics, as well as communal megalithic burial sites 36. The first farming communities in , known as the (LBK, from the Ger- man Linearbandkeramik kultur) formed in Europe as a result of migrating farmer populations from Anatolia that spread through Europe and later ad- mixed with indigenous hunter-gatherer groups 37. FBC famers’ ancestry are derived from LBK groups, with more genetic and cultural input from the in- digenous hunter-gatherer groups in southern Scandinavia, northern , and the Netherlands 20,27,35. Therefore, many Neolithic populations have similar patterns of ancestry, that of their source population which in- cludes ancestry from the Near-East where farming began, and the hunter gath- erers which they met as they moved through the rest of Europe. The ancestry of the individuals associated with the FBC reflects a common pattern of de- mographic migration which describes the Neolithic Transition across Europe: that of leapfrog migration of pioneering farmer groups followed by low-level admixture of resident hunter-gatherers into the farmer group 38. However, this is not the case with the Pitted Ware Culture (PWC), which coexisted with the FBC in Scandinavia till the end of the middle Neolithic circa 4800 BP (Fig. 3).

16

Figure 3. Timeline of archaeological cultural transitions of prehistoric Scandinavia (in BP).

The PWC appeared in the archaeological record at approximately 5,300 BP and coexisted first with FBC farmers, and later with Battle Axe Culture (BAC) herder populations. They are considered one of the last hunter-gatherer socie- ties of Scandinavia 9,36,39,40. The PWC derives its name from the unique ce- ramic style of deep pits and impressions on their pottery and were seal hunters, which is why their settlements may be found throughout the coasts of (mod- ern-day) Sweden and islands in the Baltic sea near the Swedish coast 9,36. The origin of the PWC is under some contention, but archaeogenetic studies by Malmström et al. 9 and Skoglund et al. 20,27 show that the PWC are genetically similar (but not identical) to Scandinavian Mesolithic hunter-gatherers who existed before the arrival of the FBC farmers 35. In Scandinavia, the FBC culture was eventually replaced by the BAC, with new material culture expressions, in 4800 BP 41,42. Although the BAC is a variant of the Corded Ware Culture (CWC), the BAC is so named due to the battle found associated with the graves of males as well as characteristic corded ware pottery 36. Evidence of BAC communities have been found close to the coastline, but were focused more inland in south and central Sweden 36. Genetic studies on ancient European and Eurasian samples have found that Battle Axe culture individuals are genetically similar to individuals from the CWC 21,23,43. Admixture estimates of the CWC reveal ancestry comprising 75% from the Yamnaya steppe population, and comparison of the material culture of both groups also show strong similarities 23.Earlier CWC individu- als displayed even higher proportions of Yamnaya ancestry, and this ancestry decreased with time as these populations admixed with local groups 43. As this steppe component was not present in Scandinavia before, the BAC may have arisen in Scandinavia as a result of migrating Corded Ware populations into Northern Europe.

Early migration history of Africa Africa is the birthplace of modern humans as shown by archaeological, palae- oanthropological and more recently, genetic studies 4,17. What was for a long time considered to be the earliest remains of anatomically modern humans was discovered in Ethiopia and dates to 190 kya 44,45, while in South Africa and North Africa, remains from other early and transitional Homo species have also been uncovered, such as the Florisbad skull (South Africa) and fos- sils from Jebel Irhoud (Morocco) 46,47. The fossils from Jebel Irhoud resemble

17 an early evolutionary stage of anatomically modern humans. The remains have been dated to the African middle stone age (approx. 315 kya) when mod- ern humans were thought to have first appeared in the fossil record 47, adding another important geographic area to the possible emergence of anatomically modern humans. Genetic research conducted on modern-day hunter gatherer San and pastoralist Khoekhoe groups of South Africa (cumulatively known as the Khoe-San) reveal these groups to have high frequencies of the very basal mtDNA and Y chromosome lineages, as well as the deepest population diver- gence times among modern human populations 48,49. Therefore Africa contains a rich, complex prehistory of the origin and evolution of the human species and for this reason many attempts have been made to get a glimpse of this ancient past. Archaeological studies have also revealed that Africa holds key insights into the evolution of behaviourally modern humans, with indications of mod- ern behaviour dating as far back as 100 kya 50. Earlier studies have pinpointed East and South Africa as being possible places of origin for our species, but most recently, archaeological and genetic studies suggest a more multiregional model for the origin of modern humans in that they may have originated and evolved throughout the continent of Africa 47,48,51. A landmark study by Schle- busch et al. 48 on South African Khoe-San have found that they diverged from other modern humans more than 100 kya, and that the ancestors of these groups had exclusive occupation of South Africa up until 2 kya with the arrival of East African herders 49,52. Furthermore, Schlebusch et al. found ancient sub- structure within Khoe-San groups dating as far back as 35 kya separating northern and southern San groups 48. The genetic structure of the study popu- lations were closely correlated with the geographical location of the groups, and to a lesser extent their language and mode of subsistence 48. Both studies 48,49 results hint at a complex genetic history between ancient indigenous groups living in the area. The genetics of modern populations are affected, however, by past large- scale migrations from other populations and subsequent admixture of incom- ing groups into the gene pools of indigenous groups (Fig. 4) 52,53.

18

Figure 4. A visual representation of the major migrations of crop famers and herd- ers into southern Africa during the Holocene. The brown arrows indicate back-mi- grations of non-Africans into Africa, and the blue-brown represents the mi- gration of mixed East African/Eurasian herders to southern Africa. The green ar- rows indicate the Bantu expansion from West Africa to southern Africa. Adapted from 54.

Three major migrations have had a key impact on the genetics of southern African Khoe-San groups 54,55. The earliest migration event occurring ~2 kya was the movement of African pastoralist groups to the south from eastern Af- rica 52,53,55. This was followed by the expansion of Bantu speaking people, a large-scale migration event of Bantu-speaking groups from the Cross River Valley region in Cameroon (West Africa) which began 5 kya and reached southern Africa circa 1.8 to 1.3 kya 26,48,53,56,57. Last was the immigration of European settlers to South Africa approximately 350 years ago 58. The genetic impact of these migration events on indigenous Khoe-San populations is evi- dent when analyzing Khoe-San groups, as all of the groups studied exhibit some admixture from East African populations 48,52 and most with Bantu- speaking groups as well 49,53. The Coloured groups of South Africa are heavily

19 admixed with ancestry from Khoekhoe, San, Europeans (colonists), Bantu- speakers and East Asians (due to the slave trade) 48,59,60. Such admixture not only makes it harder to elucidate the genetic history of ancient populations living in Southern Africa before these events, but also has an effect on diver- gence time estimates calculated for different indigenous groups. Ancient DNA studies of samples from before the above-mentioned times would generate a far more accurate account of the prehistory of southern Africa. African ancient DNA studies are few and far between due to the initial fo- cus on European ancient DNA studies, and the difficulty in extracting DNA from samples from warm climates, as such preservation conditions are not conducive to the preservation of DNA 61. However, with the development of new and improved molecular genetic methods and tools, it is becoming easier to not only extract DNA from samples found in warm climates, but more an- cient samples too. Morris et al. 62 sequenced the first mitochondrial genome of an ancient southern African marine forager dating to before the pastoralist migration event, about 2.3 kya. The mitochondrial genome was classified as a new L0d2c lineage commonly occurring in the Khoe-San of southern Africa 49,62. The first nuclear genome to be sequenced from Africa was that of a 4.5 kya Ethiopian individual 61, and revealed East African populations (more re- cent in time than 4.5 kya) have been influenced by migration from a popula- tion genetically similar to early Neolithic farmers from Western Eurasia. More recently, in 2017, I was part of a team that sequenced and analyzed the nuclear genomes of 7 ancient individuals, three dating to the Late Stone Age (2 kya), and 4 dating to the African Iron Age (300 to 400 ya) 26. In particular, one of the 2 kya individuals, related to southern San groups was sequenced to 13 x coverage and used as a reference ancient genome for sub- sequent analyses. This high coverage genome revealed that all modern Khoe-San groups, including the Ju|’hoansi, often presumed to be ‘unad- mixed’, have 9-22% admixture from an already admixed group of Eurasian- East African ancestry dating to 1.5-1.3 kya. Such results have an effect on past genomic research as the Ju|’hoansi were considered as the human pop- ulation with the least admixture compared to all other African groups, and was used accordingly in genetics research. Lastly, the high coverage indi- vidual, as a non-admixed San ancestral group, was compared to other Afri- cans to re-estimate the earliest divergence time between modern human groups. Whereas past research estimated a divergence time of ~100 kya, (~200 kya using the updated human mutation rate 63) using the Stone Age individual pushed back the divergence time to between 260-350 kya, to- wards the genesis of the Middle Stone Age when humans became morpho- logically and behaviorally modern. A study by Skoglund et al. 64, published at the same time, captured SNPs from 16 ancient Africans dating as far back as 8.1 kya , and found that hunter- gatherer populations clinally related to the southern African San were wide- spread in ancient times. Subsequent African demographic history was complex,

20 with repeated gene flow between different groups, and varying levels of popu- lation replacement by western African farmers and eastern African herders.

Interactions between different prehistoric populations Just as the timing and routes of migration is integral to the history of modern human expansions, so is what happened afterwards as migrating populations encountered native populations with a different culture to their own. Evidence of such interactions can be seen in the archaeological material culture, in that either one or both cultures disappear from the archaeological record, and a new one arises. It has, however, been difficult to know whether interaction between two human populations resulted in complete population replacement or integration, or whether one group adopted elements or the culture of incom- ing groups without further admixing of the two populations 5,16. With the ad- vent of ancient DNA research, however, it became possible to investigate this question, as evidence of population replacement or admixture would be evi- dent in the genomes of ancient individuals 5,65. Especially intriguing to archaeologists was the process of how farming spread through Europe and the rest of the world 2. Generally referred to as Neolithic expansions, farming developed independently in various parts of the world 54. The European Neolithic expansion was extensively studied, and the European Neolithic transition was one of the first areas which population ge- neticists addressed using aDNA 66. Inherent to this question was the nature of interaction between hunter-gatherer and farming groups: did the Neolithic transition involve the complete replacement of hunter gatherer groups, whereby hunter-gatherers were integrated into farming communities, or did hunter gatherers actively incorporate farming practices into their existing cul- ture? Genetic studies have shown that ancient hunter-gatherer and farming communities in most cases are genetically different, and therefore any admix- ture between farming and foraging groups would be evident in descendant populations 9,20,67.

Interactions in Neolithic Scandinavia Due to the unique nature of the Neolithic in Scandinavia, and the fact that foraging and famer groups coexisted with one another throughout much of this period, how foraging and farmer populations interacted with one another was a matter of much research in the archaeological community. aDNA studies on Neolithic Scandinavian individuals exemplify that much of this interaction was genetic, and that admixture between different populations have been tak- ing place in Scandinavia even before the onset of the Neolithic. As mentioned above, FBC populations had different genetic ancestry to Scandinavian forag- ers 20,27,42. However, there was also evidence of admixture between the two

21 groups evident as a small proportion of hunter gatherer ancestry in the ge- nomes of FBC individuals 20,27,42. This suggested that hunter gatherers were being incorporated into FBC communities as the FBC migrated into Scandi- navia 27. Nevertheless, the PWC, which coexisted first with the FBC and then with the BAC, show little farmer ancestry, and are an admixed group between WHG and EHG, like the Mesolithic foraging populations from which they are believed to be descendants 9,20,35. The BAC have a very tripartite ancestry, mostly comprising steppe ancestry, with an added component of farmer an- cestry and a hunter gatherer ancestry component as well, demonstrating fur- ther interaction between Yamnaya-related groups such as the CWC and local farmers before they migrated up to Scandinavia 43. While we are slowly deciphering the nature of interaction between groups in the Scandinavian Neolithic, there remains much to be explored, such as the interaction between the PWC and the BAC. PWC material culture is exempli- fied by pits used as decoration on their pottery, their mode of subsistence, which was seal hunting, and their burial style, which were single graves where individuals were buried in a supine position 36,40,41. Some of the BAC material culture is reminiscent of the CWC, such as corded ware pottery, and are named thus because of the Boat-shaped Battle Axes associated with their graves 36,68. While the PWC were seal hunters, and described as hunter gatherers, the BAC were terrestrial farmers and herders. As the PWC, the BAC had single graves as well, but buried their dead in a hocker or crouched position 36,41,68. The PWC and BAC coexisted in southern Sweden from 4900 BP to 4200 BP 42. Not much is known of the extent or nature of cultural or genetic interaction be- tween these two cultures, although there is some indication of sharing of cul- tural ideas evident in the material culture dating to that time 69. Such is the case with material culture from the island of Gotland, about 80 km west of the Swedish mainland. The island boasts a very rich prehistoric record dating as far back as when it was first settled, during the Mesolithic, approximately 9200 years ago 40. Studies of the material culture on the islands have found corded ware pottery on some coastal PWC sites as well as battle axes all over the island 69. axes and grindstones from PWC sites also show influence from the BAC. Yet, perhaps most striking is the possible evidence of BAC influence on the PWC gravesites scattered throughout the island, because while most individuals are buried in a style typical to that of the PWC, a small percentage of display some affinity to BAC burial traditions 69. As the people associated with the two cultures are so genetically dissimilar 43, it is possible to investigate the question of whether the coexistence between the PWC and BAC involved genetic admixture as well as cultural idea ex- change, or involved cultural diffusion alone by exploring whether any BAC genetic signals are seen in PWC individuals. To investigate a potential inter- action or connection between the PWC and the BAC, I extracted DNA from individuals from PWC gravesites on Gotland that featured the above men- tioned characteristics in order to determine whether the different burial styles

22 resulted from solely cultural contact between BAC and PWC populations, or that individuals from the BAC were actively incorporated into PWC societies. I focused on individuals from three gravesites, namely Ajvide, Västerbjers and Hemnor on Gotland, and genetically analyzed individuals buried in typical PWC style (supine position) and with ‘BAC influences’ (crouched positions or having a battle axe or other BAC material culture associated with the grave). Using genomic analyses such as PCA and admixture analysis as well as D-statistics (see methods section for discussion of analysis methods), I compare DNA between the two groups (typical PWC and ‘BAC influenced’) in order to determine whether they belonged to the same population, namely the PWC, and therefore took on BAC cultural ideas, or were from two differ- ent population groups (PWC and BAC) where BAC individuals were incor- porated into the PWC group.

The agricultural genesis in Africa Research on the spread of farming in Africa currently recognizes three regions in Africa where agriculture has developed independently from one another: the Sahara or Sahel belt around 7000 BP, the Ethiopian highlands around 7000 to 4000 BP, and western Africa around 5000 to 3000 BP 70. Furthermore, ar- chaeological and linguistic evidence supports an introduction of agriculture to the Nile Valley from the Middle East, and there are other genetic indications of Middle Eastern influence in genetic studies on ancient and modern North African individuals 70–72. Genetic studies on both modern and ancient individuals show that as far back as 12 000 years ago, there were multiple migrations out of Eurasia back into Africa which affected northeastern and eastern Africa 61,64,71,73. One such back migration to East Africa, dated to 3000 BP, may have involved source populations with ancestry related to pre-pottery Levantine farmers, and in fact, a study by Prendergast et al. 74 confirms this to be the case. Using 41 individ- uals spanning the African Later Stone Age, Pastoral Neolithic, and African Iron Age, they suggest that pastoralism became established in eastern Africa through four different admixture events. The very first event, which they dated to 5 to 6 kya, involved admixture of populations most closely related to pre- sent-day Nilotic speakers with populations which most closely resemble pre- sent-day northern African and Levantine groups 74. This date coincides with the archaeological record which suggests extensive contact and migration be- tween Ethiopia and southern Arabia at that time, and may likely be the source of Eurasian admixture in East African populations of today 75. This Eura- sian/East African group then proceeded southwards, as an ancient Tanzanian individual dated to 3100 BP also displays ancestry related to the Levantine pre-pottery farmers as well as East African ancestry 64. While there is no definite genetic evidence linking the introduction of pas- toralism in East Africa to a Eurasian back-migration or influence (genetic or

23 otherwise) from the Middle East, indirect evidence may suggest such a link. There is archaeological evidence of and pastoralism in the form of pottery and domestic animal remains in sites in southern Kenya and northern Tanzania dating to that time (4500 to 3300 BP) 76. Furthermore, an East African variant of the lactase persistence gene (LP), C-14010, is esti- mated to have evolved in East African populations 2,700 to 6,800 years ago in response to the use of milk in their mode of subsistence, providing further evidence of that pastoralism was a mode of subsistence in East Africa by 3000 BP (Fig. 4) 77. The same lactase persistence variant is also found in southern Africa. It occurred at high frequency in Khoekhoe and Khoe-khoe descendent groups and at lower frequencies in some San hunter-gatherer groups. The presence of this East African LP variant was linked to the migration of East African pas- toralists southwards along the East African coast to arrive in southern Africa by 2000 BP (See blue line in Fig. 4) 26,52,54,78. This migration is also confirmed by autosomal DNA studies 26,48,73. Through demic diffusion and integration into hunter-gatherer communities, these migrating groups with Eurasian/East African ancestry brought herding practices to southern African groups and be- came the ancestors to the modern day Khoekhoe 26,48,52,64,73,78. The Nama (the only modern day Khoekhoe group who still speak their original language and practice pastoralism) for example, carry up to 30% of their ancestry from this Eurasian-East African group 26. The rise of farming in Western Africa 5000 to 3000 BP has had the greatest genetic influence on indigenous African populations 79. Known as the ‘Bantu expansion’, the spread of Bantu-speaking farming populations influenced the spread of agriculture and possibly iron use southwards to the rest of Sub-Sa- haran Africa 57. Genetic studies on modern African populations has verified that the Bantu-expansion was largely demic, and consequently Western Afri- can ancestry may be observed in most Sub-Saharan African populations. The effect may also be seen in the distribution of over 500 closely related ‘Bantu’ languages which are spoken in an area equating to 500 000 km 54. The migra- tion routes that early Bantu-speaking farmers took as they moved southwards is under some contention 57,79. Linguistics categorizes Bantu languages into 3 broad groups: Northwestern Bantu, spoken close to the origin of the Bantu expansion, and Eastern and Western Bantu 79. From this language categoriza- tion came two hypotheses to explain how Bantu-speaking peoples moved through Africa, that of an ‘Early split’ and a ‘Late split’ (Fig. 5) 57,79. The early split hypothesis poses that Bantu-speaking groups split soon after their place of origin into eastern and western groups, and the eastern group migrated east. above the rainforest, towards Kenya by 3000 BP and then continued south to reach South Africa by 1300 BP 57,79. The western Bantu-speaking groups moved directly south, through the central African rainforest and then further south along the Atlantic coast, forming the second migration route 57,79. The late split hypothesis differs in that instead of splitting at their homeland,

24 Bantu-speaking farmers moved altogether down through the central African rainforest in Cameroon before splitting into independently migrating eastern and western groups 57,79. Genetic evidence mostly supports the late split hy- pothesis where eastern Bantu languages developed from western Bantu lan- guages. A study by Li et al. 57 investigated the suggested large scale pattern of the eastern Bantu expansion and confirmed a spread to the East followed by a spread to the south, however the study found possible evidence of gene flow between eastern and western Bantu-speaking groups as they migrated towards South Africa. The Bantu expansion has largely obscured the genetic variation of past groups living in Africa before, which makes it difficult to study African genetic history before the Bantu expansion. Therefore, ancient individuals ei- ther predating the Bantu expansion or who have not been affected by the Bantu expansion may be useful in studying events before the arrival of the Bantu speakers, such as the rise of pastoralism. For my second project, I performed genetic analyses on sequence data from a hair sample dating to approximately 200 BP. This hair sample was found in the Vaalkrans in the southwestern Cape in South Africa, which is thought to have been habited by Khoekhoe herders at the time the Vaalkrans individual was alive. Although the Vaalkrans individual post-dates the arrival of the Bantu speakers to South Africa, historic and anthropological research suggests that they had not migrated into the western part of South Africa, which was considered Cape Khoekhoe territory 80. The Vaalkrans individual also lived during the European colonization of South Africa, therefore the Vaalkrans individual would likely be unaffected by West African admixture, but would potentially be affected by European admixture. The purpose of the project was to investigate the ancestry of the Vaalkrans individual through comparison with data of modern and prehistoric individuals spanning 2000 years. Results provided a picture of the genetic impact of the pastoralist mi- gration on hunter-gatherers living in southern Africa at that time and going forward to the modern Khoekhoe groups living today. This will be elaborated upon in the results section.

25

Figure 5. Two hypotheses of how the Bantu expansion took place. Starting in east- ern Nigeria and western Cameroon, a) 1) at 5000 BP, farming groups split directly after 1. (The early split hypothesis) and then expanded to 2) and 3) by 3000 BP to get to the south by 1000 BP. b) Starting from the same region, farming populations first moved southward to 2) the central African forest, and then split to reach 3) East Africa and southern Africa by 1000 BP. Recreated from 57

Ancient DNA as a means to understand early human adaptation to their environment As human populations migrated to other parts of the world, they encountered novel environments, where aspects such as climate were different to the envi- ronment they came from. They would also have been exposed to novel path- ogens which they would not yet have developed resistance to. Survival in such different environments meant that humans would need to acquire traits that would allow them to survive and reproduce in the form of genetic change over time, or adaptation 81. Just as influential as these abiotic and biotic factors were on the genome of human populations, so were cultural factors such as changes in diet and subsistence strategy. Ancient DNA allows us to study the evolution of the human genome in response to biological and cultural pressures in real time, in the form of changes to allele frequencies in response to a selective pressure. For example, lactase persistence is the continued production of the lactase enzyme into adulthood which makes it possible to digest milk post weaning. It arises as a result of different mutations in a control element 14 kilo-bases upstream of the lactase gene. These mutations occur at high fre- quencies in pastoralist populations and is strongly linked to the onset of pas- toralism in different parts of the world 78. In this case, a change in subsistence strategy (pastoralism) became a selective pressure which increased the fre- quency of LP variants over time in those human populations and helped them to adapt to a pastoralist lifestyle by allowing carriers of the LP variant to digest milk as adults. Whether the LP variants emerged by mutation in pastoralist

26 populations or existed in low frequency in human populations is unknown, but for most rapid adaptation events, it was likely selection on standing variation. Different environments and regions in Africa had different selection pres- sures which led to varying allele frequencies of beneficial mutations in differ- ent African populations. Pathogens such as Malaria are found in highest fre- quency in countries in West Africa, whereas the southernmost parts of Africa remains untouched 82. In order to survive in such an environment, western African populations became adapted to a higher pathogen load in that any new mutations conferring resistance to those pathogens would rapidly increase in allele frequency in those populations 83–85. In East Africa, those populations practicing herding utilized milk as a food source, and so would have to be able to digest milk in adulthood 78. As part of that development, in connection to the selection pressure, the lactase persistence allele would increase in fre- quency in East African herding populations. When these populations migrated and were genetically and physically integrated into populations living in the areas they migrated to, such as the agriculturalists from West Africa and pas- toralists from East Africa migrating through Africa, there would be a change in allele frequency of those adaptive alleles in their descendants. Therefore, selection coupled with migration ensures the propagation of different func- tional variants in different populations. Subsequent migration of those popu- lations may consequently introduce, or change the frequency of these adaptive alleles in populations living in the region that they migrate to. In turn, by stud- ying changes in allele frequency of functional variants in a population of in- terest over time, we may gain further insight into migration patterns of human populations 86. The main purpose of the third paper of this thesis was the use of seven ancient African individuals to re-evaluate the time point in history when mod- ern humans split from archaic humans 26. However, due to Africa being a key location in the evolution of modern humans, exploration of adaptation in an- cient individuals from Africa may provide new insights into how humans adapted to new environments. My role in the project was to investigate adap- tive variants in the ancient individuals of the study, and the results further ex- emplified the role of migration in the spread of novel, advantageous alleles through populations. In my fourth and final project, I further explored adaptation in a greater context, in terms of the number of variants investigated, and the number of ancient and modern African populations incorporated in the study. The aim of the project was to investigate the change in allele frequencies of these variants through time, and across space. I did this by comparing allele frequencies of functional variants associated with disease resistance, metabolism, salt sensi- tivity and skin pigmentation between ancient Africans and modern-day Afri- can populations. In so doing, I strove to better understand which mechanisms drove adaptation in the past, and which mechanisms contributed to the allele frequency changes through time of these variants in present populations.

27 Wet and dry lab methods and the analysis of Ancient DNA

DNA survival The environment in which remains are preserved greatly affect how the DNA survives through time 87,88. In particular, the average temperature of the sur- rounding environment plays a large role in DNA degradation. DNA degrades most rapidly in moist, hot or humid environments or in places with large and frequent fluctuations in temperature. Humid, moist environments encourage microorganism and fungal growth and activity, which breaks down the DNA, and condensation, which increases the presence of water in the soil and thus depurination of DNA, a chemical degradation process 87. Depurination is fur- ther affected by the pH of the soil, specifically, anything outside the range of pH 4-9. Biochemical studies on DNA suggest that DNA survival is capped at 100 000 years, but in really cold, dry environments like permafrost or in at high altitudes, DNA has been successfully extracted from remains as old as 700 000 years 89–91. A study by Smith et al. 87 compiled thermal histories from fossil hominid sites in Northern Europe, and used this information to deter- mine the ‘thermal age’ of remains found at these sites. They define thermal age as ‘the time taken to produce a given degree of DNA degradation when temperature is held at a constant 10 degrees celsius’. Their analysis exempli- fies the importance of temperature and the warming of the planet during the Holocene on the preservation on bone material. Hofreiter et al. formulated a model where they estimated thermal history across the globe and compared it to cases where DNA degradation was independently measured 88. They used the model to determine the survival of DNA of different lengths (150 bp and 25 bp) all over the world. Again, their results verified that DNA from low latitudes and on continents such as Australia, Africa and South America is far less likely to survive to the present day 88. Regardless of the environment, DNA is still encapsulated by bone, and bone apatite may ‘protect’ DNA from extracellular factors such as the pH of the soil 87. Taking this into account, DNA survival may also vary with the type of bone it is extracted from. Studies have shown that DNA extracted from the petrous part of the temporal bone in the skull 92 as well as DNA from tooth cementum 93,94 is much better preserved than in other parts of the skeleton. In the case of the temporal bone, the bone is extremely compact and is protected

28 from micro-organismal and fungal attack as it is more recessed in the skull. Where tooth cementum is concerned, this part of the tooth is protected by the bone of the jaw. Overall, DNA has been found to be better preserved in areas of the skeleton where the bone is extremely compact, such as the core of the femur and other leg bones. However, there are always exceptions and the over- all preservation of the bone plays a large part in the amount of DNA present in a skeletal sample. For example, in 2012 Meyer et al. were able to obtain a 30x coverage genome from a phalange which belonged to a Denisova individ- ual, a group related to Neanderthals. This means that there was enough overall DNA from that bone to represent the entire genome 30 times over 95.

The challenges of working with ancient DNA Due to the nature of DNA survival through time, achieving results where qual- ity ancient DNA could be recovered from a sample was hard-won, as there are many challenges in successfully extracting and sequencing ancient DNA. Even now, with advancements in DNA technologies, researchers need to ad- dress issues such as preservation conditions of a sample, contamination of an- cient DNA with modern DNA, and other damages to the DNA sequence which makes reliance on the authenticity of an ancient DNA sequence difficult. Therefore ancient DNA researchers must take certain characteristics of an- cient DNA into account when extracting, sequencing and conducting analysis based on ancient material.

DNA damage As biological material decays over time, cells degrade and the DNA becomes exposed to many factors which break it down or change its constitution 96. Other than effects from the extracellular environment on DNA survival men- tioned above, intracellular processes as the organism and its cells break down will cause damage to the DNA sequence. Intracellular processes range from enzymatic damage to DNA, for example the release of nucleases which attack the chromatin, to chemical degradation of the DNA itself. More specifically, hydrolytic and oxidative processes change the chemistry of the DNA compo- nents and compromise the nucleotide sequence which holds genetic infor- mation. Such processes manifest themselves as fragmentation of DNA, and chemical modifications that either prevent the sequence from being replicated by DNA polymerase or cause incorrect nucleotides to be incorporated during DNA sequencing 97. Hydrolytic processes cause either deamination of cytosine to uracil, where H2O hydrolyses the amine group of the cytosine nucleotide to a carbonyl group (thus forming uracil), or it will convert a cytosine nucleotide which has been methylated by epigenetic processes to thymine (Fig. 6) 96. During DNA

29 replication, adenine pairs with uracil (the deaminated cytosine) and then thy- mine pairs with adenine, hence a C to T substitution. Less common but also possible is guanine to adenine substitution, where guanine pairs with cytosine, cytosine is deaminated to uracil and then, during DNA replication, adenine binds to uracil. Such substitutions occur mostly on the ends of DNA fragments as ends are more likely to be single stranded and more exposed to hydrolysis 97. Oxidative processes such as the attack of free radicals on DNA can cause misincorporations of nucleotides to each other 97. Oxidation affects guanine more than any other nucleotide, and oxidised guanine will bind to either cyto- sine or adenine. Lastly, nucleotides may be modified in other ways through crosslinking with other DNA fragments, DNA strands or between DNA and other molecules such as proteins. Guanine residues from depurination may also act as blocking lesions to replication of ancient DNA (Fig. 7) 97.

Figure 6. Diagram showing deamination of cytosine to uracil, and methylated cyto- sine to thymine. The green boxes show the amine group which will be deaminated, the purple boxes show the oxygen molecule which takes the place of the amine group after hydrolisis has taken place, and the red box shows the added methyl group to cytosine after methylation of cytosine.

Figure 7. Depurination of DNA resulting in cleavage of a nucleotide base from the phosphate backbone, and β-elimination which results in cleavage of the phosphate backbone itself. The cleaved guanine can then block any further DNA replication (adapted from Dabney, Meyer & Pääbo 2013).

30 DNA contamination and authentication Due to the compromised and damaged nature of ancient DNA, contamination from DNA in the surrounding environment (from the soil and micro-organ- isms) and more importantly from the people who handle and work with the sample makes it challenging to retrieve and sequence endogenous ancient DNA 98. Hence, raw sequenced DNA data includes a mixture of environmen- tal DNA, possible contaminant human DNA from e.g. excavators and/or cu- rators, and endogenous DNA. While bioinformatics approaches is now able to discriminate between endogenous and contaminant DNA, the more contami- nation present in the sample, the smaller the fraction of endogenous DNA that will be returned for a certain amount of sequencing. In order to minimize any further risk of contamination, precautions are taken in the laboratory where samples are processed to keep the surrounding environment as free from DNA as possible 99. When extracting DNA from bone and tooth material, all sam- ples are irradiated under a UV light to destroy any surface contamination. The top layer of sample bone is drilled away to further ‘decontaminate’ the bone, and bleach is used to clean the topmost layer of bone before drilling even be- gins. Bone and tooth powder is only extracted from the innermost layers of bone and teeth, and negative controls are used throughout the extraction and library building process so to ensure that the lab personnel have not introduced some form of contamination during lab processing of the sample. Once ancient DNA has been sequenced, initial processing of the sequence data involves filtering the data, aligning the data against the reference human genome to discard any environmental DNA contaminants, and conducting dif- ferent analyses to obtain a measure of DNA authenticity. This is where dam- age on the ends of DNA fragments may serve as indication of authentic, an- cient DNA in the form of deamination plots, where the mismatch frequency of the DNA sequence is calculated for each DNA fragment and plotted in or- der from the middle of the DNA fragment to both the 3’ and 5’ ends (Fig. 8). Authentic ancient DNA will have large spikes in the frequency of C to T changes at the 5’ end of the DNA fragment, and G to A changes at the 3’ end of the DNA fragment 100,101.

31

Figure 8. Damage plot for genetic data of an individual from a Pitted Ware Culture gravesite in Ajvide. On the 5’ end of DNA fragments, there is a sharp increase in nu- cleotide T from C-T deamination. From the 3’ end there is a sharp increase in A from the strand read in reverse.

Analysis of the haplogroups of individuals also verify the authenticity of the sample and whether it has been contaminated, as having more than one mito- chondrial or Y-chromosome haplogroup present in a sample would indicate contamination either from modern DNA, or another ancient DNA sample 102. Mitochondrial DNA may also be used to calculate contamination estimates of sample DNA, another measure of authenticity 103. The most common meth- ods of estimating mitochondrial contamination are the Green method and the program ContamMix 104,105. In both methods, the consensus sequence of an ancient individual is compared to over 300 modern mitochondrial sequences, and any contradictory signals in the data treated as contamination. In the Green method, private or near private mutations (mutations which only appear in the ancient individual) are identified for the ancient sample, and these are used as diagnostic sites from which a point contamination estimate is calcu- lated 104. If a number of reads that cover these diagnostic sites do not match the ancient individual, but instead match the modern mitochondrial DNA se- quences, then this is labelled as contamination. Where the Green method pro- vides an estimate of contamination, ContamMix aligns the ancient consensus

32 sequence with those of 311 modern mitochondrial sequences, and if any an- cient reads map better to any one of the modern mitochondrial DNA se- quences, this may be caused by contamination. ContamMix calculates an es- timate of authenticity as opposed to contamination in Green 105. The program SAMTOOLS is used to compare the ancient DNA sample with that of a mod- ern human DNA reference, and results are given as a point estimate percentage value. Point estimates below 5% are often interpreted as the ancient sequence being uncontaminated, since the confidence interval then covers 0% contam- ination. Therefore, downstream analyses are often conducted without attempt- ing to remove contaminant fragments 103. Other programs, such as VerifyBamID 106 and ANGSD 107 calculate con- tamination estimates for the autosomal DNA or the X-chromosome (in males). The program ANGSD uses the X-chromosome, which exists in a single copy in males, and compares the major variant (the most frequent base at a SNP site) and the minor variant (the least frequent base at that same SNP site) for a particular SNP with those of adjacent SNPs 107. Differences between the SNP site and adjacent SNP sites are determined using Fisher’s exact test, resulting in a contamination estimate for the X-chromosome which is extrapolated to the rest of the genome. VerifyBamID on the other hand, uses the entire sample genome. It uses mathematical modelling to relate the sequence reads to a hy- pothetical true genotype generated from previous genotypes for that sample 106. The result is a contamination estimate like that of ANGSD. However, ver- ifyBamID requires decent genome coverage, and is typically reliable for sam- ples with more than 1x coverage.

Wet lab methods used to sequence ancient DNA Fortunately, with the development of Next Generation Sequencing (NGS) it is much easier to retrieve endogenous DNA, as NGS can sequence far more efficiently than traditional PCR based sequencing methods, and it can also sequence very short DNA fragments, making it ideal for sequencing ancient DNA 108. New methodologies have been developed that improve the efficacy of NGS and reduce the cost of sequencing 98. These methods include DNA extraction techniques, production of indexed libraries tailored for NGS mulitplex se- quencing, UDG treatment of DNA molecules when preparing ancient DNA for sequencing in order to remove deaminated bases, or ancient DNA frag- ments of interest may be ‘captured’ via hybridization with pre-selected probes 98. The process of ancient DNA analysis used in my projects will be elaborated upon below.

33 Sample material collection and preparation Museum collections are the main source of samples for ancient DNA research- ers. Before sampling begins, the museum collection is analyzed by ancient DNA specialists in order to identify candidate elements to sample for ancient DNA analysis. Factors such as bone type, overall appearance of the bone, and weight are taken into consideration. As mentioned earlier, the ideal sample would be either the temporal bone of the skull 92 or canine or molar teeth 93. If not the temporal bone, any bone which is compact such as part of a femur could also work 98. Lastly, bones and teeth should not be brittle or chalky, and should not feel light nor look bleached 98. Once candidates for analysis have been identified, if the museum allows, samples will be transported to a lab dedicated to processing ancient DNA sam- ples. This is a lab which is completely ‘decontaminated’ and prevents modern DNA from entering the lab or contaminating the samples. A typical ancient DNA lab is HEPA-filtered and has positive air pressure to prevent contamina- tion from modern DNA in the air, UV radiated to decontaminate all surfaces and equipment, and is stocked with supplies of > 3% Sodium hypochlorite (bleach), ethanol, and deionized water to clean bench tops, hoods, and any other work surfaces. Lab personnel must also wear protective suits, visors, covers and at least double layer gloves to prevent their own DNA from contaminating samples 98. As museum samples have been buried in soil and then handled first by archaeologists and then by museum staff, they could al- ready be contaminated with modern-day human and micro-organism DNA. Before entering the ancient DNA lab, samples are typically cleaned by being UV radiated, wiped with bleach and then then wiped with water before under- going UV radiation again 98. The first layer of bone or tooth may also be abraded off to reduce risk of contamination 109. There are also cases when sampling has to be conducted on site. The above conditions of an ‘ancient DNA lab’ can be replicated using a sterile, plastic tent which is completely isolated from the surrounding environment and can easily be cleaned with bleach, water and ethanol. Just as in the lab, personnel wear protective gear and gloves while handling and taking samples from bone and tooth material. Portable 254 nm UV lights are used to decontaminate the surface of bone material before they are sampled. Sample material comes in the form of very small bone or tooth fragments, or powder obtained from drilling into samples. Between 50 and 100 mg of powder is sufficient for DNA extraction. Bone mills may also be used to grind samples such as tooth roots or small bone, but this is only possible at an an- cient DNA lab facility.

34 DNA extraction method The primary method for DNA extraction that I used was a silica-based extrac- tion protocol, that of Yang et al. 109 with modifications of treating bone and tooth samples with bleach solution and water before extraction was begun 110. Bone powder is incubated in extraction buffer made up of 0.5 M EDTA and 1 M Urea to which at least 100 µg/mL Proteinase K is added. Proteinase K di- gests contaminating proteins present in the sample extract. After incubating for at least 24 hours, the mixture is concentrated by filtering through Amicon filters and then processed using Illumina’s mini-elute silica columns and pro- tocol, which binds DNA to the silica molecules while whatever is left in the sample solution is washed away. The DNA is then eluted in EB buffer. A number of DNA library protocols can then immortalize the DNA in the DNA extract. Extraction blanks are run alongside the extraction process to monitor possible contamination of the bone samples. Other methods in DNA extraction vary by the reagents added to the various buffers used, or how the DNA is finally isolated from the extraction buffer 111. DNA may be precipitated using alcohol after a phenol/chloroform extraction, or may be centrifuged out of the extraction buffer using a spin column. Finally, the most common form of DNA extraction is binding the DNA to silica using a binding solution and silica columns. Work by Dabney et al. have sought to make improvements to this method to enable binding of shorter fragments of DNA by using different reagents to make up the binding buffer and a silica column 112.

Library building methods In order to prepare sample DNA for sequencing on an NGS platform, ancient DNA is first made into a DNA library to be ‘immortalized’. In other words, the library can be amplified as many times as one likes. Importantly, DNA libraries are built to enable all DNA present in the extract to be sequenced on an NGS platform. Before NGS, ancient DNA was amplified using PCR and a targeted approach and then sequenced using Sanger sequencing, which re- sulted in one DNA fragment sequenced per experiment. The PCR-based tar- geted approach resulted in low throughput of a few hundred DNA fragments, hardly enough to sequence an entire nuclear genome. However, with the ad- vent of NGS, which gave rise to high throughput sequencing of many DNA fragments at once, ancient whole genomes are able to be sequenced at a high depth of resolution. The key element of the DNA library is the ligation of adaptors and indexes to sample DNA 113,114. The adaptors on either end of a DNA fragment ensures that it will attach to stable surface in the sequencing machine so that sequencing may take place, and also provides a common se- quence among the entire DNA sample (comprising many individual frag- ments) which allows all DNA fragments on that solid surface to be sequenced

35 at the same time with one common primer. It is this development which has increased the throughput of ancient DNA sequencing from a few hundred se- quences after sequencing, to hundreds of millions of fragments after sequenc- ing 115. Barcodes became integral to the building of an ancient DNA library to allow many ancient samples to be sequenced together at once, as each sample would have a unique barcode to identify it during downstream computational analysis of the sequence output 114,115. The resulting DNA library may then be amplified again and again using PCR to increase the number of fragments which may eventually be sequenced and analyzed. DNA sequencing may proceed in a number of ways. One may sequence all the DNA there is in a sample, also known as whole genome sequencing (WGS), or sequence specific sites in the genome which may be informative, as is the case with SNP capture. SNP capture used hybridization to capture a set of SNPs, sometimes of a magnitude of up to 1 to 2 million SNP sites of interest. During sequencing, sample DNA containing these sites will hybridize to the probe and thus be ‘captured’ for further analysis 116. This example is one of three hybridization capture techniques used to target specific sites of a DNA sample 116. SNP capture serves as a more economical and cost-effective way of sequencing DNA than WGS, and may be especially useful for low quality ancient DNA specimens. However, SNP capture has certain disad- vantages which affect which analyses and how much information may be ob- tained from a sample. The biggest issue affecting SNP capture as a method of sequencing is ascertainment bias 117,118. Ascertainment bias is caused by non- random sampling of SNPs 118. Many of the SNP microarrays and capture SNP sets available to human geneticists have complicated ascertainment schemes, which makes it difficult to account for in statistical analyses. Many of these microarrays have been created with medical genetics in mind, therefore sites of the genome which may be more informative for population genetic analysis may not be present on the array 117. Lastly, SNPs are usually ascertained from modern, European populations, therefore ancient, non-European genetic vari- ation may not be sequenced and available for analysis 118. WGS allows more of the ancient genome to be sequenced, ideally the entire genome, thus provid- ing more loci for analysis, free from ascertainment bias and including informa- tive sites such as private mutations and more rare alleles, which SNP chips or SNP capture sets do not target 118. The more loci that are sequenced, the more independent instances of the same evolutionary history we have for an indi- vidual. Therefore, we sequence the entire genome for our analyses. DNA sequencing for my projects was completed on an Illumina sequencer as much of the preparation of ancient DNA is conducted using Illumina chem- istry. The Illumina sequencing platform utilizes a ‘sequence by synthesis’ (SBS) approach whereby sample DNA will hybridize to complimentary oli- gonucleotides on a flow cell in the sequencer via the adaptors which have been ligated to it during library preparation 119. Attached fragments become a ‘seed’

36 from which more fragments are amplified on the flow cell through bridge am- plification. The result are clusters of identical DNA sequences which are then read by the sequencer to generate sequence data used in downstream popula- tion genetic analysis 119. DNA sequences are read through use of an artificial nucleotide (dNTP) which has been tagged with a fluorescent dye. Sequencing commences when each of the four tagged DNA nucleotides are added to the reaction 120. If the nucleotide of the sample DNA sequence matches the incoming fluorescent nucleotide, they will pair up, and the dye will fluoresce. Each time fluorescent nucleotides are added to the sequencing reaction, a high resolution image is taken of the reaction, and translated into a DNA sequence 120. In order to ena- ble sequencing of a sample, however, a DNA ‘library’ must be prepared for each sample. There are various types of DNA library building methods suita- ble for ancient DNA of different qualities and quantities, with characteristics such as damage patterns, low sequence coverage and high the possibility of being highly contaminated. Therefore, choice of a library building method largely depends on the state of the DNA sample itself. These library building methods shall be described in the following section.

Blunt-end library building The blunt-end library method (Figure 9) described by Meyer and Kircher 114 utilizes a modified form of Illumina’s library preparation kit and protocol, which is used for modern DNA. I have mostly used this method when screen- ing ancient DNA samples for sequence quality so that a more suitable library building method may be used for the sample later. In general, the process in- volves adapter ligation of short artificial DNA sequences to a sample DNA fragment, and filling in the adapters to create a double stranded DNA strand which can be sequenced. Ligation of adapters to the DNA strand allows the sample DNA to attach to a solid flow cell in a sequencer machine, where the strand can then be amplified and sequenced with fluorescent nucleotides 114 In this particular library building method, any overhanging single strand ends of the DNA is cut away in order to ligate sequencing adapters to them in the adapter ligation step. After adapter ligation, the DNA library is then pre- pared to undergo PCR, and PCR primers where either one or both primers have barcodes to discriminate between DNA samples are added to allow for multiple samples to be sequenced together at once on a sequencing machine, thus saving time and money 114. .

37

Figure 9. Illustration of the blunt -end library building method, including the addi- tion of a barcoded PCR primer (oligo). (Adapted from 114).

UDG treated DNA library building For DNA samples which show relatively high endogenous human DNA con- tent (>10%), UDG treatment of libraries may be implemented to reverse the effects of deamination (in other words repairing the postmortem DNA dam- ages). This library building method removes uracils from ancient DNA se- quences and repairs the ends of DNA fragments in order to reduce the transi- tions error rate of ancient DNA (Figures 10 and 11). In blunt-end library build- ing, where uracils are not removed, transition data cannot be used, as it likely contains DNA differences caused by postmortem DNA degradation in addi- tion to true polymorphisms. Removing transitions reduce the amount of se- quence information available for downstream analysis. Reparation of post- mortem DNA damage allows both transition and transversion data to be used, increasing the amount of genetic information that can be retrieved from the DNA 121. While such DNA damage is used to ensure authenticity of ancient DNA, blunt-end libraries have already been created for all sample DNA ex- tracts as part of the sample screening process, and the data can still be tested for authenticity by investigating such characteristics such as read length, and estimate contamination. The ‘damage repair’ method is similar to the blunt- end library building method, but utilizes Uracil-DNA-glycosylase (UDG) en- zyme as well as endonuclease VIII to cleave uracils from the DNA sequence and then repair the damaged ends of the fragments 121. Protocols may use USER enzyme instead, which is a premixed version of the two enzymes above (NEB) 121. After repair of the DNA fragment ends, T4 ploymerase makes the

38 ends of the double DNA fragment equal so that adapters may be added to the strand by T4 ligase (Figure 10). Figure 11 gives a visual representation of the damage plots of ancient DNA for both blunt end libraries and for UDG treated libraries. Both C to T and G to A transitions on the ends of the DNA fragments have decreased significantly in UDG-treated DNA libraries compared to standard blunt end libraries (Figure 11).

Figure 10. Different enzymes used in the damage repair library protocol for ancient DNA. The enzyme PNK phosphorylates the 5’ ends of DNA strands to enable liga- tion of adapters. UDG enzyme converts uracil nucleotides throughout the sequence to abasic sites and endonuclease VIII repairs sequences by cleaving DNA sequences on either side of the abasic sites, and T4 polymerase extends the 5’ end of the DNA sequence and removes the overhanging 3’ end of the resulting DNA strand. The DNA strand is then ready for adapter ligation using T4 ligase and BST polymerase (adapted from 122).

39

Figure 11. Graphs representing the damage profiles of two samples under two dif- ferent treatments: A and B) No UDG treatment, C and D) Full UDG treatment. Plots A and B shows characteristic damage patterns of increased C-T substitutions at the ends of DNA fragments and decreasing towards the middle of the strand, whereas plots C and D do not show an increase in C to T substitutions at the ends of strands at all 123.

Single Strand library building Lastly, the single stranded DNA library method is used for particularly dam- aged ancient DNA, as fragments do not have to be cleaved or made ‘blunt’ to ligate adaptors to the ends of the strand (Figure 12). Instead, double stranded DNA is separated into single strands and a biotinylated adapter ligated to the 3’ ends of each strand so that they may be ‘captured’ by streptavidin beads. While the fragments are immobilized on the streptavidin beads, fragments are replicated, and double stranded adapters ligated to either ends of those frag- ments. Fragments undergo PCR and subsequent sequencing as with other li- brary preparation methods. Separation of DNA into single strands ensures that a minimal amount of genetic information is lost, as more of the DNA strands which would have been too damaged or short to build libraries from are now accessible 124.

40

Figure 12. Schematic of the Single Stranded library method described above (Adapted from 113).

After DNA libraries have been made, all libraries can be investigated using quantitative PCR (qPCR) to check whether the procedure has worked, and how many cycles of PCR are required to amplify the DNA library to close to saturation point. Barcoded primers are used in the PCR amplification mixture and amplified libraries are then cleaned and quantified. The DNA libraries are sequenced using an NGS sequencing platform, such as the Illumina Hi-Seq Xten platform and pair-end reads 125.

41 Dry lab methods used to analyze ancient DNA Once raw sequence data is retrieved, data undergoes some initial processing to ready it for population genetic analysis. As different DNA samples have been pooled together for sequencing, first the sequence data is ‘demulti- plexed’, or sorted into individual samples using their sample barcodes from library building 126. Artifacts from library building are then removed, such as adapter dimers and chimeric sequences. Library adapters, which were used to append the DNA fragments to the flow cell for sequencing, are trimmed from the sample fragments and paired end reads are merged as forward and reverse strands of a paired end read usually contains overlapping information of the authentic ancient DNA. Merging paired end reads reduces sequencing errors and improves the overall quality of the sequence data 126. The resulting data is filtered for duplicates and non-human (mostly environmental DNA) contami- nation by mapping the sequence data to a reference genome, for humans this is currently the GRCh37/hg19 human reference genome. Since ancient DNA is fragmented into short fragments, and has postmortem damage, mapping pa- rameters such as accuracy and mismatch penalties are relaxed to allow for authentic fragments to map to the reference genome 127. Ancient DNA samples are also usually of low complexity, which means that there is a low number of unique, authentic sample reads in the DNA extract. After PCR steps in the library building process, such samples have a high instance of PCR duplicates, which do not contain any novel genetic information. PCR duplicates may be used to call a consensus sequence, or the duplicate of the highest quality may be used for downstream analysis. These duplicates are typically filtered out from the DNA sample. The read and mapping quality, endogenous DNA con- tent, genome coverage and clonality of sample DNA is then calculated. Se- quence information of the multiple PCRs, libraries, and then bones for one individual are merged into a single genome for further analysis. The data is also ready for authenticity measures described earlier such as deamination pat- terns and contamination estimates 29,103,126. Postmortem damage in aDNA samples can have implications for popula- tion genetic analysis, and their effect on analysis must be reduced. One method involves using solely transversions in the aDNA sample, which will also re- duce the number of sites used in analysis by about two thirds. An alternative approach may also be to trim the end bases of the sample fragments, as DNA damage is concentrated towards to end of aDNA fragments 128. However, this is not required for UDG treated libraries, as aDNA damage was repaired in these libraries. Genotype callers are also being developed which take postmor- tem damage into account when allele calling 129,130. Another limitation of working with low-coverage data is calling diploid genotypes. There are programs involving genotype likelihoods and known al- lele frequencies for the site which may ‘fill in’ the missing information. An alternative and common strategy for low coverage samples is to haploidize the

42 data 103. In this method, the base call with the highest quality score at a variable site is chosen. The result is a mosaic haploid genome for each sample 103. While this method circumvents the inability to generate diploid calls for aDNA, it creates limitations when using programs which utilize diploid data and Hardy-Weinberg equilibrium assumptions. Low coverage, ancient datasets typically contain much missing data, and any comparisons between ancient individuals is restricted to sites and SNPs that all samples have. Therefore it becomes important to involve as many an- cient individuals from the same population as possible, to fill in those missing gaps, and then compare whole populations.

Population Genetic analysis Single parental markers (mtDNA and Y-chromosome DNA) Haplogroup information for both mitochondrial DNA and Y-chromosome DNA is commonly analyzed in ancient DNA studies, but they contain much less information than nuclear DNA data. Haplogroups are arbitrarily defined haplotypes (of the mitochondria or the Y-chromosome) that are genetically relatively similar. 131.Haplogroup frequency and distributions differ between ancient populations, and may give some preliminary information on popula- tion demography and migration history 132. For example, mitochondrial hap- logroup U is found in high frequency in European hunter-gatherer groups, whereas haplogroups N1a, T2, K, J, HV, V, W, and X are more common in agriculturalist groups 67. Males from the , which originated from the Eurasian steppe, display high frequencies of Y-chromosome haplog- roup R1, and similar frequencies are found in the Corded Ware culture groups which are believed to trace some of their ancestry to the Yamnaya 23. Further- more, mitochondrial and Y-chromosome haplogroups may be compared in or- der to investigate cultural practices and movement of males versus females in a population 133. In order to determine the haplogroup for ancient individuals, sample se- quences are compared to the RSRS (Reconstructed Sapiens Referese Se- quence) for mitochondria 134, and the GRCh37/hg19 human reference genome for the Y chromosome. Online haplotyping tools such as haplofind (for mito- chondrial haplogroups), or SNP trees such as ISOGG (for Y-chromosome haplogroups) 135,136 would then allow for haplotyping the genetic data. The results from these programs may also be used to find private mutations in the individuals, which may indicate an extinct haplogroup lineage. Phylotree, an online tool containing phylogenetic trees for both mitochondrial and Y-chro- mosome haplogroups, may be used to verify the results generated by haplofind and ISOGG 137.

43 Principal Components Analysis (PCA) Principal Components Analysis is a data summary technique for simplifying and visualizing multi-dimensional data, such as allele frequencies across dif- ferent populations 138. PCA analysis reduces multiple dimensions, or in this case, the information from many SNPs across many individuals, to its princi- pal components, that is the data pattern that best explains the variation seen in a dataset 139. In the case of ancient DNA analysis, the aim is to find patterns between the layout of different samples and how they may be associated and relate to one another, and through time through comparison with a modern sample set 140. The output is in the form of a PCA ‘plot’ where the axes can be represented by the first two ‘principal components’ of the data (but other PCs can also be represented). In this way one may find key connections or rela- tionships among or between populations or individuals. When running analysis on ancient individuals, often a set of modern-day reference individuals are utilized for comparison by projecting the ancient in- dividuals on the background of the modern-day sample set. In this way one may determine not only how ancient samples cluster together into possible population groups, but to which modern populations these ‘clusters’ are most related. PCA analysis thus gives preliminary information on population his- tory of ancient individuals. The programs PLINK and EIGENSOFT 141,142 can be used to prepare the data for PCA analysis and construct the PCA plot, and the program R is used to visualize the output from these two programs. Ancient DNA data from dif- ferent individuals may not overlap due to low coverage, or the way that they were processed or sequenced 140. Therefore a mathematical Procrustes trans- formation allows non-overlapping data to be compared to each other and mod- ern data 20,140. There are, however, some caveats which should be considered when applying the PCA approach to genetic data, and specifically ancient DNA. A study by Novembre et al. 138 found that the reliance on PCA alone to interpret genetic data may be very misleading. They used simulated data to show that the spread of data points on a PCA plot are affected by mathematical artefacts which occur when PCA is applied to spatial data. Not only that, but sample and dataset sizes also affect the PCA plot, and this is especially rele- vant to the study of ancient DNA 138. SNPs under strong selection may also influence the layout of the PCA, as selection strongly impacts genetic varia- tion. Finally, the idea that Principal components may be signatures of histori- cal events, such as mass migration, is problematic, as these patterns describe a general decay of genetic similarity with distance, which applies to a number of other demographic scenarios 138. For ancient DNA where there is less than 10 000 SNPs available for analysis, there will be misplacement of data points on the graph. The less genetic information there is for a sample, the more the sample can be drawn to the axis origin (0,0) due to the lack of information on the particular sample 21. PCA is often used as a primary data exploration and

44 visualization tool, which is then followed by other computational and popula- tion genetics analyses.

Admixture Whereas PCA analysis is an algorithmic method for displaying similarity (and in our case, genetic similarity), there are also model based methods for esti- mating ancestry 143. Model based ancestry estimation methods considers an- cestry as parameters of a statistical model, by calculating the probability of the observed genotype of an individual given different proportions of several ancestral populations. The model may also include information of allele fre- quencies of different populations to estimate ancestry. The result of these anal- yses are fractions of ancestry components (or ‘clusters’) representing the ge- nome of individuals, often displayed as bar plots (Fig. 13) 143. The program ADMIXTURE for instance, is used to determine local and global ancestry by estimating ancestry components of sample individuals from a large genetic reference dataset, and is possibly the most common admixture inference pro- gram used in the study of ancient DNA 143. The program relies on assuming a certain number of ancestral clusters (K- values), assigning each cluster to an ‘ancestry’ common to the modern popu- lations reference dataset and then using those clusters to describe the ancestry for each sample individual143. After using ADMIXTURE to infer ancestry components, the programs PONG and R may be used to convert the ancestry components into an admixture plot (Fig. 13). The above programs are ideal for detecting population substructure, how- ever, they do not offer formal tests of admixture 117. Genetic drift comes about when a population has been isolated for a long period of time, and in these cases new alleles arise and becomes more prominent in the population, and some alleles disappear through time 144. This creates a change to genetic vari- ation in the same way that admixture changes genetic variation, and can be identified as unique ancestry components by these programs. Furthermore, the ancestry substructure revealed by admixture programs may have come about in many ways, with different population histories. Population structure in re- ality may be influenced by many demographic structures which will not be picked up by admixture programs 144. For example, populations may follow an isolation-by-distance model, in which case there would not be discrete pop- ulations. There may also be substructure within populations which may be too subtle to be picked up by the admixture model 144. Population structure is also fluid: it changes constantly over time between different periods of history. Therefore the effects of ancient admixture will be overrun by the effects of recent admixture 144. Ancient DNA was believed to help solve some of these issues, as ancient individuals represented another time in history before periods of recent ad-

45 mixture. However, admixture programs cannot detect which samples are an- cient and which are modern. The problem is also compounded by sample size. Compared to the modern-day reference dataset, which can have large cohorts of individuals making up a population, there are typically only one to several ancient samples to represent a population. This will make it hard for programs such as ADMIXTURE to really discover evidence of past events which have had a huge impact on admixture in past populations. Lastly, the reference da- taset itself poses problems, as although these are modern populations, they are used to assign ancestry to ancient samples. In reality, many samples will have ancestry from a ‘ghost’ population which does not exist today, but must be represented by a modern reference dataset 64,144. Therefore it is important to include formal tests of admixture to verify the visualization of ancestry demonstrated by both PCA and admixture models.

Figure 13. Admixture summary plot for paper II describing the ancestry of the Vaalkrans individual using a K value of 8. Each colour signifies a different ancestry, denoted in the coloured key to the right. Therefore, using the key above, the Vaalkrans man (LSA_SA_vaa001) may be described as having 85 percent southern Khoe-San ancestry, and 15 percent East African ancestry.

46 F-statistics F-statistics are a suite of tests which calculate the shared genetic variation be- tween sets of 2 to 4 populations 145. They may be used to address simple and complex hypotheses for admixture and provide support or clarify the demo- graphic patterns identified in PCA and admixture plots. They are especially robust to some limitations of earlier methods, such as sample size. They can be more flexible in that the researcher may take a more active role in building models of population relationships based on genetic data 117, , but that can also be problematic in that all combination of tests cannot al-ways be tested and then the choice of tests become subjective. F-statistics are based on whether the result of the test are consistent with the value of 0. The value of 0 indicates that no admixture has taken place 117. The statistical tests used in this thesis were ABBA/BABA test, or D-statistics, an f4-ratio test, and Admixture graph fitting. The D-statistic compares a sample individual with an outgroup population and two other samples representing comparative populations (Figure 14). It is formulated similarly to a 4-population test in that it tests whether admixture has occurred between the 4 populations in question, but the D-statistic goes further in indicating directionality of gene flow between populations 117. Fig- ure 14 shows an unrooted tree topology representing the relationships among 4 populations. The D-statistic calculates tree topologies based on both the red and blue possibilities. These are also referred to as ABBA and BABA tree topologies 103,146. By comparing the number of trees that follow either ABBA or BABA typologies, a D-value is calculated. If the D-value is negative, the sample in question will share more genetic variation with population 1. If the D-value is positive, the sample will share more genetic variation with popula- tion 2. In this way a number of population relationships may be tested, as is seen in paper I and paper II. For this analysis, the POPSTATS statistical pack- age is used to calculate the D-statistic as well as standard error associated with the outcome 146. An f4-ratio test is used to distinguish introgression, or admixture, from in- complete lineage sorting or shared ancestry based on allele frequencies of 4 117 populations . Let the 4 populations be labelled A, B, C and D. The f4-ratio test calculates the difference in allele frequency between populations A and B, and C and D. If all populations share ancestry, and there has been no ad- mixture, the allele frequency differences between A and B and C and D will be independent from one another, and the resulting f4 statistic will be 0. But, if there was introgression of one population into another population, such as 117 A into C, then the f4 statistic will be different to 0 . Therefore the resulting value, known as alpha (α) also indicates the amount of introgression from pop- ulation A into C. One caveat of this statistical test however, is that more as- sumptions are made of the historical phylogeny 117.

47 Lastly, admixture graph fitting allows the researcher to create a tree suiting a phylogenetic model hypothesis. Populations form nodes on the tree, while branch lengths between the nodes are Fst values. Fst, or the fixation index, is a measure of the genetic difference between populations of interest 147. A high Fst value suggests that there is a large genetic difference between populations, and a small Fst value indicates that the populations analyzed are genetically similar 147. Admixture between populations may also be tested and are given as percentages (Fig. 15). Support or lack thereof for the graph is given by a z- score value, which is the standard error calculated for the worst supported branch in the graph (Fig. 15). It is beneficial in that poor Fst values will indi- cate which parts of the model does not fit the genetic data, and these aspects of the model may be changed. It is also more generalized than the D-statistic in that more than 4 populations are tested. However there is a danger of over fitting the graph, especially if admixture history between the tested popula- tions are complex and/or there is limited genetic data for certain populations 117.

Figure 14: Schematic diagram showing the two scenarios which the D-statistic measures. The sample is first compared to one population (population 1) and an out-group, and then another population (population 2) and an out-group in order to determine which of the two populations the sample is more related to. If the D-sta- tistic is negative, one scenario (light grey tree) is supported, and if the D-statistic is positive, the other scenario (dark grey) is supported.

48 Figure 15. An admixture graph constructed as part of the analysis for paper II (also known as qpgraph). This model tests the admixture history of the Vaalkrans individual. The populations form nodes on the graph, and the branches between the nodes are Fst values calculated between the joined nodes. The dotted lines signify admixture, and the amount of admixture is indicated in percent. As well as Fst values, the graph is also further supported by a low z-score of 0.063, on the top right of the figure.

49 Research aims

The aim of my PhD research is to investigate demographic and adaptation history of prehistoric populations using ancient DNA analysis, with a focus on the demic and cultural contacts between different groups and the conse- quences thereof. The analysis involved the selection and application of differ- ent DNA library building methods based on the quality of ancient samples, and the performance of various population genetic analyses in order to explore and interpret the sequence data from these ancient individuals.

PAPER I focuses on the cultural contacts between the last hunter gatherer community known as the Pitted Ware Culture, and the Scandinavian mainland variant community of the Corded Ware Culture known as the Battle Axe Cul- ture. I do this by analyzing PWC burials dating to circa. 5 kya from three PWC gravesites on the island of Gotland. Some of these burials are typical of the PWC, while others have material cultural influences associated with the BAC. My aim is to determine whether this interaction involved solely the exchange of ideas between the two cultures, or whether BAC individuals were actively incorporated into PWC communities through genetic admixture. PAPER II is focused on the influence of the pastoralist migration on local southern African groups living in the past and in the present. I do this by in- vestigating the ancestry of an individual from the Vaalkrans rock shelter in the southwestern Cape of South Africa who lived in the period just after the Eu- ropean colonization of southern Africa. Using similar analysis techniques to paper I with adaptation to the processing of hair samples, I aim to determine the genetic affinity of this individual to other local populations living in South Africa, and how the individual fits into the prehistory thereof. PAPER III investigated the genetic landscape of people living in southern Africa in the past. This study led to surprising finds, including rewriting the age of modern humans, and demonstrating migrations of pastoralists from eastern Africa to southern Africa. An additional aim of the study that I partic- ularly focused on was to investigate the frequency of functional variants among southern Africans of the past spanning from before the first of three major migrations to the south to just before the very last migration to South Africa. The aim of this SNP analysis was to understand the genetic landscape of functionally associated variants and how this changed with subsequent ad- mixture from these migration events.

50 PAPER IV is an expansion of paper III, in both the number of functional variants analyzed as well as the number of ancient and modern African indi- viduals analyzed. The aim of this paper was to investigate how changes in the genetic landscape of Africa evolved into adaptation to novel environments, pathogens, and subsistence strategies, and how this related to the major Holo- cene migrations in Africa.

51 Results Summary

Examples of the complexity of human demographic history (Paper I, Paper II) The story of human migration history has come a long way since it was first investigated many decades ago 148. Initial archaeological and genetic research focused on two explanations for cultural change through time. For archaeol- ogy the debate often focused on two different explanations; cultural diffusion versus demic diffusion. Genetic research often started from biological termi- nology, where change in culture could be explained either as population re- placement, or continuity, where populations have been living in the same place for a long time, but where cultural expression may have changed for various reasons 16. However, with the availability of more modern-day, and now ancient human genomes, researchers are finding that human migration history is far more complex than the two hypotheses initially proposed. For every region in the world, there have been multiple migration waves to and from each region, resulting in varied levels of admixture between emigrants and local populations, from no admixture at all to complete population re- placement 4,5,16. In paper I and paper II, the ancestry of the ancient individuals were ana- lyzed through uniparental marker haplogroups, PCA and admixture analysis, and various F-statistics on genome-wide data. In paper I, the above analyses were used to compare individuals from ‘typical PWC’ burials with PWC bur- ials which had ‘BAC influences’ within three PWC gravesites. PCA and AD- MIXTURE analysis elaborated on the finding, showing that while BAC indi- viduals from the Swedish mainland exhibited ancestry similar to that of people associated with the Corded Ware culture, the sampled PWC individuals from Gotland all exhibited ancestries much like Mesolithic Scandinavian hunter gatherers (Fig. 16). Inherent in the ancestry plots for these individuals was some farmer ancestry, which suggests that there was some gene flow between the PWC and FBC populations, but no steppe ancestry, which excludes BAC populations as a source of admixture into the PWC. D-statistics was also ap- plied to formally test the ancestry patterns seen in the PCA and admixture plots, and compared the study samples with a PWC individual, and a BAC individual. Individuals from the CWC were also included as controls. Both typical PWC burials and those with ‘BAC influences’ shared the most genetic drift with other PWC individuals and Mesolithic hunter-gatherers, and not

52 with the BAC individual and the CWC controls, which shared genetic varia- tion with the BAC individual. In summary, results for paper I show that in this case, the existence of BAC influences in PWC burials on Gotland was an incorporation of some BAC cultural ideas and practices into PWC communi- ties, perhaps as a way to differentiate themselves from other PWC communi- ties on the island. Therefore, in this case, the two cultures did not interact genetically on the island of Gotland, but culturally.

Figure 16. Admixture plot representing the ancestry of the Pitted Ware culture indi- viduals of Paper I, and various comparative ancient populations. While the BAC in- dividuals exhibit ancestry similar to the CWC, the ancestry of PWC individuas looks much like that of Scandinavian Mesolithic hunter gatherers. This is the case for all samples, regardless of whether they exhibited ‘BAC influences’ or not.

53 In paper II, I sought to understand the impact of the pastoralist migration from East Africa on southern African hunter gatherer populations. This was explored by using uniparental markers, PCA, admixture and F statistics to in- vestigate the genetic demography and history of one South African individual from a few hundred years ago. Results identify the Vaalkrans individual as having similar genetic ancestry to other ancient individuals from a pastoralist context, as the Vaalkrans individual has a mixture of indigenous San and East African admixture. The Vaalkrans individual has a mitochondrial haplogroup which is common in other Khoe-San groups as well as throughout Africa and . However, the Y-chromosome includes a mutation that forms part of a subclade which is restricted to East and South Africa. Other ancient South African individuals from Skoglund et al. 64 and Schlebusch et al. 26 were included in the study for comparison. Two individuals from Skoglund et al. 64 from Faraoskop and St Helena, as well as the Ballito Bay and Doonside indi- viduals from Schlebusch et al. 26 were described as being from the African Late Stone Age. Another individual from Skoglund et al. 64 excavated in Kasteelberg, was associated with pastoralist material culture as well as exhib- ited pastoralist genetic ancestry. The Kasteelberg pastoralist individual pre- dates the Vaalkrans individual, being dated to 1.2 kya compared to the Vaalkrans individual. In all population genetic analyses, the Vaalkrans indi- vidual was genetically similar to the Kasteelberg individual associated with pastoralism 64. The possible source of East African admixture was explored using D-sta- tistics, f4-ratio statistics and admixture graph fitting (qpgraph). Of the ancient individuals and the modern East African reference populations used in the study (Amhara, Somali, Oromo, Maasai, Mota individual), the Amhara had the highest D value, pointing towards that group being related to the source of East African ancestry of the Vaalkrans sample. This finding is reflected in similar analyses in paper III. f4-ratio statistics verified the above result, as well as confirmed that the source of East African ancestry to the Vaalkrans individual was more closely related to the Amhara than to the East African Mota individual, which existed 4.5 kya. Qpgraph models testing the ancestry of both the Vaalkrans man as well as the Kasteelberg individual found that while the Amhara are likely not the true East African ancestral population source, an East African population that is similar to the Amhara is a likely source population. This suggests that the Vaalkrans individual, the Kasteel- berg individual and the Amhara share a common East African ancestor. Such a result is an example of the limitations of using modern populations as refer- ence datasets for population genetics analysis of ancient DNA. Admixture proportions for the East African component of both ancient pas- toralists were similar between the plot calculated by ADMIXTURE, and qpgraph admixture percentages. Interestingly, when placed on a timeline from the Late Stone Age southern Africans, the Kasteelberg pastoralist, Vaalkrans

54 man, and to the modern-day pastoralist Nama in Namibia, East African ad- mixture initially arrives somewhere between 2000 and 1200 years ago, and varied in proportion in different pastoralist populations till today (Fig. 17). This supports previous findings that with the arrival of East African pastoral- ists to South Africa sometime after 2 kya, and that pastoralist groups would genetically mix with the local groups living in the surrounding area 26,64. Another finding was the lack of West African genetic influence on the Vaalkrans man. All statistical analyses show that the Vaalkrans individual did not have West African ancestry or admixture, which contrasts much of the ancestry patterns seen in other southern African groups. A pattern that is due to the fact that the Bantu expansion had a large genetic influence on the local populations throughout Africa as they migrated from West Africa to southern Africa. A reason for the lack of West African ancestry in the Vaalkrans indi- vidual may be that while the Bantu speakers established settlements through- out much of the region, their territory was in the eastern parts of South Africa and the Eastern Cape, all the way to the Fish River. The western parts of South Africa was occupied by khoekhoe herders. Results from this study link the migration of East African pastoralist pop- ulations southwards to the genesis of the khoekhoe herders in South Africa. Till today, pastoralist groups have ancestry from East Africans, which intro- duced herding and pastoralism to local communities in South Africa 2 kya. The contrast in results between paper I and paper II attest to the complexity of human prehistory, and human admixture history in general. It also demon- strates the need for a variety of statistical tests to clarify patterns of admixture detected by generalized statistics such as PCA and ADMIXTURE, and how without further analysis many of these patterns may be misinterpreted or sim- plified. Both paper I and paper II exemplify the complexity of human demo- graphic history. As has been seen here and in many other studies on human population demography, many prehistoric cultural transitions was accompa- nied by the genetic incorporation of one cultural group into another. However, the results from paper I suggests that this is not always the case, and that cultural transition , or at least cultural influences, may take place through ex- change and incorporation of cultural ideas only, without any genetic admix- ture between populations.

55

Figure 17. Line graph showing the East African ancestry proportions for Stone Age hunter gatherers (Ballito Bay A and Ballito Bay B), the two pastoralists (Kasteel- berg, Vaalkrans) and modern day pastoralist populations (Nama). The arrival and admixture with new groups from East Africa can be seen in individuals post-dating 2kya.

Migration dispersal of adaptive variants (Paper III, Paper IV) While all adaptive variants begin as a random mutation in the genome of a germ-line cell, paper III and paper IV demonstrate that novel adaptive vari- ants may be introduced to populations through admixture with migrants car- rying adaptive variants in their genomes. If selected for, functional variants will increase in frequency in resident populations. Both papers further reiterate the advantage of using ancient genomes in elucidating the past, by making statistical models and other analyses more accurate, and providing a means to study adaptation in ‘real time’. For paper III, we sequenced the nuclear ge- nomes of 7 ancient individuals; three dating to the Late Stone Age (2 kya), and 4 dating to the African Iron Age (300 to 400 ya) 26. In particular, one of the 2 kya individuals related to Southern San groups was sequenced to 13 x coverage and used as a reference ancient genome for subsequent analyses. We found that the three Stone Age individuals formed a group somewhat sepa- rated from all other modern Khoe-San and very different from other African

56 groups. They were most closely related to Southern San groups such as the Karretjie people 149 and the Lake Chrissie San 150. Iron Age individuals grouped with the Southeastern Bantu speakers from South Africa. Use of ancient African individuals who were the ancestors of the Khoe-San allowed us to establish a more accurate date for the divergence of the Khoe- San from other modern humans, and therefore we were able to elucidate the emergence of modern humans to an earlier date than previously thought. Whereas past research estimated a divergence time of 100-150 kya, use of the high coverage Stone Age individual pushed back the divergence time to be- tween 260-350 kya, towards the genesis of the Middle Stone Age when hu- mans became morphologically and behaviorally modern. Paper III also exemplified a pitfall of using modern populations as ‘unad- mixed’ examples of their ancient counterparts. Using the high coverage Stone Age genome, it was found that all modern Khoe-San groups and most notably the Ju|’hoansi have 9-22% admixture from an already admixed group of Eur- asian-East African ancestry dating to 1.5-1.3 kya. Such results have an effect on past genomic research as the Ju|’hoansi were considered as the human pop- ulation with the least admixture compared to all other African groups, and was used accordingly in human evolutionary genetics research. Another advantage of sequencing these ancient individuals was to explore adaptation of African populations through time. For both paper III and paper IV, I used the samtools mpileup function (v1.3) 151 to investigate genes coding for variants of interest in African populations. Such genes included the MCM6 lactase persistence gene, genes coding for malaria resistance and the genes coding for resistance to African sleeping sickness. Of the Iron Age individuals, three carry the variant for malarial resistance, and two carry a protective variant gene associated with African sleeping sick- ness (Table. 1). The Stone Age individuals do not carry these variants. None of the ancient individuals carry the lactase persistence variant (although for some individuals we only have one of two gene copies (Table 1)).

57 Table 1. An excerpt from Paper III of SNPs associated with disease resistance genes for the two Stone Age individuals (BBA, BBB), and Iron Age individuals (CHA, ELA, MFO, and NEW). Overall, the individuals from the Iron Age showed heterozygous or homozygous allele states which conferred resistance to disease. None of the indi- viduals did, however, have the derived allele for the lactase persistence gene. Allele states in bold indicates the disease resistance allele. LP = Lactase persistence, MR = Malaria resistance, ASSR = African sleeping sickness resistance. Trait Gene Al- BBA BBB CHA ELA MFO NEW leles

LP MCM6 C/G C(11) C(2) C(1) C(2) C(5) C(9) (East Af- rica)

MR DARC T/C T(19) T(1) NA T(5)/ T(1)/ C(18) (FY*O) C(2) C(4)

MR DARC G/A G(10) NA A(1) A(5) A(8) A(8) (FY*A)

MR G6PD C/T C(4) C(3) NA C(3) C(7) C(21)

MR ATP2 G/T T(15) G(1) NA G(3)/ G(5)/ T(9) B4 T(1) T(2)

ASS APOL A/G A(16) A(3) A(1) A(3)/ A(15) G(18) R 1(G1G) G(4)

ASS APOL T/G T(18) T(2) NA T(3)/ T(8) G(10) R 1(G1M) G(1)

We concluded that the discrepancy in resistance alleles expressed by Stone Age and Iron Age individuals was due to their different ancestries, as Iron Age individuals had extensive West African ancestry, where pathogens such as Malaria and African sleeping sickness were endemic 26. This also explains why none of the ancient individuals display the lactase persistence allele, as none of them had genetic influence from pastoralist populations. Compare this with results from functional variant analysis in paper II of the Vaalkrans in- dividual, who did have a derived allele conferring lactase persistence, a variant that is common among east African pastoralists. This is a direct indication of east African admixture and links this individual to pastoralist Khoe groups of the southern Cape. I further investigated the results of the adaptive variants in paper III for my last project and extended the analysis to include more adaptive variants. Included in this SNP list were variants associated with overall disease re- sistance, and other diseases common in Africa, such as Malaria, Trypanoso- miasis, Lassa fever and Trachoma, and SNPs associated with metabolism, skin pigmentation, and salt sensitivity 152–156. With a larger cohort of ancient data, including individuals from this study, as well as data from other prehistoric

58 African individuals 61,64, I compared allele frequencies of this more elaborate functional variant list between ancient Africans and genetic data from modern African populations. Collecting allele frequency data for modern African populations proved to be quite difficult, as many modern-day African population datasets were col- lected using various SNP arrays or SNP capture methods. As SNP arrays and capture sets only include a subset of often common variants, many of the func- tional variants were not included in these SNP sets, therefore allele frequency information for many SNPs had to be collected using other means. Other ge- nome databases, such as the Allele Frequency Database (ALFRED) 157 and ENSEMBL 158 were used to obtain allele frequency information for African populations where array data was lacking. As was found in paper III, each SNP group revealed allele frequency pat- terns which traced the most impactful migration routes in African history (Fig. 4). For the disease resistance alleles analyzed, the highest allele frequencies were found in modern populations along the migration route taken by Bantu- speaking farmers from West Africa to South Africa: along the Sahel belt, and then south towards southern Africa. Traces of the Bantu migration was espe- cially reflected in the alleles conferring resistance to Malaria, in the pattern of derived alleles found in Iron Age individuals, and not Stone Age individuals in South Africa. Investigation of the different lactase persistence variants il- lustrated different migration routes as well. The pattern of high allele fre- quency of the East African lactase persistence allele in particular, reflected the pastoralist migration along Eastern Africa southwards, and was also evident overall in ancient individuals when analyzing SNPs associated with lactase persistence. The pattern in population allele frequencies coding for salt sensitivity and skin pigmentation also followed migration routes with big impacts on the ge- nomes of African populations: the Eurasian back migration to Africa after 4000 kya, as well as the colonial migrations of Europeans to South Africa about 400 years ago. Alleles coding for a high sensitivity to salt, which were in high frequencies since prehistoric times in African populations, have de- creased as a result of Eurasian ancestry in modern-day African populations. Skin pigmentation, while more varied in ancient African populations than salt sensitivity, was also affected through time as Eurasian ancestry became a part of modern-day African populations’ ancestry. For many of the skin pigmen- tation alleles studied in paper IV, the ancestral allele resulted in dark skin, whereas the light skin pigmentation variants for genes such as rs1426654 (SLC24A5) evolved and were more prominent in Eurasia 116. In modern-day African populations, light skin pigmentation alleles have become more com- mon in Northeastern and Eastern Africans as a result of Eurasian back migra- tion, as well as in the Coloured populations in southern Africa, which are de- scendants of mixed European and African populations 55. However, Khoe-San populations have a light skin phenotype, but high allele frequencies for the

59 dark skin allele, and varying frequencies for the rs1800404 (OCA2) variant also associated with dark skin pigmentation. As Khoe-San represent one group that captures the deepest divergence among modern humans (all other modern human populations are included in other groups aside from those encompass- ing the Khoe-San) and display the greatest level of genetic diversity among all humans across the world, skin pigmentation in the Khoe-San may be con- trolled by either different or a more diverse set of genes 159. FADS1, a gene which plays a role in lipid metabolism, was previously as- sociated with the change from hunter-gathering to an agricultural lifestyle 24,160, but this association was recently refuted in Mathieson and Mathieson 161. The derived variant of FADS1 (rs174546) is at high frequency in both ancient and modern-day African populations, except in Stone Age individuals in South Africa that often carry the non-adaptive variant (the Ballito Bay A boy is heterozygous for rs174546 (FADS1)). In modern African populations, the adaptive allele is almost fixed throughout Africa, except for a few populations in northern Africa, as well as among the Coloured populations in southern Africa. Here, migration may have played a greater role in the allele frequency pattern for the FADS1 variants. The derived allele is at lower frequencies in European and almost non-existent in Asian populations (European and Asian ancestry is found in Coloured populations in South Africa), and may thus have reduced the frequency of the allele in these populations. Other metabolism genes highlight another aspect of adaptation in African populations, that of the change and expansion of subsistence strategies through Africa and the selective pressure it played on adaptation to new diets and food sources. The gene NAT2 is supposedly linked to the introduction of agriculture to Africa, and the slow acetylator variant of the gene is associated with agricultural populations. This variant is almost absent among ancient Af- ricans, but it is at high frequency among modern-day African populations that have a long history of pastoralism and agriculture. This can be seen as rela- tively high allele frequencies in African populations across the Sahel belt. The allele frequency plot for the adaptive variant of rs6444174 (ADIPOQ) demonstrates the effect of admixture on allele frequency patterns of two pop- ulations which had no contact in the past but have come into contact relatively recently. The derived allele for rs6444174 (ADIPOQ) codes for normal BMI in healthy individuals, and is highest in hunter-gatherer populations, and pop- ulations with hunter-gatherer ancestry. The allele is common among ancient individuals (e.g. Stone Age individuals in South Africa and the Mota individ- ual). The other high coverage individuals show either an absence of the allele or are heterozygous. Among modern-day Africans, there are high allele fre- quencies in Khoe-San populations, and increased allele frequencies in popu- lations surrounding these Khoe-San groups. It is likely that admixture in- creased the level of this allele in surrounding Bantu-speaking populations, as the allele did not exist previously in Iron Age individuals.

60 Overall, paper IV verified the results of paper III, in that Bantu-speaking groups introduced disease resistance alleles to southern African groups. Ex- tending the number of SNPs studied as well as including more ancient and modern African populations revealed the effects of other major migrations on African groups, such as the Eurasian back migration, the pastoralist migration and the arrival of the Europeans to South Africa. Lastly, functional variants associated with metabolism gave interesting results, demonstrating the effect of positive selection and subsequent admixture between different groups on the allele frequency of introduced SNPs to a population. However, the study also reiterated the caveats of working with ancient DNA. Many of the ancient genomes used were sequenced as whole genomes, which made this study possible compared to if they had been investigated us- ing SNP capture techniques. But some of the studied genomes were low cov- erage at many of the analyzed SNP sites, and so results for allele frequency at these sites were hard to establish. Due to the scarcity of ancient DNA from Africa, and in general, single individuals had to represent whole populations, and so allele frequencies for ancient populations in those regions may be dif- ferent in reality. More samples from Africa, and from interesting areas such as West Africa where tropical diseases were endemic since ancient times, would make future analysis of adaptive allele trajectories more reliable and reveal more interesting aspects of human adaptation in the place where we first evolved.

61 Conclusions and future prospects

In conclusion, the history and effects of migration on human evolution is a complicated one. While there are broad, common trends and patterns in mi- gration history throughout different human populations, each region of the world, and the human groups living in them, have very unique histories of migration, admixture, and adaptation which have shaped their past with re- gards to cultural change, and adaptation to their surroundings. While archae- ology has been of extreme value in elucidating this complex, multifaceted past, the genesis of DNA studies has enriched their story, and now ancient DNA has helped to clarify even further questions surrounding our prehistory. First and foremost, ancient DNA substantiates that humans are a migratory, adaptive species. Migration formed an integral part of the human story, and of human evolution. Migration often precedes genetic and cultural change, as has been shown with the pastoralist migration from eastern Africa to southern Af- rica, but it can also result solely in cultural influence without genetic ex- change, as is the case with the Pitted Ware gravesites in Gotland. Migration can also introduce novel mutations which may be advantageous in new envi- ronments and in new cultures, such as metabolism genes and their advantage in agriculturalist populations. The fact that many SNPs studied in paper III and paper IV trace the major migrations through the African continent attest to the great impact that migration has had on our past and has on present ge- nomic landscapes, as it increases and decreases the allele frequencies of dif- ferent functional variants in our genomes. The increase in adaptive alleles in different populations as a response to the many selection pressures of our en- vironment has allowed human groups to live, and to thrive all over the world. But just as ancient DNA has been greatly advantageous to us in the study of our past, there are areas in which ancient DNA research must still develop. Further improvement of both laboratory and genome analysis methods is re- quired to better analyze ancient DNA. So far, both wet and dry lab methods used in the study of modern-day DNA have been manipulated to accommo- date the analysis of ancient DNA, and we are still discovering how ancient DNA sequences behave under these methods and run the risk of over-inter- preting the results 162. Population genetic analyses do not yet have sufficient statistical power to analyze low coverage ancient DNA from a small number of ancient individuals in the same way in which we analyze modern, high cov- erage DNA. Furthermore, while DNA damage and contamination has been accounted for as much as possible in these analysis techniques, they still affect

62 how much information we may glean from ancient genomes and how we may interpret the information we receive from them. Therefore, many different sta- tistical methods should be utilized to verify the results from other forms of population genetic analysis on ancient DNA material in order to be sure of the results they produce. However, methods are continually evolving, and we see evidence of this in how ancient DNA research has advanced in just the last decade. Progress has been made on wet lab techniques to allow for sequencing of DNA samples from areas prone to bad DNA preservation, such as Africa. In areas where DNA preservation is favorable, such as Europe, there are now SNP data from hundreds of ancient individuals to bolster bioinformatics analysis and increase the amount of reliable information on the ancient populations analyzed. As ancient DNA is becoming more common as means of interpreting human his- tory, population analysis techniques are also improving, and researchers better understand how dry lab analysis handles ancient DNA sequence data. As methods continue to develop, we see a shift in focus from Europe and questions surrounding European prehistory to other regions, including Africa, and our origins in Africa. It is already known that Africa harbors the most genetically diverse human populations in the world, and so further research in both ancient and modern-day populations would reveal new key insights not just into our evolutionary history, but into human adaptation as well. There should be more research into adaptation in the human genome, and how an- cient adaptation affects modern populations living in a completely different environment to that of their ancestors. There is already much evidence of how different human populations are more susceptible to certain diseases due to their ancestry, or react to different medications 55,152. Further research incor- porating both ancient and modern genomes could therefore help the medical field better understand disease and develop more effective treatment strate- gies. As DNA sequencing becomes ever more affordable, and the ability to se- quence more ancient genomes becomes possible, scientists are coming to a deeper understanding of ourselves and every facet of what makes us who we are. Ancient DNA research continues to reveal ever more nuanced, complex stories of human prehistory and human evolution, and these stories then help us to understand the present, and better predict how we may react in the future in an ever-changing environment, complete with novel pathogens, new ways of life and a constantly changing environment.

63 Svensk sammanfattning

Människans evolutionära historia var länge ett område som beforskades inom arkeologi, fysiska antropologi och lingvistik. Även om dessa fält har bidragit mycket till kunskapen om utbredningen av Homo sapiens, och vår kultur, så lyckas de inte besvara nyckelfrågor om människans historia. En sådan nyck- elfråga rör skiften i materiell kultur över tid och vad det säger om hur forna människogrupper interagerade med varandra. Den andra frågan är hur männi- skor har anpassat sig till de olika miljöer som de stötte på när de befolkade resten av världen efter att ha lämnat sitt ursprungsområde i Afrika. Genetik, och nyligen arkeogenetik, har blivit viktiga verktyg inom fältet och har bidra- git till nya perspektiv på forskningen kring människans historia. Jag har un- dersökt specifika exempel i människans historia med hjälp av att ta fram ge- netisk information från förhistoriska mänskliga lämningar och analysera dessa data. I mitt första projekt undersökte jag potentiella interaktioner mellan män- niskor som praktiserade den Gropkeramiska kulturen (GRK) och Stridsyxe- kulturen (STR) på Gotland. GRK var kulturen hos de sista stenålders-jägar- samlarna i Skandinavien. De livnärde sig på säljakt och kulturen har fått namn från de karakteristiska gropar som återfinns i deras keramik. De begravde sina döda på rygg i enskilda gravar. Människorna som är associerade med STR levde som herdar och jordbrukare. STR kulturen anses vara en regional variant av den så kallade Snörkeramiska kulturen, och har fått sitt namn efter de strids- yxor som återfinns i kulturmaterialet. De begravde sina döda i sidolägen med uppdragna knän i så kallad hockerposition. GRK och STR samexisterade i över 500 år innan de försvann för runt 4 200 år sedan. Hur dessa två grupper interagerade med varandra under den här tiden är ett högintressant forsknings- område. Även om det inte finns några bevis för STR-associerade boplatser på Gotland, så uppvisar vissa gravar på GRK-gravfält aspekter som är typiska för STR. Exempel på detta är att vissa kroppar placerats i hockerposition samt att stridsyxor återfunnits i några av gravarna. Genom att sekvensera och analysera genomet från 19 individer som levde för cirka 5 000 år sedan, och som kom- mer från tre olika GRK-gravplatser, fann jag att influenserna ifrån STR var ett resultat ett kulturellt- och inte ett genetiskt utbyte. Oberoende av den materi- ella kulturen i graven så hade alla individer som jag testade samma genetiska härkomst – från Skandinaviska jägar-samlare och en liten andel från tidiga jordbrukare. Människor från tydliga STR-kontexter hade istället en härkomst som är mycket lik den som återfinns hos människor associerade med Snörke- ramiska kulturen. Detta resultat visar att influenserna från STR i de gotländska

64 GRK-gravarna är ett resultat av kulturell överföring av STR-idéer och mate- riell kultur till GRK-samhällen, kanske som ett sätt att differentiera GRK- grupper från varandra på ön.

Det genetiska landskapet hos folken i södra Afrika har formats av stora mi- grationer under senaste tiotusen åren, så som migrationen av pastoralister ifrån Östafrika till södra Afrika för runt 2 000 år sedan, migrationen av jordbru- kande grupper från Västafrika och slutligen den europeiska migrationen som börjar påverka södra Afrika för cirka 400 år sedan. I min andra studie under- sökte jag genetiska likheter hos en individ som levde för några hundra år sedan i Sydafrikanska Västra Kapprovinsen med dagens befolkningar i södra Afrika och med förhistoriska individer från regionen. Detta prov bestod av hårstrån vilka återfanns i Vaalkrans, en boplats vid kusten. Vaalkrans ligger i ett om- råde av Sydafrika vilket tillhörde Khoekhoe herdefolken när de första Euro- peiska kolonisatörerna anlände. Populationsgenetiska analyser visade att Vaalkrans-individen hade liknande härkomst som nutida KhoeKhoe grupper som dock inte längre lever i Kap-provinserna, utan i Namibia. Vaalkrans-man- nen och hans förfäder/förmödrar hade inte blandat sig med Européer eller Västafrikaner. Denna genetiska signatur har överlevt till för några hundra år sedan i Kap-provinserna sedan tiden då Östafrikanska pastoralister nådde dit för 2 000 år sedan, men idag finns inga grupper som inte blandats med Euro- péer eller Västafrikaner kvar. I den tredje studien undersökte vi människans tidiga historia och kunde visa att vi moderna människor redan existerade för 300 000 år sedan. Specifikt undersökte jag genetiska varianter som är associ- erade med specifika egenskaper. Jag undersökte genomen från sju förhisto- riska individer från södra Afrika, tre från den sena Stenåldern och fyra från Järnåldern. Genom att jämföra allel-frekvenserna hos genetiska varianter hos individer från dessa två tidsperioder fann jag en ökning av sjukdomsresisten- salleler hos Järnåldersindividerna. Den rimligaste förklaringen är att dessa va- rianter kommer till södra Afrika med migration från Västafrika. Min fjärde studie undersöker ett stort antal genetiska varianter associerade med egen- skaper och sjukdomsresistens hos både förhistoriska individer och hos dagens Afrikanska befolkningar. Jag fann att många mönster av allelfrekvenser som finns i moderna befolkningar beror på de stora migrationer som ägde rum i Afrika under Holcen, inklusive en migration ifrån Eurasien till Östfrika som började för runt 3 000 år sedan. Även om det förhistoriska DNAt gav viss kunskap om hur allelfrekvenserna för vissa adaptiva varianter såg ut tidigare, så finns det ännu ganska begränsat med genetiska data från förhistoriska indi- vider i Afrika. Jämförelse av vissa adaptiva alleler mellan förhistoriska och moderna afrikanska befolkningar pekade dock på att migration och befolk- ningsblandning påverkar varianter som tidigare kunde vara skyddande mot sjukdomar, men som nu har negativa effekter för individer i ett modernt sam- hälle. Ett sådant exempel är saltkänslighetsalleler och högt blodtryck hos afro- amerikaner. Ett generellt resultat av studien pekar på migration som en viktig

65 faktor i historien, vilken inte bara sprider människor och arvsanlag, utan också varianter som kan vara anpassade till specifika miljöer. Min avhandling visar flera exempel på komplexiteten i människans evolutionära historia och hur migration bidrar till att sprida nya varianter till andra befolkningar, varianter som kan vara fördelaktiga i en miljö, men kanske inte i den nya miljön.

66 Acknowledgements

‘It takes a village to raise a child’ – African proverb

Towards the end of writing my thesis, this proverb came to my mind time and time again as my friends and family commented on the feat of me completing a PhD. I myself thought that the PhD experience was going to be more of a solitary journey than it has been, but quickly learned that just as it takes a village to raise a well-rounded, healthy individual who can contribute to soci- ety in a meaningful way, so is the case with the PhD student. It takes an entire lab group (and key friends and family) to ‘raise’ a naïve, eager student to be- come a well-rounded, independent researcher who will hopefully contribute in a meaningful way to their society, inside or outside of Academia. So, while I represent this body of work, which took me five years to complete, this thesis is also the product of the people who have supported me by either taking hours out of their day to teach me, advise me or console me, and generally guide me in my journey from student to researcher. My journey started with an email and a skype call involving two of my three supervisors, Mattias and Helena, and they are the ones who set this thesis in motion. I would like to thank them for taking a chance on me, a very eager girl on the other side of the world who emailed them out of the blue ranting about ancient DNA and how cool it was. Helena has especially been there for me every step of the way, as a teacher, a mentor, and a shining example of how to be a respected and honest researcher in every sense of the word. Hel- ena, I cannot sufficiently express how grateful I am to you for everything you have done for me, for all the effort you put into me and the time you gave to make this thesis the work that it is. So these small words will have to do for now. Many thanks to Mattias for giving me the opportunity to work in a lab full of such amazing researchers from whom I could learn more about this field. I still cannot believe how lucky I am to work with such a group. The Jakobsson lab is a shining example of how to be one of the labs at the forefront of the ancient DNA research field, but also one of respect, honesty, and diligence to the practice of good science. I want to thank Carina, my third supervisor, for allowing me to be a part of her amazing research. I have always wanted to contribute to the research field

67 of African prehistory, and being able to work for Carina, who is the expert on African genetic history, was really a dream come true. She also gave me the space to develop at my own pace and approached me as a fellow colleague rather than her student, which really helped me become more confident in my- self as a researcher. Speaking of skill development, I want to thank Arielle, who basically taught me EVERYTHING I know regarding dry lab analysis. I don’t think she truly understood the amount of work it was going to take to handle such a role, until 3 months of me constantly barging into her office with yet another ques- tion nearly drove both her and me crazy. Arielle, I know it was hard, and I recognize and value all that effort you put into me that summer in 2016 (I think it was then). When Arielle was not available, the ever patient Torsten was there to help me with every other stupid question I had. Thank you Tor- sten, for always leaving your door open to me, for always making the time to answer my questions, and for being so understanding when my frustration with simple computer tasks started to show. Just as much as skill development was required to make this thesis an even- tuality, so was an emotional support network. To my fellow PhD students who went through this journey with me: Mario, Luciana, Gwenna, and TJ, I want to thank you all for always checking in on me. Mario particularly was my all- round support guru. He helped me with analysis, he gave me a figurative slap on my face when I needed to stop feeling sorry for myself and keep going, and always made me smile, even if he didn’t mean to. Mario, you may never read this, much like you don’t read your work emails if they are in Swedish, but I also want to let you know that I appreciated every small thing you did for me to make me feel better when you could see I was struggling. I will never forget it. Luciana, your constant willingness to help and your general work ethic has always been an inspiration to me and has spurred me on when I was over it all. Gwenna, I so admire your understanding of population genetics theory and the way you approach your PhD, and I want to thank you for always trying to improve our experience as PhD students in the Human Evolution Lab. TJ, dude, you give the most incredible pep talks, at just the right time, always. Thank you. And of course, I want to thank you Rickard for your support, and for being so willing to regard me as a friend. When we first met, I couldn’t get over how quickly we became buddies, but I feel honored that it was the case. Last but not least, I want to sincerely thank the people outside the lab who have helped me grow as a person over these last five years. Mom, Dad, Dan- iela and Gaby, thank you for always being a phone call away if I ever needed you. And I needed you all so much. We have always had an incredibly strong bond as a family, and I am so very thankful that distance has not changed that. That bond has been my lifeline many a time over these last couple of years, when I was feeling especially inadequate, alone and overwhelmed with the choice I had made to take this opportunity. Again, words will never be enough, but they will have to do. Elin, and Lotta, I feel so incredibly blessed that I have

68 met you both and can consider you as my Swedish family. You two are some of the people who have had the greatest influence on my life in Sweden and have been part of some beautiful experiences that I will never forget. To my Australian expat family, I have met some of you quite late into my time here in Sweden, but in this short time you have given me so much support when my family back home could not be there. Darren not only helped me to de-stress with his amazing spinning and Body Pump sessions at the gym, but encouraged me to become a gym instructor myself. That decision, to instruct, has been quite frankly life –changing. Camilla, thank you for the messages, pep-talks and for cheering me on these final months, you have no idea how much it means to me. Chris, thank you for making me feel like part of your family with Sofie and adorable Lucy. Finally, Pierre, we met a little over a year ago, but since then you have been a pillar of strength I could lean on, and a source of calm when I needed it most. This blessing was incredibly unexpected, but came at just the perfect time, and I cannot wait to see what the future holds for us. To anyone I may have missed here, I hope that I am able to express in person to you how much your presence in my life over this time has helped me. To every single one of you, I say: WE DID IT!!!!

69 References

1. Renfrew, C. & Bahn, P. Archeology: Theories, Methods and Practice. (Thames & Hudson, 1996). 2. Childe, V. G. The Dawn Of European Civilization. (Routledge & Kegan Paul, 1925). 3. Renfrew, C. Before civilization: the radiocarbon revolution and . (Knopf; [distributed by Random House], 1973). 4. Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541, 302–310 (2017). 5. Slatkin, M. & Racimo, F. Ancient DNA and human history. Proc. Natl. Acad. Sci. U. S. A. 113, 6380–7 (2016). 6. Pääbo, S. et al. Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc. Natl. Acad. Sci. U. S. A. 86, 1939–43 (1989). 7. Pääbo, S. et al. Genetic Analyses from Ancient DNA. Annu. Rev. Genet. 38, 645–679 (2004). 8. Bramanti, B. et al. Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers. Science 326, 137–40 (2009). 9. Malmström, H. et al. Ancient DNA reveals lack of continuity between neolithic hunter-gatherers and contemporary Scandinavians. Curr. Biol. 19, 1758–62 (2009). 10. Haak, W. et al. Ancient DNA from the first European farmers in 7500-year- old Neolithic sites. Science 310, 1016–8 (2005). 11. Cann, R. L., Stoneking, M. & Wilson, A. C. Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987). 12. Herrera, R. J. & Garcia-Bertrand, R. Ancestral DNA, human origins, and migrations. (Academic Press, 2018). 13. Underhill, P. A. & Kivisild, T. Use of Y Chromosome and Mitochondrial DNA Population Structure in Tracing Human Migrations. Annu. Rev. Genet. 41, 539–564 (2007). 14. Wall, J. D. & Slatkin, M. Paleopopulation Genetics. Annu. Rev. Genet. 46, 635–649 (2012). 15. Jobling, M. A., Hollox, E., Kivisild, T. & Tyler-Smith, C. Human Evolutionary Genetics. (CRC Press Inc, 2013). 16. Pickrell, J. K. & Reich, D. Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 30, 377–389 (2014). 17. Ramachandran, S. et al. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc. Natl. Acad. Sci. U. S. A. 102, 15942–7 (2005). 18. Gibbons, A. How We Lost Our Diversity. Science 2009, 2 (2009).

70 19. Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo- Eskimo. Nature 463, 757–762 (2010). 20. Skoglund, P. et al. Origins and genetic legacy of neolithic farmers and hunter- gatherers in Europe. Science 336, 466–469 (2012). 21. Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015). 22. Fu, Q. et al. The genetic history of Ice Age Europe. Nature 534, 200 (2016). 23. Haak, W. et al. Massive migration from the steppe was a source for Indo- European languages in Europe. Nature 522, 207–211 (2015). 24. Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015). 25. Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–51 (2014). 26. Schlebusch, C. M. et al. Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago. Science 358, 652–655 (2017). 27. Skoglund, P. et al. Genomic diversity and admixture differs for Stone-Age Scandinavian foragers and farmers. Science 344, 747–50 (2014). 28. Omrak, A. et al. Genomic Evidence Establishes Anatolia as the Source of the European Neolithic Gene Pool. Curr. Biol. 26, 270–275 (2016). 29. Günther, T. et al. Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proc. Natl. Acad. Sci. U. S. A. 112, 11917–22 (2015). 30. Lazaridis, I. et al. Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016). 31. Andersen, B. G. B. & Borns, H. W. J. The Ice Age world: an introduction to Quaternary history and research with emphasis on North America and northern Europe during the last 2.5 million years. (Oxford University Press, 1994). 32. Riede, F. The resettlement of northern Europe. Oxford Handb. Archaeol. Anthropol. Hunter-Gatherers 556–581 (2014). doi:10.1093/oxfordhb/9780199551224.013.059 33. Günther, T. et al. Population genomics of Mesolithic Scandinavia: Investigating early postglacial migration routes and high-latitude adaptation. PLoS Biol. 16, e2003703 (2018). 34. Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409 (2014). 35. Mittnik, A. et al. The genetic prehistory of the Baltic Sea region. Nat. Commun. 9, 442 (2018). 36. Price, T. D. Ancient Scandinavia: an archaeological history from the first humans to the Vikings. (Oxford University Press, 2015). 37. Haak, W. et al. Ancient DNA from European Early Neolithic Farmers Reveals Their Near Eastern Affinities. PLoS Biol. 8, e1000536 (2010). 38. Deguilloux, M. F., Leahy, R., Pemonge, M. H. & Rottier, S. European Neolithization and Ancient DNA: An Assessment. Evol. Anthropol. 21, 24– 37 (2012).

71 39. Martinsson-Wallin, H. & Wallin, P. Collective spaces and material expressions: ritual practice and island identities in Neolithic Gotland. in Decoding Neolithic Atlantic & Mediterranean island Ritual 1–15 (2016). 40. Wallin, P. The Use and Organisation of a Middle-Neolithic Pitted Ware Site on the Island of Gotland in the Baltic Sea. Sèances la Société préhistorique Fr. 6, 409–425 (2016). 41. Malmer, M. P. The Neolithic of south Sweden: TRB, GRK, and STR. (Royal Academy of Letters &, 2002). 42. Malmström, H. et al. Ancient mitochondrial DNA from the northern fringe of the Neolithic farming expansion in Europe sheds light on the dispersion process. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 370, 20130373 (2015). 43. Malmström, H. et al. The genomic ancestry of the Scandinavian Battle Axe Culture people and their relation to the broader Corded Ware horizon. Proc. R. Soc. B Biol. Sci. 286, 20191528 (2019). 44. White, T. D. et al. Pleistocene Homo sapiens from Middle Awash, Ethiopia. Nature 423, 742–747 (2003). 45. McDougall, I., Brown, F. H. & Fleagle, J. G. Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature 433, 733–736 (2005). 46. Thackeray, F. Homo Sapiens helmei from Florisbad, South Africa. The 27, 13–14 (2010). 47. Hublin, J.-J. et al. New fossils from Jebel Irhoud, Morocco and the pan- African origin of Homo sapiens. Nature 546, 289–292 (2017). 48. Schlebusch, C. M. et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338, 374–9 (2012). 49. Schlebusch, C. M., Lombard, M. & Soodyall, H. MtDNA control region variation affirms diversity and deep sub-structure in populations from southern Africa. BMC Evol. Biol. 13, 56 (2013). 50. Henshilwood, C. S. Emergence of Modern Human Behavior: Middle Stone Age Engravings from South Africa. Science 295, 1278–1280 (2002). 51. Henn, B. M., Steele, T. E. & Weaver, T. D. Clarifying distinct models of modern human origins in Africa. Current Opinion in Genetics and Development 53, 148–156 (2018). 52. Breton, G. et al. Lactase persistence alleles reveal partial east African ancestry of southern African Khoe pastoralists. Curr. Biol. 24, 852–858 (2014). 53. Pickrell, J. K. et al. The genetic prehistory of southern Africa. Nat. Commun. 3, 1143 (2012). 54. Schlebusch, C. M. & Jakobsson, M. Tales of Human Migration, Admixture, and Selection in Africa. Annu. Rev. Genomics Hum. Genet. 19, 405–428 (2018). 55. Owers, K. A. et al. Adaptation to infectious disease exposure in indigenous Southern African populations. Proc. R. Soc. B Biol. Sci. 284, 20170226 (2017). 56. Barbieri, C. et al. Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in Southern Africa. PLoS One 9, e99117 (2014).

72 57. Li, S., Schlebusch, C. & Jakobsson, M. Genetic variation reveals large-scale population expansion and migration during the expansion of Bantu-speaking peoples. Proceedings. Biol. Sci. 281, 20141448 (2014). 58. Davenport, T. R. H. & Saunders, C. C. South Africa : a modern history. (Macmillan Press, 2000). 59. Uren, C., Möller, M., van Helden, P. D., Henn, B. M. & Hoal, E. G. Population structure and infectious disease risk in southern Africa. Mol. Genet. Genomics 292, 499–509 (2017). 60. Petersen, D. C. et al. Complex Patterns of Genomic Admixture within Southern Africa. PLoS Genet. 9, e1003309 (2013). 61. Gallego Llorente, M. et al. Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent. Science 350, 820–2 (2015). 62. Morris, A. G., Heinze, A., Chan, E. K. F., Smith, A. B. & Hayes, V. M. First Ancient Mitochondrial Human Genome from a Pre-Pastoralist Southern African. Genome Biol. Evol. 6, 1–16 (2014). 63. Scally, A. & Durbin, R. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012). 64. Skoglund, P. et al. Reconstructing Prehistoric African Population Structure. Cell 171, 59–71 (2017). 65. Skoglund, P. & Mathieson, I. Ancient Genomics of Modern Humans: The First Decade. Annu. Rev. Genomics Hum. Genet. 19, 381–404 (2018). 66. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–92 (1978). 67. Brandt, G. et al. Ancient DNA reveals key stages in the formation of central European mitochondrial genetic diversity. Science 342, 257–61 (2013). 68. Palmgren, E. Gotlands mellanneolitiska hybridkultur (stridsyxekultur). in Arkeologi på Gotland 2. Tillbakablickar och nya forskningsrön (eds. Wallin, P. & Martinsson-Wallin, H.) 129–136 (Institutionen för arkeologi och antik historia, Uppsala universitet & Gotlands museum, 2018). 69. Palmgren, E. & Martinsson-Wallin, H. Analysis of late mid-Neolithic pottery illuminates the presence of a Corded Ware Culture on the Baltic Island of Gotland. Doc. Praehist. 42, 297–310 (2015). 70. The Oxford Handbook of African Archaeology. (Oxford University Press, 2013). doi:10.1093/oxfordhb/9780199569885.001.0001 71. Hollfelder, N. et al. Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. PLOS Genet. 13, e1006976 (2017). 72. Fregel, R. et al. Ancient genomes from North Africa evidence prehistoric migrations to the Maghreb from both the Levant and Europe. Proc. Natl. Acad. Sci. U. S. A. 115, 6774–6779 (2018). 73. Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl. Acad. Sci. U. S. A. 111, 2632–2637 (2014). 74. Prendergast, M. E. et al. Ancient DNA reveals a multistep spread of the first herders into sub-Saharan Africa. Science 365, eaaw6275 (2019).

73 75. Japp, S., Gerlach, I., Hitgen, H. & Schnelle, M. Yeha and Hawelti: cultural contacts between Sabaʾ and DʿMT — New research by the German Archaeological Institute in Ethiopia. in Proceedings of the Seminar for Arabian Studies 41, 145–160 (Archaeopress, 2011). 76. Ambrose, S. H. Chronology of the later stone age and food production in East Africa. J. Archaeol. Sci. 25, 377–392 (1998). 77. Tishkoff, S. A. et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 39, 31–40 (2007). 78. Macholdt, E. et al. Tracing Pastoralist Migrations to Southern Africa with Lactase Persistence Alleles. Curr. Biol. 24, 875–879 (2014). 79. de Filippo, C., Bostoen, K., Stoneking, M. & Pakendorf, B. Bringing together linguistic and genetic evidence to test the Bantu expansion. Proceedings. Biol. Sci. 279, 3256–63 (2012). 80. De Jongh, M. A Forgotten First People: The Southern Cape Hessequa. (Watermark Press, 2016). 81. Marciniak, S. & Perry, G. H. Harnessing ancient genomes to study the history of human adaptation. Nat. Rev. Genet. 18, 659–674 (2017). 82. Snow, R. W. et al. The prevalence of Plasmodium falciparum in sub-Saharan Africa since 1900. Nature 550, 515–518 (2017). 83. Gomez, F., Hirbo, J. & Tishkoff, S. A. Genetic variation and adaptation in Africa: implications for human evolution and disease. Cold Spring Harb. Perspect. Biol. 6, a008524 (2014). 84. Rotimi, C. N. et al. The genomic landscape of African populations in health and disease. Human Molecular Genetics 26, 225–236 (2017). 85. Prohaska, A. et al. Human Disease Variation in the Light of Population Genomics. Cell 177, 115–131 (2019). 86. Coop, G. et al. The Role of Geography in Human Adaptation. PLoS Genet. 5, e1000500 (2009). 87. Smith, C. I., Chamberlain, A. T., Riley, M. S., Stringer, C. & Collins, M. J. The thermal history of human fossils and the likelihood of successful DNA amplification. J. Hum. Evol. 45, 203–217 (2003). 88. Hofreiter, M. et al. The future of ancient DNA: Technical advances and conceptual shifts. BioEssays 37, 284–293 (2015). 89. Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013). 90. Meyer, M. et al. Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531, 504–507 (2016). 91. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2013). 92. Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5, 5257 (2014). 93. Hansen, H. B. et al. Comparing Ancient DNA Preservation in Petrous Bone and Tooth Cementum. PLoS One 12, e0170940 (2017). 94. Adler, C. J., Haak, W., Donlon, D. & Cooper, A. Survival and recovery of DNA from ancient teeth and bones. J. Archaeol. Sci. 38, 956–964 (2011). 95. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–6 (2012).

74 96. Lindahl, T. Instability and decay of the primary structure of DNA. Nature 362, 709–715 (1993). 97. Dabney, J., Meyer, M. & Pääbo, S. Ancient DNA damage. Cold Spring Harb. Perspect. Biol. 5, a012567 (2013). 98. Llamas, B. et al. From the field to the laboratory: Controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era. STAR Sci. Technol. Archaeol. Res. 3, 1–14 (2017). 99. Poinar, H. N. & Cooper, A. Ancient DNA: do it right or not at all. Science 5482, 416 (2000). 100. Briggs, A. W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. U. S. A. 104, 14616–21 (2007). 101. Sawyer, S., Krause, J., Guschanski, K., Savolainen, V. & Pääbo, S. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS One 7, e34131 (2012). 102. Hofreiter, M., Serre, D., Poinar, H. N., Kuch, M. & Pääbo, S. Ancient DNA. Nat. Rev. Genet. 2, 353–359 (2001). 103. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). 104. Green, R. E. et al. The Neandertal genome and ancient DNA authenticity. EMBO J. 28, 2494–2502 (2009). 105. Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014). 106. Jun, G. et al. Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data. Am. J. Hum. Genet. 91, 839–848 (2012). 107. Rasmussen, M. et al. An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia. Science 334, 94–98 (2011). 108. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376 (2005). 109. Yang, D. Y., Eng, B., Waye, J. S., Dudar, J. C. & Saunders, S. R. Improved DNA extraction from ancient bones using silica-based spin columns. Am. J. Phys. Anthropol. 105, 539–543 (1998). 110. Malmstrom, H. et al. More on Contamination: The Use of Asymmetric Molecular Behavior to Identify Authentic Ancient Human DNA. Mol. Biol. Evol. 24, 998–1004 (2007). 111. Rohland, N. & Hofreiter, M. Comparison and optimization of ancient DNA extraction. Biotechniques 42, 343–352 (2007). 112. Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. U. S. A. 110, 15758–63 (2013). 113. Bennett, E. A. et al. Library construction for ancient genomics: Single strand or double strand? Biotechniques 56, 289–300 (2014). 114. Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, pdb-prot5448 (2010). 115. Meyer, M., Stenzel, U. & Hofreiter, M. Parallel tagged sequencing on the 454 platform. Nat. Protoc. 3, 267–278 (2008).

75 116. Marciniak, S., Klunk, J., Devault, A., Enk, J. & Poinar, H. N. Ancient human genomics: the methodology behind reconstructing evolutionary pathways. J. Hum. Evol. 79, 21–34 (2015). 117. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065– 1093 (2012). 118. Lachance, J. & Tishkoff, S. A. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it. Bioessays 35, 780 (2013). 119. Illumina. Patterned Flow Cell Technology. (2015). 120. Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). 121. Briggs, A. W. & Heyn, P. Preparation of next-generation sequencing libraries from damaged DNA. Methods Mol. Biol. 840, 143–154 (2012). 122. Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87–e87 (2010). 123. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil- DNA-glycosylase treatment for screening of ancient DNA. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 370, 20130624 (2015). 124. Gansauge, M.-T. & Meyer, M. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8, 737–748 (2013). 125. Illumina. An introduction to Next-Generation Sequencing Technology. (2017). Available at: www.illumina.com/technology/next-generation- sequencing.html. (Accessed: 24th October 2017) 126. Kircher, M. Analysis of high-throughput ancient DNA sequencing data. Methods Mol. Biol. 840, 197–228 (2012). 127. Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics 13, 178 (2012). 128. Seguin-Orlando, Andaine Korneliussen, Thorfinn S. Sikora, M. et al. Genomic structure in Europeans dating back at least 36,200 years. Science 345, 1255832–1255832 (2014). 129. Hofmanová, Z. et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl. Acad. Sci. 113, 6886–6891 (2016). 130. Link, V. et al. ATLAS: Analysis Tools for Low-depth and Ancient Samples. bioRxiv 105346 (2017). doi:10.1101/105346 131. ISOGG. Haplogroups. (2019). Available at: https://isogg.org/wiki/Haplogroup. (Accessed: 16th July 2019) 132. Pakendorf, B. & Stoneking, M. Mitochondrial DNA and Human Evolution. Annu. Rev. Genomics Hum. Genet. 6, 165–183 (2005). 133. Prugnolle, F. & de Meeus, T. Inferring sex-biased dispersal from population genetic tools: a review. Heredity (Edinb). 88, 161–165 (2002). 134. Behar, D. M. et al. A ‘copernican’ reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012). 135. Vianello, D. et al. HAPLOFIND: a New method for high‐throughput mtDNA Haplogroup assignment. Hum. Mutat. 34, 1189–1194 (2013). 136. International Society of Genetic Genealogy. Y-DNA Haplogroup Tree 2017. (2017). 137. van Oven, M. PhyloTree Build 17: Growing the human mitochondrial DNA tree. Forensic Sci. Int. Genet. Suppl. Ser. 5, e392–e394 (2015).

76 138. Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008). 139. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904 (2006). 140. Wang, C. et al. Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis. Stat. Appl. Genet. Mol. Biol. 9, (2010). 141. Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006). 142. Purcell, S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am. J. Hum. Genet. 81, 559–575 (2007). 143. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–64 (2009). 144. Lawson, D. J., van Dorp, L. & Falush, D. A tutorial on how not to over- interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 9, (2018). 145. Peter, B. M. Admixture, Population Structure, and F-Statistics. Genetics 202, 1485–501 (2016). 146. Skoglund, P. et al. Genetic evidence for two founding populations of the Americas. Nature 525, 104–8 (2015). 147. Vitti, J. J., Grossman, S. R. & Sabeti, P. C. Detecting Natural Selection in Genomic Data. Annu. Rev. Genet. 47, 97–120 (2013). 148. Cavalli-Sforza, L. L., Cavalli-Sforza, L., Menozzi, P. & Piazza, A. The history and geography of human genes. (Princeton university press, 1994). 149. Schlebusch, C. M., de Jongh, M. & Soodyall, H. Different contributions of ancient mitochondrial and Y-chromosomal lineages in ‘Karretjie people’ of the Great Karoo in South Africa. J. Hum. Genet. 56, 623–630 (2011). 150. Schlebusch, C. M., Prins, F., Lombard, M., Jakobsson, M. & Soodyall, H. The disappearing San of southeastern Africa and their genetic affinities. Hum. Genet. 135, 1365–1373 (2016). 151. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). 152. Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015). 153. Crawford, N. G. et al. Loci associated with skin pigmentation identified in African populations. Science (80-. ). 358, eaan8433 (2017). 154. Allison, A. C. Protection afforded by sickle-cell trait against subtertian malareal infection. Br. Med. J. 1, 290–4 (1954). 155. Cooper, A. et al. APOL1 renal risk variants have contrasting resistance and susceptibility associations with African trypanosomiasis. Elife 6, (2017). 156. Miller, L. H., Mason, S. J., Clyde, D. F. & Mcginniss, M. H. The Resistance Factor to Plasmodium vivax in Blacks: The Duffy-Blood-Group Genotype, FyFy. N. Engl. J. Med. 295, 302–304 (1976). 157. Rajeevan, H., Soundararajan, U., Kidd, J. R., Pakstis, A. J. & Kidd, K. K. ALFRED: an allele frequency resource for research and teaching. Nucleic Acids Res. 40, D1010–D1015 (2012). 158. Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).

77 159. Martin, A. R. et al. An Unexpectedly Complex Architecture for Skin Pigmentation in Africans. Cell 171, 1340-1353.e14 (2017). 160. Ye, K., Gao, F., Wang, D., Bar-Yosef, O. & Keinan, A. Dietary adaptation of FADS genes in Europe varied across time and geography. Nat. Ecol. Evol. 1, 167 (2017). 161. Mathieson, S. & Mathieson, I. FADS1 and the timing of human adaptation to agriculture. Mol. Biol. Evol. 35, 2957–2970 (2018). 162. Gunther, T. & Jakobsson, M. Population Genomic Analyses of DNA from Ancient Remains. in Handbook of Statistical Genomics (eds. Balding, D. J., Moltke, I. & Marioni, J.) 1135 (2019). doi:10.1002/9781119487845

78

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1880 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-397222 2019