UNIVERSITÀ DEGLI STUDI DI PAVIA Dipartimento di Biologia e Biotecnologie “Lazzaro Spallanzani”

Employing mitogenomes to reconstruct migration and dispersal events

Stefania Brandini

Dottorato di Ricerca in Genetica, Biologia Molecolare e Cellulare XXIX Ciclo – A.A. 2013-2016

UNIVERSITÀ DEGLI STUDI DI PAVIA Dipartimento di Biologia e Biotecnologie “Lazzaro Spallanzani”

Employing mitogenomes to reconstruct migration and dispersal events

Stefania Brandini

Supervised by Prof. Antonio Torroni

Dottorato di Ricerca in Genetica, Biologia Molecolare e Cellulare XXIX Ciclo – A.A. 2013-2016

ABSTRACT

ABSTRACT

Humans have always been characterized by a strong curiosity about the origins of themselves and the forms of life that surround them. In the past, answers were searched in philosophy and religion, but in last decades also empirical sciences have started to provide evidences about the evolutionary history of humans and other species. Important contributions come from genomics that, through the study of the DNA sequence variation, provides information concerning the genetic relationships among different populations. One important tool to track down the history and the migrations of our ancestors is the mitochondrial DNA (mtDNA). The mitochondrial genome is organized as a small circular molecule of DNA, present in hundreds/thousands of copies per cell, transmitted as a non- recombining unit only through the mother, and characterized by a much greater evolutionary rate than the average nuclear gene. Consequently, the mtDNA is not subject to recombination events and its variability is originated only by the sequential accumulation of new mutations along the maternal lineage. During millennia, this process of molecular divergence has given rise to monophyletic units (clades), called haplogroups, which are generally restricted to specific geographic areas or population groups as they arose after the colonization of different regions and continents. The study of the geographical distribution, the internal variability and the coalescence age of each haplogroup is known as ‘phylogeography’. The timescale is provided by converting lineage diversity to age estimates by using a molecular clock. These estimated ages, combined with the information about the geographical distribution of the mtDNA lineages, allow us to make inferences about the demographic history of populations, such as dispersals, range expansions, or migrations. During these three years of doctoral studies, I analysed the sequence variation of the mtDNA at the highest level of resolution, that of complete sequence (mitogenome), in order to reconstruct the migration events of both human and animal populations. In particular, I mainly focused my research activity on three projects aimed to trace the first human peopling events in Sardinia and South America and to study the recent worldwide Aedes albopictus, the Asian tiger mosquito The first project I worked on aimed to date events that brought to the initial peopling in Sardinia and to clarify the genetic history of Europe. Sardinians are "outliers" in the European genetic landscape and, according to paleogenomic nuclear data, the closest to early European farmers. To learn more about the genetic ancestry of Sardinians, we analyzed 3,491 modern and 21 ancient mitogenomes from Sardinia and observed that the age estimates of three Sardinian- specific haplogroups are >7,800 years, the archeologically-based upper boundary of the Neolithic in the island. This finding not only supports archeological evidence of a Mesolithic occupation of the island, but reveals a dual ancestral origin of the

1

ABSTRACT first Sardinians. Indeed, one of the Sardinian-specific haplogroups harbors ancestral roots in Paleolithic Western Europe, but the other two are most likely of Late Paleolithic Near Eastern ancestry, and among those that are often assumed to have spread from Anatolia only with the Neolithic. Thus, their ages are compatible with the scenario of a Late Glacial recolonization of Mediterranean Europe from the prior to the migration wave(s) associated with the onset of farming. The second research project aimed to further assess the mitogenome variation of Native Americans origin. Specifically, I focused on Ecuador and Peru, two geographical areas of particular interest because of their location along the Pacific coast, in order to shed light on the peopling of South America. Phylogenetic analyses encompassing both novel and previously reported mitogenomes, allowed the identification of 50 new sub-haplogroups and the finding of a number of sub- clades shared with Native Americans from North and Central America, thus increasing the number of founding mtDNA lineages that entered South America from the North. Our phylogeographic analyses confirmed that the North to South expansion was extremely rapid, and most likely occurred along both the Pacific and Atlantic coasts. The third study was not carried out on humans, but on a species whose spread is associated with human activities: the Asian tiger mosquito Ae. albopictus. Its aim was to acquire information about the diffusion process of this insect by analysing the mitogenome variation of representatives from Asia, America and Europe. Phylogenetic analyses revealed five haplogroups in Asia, but population surveys showed that only three of these were involved in the recent worldwide spread. We also found out that a sub-haplogroup, which is now common in Italy, most likely arose in North America from an ancestral Japanese source. In the course of my Ph.D. studies I also contributed to two additional projects. In the first a rare human mitochondrial haplogroup, named R0a, was analyzed to reconstruct ancient migratory events involving the and Eastern Africa, while the second one assessed the mitogenome variation of Egyptian cattle breeds to acquire new insights on the initial events that brought to the diffusion of domestic cattle (Bos taurus) outside the Near East. Taken together, the data and results reported in this thesis remark the fact that phylogeography analyses continue to be crucial for interpreting mtDNA data, providing a multidisciplinary perspective on human and animal evolution, with major impacts also in other fields such as archaeology, anthropology and linguistics.

2

CONTENTS

CONTENTS

ABSTRACT 1

CONTENTS 3

INTRODUCTION 6

1. MITOCHONDRIA 7 1.1. ORIGIN AND EVOLUTION OF MITOCHONDRIA 8 1.2. MITOCHONDRIAL GENOME 9 1.2.1. Genome organization 9 1.2.2. Replication 11 1.2.3. Transcription 12 1.2.4. Translation 13 1.2.5. Genetic code 13 1.2.6. Mitochondrial DNA features 14 1.2.6.1. Maternal inheritance and lack of recombination 14 1.2.6.2. Homoplasmy and heteroplasmy 15 1.2.6.3. Mitochondrial genetic bottleneck 16 1.2.6.4. Mutation rate 17 2. HUMAN POPULATION GENETICS 18 2.1. THE MTDNA CONTRIBUTION AND THE PHYLOGEOGRAPHIC APPROACH 18 2.2. THE MOLECULAR CLOCK 19 2.3. MTDNA NOMENCLATURE 19 2.4. MTDNA REFERENCE SEQUENCES 21 2.5. MTDNA WORLDWIDE PHYLOGENY 22 2.5.1. The origin of modern humans 24 2.5.2. The ‘Out of Africa’ exit 26 2.5.3. Human colonization of the world 28 2.5.3.1. The peopling of Australasia 28 2.5.3.2. The peopling of Europe 29 2.5.3.3. Back to Africa 30 2.5.3.4. The peopling of the Americas 31

MY CONTRIBUTION 35

3. THE FIRST PEOPLING OF SARDINIA 36 3.1. BACKGROUND 36 3.2. THE SAMPLE 37 3.3. RESULTS 38 3.3.1. Phylogeny and phylogeography 38

3

CONTENTS

3.3.2. Age estimates 48 3.4. DISCUSSION 53 3.5. CONCLUSION 60 4. MITOGENOME VARIATION IN ECUADOR AND PERU 61 4.1. BACKGROUND 61 4.2. THE SAMPLE 62 4.3. RESULTS 65 4.3.1. The mitogenome phylogeny 65 4.3.2. Phylogeography 74 4.3.3. Haplogroup age estimates 79 4.4. DISCUSSION 84 4.5. CONCLUSION 86 5. THE WORLDWIDE SPREAD OF THE TIGER MOSQUITO 87 5.1. BACKGROUND 87 5.2. THE SAMPLE 89 5.3. RESULTS 91 5.3.1. The mtDNA control region of Ae. albopictus 91 5.3.2. The phylogeny 92 5.3.3. The geographical distribution of Ae. albopictus haplogroups 96 5.4. DISCUSSION 98 5.5. CONCLUSION 100 6. ADDITIONAL PROJECTS 101 6.1. ORIGIN AND SPREAD OF HUMAN HAPLOGROUP R0A 101 6.1.1. Background 101 6.1.2. Results and discussion 101 6.2. THE MITOGENOME VARIATION OF EGYPTIAN CATTLE BREEDS 108 6.2.1. Background 108 6.2.2. Results and discussion 109 7. MATERIALS AND METHODS 114 7.1. DNA EXTRACTION 114 7.1.1. Extraction of human and bovine DNA 114 7.1.2. Extraction of mosquito DNA 114 7.2. DNA AMPLIFICATION 115 7.2.1. Whole genome amplification 115 7.2.2. Long range PCR for Illumina sequencing 116 7.2.2.1. Long range PCR of human mtDNA 116 7.2.2.2. Long range PCR of bovine mtDNA 117 7.2.2.3. Long range PCR of tiger mosquito mtDNA 118 7.2.3. DNA amplification for Sanger sequencing 119 7.2.3.1. Amplification of human mtDNA 119 7.2.3.2. Amplification of Ae. albopictus mtDNA 121

4

CONTENTS

7.3. ELECTROPHORESIS 123 7.4. SEQUENCE ANALYSIS 123 7.4.1. Next Generation Sequencing 123 7.4.2. Sanger sequencing 124 7.4.2.1. Human mtDNA sequencing 124 7.4.2.2. Mosquito mtDNA sequencing 126 7.5. PHYLOGENETIC (AND OTHER) ANALYSES 129 7.5.1. Tree design and haplogroup definition 129 7.5.1.1. Maximum parsimony 129 7.5.2. Coalescence times 130 7.5.2.1. Rho statistics 130 7.5.2.2. Maximum likelihood 131 7.5.2.3. Bayesian approach 131 7.5.3. Population expansion times 132 7.5.4. Frequency distribution analysis 132

REFERENCES 133

LIST OF ORIGINAL PUBLICATIONS 164

5

INTRODUCTION

INTRODUCTION

6

1. Mitochondria

1. Mitochondria

Eukaryotic cells are organized in compartments. Each cell contains a nucleus and a surrounding cytoplasm in which are suspended organelles enclosed within membranes. Among these there are mitochondria, made up by an inner membrane enclosing the matrix and an outer membrane delimiting the perimembrane space. Mitochondria are commonly described as cigar shaped structures nevertheless, at a structural level, this traditional view is not accurate, and it would be more correct to think of mitochondria as a budding of a fusing network more similar to the endoplasmic reticulum (Iborra et al., 2004). Although most of the DNA of a cell is contained in the nucleus, also the mitochondrion has its own independent genome, such as the machineries for replication, transcription, and protein synthesis. Every cell contains as many as several thousands mitochondria (Figure 1.1), each with a variable number of mitochondrial DNA (mtDNA) molecules. As a result of this high amount of mtDNA copies, despite its small size, the mitochondrial genome can come to constitute up to 0.3% of the total DNA of a cell.

Figure 1.1. Electron microscope cell section view of numerous mitochondria.

The mitochondrion has been found to be involved in numerous cellular functions such as the metabolism (i.e. amino acids, lipids, cholesterol, steroids and nucleotides, the intracellular signaling), the apoptosis and the aging, but the most important is probably the cellular energy metabolism (Tao et al., 2014). This includes fatty acid β oxidation, the urea cycle and, overall, the ATP production through the oxidative phosphorylation (OX-PHOS). During the OX-PHOS, mitochondria use the molecular machinery called ‘respiratory chain’ (Figure 1.2) to extract energy from nutrients and convert it in ATP (Chinnery and Schon, 2003). The respiratory chain, also known as ‘electron transport chain’, consists in a group of five enzymatic complexes situated in the inner mitochondrial membrane, each

7

1. Mitochondria composed of multiple subunits encoded by both nuclear and mitochondrial genes. The components encoded by nuclear genes are imported into the mitochondrion, via specialized import systems (Mokranjac and Neupert, 2005), where they are assembled with those encoded by mtDNA.

Figure 1.2. Schematic representation of the respiratory chain. The electron transport chain is a series of electron transporters embedded in the inner mitochondrial membrane that shuttles electrons from NADH and FADH2 to molecular oxygen. In the process, protons are pumped from the mitochondrial matrix to the intermembrane space, and oxygen is reduced to form water.

1.1. Origin and evolution of mitochondria

The discovery of the mtDNA makes to emerge several questions about how eukaryotic cells tolerate more than one genome and why this genome shed many (but not all) of its genes to the point of being no longer self-sufficient for replication and expression. The endosymbiosis (or symbiogenesis) represents the most commonly accepted theory to explain the origin of mitochondria. According to this theory, mitochondria descend from free-living bacteria that became symbiotic with eukaryotic cells about 1.5 billion years ago. Originally it was proposed that the nucleus originated in an Archaebacterium and symbiosis began with an eubacterial progenitor of the modern mitochondrion (Margulis, 1971). During the course of the years the conventional ‘endosymbiosis theory’ has been subjected to modifications

8

1. Mitochondria and the revised theory has been labelled ‘hydrogen hypothesis’ (Martin and Muller, 1998). It postulates that the eukaryotic nucleus and the mitochondria were created simultaneously through the fusion of a hydrogen-requiring methanogenic Archaebacterium (host) and a hydrogen-producing alpha-proteobacterium (symbiont). Indeed, the eukaryotic nucleus is a chimera of genes whose origins are clearly archaebacterial with others clearly derived from Eubacteria. In addition, the Archaeozoa, eukaryotes lacking mitochondria, contain mitochondria-like genes in their nuclear genome, thus suggesting that the Archaeozoa once had mitochondria and lost them after a possible lateral transfer of mitochondrial genes to the nuclear genome. Moreover, phylogenetic studies of Archaeozoa have shown that not all members of the family can be classified as basal eukaryotes: some of them, such as Entamoeba histolytica, are classified much further up the eukaryotic tree. Finally, mitochondrial genomes have been found to share common ancestry with hydrogenosomes in alpha-proteobacteria. Regardless of which view is correct, one thing is common to both: the majority of mitochondrial genes that existed in the symbiont genome of the proto-eubacterium have been transferred to the nuclear genome.

1.2. Mitochondrial genome

The sequencing of the entire mitochondrial genome (mitogenome) was announched for the first time in 1981, when Anderson and colleagues published the sequence and the organization of the human mitogenome (Anderson et al., 1981). This sequence, 16,569 base pairs (bp) long, is the so-called Cambridge Reference Sequence (CRS), later resequenced and named revised Cambrige reference sequence (rCRS) (Andrews et al., 1999). Human mtDNA was only the first of many other mitochondrial genomes to be completely sequenced. Indeed, one year later the publication of CRS, Anderson’s laboratory produced also the Bovine Reference Sequence (BRS) (Anderson et al., 1982), 16,338 bp long. Nowdays, although mitogenomes of numerous species are available, many others still need to be sequenced.

1.2.1. Genome organization

Mitochondrial DNA is organized as a circular, double-stranded molecule. The two strands are denoted H (heavy) and L (light) because they have a different base composition which confers them different buoyant densities in a caesium chloride gradient. The H-strand is guanine-rich while L-strand is cytosine-rich. Traditionally, the molecule is numbered on the light strand, relative to the first published human mtDNA sequence. Differently from the nuclear genome, the mitogenome has a very compact structure without introns. In humans, it contains 37 genes (28 on the H-strand and 9 on the L-strand), all of which essential for normal mitochondrial function. Thirteen of these genes encode for enzymes involved in OX-PHOS, while the remaining for

9

1. Mitochondria two ribosomal RNAs (rRNAs 12S and 16S) and 22 transfer RNAs (tRNAs) required and sufficient for the synthesis of mitochondrial proteins (Anderson et al., 1981; Wallace, 1994; DiMauro and Schon, 2003). Most genes are contiguous, separated by one or two non-coding base pairs and among those codifying enzymes MTATP6 and MTATP8, as well as MTND4 and MTND4L, are overlapping (Figure 1.3 and Table 1.1). This pattern is also found among most metazoans, although in some cases one or more of the 37 genes is absent and the mtDNA length varies between species.

Figure 1.3. Map of the human mitochondrial genome. Loci are coloured according to functional groupings. Gene identifiers on the outside of the map are transcribed on the heavy strand and gene identifiers on the inside of the map are transcribed on the light strand. Transfer RNA loci are designated by the single letter code of their specific amino acid. The non-coding D-loop is shown at the top of the map (in black) (Stewart and Chinnery, 2015).

10

1. Mitochondria

Table 1.1. List of animal mtDNA genes and gene products. Gene designation Encoded product

COI, COII, COIII Cytochrome oxidase subunits I, II, and III Cytb Cytochrome b apoenzyme ND1-6, 4L NADH dehydrogenase subunits 1 to 6 and 4L ATP6, ATP8 ATP synthase subunits 6 and 8 lrRNA Large ribosomal subunit RNA srRNA Small ribosomal subunit RNA tRNAs 18 amino acid-specific transfer RNAs L(CUN) and L(UUR) 2 leucine tRNAs S(AGN) and S(UCN) 2 serine tRNAs

Although most of the mtDNA encodes for products, there are several non- coding regions interspersed in the molecule, the major of which is the mtDNA displacement loop (D-loop), or control-region, that is involved in the regulation of transcription and replication of the molecule. In human mtDNA, the D-loop extends from np 16024 to np 576, splitting into three short regions named HVSI (nps 16024-16400), HVSII (nps 44-340) and HVSII (438-576) (Brandstätter et al., 2004). The acronym HVS derives from ‘hypervariable sequences’ as they are highly variable at the population level if compared to the rest of the genome. These regions contain the origin of heavy-strand mtDNA replication (OH), the light-strand transcription promoter (LSP) and the heavy-strand promoters (HSP1 and HSP2) (Falkenberg et al., 2007). It seems they are involved in the genome replication and transcription nevertheless their function is not completely known. The structure and the composition of the D-loop are not identical in all the animals. For example, if we consider the bovine mtDNA, its control-region sequence is only slightly homologous to the corresponding region in the human mitochondrial genome (Anderson et al., 1982) and contains only one hypervariable sequence (nps 16042-16313).

1.2.2. Replication

The replication of the mtDNA is a process independent to cell cycle. Indeed, unlike nuclear DNA, which replicates only once during each cell cycle, mtDNA is continuously recycled by the polymerase γ, even in non-dividing tissues such as skeletal muscle and brain (Bogenhagen and Clayton, 1977; Birky, 2001). The precise mechanism of the mtDNA replication is currently a topic of great debate. The traditional ‘strand-asymmetric model’, proposed in 1982 by Clayton, suggests that mammalian mtDNA molecules replicate unidirectionally from two spatially and temporally distinct strand-specific origins. According to this model, the heavy strand leads the replication cycle beginning at OH with the synthesis of a primary transcript that continues until the origin of light-strand replication (OL) is exposed.

11

1. Mitochondria

Only after the replication fork has passed OL, the light (or lagging) strand is synthesized in the opposite direction (Clayton, 1982). However, in 2003 experimental evidences supported an alternative ‘strand-symmetric’, or ‘rolling circle’ model (Bowmaker et al., 2003). It postulates that the replication of mtDNA begins at several points in a 5.5-kb critical region between the D-loop and the ND4 gene. The replication bubbles then proceed in both directions, stopping at OH, and stalling briefly in the region of OL before completing the replication cycle, with the lagging strand catching up by the ligation of Okazaki fragments (Figure 1.4) (Lightowlers and Chrzanowska-Lightowlers, 2012).

Figure 1.4. Models of mammalian mtDNA replication. The ‘strand asynchronous’ or ‘strand-displacement’ model (A) compared to the ‘strand synchronous’ and unidirectional replication model (Lightowlers and Chrzanowska-Lightowlers, 2012).

1.2.3. Transcription

Transcription of the mtDNA is ‘prokaryotic like’. In human mtDNA, the transcription of the two strands is initiated from three promoters named heavy strand 1 and 2 (HSP1 and HSP2) and light strand (LSP), which, together with their uspstream enhancers, are recognized by the transcription machinery. This consists of a mitochondrial RNA polymerase (POLRMT), a mitochondrial transcription factor A (TFAM), and one of two homologous mitochondrial transcription factors, B1 (TFB1M) or B2 (TFB2M) (Falkenberg et al., 2002; Chinnery and Hudson, 2013). Transcription initiated from the HSP1 generates a short transcript that

12

1. Mitochondria terminates at the 16S rRNA, while those initiated from HSP2 generates a polycistronic message including both rRNA genes, 12 mRNA genes, and 14 tRNA genes. Light-strand transcription from the LSP generates the ND6 mRNA and 8 tRNAs (Taylor and Turnbull, 2005). Full length transcripts are cut into functional tRNA, rRNA, and mRNA molecules. The end of the process depends on the mitochondrial transcription termination factors, a group of DNA-binding proteins whose functions and mechanism of action remain to be defined, and brings to full- length transcripts that are cut into functional tRNA, rRNA, and mRNA molecules.

1.2.4. Translation

The mitochondrial translation machinery works in strict cooperation with the cytoplasmatic one that makes nuclear-encoded proteins destined for the mitochondrion (Chinnery and Hudson, 2013). Indeed, during the transcription, nascent mtRNA is translated by mitochondrial ribosomes (mitoribosomes) that bound both the polymerase (Kornberg, 1992) and the inner mitochondrial membrane (Liu and Spremulli, 2000). The mitoribosomes involved in this process are partly coded by mtDNA but require further 81 nuclear DNA (nDNA) proteins (Chinnery and Hudson, 2013) to be assembled. The initiation factors IF1 and IF3 promote the dissociation of the ribosomal subunits (Koc and Spremulli, 2002), thus allowing the assembly of the initiation complex and the beginning of the translation process (Christian et al., 2009). The elongation is controlled by nuclear-encoded proteins and it goes on until the recognition of the STOP codons. Once translation ends, peptides are transferred through the mitochondrial double membrane by the mediation of the translocation machineries TIM (trans inner membrane) and TOM (trans outer membrane). The close proximity of the two sets of translation machinery (cytoplasmic and mitochondrial) on each side of the membranes of the organelle ensures efficient assembly of mitochondrial complexes containing proteins encoded by nuclear and mitochondrial genomes (Iborra et al., 2004).

1.2.5. Genetic code

The genetic code is defined as the basis of the heredity and consists in a set of rules by which the genetic information encoded by DNA and RNA sequences is translated in proteins. When the code was deciphered, it was immediately labeled as ‘universal’, but, less than 15 years later, it was found that in mitochondria some codons differed from the universal code (Barrell et al., 1979). To explain these changes, it was proposed that mitochondria might probably tolerate changes in the code that would not be acceptable to a larger and more complex genome (as the nuclear) (Jukes, 1981). However, the discovery that the code has changed in undamaged organisms made this hypothesis unlikely. Now it is realized that the genetic code evolved in two distinct phases: a first in which the ‘canonical’ code emerged and a second in which it diverged in numerous nuclear and organelles lineages (Knight et al., 2001). In mitochondria this phenomenon occurred

13

1. Mitochondria independently in plants and in the other organisms. In plants mitochondria use the universal code, whereas other organisms have many different code changes in their mtDNAs with respect to the universal one, with only one constant: the codon UGA coding for tryptophan instead of a termination signal (Anderson et al., 1981). Further studies have shown that the mitochondrial genetic code is not even universal among non-plant mitochondria (Table 1.2). This variability in codon usage reflects the variability in number and composition of anticodons among organisms and organelles.

Table 1.2. Mitochondrial genetic code variation for mammals, fruit flies and yeasts.

Nuclear mtDNA genetic code RNA codon genetic code Mammals Drosophila Yeasts

UGA STOP Trp Trp Trp

AGA, AGG Arg STOP Ser Arg

AUA Ile Met Met Met

AUU Ile Met Met Met CUU, CUC, Leu Leu Leu Thr CUA, CUG

1.2.6. Mitochondrial DNA features

1.2.6.1. Maternal inheritance and lack of recombination

One of the main features of mammalian mtDNAs is that they are inherited mostly from the maternal line. The transmission along the female lineage, from mother to offspring without any paternal contribution or recombination between the two parental lineages, was already observed in 1980 (Giles et al., 1980). The precise molecular mechanism behind strict maternal transmission in humans remains elusive. It appears that several mechanisms have coevolved to avoid paternal mtDNA contribution to the embryo (Sato and Sato, 2013). It was originally thought that paternal mitochondria did not enter the oocyte, because of both their low copy number and their location in the midpiece of the sperm tail. However, the detection of paternal mtDNA molecules in early human preimplantation embryos (St John et al., 2000) leaded to believe that paternal mitochondria are subjected to prezygotic and zygotic control mechanisms. It was suggested that during the spermatogenesis there is a down-regulation on mtDNA replication that controls the paternal mtDNA copy number (Rantanen et al., 2001) and a further elimination by an active mechanism that involves selective ubiquitination (Sutovsky and Schatten, 2000; Sutovsky, 2003). Alternatively, a passive ‘diluition model’ has been proposed due to the disproportionate amount of paternal vs maternal mtDNAs (ratio of 1:15,860) (Luo et al., 2013; Pyle et al., 2015). Recently, a study on C. elegans envisioned a

14

1. Mitochondria third mechanism, in which an endonuclease mediates the damage of the inner membrane of paternal mitochondria thus inducing their selective elimination (Zhou et al., 2016). Taken together, all these data confirm that in natural conditions maternal inheritance is strongly controlled by the elimination of paternal mtDNA. However, occasionally, these mechanisms may fail, potentially leading to maternal/paternal mtDNA mosaicism in an individual. An example is the case of a 28 years-old man affected by a metabolic disorder due to a pathogenic mtDNA deletion that resulted of paternal origin (Schwartz and Vissing, 2002). Untill now this is the only documented case of paternal transmission in humans. Moreover, high-depth mtDNA sequencing, up to about 1.2 millions fold coverage, revealed that there is no evidence for paternal contribution in mtDNA inheritance (Pyle et al., 2015). In conclusion, the current opinion is that paternal transmission of mtDNA is exceptionally rare and even if it does occur, it is extremely unlikely that it might result in recombination between paternal and maternal mtDNAs. Therefore, the traditional dogma of maternal inheritance is widely accepted, especially in studies of population genetics.

1.2.6.2. Homoplasmy and heteroplasmy

Cells contain thousands of molecules of mtDNA and in the majority of times their sequences are identical. This condition in which all the mtDNAs in a cell (or in a tissue) have the same genome is known as ‘homoplasmy’. However, somentimes wild type and mutated molecules can coexist and this situation is termed ‘heteroplasmy’. The percentage of heteroplasmy can vary widely among different individuals, populatiions (Irwin et al., 2009), but also in the same individual from organ to organ or between cells (Calloway et al., 2000). This mixture of wild type and mutated mtDNAs is often correlated with clinical expressions (Avital et al., 2012; Gasparre et al., 2013; Sobenin et al., 2013). Studies conducted on cybrids (cell lines incorporating mitochondria from another surce) containing different amounts of mutated mtDNAs have shown that the proportion of mutant mtDNA must exceed a critical threshold level, that is mutation- and tissue-specific, before a cell expresses a biochemical defect in the respiratory chain (Schon et al., 1997; Wallace et al., 1998). However, heteroplasmies are also present in normal individuals and appear to be more frequent in the control region than in the coding region (Jazin et al., 1996; Santos et al., 2008; Li et al., 2010). Now is possible to determine the presence of even a low percentage of heteroplasmy by the next generation sequencing (NGS) that allows to resequence a particular region thousands of times, thus revealing also rare variants (Wallace and Chalkia, 2013). Studies using this approach reveald that 25∼65% of the general population has at least one heteroplasmy across the entire mitochondrial genome (1000 GPC et al., 2010; Li et al., 2010; Sosa et al., 2012). Two mechanisms are described as responsible of changes in mtDNA in human cells in vivo that occasionally cause the appearance of heteroplasmies. These mechanisms are known as ‘relaxed replication’ and ‘vegetative segregation’.

15

1. Mitochondria

The replication in mitochondria is considered ‘relaxed’ because it occurs independently of the cell cycle, indeed mtDNA is destroyed and replicated continuously even in non-dividing tissues (Bogenhagen and Clayton, 1977). Moreover, since individual molecules appear to be randomly selected for destruction and replication, in heteroplasmic cells this process can lead to changes in the proportion of mutant and wild-type mtDNA molecules over a period time through random intracellular genetic drift (Birky, 1994; Chinnery and Samuels, 1999). Vegetative segregation is the unequal partitioning of mutant and wild-type mtDNA that occurs during cell division and can also lead to changes in the level of heteroplasmy in a proliferative tissue (Birky, 1994), such as blood leucocytes or cells in culture (Lehtinen et al., 2000).

1.2.6.3. Mitochondrial genetic bottleneck

Taking into account the previously described features of mtDNAs, one should expect not to find any differences between mother and offspring mtDNA composition. However, observations indicate that the amount of a variant inherited from a heteroplasmic mother varies in the offspring (Cree et al., 2008; Carling et al., 2011). A theory explaining this phenomenon is the ‘mitochondrial bottleneck’. The reduction in the effective number of mitochondrial genomes occurs during early embryogenesis in the developing female germ line (Jenuth et al., 1996; Marchington et al., 1998) and it facilitates the rapid removal of deleterious mtDNA mutations from the population. The exact mechanism by which the bolleneck occurs is hotly debated but there are currently three theories (Figure 1.5). A possible explanation for the mitochondrial reduction is to consider as bottleneck the small number of mtDNA copies (~2,000) during the early development of the germ lines (Cao et al., 2007) in respect to the copy number at other stages of the female germ line (~100,000 copies in the mature oocyte). However, 2,000 is too great a number to sustain the observed genetic drift (Figure 1.5a). Other two hypotheses have been proposed to explain this rapid genetic drift in germ line, one sustaining that molecules of mitochondrial DNA aggregated in nucloids caused a tighter bottleneck that accelerates the drift (Figure 1.5b) (Cao et al., 2007), and another asserting that a combination of several factors are involved. It has been demonstrated (using specific markers) that mtDNA content during the early development of cell lines varies with time and hits a sharp minimum just before mtDNA replication in the embryo is initiated. This bottleneck narrowing leads to an increase in genetic drift (Figure 1.5c) (Cree et al., 2008). These theories suggest that multiple factors may be involved in the mitochondrial bottleneck (including or not nucleoids) and further investigation is needed to completely clarify this complex system (Khrapko, 2008).

16

1. Mitochondria

Figure 1.5. The mitochondrial genetic bottleneck hypothesis. (a) A simple model of the mitochondrial bottleneck contains too many mtDNA molecules and creates lower genetic drift estimations (solid line) than experimentally observed genetic drift (dotted line). (b) Assuming aggregation of mtDNA molecules into nucleoids results in a much tighter effective bottleneck and accelerates the estimated drift to fit observations. (c) Alternatively, observed drift can be accounted by a variable bottleneck with temporal reduction of copy number (Khrapko, 2008).

1.2.6.4. Mutation rate

Mutation is the ultimate source of genetic variation. It is both the substrate for evolution and the cause of genetic disease (Nachman and Crowell, 2000). In genetics, the mutation rate is the measure of the rate at which a mutation occurs in an organism or gene in each generation. Mutation rates differ between species and even between different regions of the genome of a single species. These different rates of nucleotide substitution are measured in ‘substitutions per base pair per generation’. Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. Human mtDNA is characterized by a much greater evolutionary rate than that of the average nuclear gene (Brown et al., 1979) and this peculiarity makes mtDNA very informative in evolutionary studies. This high mutational rate is due to several reasons, among these the higher frequency of replication than the nuclear DNA and the lack of an efficient DNA repair mechanism as well as protective proteins such as histones (Clayton et al., 1974; Tao et al., 2014). In addition, mitochondrial DNA is also physically associated with the inner mitochondrial membrane where highly mutagenic oxygen radicals are generated (Shigenaga et al., 1994; Tao et al., 2014). The mutation rate across the entire mtDNA molecule is not equal and the overall mutation rate in the non-coding control region (bases 16024- 576) is about 10 times higher than that of the coding region (bases 577-16023) (Pakendorf and Stoneking, 2005; Howell et al., 2007; van Oven and Kayser, 2009).

17

2. Human population genetics

2. Human population genetics

Evolution is defined as the change in the inherited characteristics of biological populations over successive generations. The driving forces of this phenomenon are mutations, natural selection, migration and genetic drift. Changes produced in any generation are normally small and they can provide advantages or disadvantages as well as have no influence on the fitness to the individuals in which they occurred. Human population genetics studies the genetic variation among contemporary modern humans and/or ancient specimens in order to understand the evolutionary history of different populations. Inferences about human history may come also from the study of other species linked to domestication events or infectious diseases. These studies are based on the analysis of genetic polymorphisms, especially those whose phenotypic effects are almost neutral. Until recently, the mitochondrial DNA (mtDNA) and the male-specific region of the Y-chromosome (MSY) dominated the scene of population studies. With the technological advances in both biomolecular sciences and bioinformatics, the simultaneous analysis of large sets of genetic markers is becoming more affordable, and the traditional uniparental systems are often coupled with the more comprehensive genome-wide scans. Indeed, the uniparentally-transmitted genetic systems represent less than 2% of the DNA of a cell and to obtain a comprehensive view of the population patterns of human (or other species) diversity, autosomal markers needs to be evaluated as well (Barbujani and Colonna, 2010). On the other hand, the possibility of recombination events makes more difficult the interpretation of genomic DNA data, making the uniparental markers still extremely informative.

2.1. The mtDNA contribution and the phylogeographic approach

Historically, the mitochondrial DNA has been an elective tool in the study of population genetics. Human mtDNA is characterized by a high copy number, a much greater evolutionary rate than that of the average nuclear gene and a matrilineal inheritance (Giles et al., 1980) without any paternal contribution or recombination between the two parental lineages (Hagelberg et al., 1999; Macaulay et al., 1999a; Sutovsky et al., 2000; Elson et al., 2001; Rantanen et al., 2001). Therefore, over the course of time, this process of molecular divergence gave rise to monophyletic units called haplogroups. A haplogroup can be defined as a group of phylogenetically related mtDNAs that share mutations derived from a common female ancestor (or MRCA) (Torroni et al., 1993). Because the process of molecular differentiation is relatively fast and occurred mainly during and after the process of human dispersal into different parts of the world, subsets of mtDNA variation usually tend to be restricted to particular geographic areas and

18

2. Human population genetics populations (Torroni et al., 2006). Thanks to these peculiar features, the mitochondrial DNA is a very informative and a widely employed instrument to track down the history and migration of the female ancestor who transmitted the mtDNA molecule through generations. The study of the geographical distribution of the clades (haplogroups) within a phylogeny is called phylogeography (Avise, 2000). This approach requires the combination of three elements: a phylogenetic tree, the geographic distribution of lineages on the tree, and the time depth of lineages, especially those that are restricted to a particular area (Soares et al., 2010). Sequence variation of DNAs sampled in the present, can be used to reconstruct the phylogenetic tree that displays the inferred genealogical relationships between individual sequences. The timescale is provided by converting lineage diversity to age estimates by means of a molecular clock.

2.2. The molecular clock

The molecular clock is an extremely useful method for estimating evolutionary timescales by the analysis of the changes of molecular traits. It is based on the inference that DNA and protein sequences evolve at a rate that is relatively constant over time and among different organisms. A direct consequence of this constancy is that the genetic difference between any organisms is proportional to the time since they last shared a common ancestor (Ho, 2008). The building of a molecular clock requires the comparison between a phylogenetic tree (including sequences from different individuals belonging to the same or different species) and an outgroup with respect to which the divergence in exactly known. Since timing is decisive for the interpretation of the demographic history of populations, a reliable relation between sequence diversity and the time scale is needed. Uniparental markers, in particular the mtDNA, provide phylogenies that can be better time-calibrated than other systems. During the years, a wide range of molecular clock models and methods, implemented in various statistical phylogenetic settings, have been proposed (Ho and Duchêne, 2014), but the most commonly used for humans is the time-dependent clock that corrects for purifying selection proposed by Soares (Soares et al., 2009). Recently another mutation rate, based on the divergence between ancient and modern human mitogenomes, has been introduced and it consists in a linear clock obtained using 66 ancient dated mtDNAs as tip calibration points (Posth et al., 2016).

2.3. mtDNA nomenclature

In the early 90’s, the first large-scale population study was performed on Native Americans and it highlighted the presence of four different clusters that included all the samples (Torroni et al., 1993). In this context the universally accepted mtDNA nomenclature was initiated with the definition of the first four branches in the human phylogenetic tree, named alphabetically as A, B, C, and D. Shortly

19

2. Human population genetics afterward these analyses were also applied to other continental populations allowing the identification of haplogroups E, F and G in Asia and H, I, J, K, T, U, W, X and V in Europe (Figure 2.1) (Torroni et al., 1994a; Torroni et al., 1994b; Torroni et al., 1996). In order to address all the laboratories producing mtDNA data to follow the same nomenclature, in 1998 the cladistic rules for the hierarchical ordering of haplogroups and sub-haplogroups were established (Richards et al., 1998). This system involves the use of a capital letter to define a main branch of the mtDNA phylogenetic tree, followed by a number to identify a second level lineage (a sub-clade). By following a hierarchical notation, the next layer of clusters is denoted by a small letter as a suffix and the successive levels by numbers and small letters alternate (e.g., B2b5a1). Since then, the naming of new haplogroups has naturally evolved in a self-organizing way, ushered forward by those who produced novel data, following the rules of the nomenclature system and respecting the published record. Haplotypes that cannot be assigned to a specific known sub-lineage (either because such lineage has not been defined yet, or because insufficient data was produced for that sample) are placed under the last possible assignable node in the tree and are represented with a star at the end of their name (e.g., A2*).

Figure 2.1. Simplified mtDNA lineages and their geographical distribution (MITOMAP 2013).

20

2. Human population genetics

The increasing number of complete mtDNA sequences has greatly improved the capability to define the basal branches of the human phylogenetic tree. On the other hand, the large amount of data may create conflicting information from different papers concerning the same haplogroup. In order to avoid this problem van Oven and Kayser drew an updated phylogeny (http://www.Phylotree.org) by comparing the maximum number of available mtDNA sequences of each haplogroup (van Oven and Kayser, 2009). Since mitogenomes are constantly published, this online tree offers regular updates that allow checking for the latest haplogroup subdivisions and naming, getting the full up-to-date picture of the worldwide human mtDNA tree.

2.4. mtDNA reference sequences

The definition of the haplogroup identity of a sequence needs a previous comparison with a reference sequence. The first mitochondrial reference was the CRS (Cambridge reference sequence) corresponding to the first human mitogenome sequenced (Anderson et al., 1981). In 1999 the original mtDNA sample used to obtain the CRS was sequenced again (Andrews et al., 1999) and this resequencing revealed some discrepancies in respect to that obtained by Anderson et al. These differences include one extra base pair in position 3107, and incorrect assignments of single base pairs due both to sequencing artifacts and to the use of bovine samples to cover the regions technically difficult to sequence in the human sample. This ‘new version’ of CRS was called revised Cambridge reference sequence (rCRS). The rCRS belongs to the European haplogroup H2a2a1 (van Oven and Kayser, 2009) and is deposited in the GenBank NCBI database under the accession number NC_012920. In 2012, Behar and colleagues (Behar et al., 2012a) proposed that rCRS should be replaced by a new reference representing the ancestral genome from which all present mtDNAs descends (the ), placing the root between haplogroups L0 and L1’2’3’4’5’6. This reference was named RSRS (Reconstructed Sapiens Reference Sequence). The RSRS was reconstructed using ~19,000 mitogenomes and includes three spacers (positions 523, 524 and 3107) to preserve the historical CRS position numbering (Table 2.1). The introduction of a new reference trigged opposite opinions. The main criticisms arrived from forensic and medical genetics, in which a sample is commonly represented with the list of variants in comparison to the reference, and not as a sequence (i.e. FASTA format) (Bandelt et al., 2014). Given that the two references belong to phylogenetically distant haplogroups (RSRS to haplogroup L and rCRS to H2a2a1), the amount of differences between the two referece sequences is significant (Table 2.1), thus creating difficulties in the conversion of the list of mutations obtained reading sequences with respect to one or the other reference.

21

2. Human population genetics

Table 2.1. List of SNP differences between the RSRS and the rCRS. Position RSRS rCRS Position RSRS rCRS 73 G A 8701 G A 146 C T 8860 G A 152 C T 9540 C T 195 C T 10398 G A 247 A G 10664 T C 263 G A 10688 A G 523 - A 10810 C T 524 - C 10873 C T 750 G A 10915 C T 769 A G 11719 A G 825 A T 11914 A G 1018 A G 12705 T C 1438 G A 13105 G A 2706 G A 13276 G A 2758 A G 13506 T C 2885 C T 13650 T C 3594 T C 14766 T C 4104 G A 15326 G A 4312 T C 16129 A G 4769 G A 16187 T C 7028 T C 16189 C T 7146 G A 16223 T C 7256 T C 16230 G A 7521 A G 16278 T C 8468 T C 16311 C T 8655 T C 16519 C T aCoding-region mutations (nps 577-16023) are shown in bold; control-region mutations (nps 16024-576) are in italic (http://www.Phylotree.org).

2.5. mtDNA worldwide phylogeny

The human mtDNA tree (Figure 2.2) splits at its core layers into branches that carry exclusively African sequences belonging to the L haplogroup. This clade has been subdivided into 7 main branches (L0-L6) (Kivisild et al., 2004; Behar et al., 2008) and only one of these, haplogroup L3, is shared between the Africans and the rest of the world (Torroni et al., 2006). Thus, mtDNAs outside Africa fall into the two main clusters M and N, branching out from the root of haplogroup L3 (that also has given rise to some sub-clades specific to African populations). The number of extant non-African founder haplogroups can however be extended to include a third member, haplogroup R, which is a daughter-clade of N.

22

2. Human population genetics

Figure 2.2. Schematic representation of the worldwide phylogeny of human mtDNA. The tree was obtained by combining six trees constructed separately, with branch lengths estimated with maximum likelihood and the time-dependent molecular clock. Number and presence of clades is dependent on availability of data and not on worldwide frequencies. Ages are expressed in kilo years ago (Kya) (Soares et al., 2009).

23

2. Human population genetics

In the commonly used haplogroup nomenclature (Richards and Macaulay, 2001), branches within M include C, D, E, G, Q, and Z (Friedlaender et al., 2005; Kong et al., 2006; Chandrasekar et al., 2009; Derenko et al., 2010), while the subdivision within N includes A, I, S, W, X, and Y (Reidla et al., 2003; Kivisild et al., 2006; Kong et al., 2006; Derenko et al., 2007; Perego et al., 2009; Palanichamy et al., 2010) and the super haplogroup R, which contains also B, F, J, P, R0, T, and U (Torroni et al., 1996; Macaulay et al., 1999b; Palanichamy et al., 2004; Friedlaender et al., 2005; Kong et al., 2006; Achilli et al., 2008a; Cerný et al., 2011). The majority of the sub-lineages of haplogroup M are found primarily in South and East Asia, while N and its sub-clade R encompass most of the mtDNAs of West Asian and European populations. Haplogroups M and N are both rare in sub-Saharan Africa, where the mtDNAs belong almost exclusively to the macro- haplogroup L. However, the finding of distinct M variants in leaded the identification of eastern Africa as the source of a migration out of Africa involving the ancestors of current Asian populations (Quintana-Murci et al., 1999; Jobling et al., 2004).

2.5.1. The origin of modern humans

In 1871, Charles Darwin postulated in his ‘The Descent of Man’ that humans originated in Africa. However, this hypothesis was considered speculative until the 1980s when it was substantiated by a number of studies using present-day mitochondrial DNA, as well as evidences based on physical anthropology of archaic specimens. The first reconstructions of human genetic ancestry were based on data from mtDNA, MSY, and a small number of nuclear loci and much of what they revealed remains central to our understanding today (Groucutt et al., 2015). The model envisioned that a small, relatively isolated population of early humans evolved into modern Homo sapiens, and succeeded in spreading out of Africa towards Asia and then Europe and elsewhere, replacing all of the Neanderthals and other late archaic Homo sapiens (Stringer and Andrews, 1988). Although a common recent African origin for all modern humans is not the only hypothesis that has been set forth, it currently represents the near consensus position held by the scientific community (Liu et al., 2006). As concerning uniparental markers, the analysis of mtDNA from African populations suggested that East Africa played an important role in the origin and diversification of modern humans (Gonder et al., 2007). The maternal lineages of all living humans coalesce in a most recent common ancestor (MRCA), also known as "mitochondrial Eve" (Cann et al., 1987), that is typically represented as the top of the root node of the human phylogenetic tree, splitting in macro-haplogroups L0 and L1-L6 (Figure 2.3). Phylogenetic studies describing the global mitogenome diversity in humans confirmed the single origin of our most recent (female) common ancestor, who lived in East Africa around 200 thousands years ago (Kya), with no findings of Neanderthal or other mitogenomes in modern humans (Ingman et al., 2000; Mishmar et al., 2003; Gonder et al., 2007; Underhill and Kivisild, 2007; Soares et al., 2009).

24

2. Human population genetics

Figure 2.3. Schematic phylogenetic tree of the African L sub-lineages. The L0 and L1- L5 branches are highlighted in light green and tan, respectively. The branches are made up of haplogroups L0–L6 which, in turn, are divided into clades. Khoisan and non-Khoisan clades are shown in blue and purple, respectively. Clades involved in the African exodus are shown in pink. A time scale is given on the left. Approximate time periods for the beginning of African last stone age (LSA) modernization, appearance of African LSA sites, and solidization of LSA throughout Africa are shown by increasing colors densities (Behar et al., 2008).

25

2. Human population genetics

This picture of human origins and dispersal, based on the phylogeographic analysis of human mtDNA, is certainly not the whole story, which has to be based on the full range of human genome markers and as much ancient DNA evidence as possible. Indeed, studies of genome-wide variation have shown that modern humans underwent some interbreeding with earlier human offshoots from Africa. The impact of Neanderthals and Denisovans is overall rather small in modern subjects (1¬2%) (Green et al., 2010; Reich et al., 2010; Meyer et al., 2012), but certain archaic Homo genes (e.g. immune genes) played a major role on the fitness of the expanding populations (Abi-Rached et al., 2011; Mendez et al., 2012; Deschamps et al., 2016). Nevertheless, the “out-of-Africa” model (see below), first proposed three decades ago on the basis of mtDNA phylogeography, remains the consensus view of modern human origins, broadly supported also by paleontological and archeological as well as genome-wide evidence (Mellars et al. 2013).

2.5.2. The ‘Out of Africa’ exit

The reconstruction of human genetic ancestry and the dispersal of modern humans out of Africa represent still now an ongoing debate. Evidences related to the timing and routes of dispersal of Homo sapiens out of Africa come from fossil, genetic and archaeological data and are consistent with several models (Groucutt et al., 2015). The oldest modern human remains out of Africa were identified in the Near East (Skhul and Qafzeh caves) (Grün and Stringer, 1991) and in the United Arab Emirates (Jabel Faya site) (Armitage et al., 2011). These archaeological and fossil evidences suggest that humans could have been present outside the African continent 90-120 Kya, demonstrating their presence in eastern Arabia during the last interglacial. In support of this findings, climatic models demonstrated the presence of enabling migration paths out of Africa throw resource-rich corridors established during three time windows: one 130-118 Kya (not associated with human migration out of Africa (Timmermann and Friedrich, 2016)), one 106-94 Kya and another 89-73 Kya (deMenocal and Stringer, 2016). If the hypothesis of an early exodus ~100 Kya is true, it was suggested that the presence of fossils and artifacts outside of Africa could be attributed to an earlier localized and short-lived dispersal. All of the members of this group became extinct and thus had no descendants in the modern human population (Mellars 2006a; Richards et al., 2006). This hypothesis finds validations by evidences coming from mtDNA studies. All present-day mtDNAs in non-Africans are placed within the two branches, M and N, of haplogroup L3 (Behar et al., 2012a). This has led to argument that the group leaving Africa must have been one and very small, given that only L3 type survived and all other possible founding branches, if any, were lost by genetic drift. The age of L3 is an upper boundary on the exit from Africa and places the maximum at ~70 Kya, virtually ruling out a successful exit before the Toba volcanic supereruption in Sumatra 74 Kya (Soares et al., 2012) (Figure

26

2. Human population genetics

2.4). On the other hand, haplogroups M and N most likely originated after the exit from Africa, thus representing a lower bound for this dating. The ages of these two haplogroups, between 50 and 65 Kya, are quite close to the one of their L3 ancestor in Africa, suggesting that the expansion of L3 in Eastern Africa and its exit out of the continent could be part of the same demographic event (Figure 2.4) (Soares et al. 2012).

Figure 2.4. Schematic representation of major dispersals from Eastern Africa during the Pleistocene (Soares et al., 2012).

Two alternative scenarios have been proposed to explain the presence of the two sub-branches of the mtDNA haplogroup L3 both in Europe and Asia. The first postulates a ‘Levantine route’ from northeast Africa to the Levant across the Sinai Peninsula ~45 Kya (Stringer et al., 1988; Prugnolle et al., 2005). However, the route along the Levantine corridor did not explain why adjacent Europe was settled thousands of years later than distant Australia (Forster and Matsumura, 2005), thus, this model has been replaced by a ‘southern route model’ according to which dispersal probably started ~70 Kya from the Horn of Africa to the Persian Gulf and further along the tropical coast of the Indian Ocean to Southeast Asia and Australasia (Figure 2.5). This second scenario is strongly supported by palaeoenvironmental evidence, confirming that a northern migration would have been impossible during the glacial period extending from ~70 to 50 Kya (Forster and Matsumura, 2005; Macaulay et al., 2005; Mellars, 2006b; Torroni et al., 2006). Recently, three new studies based on high-coverage whole genome analyses on 270 populations across the globe (Malaspinas et al., 2016; Mallick et al., 2016; Pagani et al., 2016) provided a high-resolution portrait of human genetic diversity, allowing new inferences to refine and extend current models of historical human migration out of Africa (mainly based on mtDNA data) (Tucci and Akey, 2016). They came to different conclusions about the out of Africa dispersal. Pagani

27

2. Human population genetics and colleagues identified a genetic signature in the genomes of present-day Papuans that suggests human presence outside Africa before the main out of Africa split time that involved other Eurasians (~75 Kya) (Pagani et al., 2016), in line with a multiple dispersal model. The other two works, instead, support the scenario that all contemporary non-Africans branched off from a single ancestral population (Malaspinas et al., 2016; Mallick et al., 2016).

Figure 2.5. Hypothetical routes along the Indian Ocean coastline that could have been taken by early humans from Africa. MtDNA data from Malaysians and aboriginal Andaman islanders suggest that human settlements appeared along the Indian Ocean coastline 60 Kya (Forster and Matsumura, 2005).

In conclusion, the debate about the first out of Africa is still open. Modern humans could have crossed the Bab al-Mandab more than 100 Kya, a time that predates the upper bound indicated by genetic studies, but from a genetic point of view, in particular from the mtDNA perspective, a single exit about 60-70 Kya is the most plausible scenario to explain the first migratory event that led to the peopling of the planet.

2.5.3. Human colonization of the world

2.5.3.1. The peopling of Australasia

Soon after the rapid expansion along the coastlines of southern Asia, south-eastern Asia, and Indonesia, the wave of migration reached New Guinea and Australia (Figure 2.5), at a time when the lower sea levels joined the two islands into one land mass, necessitating sea travel only across narrow straits (Hudjashov et al., 2007). The deep and specific phylogenetic lineages now within this former landmass indicate a small founding population size and subsequent isolation of Australia and New Guinea, from the rest of the world. Approximately 3.5 Kya, an

28

2. Human population genetics expansion of Austronesian-speakers arrived in Near Oceania and the descendants of these colonizers spread to the far corners of the Pacific, colonizing Remote Oceania (Duggan et al., 2014; Duggan and Stoneking, 2014). These founder events and the lack of contact could underlie the divergent morphological development seen in the Australian human fossil record and could also help in explaining the remarkably restricted range of Pleistocene Australian lithic industries and bone artifacts compared with contemporary cultures elsewhere in the world (Mellars 2006c). The richest basal variation in the founder haplogroups M, N and R is found along the southern stretch of Eurasia, particularly in the Indian subcontinent (Chaubey et al., 2008; Sun et al., 2006; Palanichamy et al., 2004) and a similarly high basal diversification are present in Southeast Asia (Hill et al., 2007; Kong et al., 2006; Macaulay et al., 2005). These data suggest a rapid colonization along the southern coast of Asia, reaching Sahul ∼60 Kya. The expansion northwards to fill the heartland of the continent occurred only later, ~45 Kya, when a combination of technology and climatic conditions enabled the exploration of the interior of Eurasia. One of the marginal extensions eventually led to the peopling of Europe (Mellars, 2006b).

2.5.3.2. The peopling of Europe

The first peopling of Europe by modern humans occurred about 45 Kya (Gamble et al., 2004; Mellars 2006b; Mellars 2006d). The genetic pool of present-day Europeans is suggested to derive from the admixture of three ancestral populations: west European Paleolithic hunter-gatherers, ancient north Eurasians - who were closely related to a 24 Ky-old skeleton from Siberia (Raghavan et al., 2014) -, and early Neolithic European farmers (Lazaridis et al., 2014). However, an ongoing debate concerns the relative amount of genetic input into modern Europeans from Paleolithic versus Neolithic waves of settlement. Europeans have a high level of haplogroup diversity within haplogroups N and R (H, HV, N1, J-T, U, I, W, and X) but lack of haplogroup M almost entirely (Underhill and Kivisild 2007, Soares et al., 2010). Members of mtDNA haplogroup U5 probably marked the first Upper Paleolithic entry in Europe from the Near East, while populations bearing U6 (and M1) entered North Africa (Olivieri et al., 2006; Pennarun et al., 2012). Paleolithic events include also the resettlement from southern refugia after the Last Glacial Maximum (LGM) (~18 Kya), marked by lineages originated in these refugia. The expansion from the Franco-Cantabrian refugium left signatures in haplogroups V (Torroni et al., 1998; Torroni et al., 2001), H1, H3 (Achilli et al., 2004; Pereira et al., 2005), H5 and U5b1b (Soares et al., 2010; Tambets et al., 2004), while U5b3 marks a dispersal event from the Italian Peninsula (Pala et al., 2009), U4 and U5a from the East European Plain (Malyarchuk et al., 2008; Malyarchuk et al., 2010a) and J, T, I and W from the Near Eastern refugia (Pala et al., 2012; Olivieri et al., 2013). The advent of agriculture and pastoralism usually distinguish Neolithic from earlier Paleolithic or Mesolithic hunting-gathering cultures. The Neolithic

29

2. Human population genetics period of interest for Europe began around 10 Kya when farmers moving from the Near East rapidly (~5 Ky) reached most of the European continent. Most of the studies carried on ancient DNAs have led to the prevailing conclusion that Paleolithic and Mesolithic hunter-gatherer European populations differed genetically from early Neolithic farmers, in turn implying that there was a wide- scale replacement across Europe from the Near East in the early Neolithic, with limited assimilation of native Europeans (Pinhasi et al., 2012; Lazaridis et al., 2014; Omrak et al., 2016; Posth et al., 2016). However, according to a recent study on ancient genomes, in the time frame from ~37 to ~14 Kya, Paleolithic Europeans derive from a single ancestral population, but, starting from 14 Kya (the first significant warming period – the Bølling-Allerød interstadial - after the LGM), there was a migration wave from the Near East contributing to the European genetic pool (Fu et al., 2016). The notion of a genetic input from the Near East into and across Europe in the late Pleistocene prior to the arrival of the Early Neolithic material culture in Greece ~8.5 Kya (Manning et al., 2014) is a novelty in human paleogenomics. Actually, a Late Glacial/ Postglacial recolonization of Europe from the Near East before the migration waves associated with the onset of farming had been already hypothesized in mtDNA studies (Pala et al., 2012; Olivieri et al., 2013, Gandini et al., 2016, Richards et al., 2016). In addition, several haplogroups often assumed to have dispersed from Anatolia only with the advent of the Neolithic, similarly to all non-U8, non-U5 and non-U2 mtDNAs, were already present in Mesolithic Mediterranean Europe, particularly in Italy (Pereira et al., 2016), a suggestion in line with the recent detection of two K1c mitogenomes in Mesolithic Greece (Hofmanová et al., 2016). These data leave still open questions concerning the populations involved in the genetic contributions to Paleolithic and Neolithic Europeans.

2.5.3.3. Back to Africa

Together with the migration towards the western part of Eurasia around 45 Kya, the migratory flow probably took also a more southern route, back again into Africa, across the Mediterranean side. This dispersal event has been hypothesized studying the distribution of the mtDNA haplogroups M1 and U6a, which are found virtually only in Africa and whose arrival would have temporally overlapped with the event that led to the peopling of Europe by modern humans (Olivieri et al., 2006; Gonzalez et al., 2007). Similarly, three other potentially Eurasian ancient mtDNA clades have been identified in Eastern Africa: N1a1a, HV1 and R0a (Cerný et al., 2011; Musilová et al., 2011; Fernandes et al., 2012), providing additional evidence of a ‘back to Africa’ migration. This scenario has been recently confirmed also by the finding of a 35,000-year-old individual from Romania with a mitogenome belonging to haplogroup U6*, with a haplotype not previously found in ancient or present-day humans (Hervella et al., 2016). This finding is in line with a migration event from Western Asia to Africa during which haplogroup U6 diversified until the emergence of the present-day African lineages, but indicates that U6 moved at the same time also to Europe, where it later disappeared.

30

2. Human population genetics

2.5.3.4. The peopling of the Americas

America, the double continent, was the last to be colonized by modern humans and the study of its peopling represents one of the first and the most significant example of interdisciplinary interaction between archaeology, linguistics and genetics (Greenberg et al., 1986). In last decades phylogenetic studies were particularly useful to shed light on America’s first colonizers, particularly regarding the timing of their arrival and the routes they took (Schurr and Sherry, 2004; O'Rourke and Raff, 2010). Many major contributions have come from mtDNA studies, mainly carried out in modern populations, but also from ancient human remains (Kemp et al., 2007; Gilbert et al., 2008; Raff et al., 2010; Tackney et al., 2015; Fehren-Schmitz et al., 2015; Llamas et al., 2016). The study of the mitogenome variability along all the Americas allowed to increase the overall number of maternal founding lineages from just four - initially named A, B, C and D (Schurr et al., 1990; Torroni et al., 1992; Torroni et al., 1993) – to 16 (Figure 2.6). Among these, eight (A2, B2, C1b, C1c, C1d, C1d1, D1 and D4h3a) are called pan-American haplogroups, as they are distributed across the double continent, while the others are less frequent and generally show a distribution restricted to specific geographic areas, i.e. North America (A2a, A2b, C4c, D2a, D3, D4e1, X2a and X2g) (Perego et al., 2010). Additional sub-lineages have evolved from the pan-American haplogroups and exhibit limited geographic distribution ranges (Bodner et al., 2012; de Saint Pierre et al., 2012; Achilli et al., 2013). Overall these studies, together with archeological and climatic evidence, supports the scenario that the migratory event that led to the peopling of the Americas occurred approximately 20 Kya, during the Last Glacial Maximum (LGM). It refers to the time of maximum extent of the ice sheets during the last glacial period, between 13 and 30 Kya, with a peak around 18-20 Kya (Clark et al., 2009). During the peak of the last Ice Age, the sea level was considerably lower than today, and Asia and North America were connected by an exposed massive land bridge known as Beringia, now submerged. One very contentious issue is whether the settlement of the Americas occurred by means of a single or multiple streams of migration. Many analyses of Native American genetic diversity suggest a single migratory wave from an ancestral population that lived in Beringia during the LGM, probably coming from South-Central Siberia (Zegura et al., 2004; Schroeder et al., 2007; Tamm et al., 2007; Wang et al., 2007; Kemp and Schurr, 2010; Hoffecker et al., 2014; Hoffecker et al., 2016). However, evidence coming both from mtDNA and genome-wide analyses identified at least one or two additional source populations, leading to the so called ‘tripartite migration model’. This model, originally proposed in 1980’ by anthropometric and linguistic data, postulates that the Americas were settled through three separate population movements whose identity was expressed in linguistic terms as Amerinds, Na-Dene, and Eskimo– Aleut speakers (Williams et al., 1985; Greenberg et al., 1986; Greenberg 1987; Reich et al., 2012; Achilli et al., 2013, Raghavan et al., 2014; Raghavan et al., 2015).

31

2. Human population genetics

Figure 2.6. MtDNA tree encompassing the roots of all known founding Native American haplogroups. The distinguishing mutational motifs for the 16 known Native American haplogroups are reported on the branches. Mutations in the control region are in red, while mutations in the coding region are listed in black; they are transitions unless a base is explicitly indicated. The prefix @ designates reversions, while suffixes indicate transversions (to A, G, C, or T), indels (+, d). Recurrent mutations within the tree are underlined (Perego et al., 2010; Kumar et al., 2011).

32

2. Human population genetics

Independently to how many migration events occurred, the reduced genetic diversity found in the Americas is a sign of a significant founder effect, suggesting that the number of Native American founders was limited. The ancestral Beringian populations probably retreated into refugia during the Ice Age, where new genetic variants evolved through mutation and genetic drift. Here they remained isolated for ∼5 Ky, before spreading rapidly throughout the Americas (Reich et al., 2012; Chatters et al., 2014; Raff and Bolnick 2014; Rasmussen et al., 2015; Raghavan et al., 2015; Llamas et al., 2016). The southward expansion from Beringia to the extreme southern tip of South America, covering a latitude gap of more than 100° (from about 65° North to 54° South) and a distance of more than 15,000 Km, possibly occurred in a time span of less than 2 Ky (Kumar et al., 2011; Bodner et al., 2012). It likely occurred following two entry ways: the Pacific coastal route (deglaciated more than 17 Kya), probably playing the major role in the peopling of the double continent, and the interior ice-free corridor passage between the Laurentide and Cordilleran ice sheets (opened ∼13.5 Kya) (Figure 2.7), that also had a significant impact, at least for the colonization of North America (Fix 2005; Fagundes et al., 2008; Perego et al., 2009; Hooshiar Kashani et al., 2012; Kemp et al., 2010; Perego et al., 2010; Achilli et al., 2013).

Figure 2.7. The two paths of migration from Beringia. The Pacific coastal route is marked in yellow while the interior route is in light blue.

33

2. Human population genetics

Likewise, it was proposed that the migration to South America may have occurred down the Pacific coast and later eastward across the Andean Cordillera, or after a split of the founding population in the northern area of South America with population groups moving separately across the eastern Andean highlands into the Amazonian basin (Bodner et al., 2012, de Saint Pierre et al., 2012; Reich et al., 2012; Homburber et al., 2015; Llamas et al., 2016). Intriguingly, some recent genome-wide data have also risen the possibility that some Amazonian Native Americans descend partly from a Native American founding population that carried ancestry more closely related to indigenous Oceanian populations. This signature is apparently not present in Northern and Central Americans suggesting a more diverse set of founding populations of the Americas than previously accepted (Raghavan et al., 2015; Skoglund et al., 2015).

34

MY CONTRIBUTION

MY CONTRIBUTION

35

3. The first peopling of Sardinia

3. The first peopling of Sardinia

3.1. Background

The amount and quality of paleogenomics data collected in the last decade revealed an unexpected complexity in the genetic history of Europeans, but the origin of the ancestral population sources remain controversial. This is particularly true for Sardinians, who are ‘outliers’ in the European genetic landscape and, according to nuclear data, the closest contemporary population to early European Neolithic farmers. A recent study carried on genome-wide data from ancient specimens, introduced the notion of a genetic input from the Near East into Europe in the late Pleistocene (Fu et al., 2016). Indeed, until that, the prevailing conclusion of ancient DNA studies has been that there was a replacement of Paleolithic and Mesolithic hunter-gatherer European populations from Near Eastern only in the early Neolithic (Pinhasi et al., 2012; Lazaridis et al., 2014; Omrak et al., 2016). Although a Near Eastern influence preceding the Neolithic is a new concept in human paleogenomics, it is not in phylogeographic studies of modern mtDNA variation. Indeed, different mtDNA studies have led to hypothesize a Late Glacial/Postglacial recolonization of Europe from the Near East before the migration waves associated with the onset of farming (Pala et al., 2012; Olivieri et al., 2013; Richards et al., 2016; Gandini et al., 2016). In this complex scenario an intriguing observation concerns the peopling of Sardinia, an island that remained unconnected with the mainland even when the sea level was at its lowest during the LGM (Shackleton et al., 1984) and that was probably the last of the large Mediterranean islands to be colonized by modern humans (Sondaar, 1998). Modern Sardinians, a unique reservoir of distinct genetic signatures (Cavalli-Sforza et al., 1994; Pala et al., 2009; Francalacci et al., 2013; Sidore et al., 2015), on one hand apparently harbor the highest levels of nuclear genome similarity with European Neolithic farmers (Lazaridis et al., 2014) and an extensive similarity with the Late Neolithic/Chalcolithic Tyrolean Iceman (Keller et al., 2012; Sikora et al., 2014) but, on the other hand, they differ substantially from Near Eastern Neolithic farmers including those from Anatolia (Lazaridis et al., 2016). These findings have led to the view that, in modern Europe, Sardinians may have best preserved the gene pool of Neolithic farmers, possibly because their ancestors were less affected by subsequent Bronze Age dispersals into Europe (Haak et al., 2015). In the course of our study, we analyzed a large dataset of modern and ancient mitogenomes focusing on those haplogroups that are restricted to Sardinia (Sardinian-Specific Haplogroups, SSHs), in order to learn more about the genetic ancestry of this islanders and, consequently, shed light also on the European genetic history.

36

3. The first peopling of Sardinia

3.2. The sample

To learn more about the genetic ancestry of Sardinians, in collaboration with the University of Sassari (Italy) and the Max Planck Institute of Jena (Germany) we collected a large dataset of novel Sardinians mitogenomes including 3,491 from modern islanders (data not shown) and 21 from ancient specimens (Table 3.1). Ancient samples from all cultural phases of Sardinia between the Neolithic and the Nuragic Final Bronze Age were excavated from different rock-cut tombs, megalithic tombs, caves and rock shelters from skeletal remains and were radiocarbon dated from 5.3 to 2.9 Kya (Skeates et al., 2013). Specifically, five samples were assigned to the Neolithic period, one to the Figliosa culture (Copper Age), two to Monte Claro culture (late Copper Age), one to Bell Beaker culture (early Copper Age), four to the Bronze Age and eight to the Nuragic phase (Table 3.1).

Table 3.1. Archaeological information, radiocarbon dating and haplogroup affiliation of the 21 ancient Sardinian mitogenomes sequenced in this study. Date Librar Site name C14 age Archeology Material interval Haplogroup y ID (Province) (BP)1 (cal BCE) S'Isterridolzu Tooth (from MA73 Neolithic 5279 ± 30 4230-4000 J2b1a (SS) mandible) S'Isterridolzu Tooth (from MA74 Neolithic 5309 ± 30 4240-4050 J1c3 (SS) mandible) Tooth (from MA76 Noeddale (SS) Neolithic 5296 ± 30 4240-4010 H1 mandible) Tooth (from MA77 Noeddale (SS) Final Neolithic 4853 ± 27 3700-3540 H1e1 mandible) Su Asedazzu Early Bronze M1 taken from child's MA78 3658 ± 26 2140-1950 H3u (CA) Age mandible Longu Fresu MA79 Neolithic Tibia 5258 ± 34 4230-3980 U5b2b3 (CA) Su Cannisoni Early Bronze MA81 Ulna proximal left 3555 ± 35 2020-1770 K1b1b1 (CA) Age Su Cannisoni MA82 Nuragic Skull fragment 3220 ± 28 1600-1430 H1 (CA) Stampu Erdi Early Bronze MA85 Tibia 3579 ± 27 2020-1880 H1e1a (CA) Age Monte Gastea Early Bronze MA86 Fibula right 3647 ± 29 2140-1940 U5b2a (CA) Age Su Asedazzu MA87 Nuragic Upper molar 2865 ± 35 1190-920 J2b1a (CA) Su Asedazzu Early Bronze Second upper molar MA88 3794 ± 34 2350-2060 U5b2b5 (CA) Age taken from mandible Cannas di Sotto- MA89 Filigosa Petrous fragment 4551 ± 26 3370-3110 K1b1a1 Tomb 12 (CI) Seddas de Daga MA92 Monte Claro Femur fragment 4067 ± 26 2840-2490 J2a1a1 (CI) MA10 Scaba ’e Arriu Monte Claro Temporale fragment 4082 ± 23 2850-2500 T2c1d 0 M (VS) MA10 Bingia ’e Monti Early Bronze Bone fragment 3984 ± 29 2570-2460 K1a 4 (OR) Age MA10 Padru Jossu Bell Beaker Bone fragment 3912 ± 26 2470-2310 HV0j1 8 (VS)

37

3. The first peopling of Sardinia

MA11 Ingurtosu Nuragic Femur fragment 2941 ± 24 1220-1050 T2b3 0 Mannu (CA) MA11 Is Arutas (OR) Nuragic Tooth 2952 ± 25 1260-1060 V 2 MA11 Mont ’e Prama Nuragic Tooth 3008 ± 25 1380-1130 H3 5 (OR) MA13 Is Arutas (OR) Nuragic Skull fragment 2941 ± 27 1230-1050 H5a 8 1All 14C dates are newly reported in this study and calibrated using IntCal13 calibration curve in OxCal4.2 (Ramsey and Lee, 2013). The calibrated dates in years before common era (cal BCE) with 95.4% confidence intervals are also reported.

Modern subjects were sampled from various areas of the island, encompassing each Sardinian province. These included 2,067 individuals participating in the SardiNIA project, a longitudinal study of age-related quantitative traits encompassing 6,921 individuals, more than 60% of the adult population of four villages (Lanusei, Ilbono, Elini and Arzana) in the Lanusei valley (Pilia et al., 2006; Sidore et al., 2015;) and 1,424 individuals participating in case-control studies of Multiple Sclerosis (Sanna et al., 2010) and Type 1 Diabetes (Zoledziewska et al., 2009), encompassing 10,106 individuals and 1,109 families composed of an affected child and two unaffected parents. All participants gave informed consent to study protocols, which were approved by the Sardinian local research ethic committees: Comitato Etico di Azienda Sanitaria Locale 8, Lanusei (2009/0016600) and Comitato Etico di Azienda Sanitaria Locale 1, Sassari (2171/CE), and by the NIH Office of Human Subject Research as governed by the Italian IRB approval. From the initial dataset of 3,491 modern sequences, we removed 1,355 oldest maternally related samples on the bases of pedigree data or kinship evaluation of nuclear genomes, and 44 samples with a non-Sardinian maternal origin. To the remaining 2,092 mitogenomes, we added other 125 previously published Sardinian mtDNA complete sequences, achieving a final dataset of 2,217 modern mitogenomes.

3.3. Results

3.3.1. Phylogeny and phylogeography

To survey mitogenome genetic variation across modern Sardinians, we assessed the phylogenetic relationships of the 2,217 modern mitogenomes (2,092 not maternally related, plus 125 from literature) with all publicly available worldwide mitogenomes (more than 26,000, data not shown). We built the phylogenetic tree of the Sardinian mitogenomes (data not shown) by using a maximum parsimony approach and we assigned labels to new haplogroups by following the nomenclature proposed by the PhyloTree database build 17 (at http://www.phylotree.org/) (van Oven and Kayser, 2009). Successively, we identified the ‘Sardinian-Specific Haplogroups’ (SSHs) when three requirements were fullfilled: (i) they included only mtDNAs of Sardinian

38

3. The first peopling of Sardinia origins; (ii) they were defined by at least three mitogenomes, encompassing a minimum of two haplotypes (i.e. differing by at least one homoplasmic mutation); (iii) they enclosed at least one stable mutation (i.e. not recurrent in the tree) at their roots. To evaluate whether a haplogroup was geographically restricted to Sardinia, we checked the presence of each mutation in the diagnostic motif at the root of the Sardinian-specific branches in our in-house database of more than 26,000 mitogenomes from modern worldwide populations. This allowed us to identify the nodes upstream to the Sardinian-specific haplogroups whose mutational motifs characterize mitogenomes present also outside Sardinia (Table 3.2, Figure 3.3). The same database was surveyed to identify the mitogenomes that are phylogenetically most closely related to each Sardinian-specific branch (Closest External Mitogenomes - CEMs), and shared by descent at least one of the root mutations characterizing the identified Sardinian-specific branch. At the end of the comparison we obtained 166 CEMs that we added to the previous phylogenetic tree (not shown).

Table 3.2. List of the new nodes upstream to the Sardinian-specific haplogroups. The accession numbers of mitogenomes employed to define these nodes are provided. GenBank Haplogroup/node Reference ID/1000GP L0a1a4 EU092763 Behar et al., 2008 W1 with 143 GU123002 Malyarchuk et al., 2010b W1 with 143 KJ445935 Zheng et al., 2014, direct submission W with 195@ KF056262 Kang et al., 2013 J1b10 JX153311 Raule et al., 2014 J1c3l HG00261 1000 Genomes Project J1c3n'o'p 0153 This study J1c3n'o'p 0154 This study Greenspan, Family Tree DNA, direct HQ154118 J1c3n'o'p submission Greenspan, Family Tree DNA, direct KR072973 J1c3n'o'p submission J1c3n'o'p JQ702826 Behar et al., 2012a T2e8 EU597536 Hartmann et al., 2008, direct submission Greenspan, Family Tree DNA, direct KM378618 U2e4a submission U2e4 NA20807 1000 Genomes Project U3c1 HM852803 Schönberg et al., 2011 K1a2b with 16278 JQ704017 Behar et al., 2012a Greenspan_Family Tree DNA, direct GQ281051 K1a2b with 16278 submission K1g JQ702852 Behar et al., 2012a K2a9 with 152@-709@ JQ702282 Behar et al., 2012a

39

3. The first peopling of Sardinia

K2a9 with 152@-709@ JQ701808 Behar et al., 2012a U5b1 with 16189-16192@- HQ675040 Seoane et al., 2011 16311 U5b1 with 16189-16192@- JX677560 Family Tree DNA 16311 U5b1 with 16189-16192@- Greenspan, Family Tree DNA, direct KT372902 16311 submission H1ce KM102152 Just et al., 2015 H1e1d JQ705256 Behar et al., 2012a H1e1d 0979 This study Greenspan, Family Tree DNA, direct JN400598 H1e1d submission H1e1d 0971 This study H1cq JQ704557 Behar et al., 2012a H1cq JQ704263 Behar et al., 2012a H1 with 152-16311 JX153798 Raule et al., 2014 H1 with 152-16311 FJ236980 Ennafaa et al., 2009 H1 with 16320 EF177432 Pereira et al., 2007 H109 DQ473537 Kornblum et al., 2008 H3ba JQ704315 Behar et al., 2012a H3ba KM102109 Just et al., 2015 H3f AY738984 Achilli et al., 2004 H3f EU150187 Family Tree DNA H3f JQ324586 Behar et al., 2012b H4a1b1 JQ324591 Behar et al., 2012b H4a1b1 KF162920 Li et al., 2014 H4a1b KF162118 Li et al., 2014 V30 JN581640 Bertolin et al., 2011, direct submission V31a NA19725 1000 Genomes Project V31a KJ446248 Zheng et al., 2014, direct submission V31 KJ446250 Zheng et al., 2014, direct submission V31 AF381990 Maca-Meyer et al., 2001 V31 KJ446247 Zheng et al., 2014, direct submission HV0h EF660924 Gasparre et al., 2007

Following these guidelines, we defined 89 SSHs in which clustered 1,738 out 2,217 modern Sardinian mitogenomes (78.4%) (Table 3.3). These 89 SSHs, 80 of which are defined here for the first time, include descendants from all major macro-haplogroups of the human mtDNA tree (L, M, N and R) and are defined by 104 mutations (Table 3.3). About 51% of modern Sardinian-specific mtDNAs fall into HV, 27% into JT, 17% into U and 5% into other lineages. These frequencies are close to those reported in typical western European populations. However, when assessed at a higher level of haplogroup resolution, they differ substantially

40

3. The first peopling of Sardinia from those in continental Europe. This is most marked for H1 and H3, with peak values on the island of 18.5% and 18.4%, respectively (Table 3.4). The phylogenetic analyses of the 21 Sardinian mitogenomes deriving from ancient samples showed that they harbored 21 distinct haplotypes in 19 sub- haplogroups belonging to macro-haplogroups R0, JT and U (Table 3.1 and Figure 3.1). These haplotypes were compared with those from modern Sardinians and with 417 ancient mitogenomes available in the literature (data not shown).

Table 3.3. Age estimates and diagnostic mutational motifs of the 89 Sardinian-specific haplogroups. The same information is also provided for the upstream nodes from which the Sardinian-specific haplogroups depart. Sardinian-specific haplogroups are underlined. BEAST age estimates ML age estimates (Ky) (Ky) Mutational Haplogroup Na Soares et al., Posth et Soares et Posth et motifs 2009b al., 2016c al., 2009b al., 2016c

f RSRS 176.69 ± 11.34 >L0a1a4 4 7.98 ± 4.03 6.68 ± 1.68 8.43 ± 2.51 7.22 ± 2.10 13491 >>L0a1a4a 3 1.62 ± 1.46 1.38 ± 0.63 1.99 ± 1.14 1.70 ± 0.94 14455 f >L3 67.26 ± 4.43 f >>M 49.59 ± 1.82 >>>M1a1b1 43 9.18 ± 2.71 7.66 ± 1.12 12.72 ± 1.82 10.92 ± 1.42 14769 >>>>M1a1b1a 28 4.49 ± 1.49 3.80 ± 0.63 10.15 ± 1.85 8.69 ± 1.53 5319 f >>N 58.86 ± 2.35 >>>Ig 20.10 ± 1.75 >>>>I1a1ag 3.80 ± 0.55 >>>>>I1a1a4 3 0.81 ± 1.03 0.69 ± 0.45 1.42 ± 0.84 1.23 ± 0.70 6680 >>>>I5a2 30 13.64 ± 3.42 11.22 ± 1.38 12.29 ± 1.92 10.53 ± 1.52 3615 8742 >>>>>I5a2b 19 1.75 ± 1.98 1.50 ± 0.86 3.65 ± 1.51 3.14 ± 1.26 11447 12810 >>>>N1b1al 59 13.27 ± 1.75 10.94 ± 0.71 16.89 ± 2.26 14.38 ± 1.89 9335 11362 >>>>>N1b1a9 4 9.39 ± 3.70 7.83 ± 1.53 8.46 ± 2.39 7.25 ± 2.01 14323 g >>>>W 16.80 ± 2.65 >>>>>W1g 10.40 ± 1.45 >>>>>>W1j 4 0.67 ± 0.85 0.58 ± 0.37 1.89 ± 1.09 1.64 ± 0.93 14821 >>>>>W10 5 4.56 ± 2.37 3.86 ± 1.01 4.96 ±.2.10 4.29 ± 1.86 4991 9088 >>>>X2f 19.23 ± 2.64 >>>>>X2bf 9.68 ± 2.47 >>>>>>X2b3 10 2.50 ± 1.64 2.13 ± 0.71 4.27 ± 1.71 3.66 ± 1.43 8269 f >>>>>X2c 9.51 ± 4.23 >>>>>>X2c2 8.87 ± 1.66 8.00 ± 1.28 >>>>>>>X2c2a 15 3.25 ± 2.09 2.76 ± 0.90 5.66 ± 1.76 5.07 ± 1.59 13807 15314 f >>>R 56.52 ± 2.08 h >>>JT 58.00 ± 7.50 h >>>>J 43.30 ± 6.10 h >>>>>J1 33.30 ± 6.20 >>>>>>J1b10 5 28.03 ± 10.00 22.23 ± 3.78 16.99 ± 3.32 14.67 ± 2.80

41

3. The first peopling of Sardinia

4080 5471 >>>>>>>J1b10a 1.65 ± 1.47 1.41 ± 0.64 1.57 ± 1.02 1.34 ± 0.89 4 13329 15193 >>>>>>J1ch 16.40 ± 2.00 >>>>>>>J1c1b1a 22 10.50 ± 2.66 8.72 ± 1.09 8.71 ± 1.68 7.42 ± 1.33 10463 >>>>>>>>J1c1b1a2 9 1.07 ± 0.94 0.91 ± 0.41 2.41 ± 1.05 2.05 ± 0.92 12192 >>>>>>>J1c3h 11.10 ± 2.50 >>>>>>>>J1c3h 42 5.50 ± 3.13 4.64 ± 1.32 6.53 ± 1.78 5.71 ± 1.51 8659 >>>>>>>>J1c3l 24 11.55 ± 3.68 9.57 ± 1.50 8.01 ± 1.97 6.85 ± 1.54 >>>>>>>>>J1c3l1 23 8.76 ± 3.96 7.32 ± 1.64 5.75 ± 1.76 4.92 ± 1.41 8790

>>>>>>>>J1c3n'o'p' 33 5.67 ± 2.37 4.78 ± 1.00 6.27 ± 1.50 5.37 ± 1.26 1040 4853 >>>>>>>>>J1c3n 0.41 ± 0.42 0.35 ± 0.18 1.98 ± 0.72 1.71 ± 0.60 21 8270 >>>>>>>>>J1c3o 3 2.23 ± 2.43 1.90 ± 1.05 1.56 ± 0.97 1.27 ± 0.80 13995 >>>>>>>>>J1c3p 6 0.49 ± 6.77 0.42 ± 2.96 1.33 ± 0.64 1.14 ± 0.50 8712 h >>>>>J2 36.80 ± 5.50 h >>>>>>J2a1a1 10.90 ± 2.40 >>>>>>>J2a1a1d 62 4.07 ± 1.71 3.45 ± 0.73 6.58 ± 1.53 5.90 ± 1.26 13753 >>>>>>>J2a1a1f 15 0.90 ± 1.15 0.77 ± 0.50 2.78 ± 1.32 2.47 ± 1.15 3375 h >>>>>>J2b1a 11.30 ± 1.80 >>> >>>>J2b1a2 31 10.32 ± 2.77 8.58 ± 1.14 12.41 ± 2.61 10.73 ± 2.27 6216 3010 4131 >>>>>>>>J2b1a2b 11 1.56 ± 1.37 1.34 ± 0.59 2.92 ± 1.39 2.46 ± 1.23 10410 11089 15662 >>>>>>>>J2b1a2c 7 1.81 ± 1.68 1.55 ± 0.73 3.21 ± 1.61 2.88 ± 1.47 15394 >>>>>>>J2b1a5 85 3.81 ± 1.17 3.24 ± 0.50 7.21 ± 1.74 6.29 ± 1.50 6491

>>>>>>>J2b1a7 46 0.74 ± 0.42 0.63 ± 0.18 4.12 ± 1.58 3.45 ± 1.34 10286 >>>>>>>J2b1a8 10 5.59 ± 3.13 4.71 ± 1.32 5.22 ± 1.94 4.53 ± 1.68 15319 h >>>>T 28.80 ± 6.40 >>>>>T1a 17.20 ± 2.90 >>>>>>T1a14 28 1.73 ± 2.31 1.48 ± 1.00 3.73 ± 1.61 3.08 ± 1.37 10580 10619 h >>>>>T2 21.30 ± 2.30 >>>>>>T2b3a 63 5.42 ± 3.66 4.57 ± 1.55 10.95 ± 2.01 9.44 ± 1.60 5656 >>>>>>>T2b3a1 46 5.08 ± 1.72 4.29 ± 0.73 6.30 ± 1.26 5.65 ± 1.06 >>>>>>>>T2b3a1a 3 2.81 ± 2.33 2.39 ± 1.00 1.42 ± 0.92 1.21 ± 0.74 10398

>>>>>>>>T2b3a1b 12 4.32 ± 2.35 3.66 ± 1.00 4.52 ± 1.37 4.02 ± 1.21 >>>>>>>>T2b3a1c 13 2.48 ± 1.59 2.11 ± 0.68 2.53 ± 0.93 2.30 ± 0.84 4227 >>>>>>>T2b3a2 4.66 ± 2.57 3.94 ± 1.09 3.34 ± 1.48 2.95 ± 1.31 8602 7 >>>>>>T2c1d2 54 2.98 ± 2.44 2.53 ± 1.05 4.68 ± 1.38 4.03 ± 1.21 15784

>>>>>>>T2c1d2a 53 1.07 ± 0.73 0.91 ± 0.32 3.26 ± 1.03 2.77 ± 0.89 7679 >>>>>>T2e8 8 4.07 ± 2.24 3.45 ± 0.96 3.65 ± 1.74 3.26 ± 1.52 >>>>>>>T2e8a 7 0.49 ± 0.63 0.42 ± 0.27 1.37 ± 0.66 1.22 ± 0.61 6965G >>>>>>node16296@ 8 8.15 ± 3.29 6.82 ± 1.37 6.12 ± 1.84 5.44 ± 1.63 >>>>>>>T2i3 4 0.82 ± 1.04 0.70 ± 0.46 1.14 ± 0.62 1.03 ± 0.55 6345 9309 >>>Uf 46.53 ± 3.29 >>>>U1a1c1c 21 15.57 ± 5.14 12.75 ± 2.05 11.96 ± 2.72 10.31 ± 2.34 7403

42

3. The first peopling of Sardinia

663 11116 >>>>>U1a1c1c1 3.15 ± 1.37 2.68 ± 0.59 5.59 ± 1.69 4.81 ± 1.45 19 13656 >>>>U2'3'4'7'8'9f 45.00 ± 3.19 >>>>>U2e4a 19 4.15 ± 3.64 3.52 ± 1.55 3.89 ± 1.41 3.24 ± 1.21 1154T 15732 >>>>>>U2e4a1 18 0.49 ± 1.82 0.42 ± 0.80 2.62 ± 1.01 2.16 ± 0.87 13758 >>>>>>U3c1 6 20.66 ± 6.66 16.69 ± 2.60 13.97 ± 3.34 12.07 ± 2.86 7382 3873 4612 5261 8110 >>>>>>>U3c1a 2.65 ± 3.50 2.25 ± 1.51 2.33 ± 1.35 2.04 ± 1.19 8634 14992 5 15613@ i >>>>>K 36.20 ± 10.00 i >>>>>>K1 29.30 ± 7.70 >>>>>>>K1ai 20.10 ± 3.00 >>>>>>>>K1a2l 49 18.74 ± 3.30 15.21 ± 1.30 16.95 ± 2.42 14.54 ± 1.95 11025 >>>>>>>>>K1a2b 12 15.39 ± 3.67 12.60 ± 1.47 12.76 ± 2.38 10.87 ± 1.97 8005 >>>>>>>>>>node16278 9 10.07 ± 5.60 8.38 ± 2.31 7.34 ± 2.17 6.22 ± 1.82 16278 >>>>>>>>>>>K1a2b1 7 4.66 ± 4.04 3.94 ± 1.72 4.78 ± 1.67 4.07 ± 1.41 15787 >>>>>>>>>K1a2d 9 16.02 ± 3.89 13.10 ± 1.55 12.79 ± 2.71 10.99 ± 2.35 5301

>>>>>>>>K1a32 32 5.25 ± 17.27 4.44 ± 7.34 6.83 ± 2.42 5.89 ± 2.02 14947 195 3337 11914 >>>>>>>K1b1c 8.34 ± 2.52 6.97 ± 1.05 12.46 ± 2.79 10.67 ± 2.38 18 94

>>>>>>>>K1b1c1 4 5.00 ± 4.64 4.22 ± 1.97 2.90 ± 1.52 2.49 ± 1.30 16240T 4655 7235 7468 10084 >>>>>>>K1g 13.42 ± 5.50 11.06 ± 2.22 12.48 ± 3.13 10.74 ± 2.57 11929 16291 47 16362 709 6285 >>>>>>>>K1g1 0.08 ± 0.00 0.07 ± 0.00 3.32 ± 1.25 2.84 ± 1.06 6973 13145 46 16086 >>>>>>node152@- 16.65 ± 6.22 13.59 ± 2.47 13.96 ± 3.07 12.10 ± 2.67 709@ 152@ 709@ 17 6915 11914 >>>>>>>K2a9a 6.02 ± 2.59 5.07 ± 1.09 6.32 ± 1.88 5.44 ± 1.65 15 16399 f >>>>U5b 22.79 ± 3.59 >>>>>U5b1l 328 23.91 ± 4.69 19.15 ± 1.80 25.24 ± 3.64 21.67 ± 3.00 5656 3105 3498 6674 15777 >>>>>>U5b1i 12.98 ± 5.38 10.70 ± 2.18 13.56 ± 2.82 11.60 ± 2.44 16167 16311 16356 44 >>>>>>>U5b1i1 41 12.98 ± 11.89 10.70 ± 4.82 11.32 ± 2.69 9.69 ± 2.37 3105@ 16189 >>>>>>node16311 10 14.40 ± 5.32 11.83 ± 2.14 14.33 ± 3.00 12.39 ± 2.52 16192@ 16311 >>>>>>>U5b1j 6 3.40 ± 3.32 2.89 ± 1.43 3.54 ± 1.81 3.11 ± 1.56 5153 10876 >>>>>U5b2b5 14 13.60 ± 4.57 11.20 ± 1.84 12.59 ± 2.99 10.69 ± 2.54 13161 16296 >>>>>>U5b2b5a 11 0.90 ± 1.06 0.77 ± 0.46 3.16 ± 1.63 2.70 ± 1.41 16325 >>>>>U5b3a 103 16.56 ± 5.16 13.52 ± 2.05 16.29 ± 2.89 13.95 ± 2.43 16235 >>>>>>U5b3a1 89 12.71 ± 4.65 10.49 ± 1.89 13.21 ± 2.54 11.32 ± 2.18 16169A

43

3. The first peopling of Sardinia

11177 >>>>>>>U5b3a1a 2.65 ± 0.78 2.25 ± 0.34 8.73 ± 1.93 7.48 ± 1.59 16304@ 87 228@ 373 >>>>>>U5b3a2 11.21 ± 4.00 9.29 ± 1.63 11.22 ± 2.78 9.49 ± 2.37 10978 14 >>>>>>>U5b3a2a 4 1.48 ± 1.74 1.27 ± 0.75 1.40 ± 0.88 1.14 ± 0.73 8395 143 4936 9128 >>>>U6a8b 12.45 ± 5.98 10.28 ± 2.43 12.75 ± 3.42 10.95 ± 3.02 12 9335A 16295 >>>>>U6a8b2 0.99 ± 14.78 0.84 ± 6.46 1.50 ± 0.99 1.30 ± 0.85 4655 5 f >>>HV 21.91 ± 2.83 f >>>>H 12.85 ± 0.67 f >>>>>H1 9.89 ± 0.88 m >>>>>H1 415 9.36 ± 1.57 7.80 ± 0.65 14.09 ± 1.57 12.38 ± 1.24 >>>>>>H1cb 53 1.49 ± 0.85 1.27 ± 0.37 6.28 ± 2.07 5.26 ± 1.67 14329 >>>>>>H1ce 7 6.78 ± 2.87 5.70 ± 1.21 6.17 ± 2.32 5.47 ± 2.09 5075 >>>>>>>H1ce1 6 4.97 ± 3.28 4.20 ± 1.39 3.64 ± 1.57 3.13 ± 1.47 15618 >>>>>>H1cl 22 2.20 ± 0.75 1.87 ± 0.32 4.69 ± 1.84 3.88 ± 1.63 15217 >>>>>>H1cm 24 4.88 ± 2.95 4.12 ± 1.25 5.92 ± 2.17 5.21 ± 1.99 7543 >>>>>>H1cn 60 7.33 ± 2.55 6.15 ± 1.07 9.03 ± 2.06 7.84 ± 1.81 8603 >>>>>>H1ef 8.50 ± 2.03 11914 13938 >>>>>>>H1e1d 3.26 ± 1.95 2.77 ± 0.83 4.69 ± 1.61 3.91 ± 1.47 11 15930 >>>>>>>>H1e1d1 0.96 ± 0.85 0.82 ± 0.37 1.46 ± 0.60 1.17 ± 0.51 14180 7 >>>>>>>H1e9 8 0.44 ± 10.38 0.37 ± 4.54 2.40 ± 1.44 2.01 ± 1.26 15071 >>>>>>H1co 23 3.26 ± 1.73 2.77 ± 0.74 5.70 ± 1.74 4.72 ± 1.53 5471 7570 >>>>>>H1cp 8 4.97 ± 2.30 4.20 ± 0.97 5.21 ± 1.81 4.60 ± 1.62 6164 15346 >>>>>>H1cq 8 5.78 ± 3.51 4.87 ± 1.48 5.53 ± 2.06 4.98 ± 1.95 16239 >>>>>>>H1cq1 6 1.23 ± 1.81 1.05 ± 0.79 1.77 ± 0.89 1.51 ± 0.74 1888C 11914

>>>>>>node16311 10 6.05 ± 3.96 5.10 ± 1.67 5.87 ± 1.99 5.45 ± 1.88 >>>>>>>H1cr 8 0.52 ± 6.68 0.45 ± 2.92 1.75 ± 0.83 1.52 ± 0.72 1888 11278C >>>>>>node16319 16 7.33 ± 2.33 6.15 ± 0.97 6.66 ± 2.12 5.78 ± 1.84 16319 >>>>>>>H1cs 11 4.52 ± 2.29 3.82 ± 0.97 4.15 ± 1.53 3.38 ± 1.27 16320 9938 10042T >>>>>>>H1ct 13 1.58 ± 1.39 1.35 ± 0.60 2.85 ± 1.20 2.51 ± 1.07 12059C 16129 146

>>>>>>>H1cu 9 0.79 ± 0.96 0.67 ± 0.42 1.86 ± 0.93 1.64 ± 0.85 8282 >>>>>>>H1cv 16 3.71 ± 1.41 3.15 ± 0.60 6.67 ± 2.20 6.06 ± 2.00 1290 f >>>>>>H1b 6.24 ± 1.81 >>>>>>>H1b6 23 2.91 ± 1.19 2.47 ± 0.51 5.46 ± 1.93 4.80 ± 1.73 16042 >>>>>>>H1b7 6 1.84 ± 2.14 1.57 ± 0.93 1.95 ± 1.12 1.74 ± 0.96 6366C >>>>>>>H1b8 6 1.23 ± 1.07 1.05 ± 0.46 1.67 ± 0.91 1.48 ± 0.74 14530 >>>>>H109 8 18.46 ± 9.29 15.00 ± 3.66 10.75 ± 1.85 9.35 ± 1.54 16269 6465 9698 >>>>>>H109a 2.11 ± 11.99 1.80 ± 5.19 2.33 ± 1.13 1.97 ± 0.93 7 150 200 374

44

3. The first peopling of Sardinia

>>>>>H13a1a1 5.89 ± 1.89 >>>>>>H13a1a1f 8 6.87 ± 3.76 5.77 ± 1.58 2.60 ± 0.98 2.25 ± 0.84 1352A 15900 >>>>>H3f 8.92 ± 1.06 >>>>>H3m 409 13.67 ± 3.22 11.25 ± 1.30 13.43 ± 1.46 11.85 ± 1.21

>>>>>>H3aw 35 2.91 ± 1.51 2.47 ± 0.65 6.25 ± 2.04 5.39 ± 1.78 15289 >>>>>>H3ay 25 5.42 ± 2.96 4.57 ± 1.25 6.07 ± 2.05 5.22 ± 1.80 11200 >>>>>>H3az 5.33 ± 8.54 4.50 ± 3.62 6.29 ± 2.00 5.49 ± 1.85 16067 16291 27 >>>>>>H3ba 14 11.78 ± 3.30 9.75 ± 1.34 9.94 ± 1.95 8.81 ± 1.71 13928C 12033 14470 >>>>>>>H3ba1 12 6.42 ± 2.87 5.40 ± 1.21 5.54 ± 1.59 4.77 ± 1.56 199 >>>>>>H3bb 12 6.60 ± 4.19 5.55 ± 1.76 5.84 ± 2.21 5.17 ± 1.98 8580A >>>>>>H3f 208 11.97 ± 3.19 9.90 ± 1.30 12.22 ± 1.34 10.65 ± 1.10 93 >>>>>>>H3f1 157 1.67 ± 0.64 1.42 ± 0.28 8.05 ± 1.71 6.97 ± 1.61 15315 1019 8290 >>>>>>>H3f2 17 3.53 ± 1.30 3.00 ± 0.56 5.69 ± 1.53 4.82 ± 1.43 9033 9151 16294

>>>>>>>H3f3 31 7.70 ± 4.56 6.45 ± 1.90 8.26 ± 2.56 7.23 ± 2.19 14798 >>>>>>H3u 10.66 ± 4.30 8.85 ± 1.76 9.49 ± 1.86 8.67 ± 1.47 9966 25 10084 16240 >>>>>>>H3u2 2.37 ± 1.72 2.02 ± 0.74 3.19 ± 1.29 2.71 ± 1.10 8 16244 >>>>>>>H3u3 5.15 ± 4.49 4.35 ± 1.90 3.94 ± 1.40 3.39 ± 1.28 7571C 14 >>>>>>H3x 9.91 ± 5.40 8.25 ± 2.23 8.27 ± 2.26 7.30 ± 1.96 16368 15 3615 8937 >>>>>>>H3x1 2.73 ± 7.64 2.32 ± 3.29 3.11 ± 1.15 2.62 ± 1.06 14 11914 16293 >>>>>>H3bc 3.62 ± 7.04 3.07 ± 3.01 5.76 ± 1.77 4.93 ± 1.62 152 571 38 >>>>>H32 7 7.42 ± 5.21 6.22 ± 2.18 5.61 ± 2.06 4.93 ± 1.97 8557 73 152 f >>>>>H4a1 8.64 ± 1.56 >>>>>>H4a1b1 8 6.33 ± 3.75 5.32 ± 1.58 3.90 ± 1.41 3.18 ± 1.26 16286 >>>>>>>H4a1b1a 6 0.52 ± 0.64 0.45 ± 0.28 1.29 ± 0.57 1.11 ± 0.49 10620 16290 >>>>>>H4a1d 9 4.16 ± 2.50 3.52 ± 1.07 4.50 ± 1.84 3.64 ± 1.62 7356 7521

f >>>>HV0 13.53 ± 3.23 >>>>>HV0af 11.20 ± 2.08 12662 >>>>>>HV0a2 35 3.89 ± 4.99 3.30 ± 2.13 5.70 ± 1.89 4.71 ± 1.69 16298@ >>>>>>HV0a3 25 0.17 ± 0.21 0.15 ± 0.09 3.34 ± 1.48 2.66 ± 1.13 6635 14470

>>>>>>Vf 9.74 ± 1.36 >>>>>>>V29 40 6.96 ± 3.87 5.85 ± 1.62 7.71 ± 1.80 6.83 ± 1.75 7852 15905

>>>>>>>V30 7 10.38 ± 4.06 8.62 ± 1.67 7.56 ± 1.87 6.64 ± 1.69 11674 16320 >>>>>>>>V30a 6 7.42 ± 4.66 6.22 ± 1.95 5.04 ± 1.65 4.38 ± 1.53 437 >>>>>>>V31 9.08 ± 3.02 7.57 ± 1.25 7.91 ± 1.92 7.15 ± 1.68 14629 14 >>>>>>>>V31a 11 5.78 ± 2.75 4.87 ± 1.16 4.09 ± 1.43 3.49 ± 1.39 513 >>>>>>>>>V31a1 9 3.80 ± 2.71 3.22 ± 1.16 2.41 ± 0.91 2.05 ± 0.85 4997

45

3. The first peopling of Sardinia

3204 4008 >>>>>>>V32 3.00 ± 2.48 2.55 ± 1.07 3.05 ± 1.40 2.63 ± 1.16 9 12026 >>>>>>HV0h 4 5.78 ± 4.17 4.87 ± 1.76 3.90 ± 2.08 3.33 ± 1.87 16153 >>>>>>>HV0h1 3 2.29 ± 1.93 1.95 ± 0.83 1.52 ± 0.95 1.27 ± 0.77 8572 4560 16189 >>>>>>HV0i 5.69 ± 2.63 4.80 ± 1.11 3.79 ± 1.73 3.16 ± 1.49 4 152 aNumber of mtDNA sequences. bUsing the corrected molecular clock proposed by Soares et al. (2009). cUsing the mutation rate proposed by Posth et al. (2016). dMaximum Likelihood molecular divergence. eCoding- and control-region mutations are listed only for Sardinian-specific and their closest external branches. fML Coalescence ages are those reported in Table S5 of Behar et al. (2012a). gML Coalescence ages are those reported in Table 1 of Olivieri et al. (2013). hML Coalescence ages are those reported in Table 1 of Pala et al. (2012). iML Coalescence ages are those reported in Supplementary Data 2 of Costa et al. (2013). lFor these haplogroups, all mitogenomes available from literature were used for age calculation. mAge estimates based on our dataset of 2,092 mitogenomes together with 125 Sardinian mitogenomes publicly available are also reported for haplogroups H1 and H3.

Table 3.4. Frequencies of major mtDNA haplogroups in our Sardinian dataset. Frequencies for both the entire dataset of unrelated Sardinian mitogenomes (N=2092) and for those belonging to the Sardinian-specific haplogroups (N=1613) are reported. Haplogroup/Sub- Samples within Sardinian- All Sardinian samples (%) haplogroup specific haplogroups (%)a R0 1074 (51.34%) 820 (50.84%) HV 1073 (51.29%) 820 (50.84%) >HV0 151 (7.22%) 128 (7.94%) >>V 73 (3.49%) 62 (3.84%) >H 914 (43.69%) 692 (42.90%) >>H1 386 (18.45%) 289 (17.92%) >>H3 384 (18.36%) 373 (23.12%) >>H5 24 (1.15%) 0 (0.00%) JT 562 (26.86%) 437 (27.09%) J 365 (17.45%) 322 (19.96%) >J1 134 (6.41%) 100 (6.20%) >J2 231 (11.04%) 222 (13.76%) T 197 (9.42%) 115 (7.13%) >T1 31 (1.48%) 26 (1.61%) >T2 166 (7.93%) 89 (5.52%) U 340 (16.25%) 275 (17.05%) U5 144 (6.88%) 124 (7.69%) U1 30 (1.43%) 16 (0.99%)

46

3. The first peopling of Sardinia

K 119 (5.69%) 107 (6.63%) >K1 104 (4.97%) 95 (5.89%) >K2 15 (0.72%) 12 (0.74%) Others 116 (5.55%) 81 (5.02%) L0a 4 (0.19%) 3 (0.19%) M1 27 (1.29%) 24 (1.49%) N1b 8 (0.38%) 4 (0.25%) I 28 (1.34%) 21 (1.30%) W 14 (0.67%) 8 (0.49%) X2 33 (1.58%) 21 (1.30%) Total 2092 1613 aThe 125 Sardinian mitogenomes from published studies were not included here because they were not randomly selected.

73 4216 11467 11719 11251 12308 R0 15452A 12372 16126 14766 U JT 16192 HV 1811 295 489 709 9698 16270 10398 13368 72 7028 U5 16298 12612 14905 U8 H 15607 HV0 13708 3480 150 3010 6776 456 16069 15928 9055 3197 16294 8862T 15904 H1 H3 H5’36 J 10552 7768 9477 HV0j HV0a T 11299 5460 MA 16304 462 150 152 14798 13617 11812 2280 4580 MA MA 115 H5 3010 7476 1416 14182 H1e KC55 14233 16298@ 82 76 15257 16224 V 8512 4017 9966 4336 J1 16296 U5b J2 6311 HV0j1 MA H1e1 H3u H5a 185 228 T2 1721 112 14798 195 5633 K 13637 14902 10499 15812 930 10822 1189 MA J1c 5147 U5b2 MA H1e1a MA 11377 16193 T2c 10398 108 78 16304 4732 11653 87 MA 13934 J2a J2b 6261 K1 16189 12634 138 T2b J1c3 16292 497 5913 14133 10172 U5b2a 13630 10750 K1b 16192@ MA MA J2a1 J2b1 T2c1 K1a 85 74 T2b3 146 279 9962 U5b2b 215 319 16278 10289 5187 517T 13161 513 7789 15946 J2b1a 7873 2755 16296 16145 MA 16093 15905 16231 110 T2c1d MA U5b2b5 138 K1b1 16224 J2a1a MA MA 152 6053 MA U5b2b3 MA 13722 73 87 Sardinia (unspecified) MA 11923 8164 86 88 16261 100 Sassari 15257 K1b1b J2a1a1 16319 Oristano 16463 14063 Cagliari 14384 MA K1b1a K1b1b1 Nuoro 92 13967 MA K1b1a1 79 MA MA 81 89 Figure 3.1. Maximum-Parsimony tree of 22 ancient Sardinian mitogenomes, 21 analyzed in this study and one available from literature. Mutations are shown on the branches (relative to rCRS); they are transitions unless the base change is explicitly indicated. Suffixes indicate: transversions (to A, G, C, or T) and reversions (@). Recurrent mutations are underlined. All the samples are colored according to their geographic origin as shown in the legend. The nodes defined here for the first time are in blue. ID numbers of mitogenomes correspond to those in Table 3.1 and are colored according to their geographic origin as shown in the legend.

47

3. The first peopling of Sardinia

3.3.2. Age estimates

For all 89 SSHs and their closest upstream nodes (including a maximum number of 15 CEMs), we assessed the minimum coalescence ages with both Maximum Likelihood (ML) and BEAST (Bayesian Evolutionary Analysis Sampling Trees) computations and employing two different mutation rates: one established on modern mitogenomes, which corrects for the effect of selection and is routinely applied in phylogeographic studies (Soares et al., 2009); the other using radiocarbon dated ancient mitogenomes as tip calibration points (Posth et al., 2016) (Table 3.3 and Figure 3.2). The obtained coalescence age estimates provide for each SSH lower and upper bound time estimates, respectively, for the arrival time of its founder mitogenome on the island. In agreement with historical demographic evidence (Francalacci et al., 2013), all age estimates indicate that more than 50% of SSHs coalesce in the post- Nuragic (<2 Kya) and Nuragic (∼2-4 Kya) archaeological periods. However, not all of the remainders fall in the Neolithic-Copper Age period (∼4-7.8 Kya) (Figure 3.2). Three SSHs (corresponding to 3.1% of modern Sardinian-specific mitogenomes) showed (with all approaches) a coalescence age >7.8 Ky, the postulated archeologically-based starting time of the Neolithic in the island (Berger and Guilaine 2009). This indicates that most likely their founding haplotypes were already present on the island prior to the Neolithic (Figures 3.2 and 3.3). These haplogroups are K1a2d, N1b1a9 and U5b1i1 (Table 3.5).

Table 3.5. Maximum Likelihood (ML) and Bayesian age estimates for the three Sardinian-specific haplogroups (SSHs) whose age estimates are >7.8 Ky. BEAST age estimates ML age estimates (Ky) (Ky) Ancestral SSH Na geographic Soares et al. Posth et al. Soares et al. Posth et al. source 2009 2016 2009 2016

T ΔT T ΔT T ΔT T ΔT Western U5b1i1 41 12.98 11.89 10.70 4.82 11.32 2.69 9.69 2.37 Europe K1a2d 9 16.02 3.89 13.10 1.55 12.79 2.71 10.99 2.35 Near East N1b1a9 4 9.39 3.70 7.83 1.53 8.46 2.39 7.25 2.01 Near East aNumber of mitogenomes included in the corresponding SSH.

48

3. The first peopling of Sardinia

L0a1a4a L0a1a4aL0a1a4a M1a1b1a M1a1b1aM1a1b1a I1a1a4 I1a1a4I1a1a4 I5a2b I5a2bI5a2b N1b1a9 N1b1a9N1b1a9 W1j W1jW1j W10 W10W10 X2b3 X2b3X2b3 X2c2a X2c2aX2c2a U6a8b2 U6a8b2U6a8b2 U5b3a2a U5b3a2aU5b3a2a U5b3a1a U5b3a1aU5b3a1a U5b2b5a U5b2b5aU5b2b5a U5b1j U5b1jU5b1j U5b1i1 U5b1i1U5b1i1 K2a9a K2a9aK2a9a K1g1 K1g1K1g1 K1b1c1 K1b1c1K1b1c1 K1a32 K1a32K1a32 K1a2d K1a2dK1a2d K1a2b1 K1a2b1K1a2b1 U3c1a U3c1aU3c1a U2e4a1 U2e4a1U2e4a1 U1a1c1c1 U1a1c1c1U1a1c1c1 T2i3 T2i3T2i3 T2e8a T2e8aT2e8a T2c1d2a T2c1d2aT2c1d2a T2b3a2 T2b3a2T2b3a2 T2b3a1c T2b3a1cT2b3a1c T2b3a1b T2b3a1bT2b3a1b T2b3a1a T2b3a1aT2b3a1a T1a14 T1a14T1a14 J2b1a8 J2b1a8J2b1a8 J2b1a7 J2b1a7J2b1a7 J2b1a5 J2b1a5J2b1a5 J2b1a2c J2b1a2cJ2b1a2c J2b1a2b J2b1a2bJ2b1a2b J2a1a1f J2a1a1fJ2a1a1f J2a1a1d J2a1a1dJ2a1a1d J1c3p J1c3pJ1c3p J1c3o J1c3oJ1c3o J1c3n J1c3nJ1c3n J1c3l1 J1c3l1J1c3l1 J1c3h J1c3hJ1c3h

J1c1b1a2 J1c1b1a2J1c1b1a2

J1b10a J1b10aJ1b10a

HV0i HV0iHV0i HV0h1 HV0h1HV0h1 V32 V32V32 V31a1 V31a1V31a1

V30a V30aV30a

V29 V29V29

HV0a3 HV0a3HV0a3

HV0a2 HV0a2HV0a2

H4a1d H4a1dH4a1d

H4a1b1a H4a1b1aH4a1b1a

H32 H32H32

H3bc H3bcH3bc

H3x1 H3x1H3x1

H3u3 H3u3H3u3

H3u2 H3u2H3u2

H3f1 H3f1H3f1

H3f2 H3f2H3f2 H3f3 H3 H3f3H3f3 H3bb H3bbH3bb

H3ba1 H3ba1H3ba1

H3az H3azH3az

H3ay H3ayH3ay

H3aw H3awH3aw

H13a1a1f H13a1a1fH13a1a1f

H109a H109aH109a

H1b8 H1b8H1b8

H1b7 H1b7H1b7

H1b6 H1b6H1b6

H1cv H1cvH1cv

H1cu H1cuH1cu

H1ct H1ctH1ct

H1cs H1csH1cs

H1cr H1crH1cr H1cq1 H1 H1cq1H1cq1 A) H1cp H1cpH1cp H1co H1coH1co Method: ML H1e9 H1e9H1e9 H1e1d1 H1e1d1H1e1d1

H1cn H1cnH1cn

Rate: Soares et al. 2009 H1cm H1cmH1cm

H1cl H1clH1cl

H1ce1 H1ce1H1ce1

H1cb H1cbH1cb

180 70 50 30 20 18 16 14 12 10 8 6 4 2 0 Ky

49

3. The first peopling of Sardinia

L0a1a4a L0a1a4a L0a1a4aL0a1a4a M1a1b1a M1a1b1a M1a1b1aM1a1b1a I1a1a4 I1a1a4 I1a1a4I1a1a4

I5a2b I5a2b I5a2bI5a2b

N1b1a9 N1b1a9 N1b1a9N1b1a9

W1j W1j W1jW1j W10 W10 W10W10 X2b3 X2b3 X2b3X2b3 X2c2a X2c2a X2c2aX2c2a U6a8b2 U6a8b2 U6a8b2U6a8b2 U5b3a2a U5b3a2a U5b3a2aU5b3a2a U5b3a1a U5b3a1a U5b3a1aU5b3a1a U5b2b5a U5b2b5a U5b2b5aU5b2b5a U5b1j U5b1j U5b1jU5b1j U5b1i1 U5b1i1 U5b1i1U5b1i1 K2a9a K2a9a K2a9aK2a9a K1g1 K1g1 K1g1K1g1 K1b1c1 K1b1c1 K1b1c1K1b1c1 K1a32 K1a32 K1a32K1a32 K1a2d K1a2d K1a2dK1a2d K1a2b1 K1a2b1 K1a2b1K1a2b1 U3c1a U3c1a U3c1aU3c1a U2e4a1 U2e4a1 U2e4a1U2e4a1 U1a1c1c1 U1a1c1c1 U1a1c1c1U1a1c1c1 T2i3 T2i3 T2i3T2i3 T2e8a T2e8a T2e8aT2e8a T2c1d2a T2c1d2a T2c1d2aT2c1d2a T2b3a2 T2b3a2 T2b3a2T2b3a2 T2b3a1c T2b3a1c T2b3a1cT2b3a1c T2b3a1b T2b3a1b T2b3a1bT2b3a1b T2b3a1a T2b3a1a T2b3a1aT2b3a1a T1a14 T1a14 T1a14T1a14 J2b1a8 J2b1a8 J2b1a8J2b1a8 J2b1a7 J2b1a7 J2b1a7J2b1a7 J2b1a5 J2b1a5 J2b1a5J2b1a5 J2b1a2c J2b1a2c J2b1a2cJ2b1a2c J2b1a2b J2b1a2b J2b1a2bJ2b1a2b J2a1a1f J2a1a1f J2a1a1fJ2a1a1f J2a1a1d J2a1a1d J2a1a1dJ2a1a1d J1c3p J1c3p J1c3pJ1c3p J1c3o J1c3o J1c3oJ1c3o J1c3n J1c3n J1c3nJ1c3n J1c3l1 J1c3l1 J1c3l1J1c3l1 J1c3h J1c3h J1c3hJ1c3h J1c1b1a2 J1c1b1a2 J1c1b1a2J1c1b1a2 J1b10a J1b10a J1b10aJ1b10a HV0i HV0i HV0iHV0i HV0h1 HV0h1 HV0h1HV0h1 V32 V32 V32V32 V31a1 V31a1 V31a1V31a1 V30a V30a V30aV30a V29 V29 V29V29 HV0a3 HV0a3 HV0a3HV0a3 HV0a2 HV0a2 HV0a2HV0a2 H4a1d H4a1d H4a1dH4a1d H4a1b1a H4a1b1a H4a1b1aH4a1b1a H32 H32 H32H32 H3bc H3bc H3bcH3bc H3x1 H3x1 H3x1H3x1 H3u3 H3u3 H3u3H3u3 H3u2 H3u2 H3u2H3u2 H3f1 H3f1 H3f1H3f1 H3f2 H3f2 H3 H3f2H3f2 H3f3 H3f3 H3f3H3f3 H3bb H3bb H3bbH3bb H3ba1 H3ba1 H3ba1H3ba1 H3az H3az H3azH3az H3ay H3ay H3ayH3ay H3aw H3aw H3awH3aw H13a1a1f H13a1a1f H13a1a1fH13a1a1f H109a H109a H109aH109a H1b8 H1b8 H1b8H1b8 H1b7 H1b7 H1b7H1b7 H1b6 H1b6 H1b6H1b6 H1cv H1cv H1cvH1cv H1cu H1cu H1cuH1cu H1ct H1ct H1ctH1ct H1cs H1cs H1csH1cs H1cr H1cr H1crH1cr H1cq1 H1cq1 H1cq1H1cq1 B) H1cp H1cp H1 H1cpH1cp H1co H1co H1coH1co Method: Beast H1e9 H1e9 H1e9H1e9 H1e1d1 H1e1d1 H1e1d1H1e1d1 H1cn H1cn H1cnH1cn Rate: Soares et H1cm al. 2009 H1cm H1cmH1cm H1cl H1cl H1clH1cl H1ce1 H1ce1 H1ce1H1ce1 H1cb H1cb H1cbH1cb

180 70 50 30 20 18 16 14 12 10 8 6 4 2 0 Ky

50

3. The first peopling of Sardinia

L0a1a4a L0a1a4a M1a1b1a M1a1b1a I1a1a4 I1a1a4 I5a2b I5a2b N1b1a9 N1b1a9 W1j W1j W10 W10 X2b3 X2b3 X2c2a X2c2a U6a8b2 U6a8b2 U6a8b2 U5b3a2a U5b3a2a U5b3a2a U5b3a1a U5b3a1a U5b3a1a U5b2b5a U5b2b5a U5b2b5a U5b1j U5b1j U5b1j U5b1i1 U5b1i1 U5b1i1 K2a9a K2a9a K2a9a K1g1 K1g1 K1g1 K1b1c1 K1b1c1 K1b1c1 K1a32 K1a32 K1a32 K1a2d K1a2d K1a2d K1a2b1 K1a2b1 K1a2b1 U3c1a U3c1a U3c1a U2e4a1 U2e4a1 U2e4a1 U1a1c1c1 U1a1c1c1 U1a1c1c1 T2i3 T2i3 T2i3 T2e8a T2e8a T2e8a T2c1d2a T2c1d2a T2c1d2a T2b3a2 T2b3a2 T2b3a2 T2b3a1c T2b3a1c T2b3a1c T2b3a1b T2b3a1b T2b3a1b T2b3a1a T2b3a1a T2b3a1a T1a14 T1a14 T1a14 J2b1a8 J2b1a8 J2b1a8 J2b1a7 J2b1a7 J2b1a7 J2b1a5 J2b1a5 J2b1a5 J2b1a2c J2b1a2c J2b1a2c J2b1a2b J2b1a2b J2b1a2b J2a1a1f J2a1a1f J2a1a1f J2a1a1d J2a1a1d J2a1a1d J1c3p J1c3p J1c3p J1c3o J1c3o J1c3o J1c3n J1c3n J1c3n J1c3l1 J1c3l1 J1c3l1 J1c3h J1c3h J1c3h J1c1b1a2 J1c1b1a2 J1c1b1a2 J1b10a J1b10a J1b10a HV0i HV0i HV0h1 HV0h1 V32 V32 V31a1 V31a1 V30a V30a V29 V29 HV0a3 HV0a3 HV0a2 HV0a2 H4a1d H4a1d H4a1b1a H4a1b1a H32 H32 H3bc H3bc H3x1 H3x1 H3u3 H3u3 H3u2 H3u2 H3f1 H3f1 H3f2 H3 H3f2 H3f3 H3f3 H3bb H3bb H3ba1 H3ba1 H3az H3az H3ay H3ay H3aw H3aw H13a1a1f H13a1a1f H109a H109a H1b8 H1b8 H1b7 H1b7 H1b6 H1b6 H1cv H1cv H1cu H1cu H1ct H1ct H1cs H1cs H1cr H1cr H1cq1 H1 H1cq1 C) H1cp H1cp H1co H1co Method: ML H1e9 H1e9 H1e1d1 H1e1d1 H1cn H1cn Rate: Posth et al. 2016 H1cm H1cm H1cl H1cl H1ce1 H1ce1 H1cb H1cb

180 70 50 30 20 18 16 14 12 10 8 6 4 2 0 Ky

51

3. The first peopling of Sardinia

L0a1a4a L0a1a4aL0a1a4a L0a1a4a M1a1b1a M1a1b1aM1a1b1a M1a1b1a I1a1a4 I1a1a4I1a1a4 I1a1a4 I5a2b I5a2bI5a2b I5a2b N1b1a9 N1b1a9N1b1a9 N1b1a9 W1j W1jW1j W1j W10 W10W10 W10 X2b3 X2b3X2b3 X2b3 X2c2a X2c2aX2c2a X2c2a U6a8b2 U6a8b2U6a8b2 U6a8b2 U5b3a2a U5b3a2aU5b3a2a U5b3a2a U5b3a1a U5b3a1aU5b3a1a U5b3a1a U5b2b5a U5b2b5aU5b2b5a U5b2b5a U5b1j U5b1jU5b1j U5b1j U5b1i1 U5b1i1U5b1i1 U5b1i1 K2a9a K2a9aK2a9a K2a9a K1g1 K1g1K1g1 K1g1 K1b1c1 K1b1c1K1b1c1 K1b1c1 K1a32 K1a32K1a32 K1a32 K1a2d K1a2dK1a2d K1a2d K1a2b1 K1a2b1K1a2b1 K1a2b1 U3c1a U3c1aU3c1a U3c1a U2e4a1 U2e4a1U2e4a1 U2e4a1 U1a1c1c1 U1a1c1c1U1a1c1c1 U1a1c1c1 T2i3 T2i3T2i3 T2i3 T2e8a T2e8aT2e8a T2e8a T2c1d2a T2c1d2aT2c1d2a T2c1d2a T2b3a2 T2b3a2T2b3a2 T2b3a2 T2b3a1c T2b3a1cT2b3a1c T2b3a1c T2b3a1b T2b3a1bT2b3a1b T2b3a1b T2b3a1a T2b3a1aT2b3a1a T2b3a1a T1a14 T1a14T1a14 T1a14 J2b1a8 J2b1a8J2b1a8 J2b1a8 J2b1a7 J2b1a7J2b1a7 J2b1a7 J2b1a5 J2b1a5J2b1a5 J2b1a5 J2b1a2c J2b1a2cJ2b1a2c J2b1a2c J2b1a2b J2b1a2bJ2b1a2b J2b1a2b J2a1a1f J2a1a1fJ2a1a1f J2a1a1f J2a1a1d J2a1a1dJ2a1a1d J2a1a1d J1c3p J1c3pJ1c3p J1c3p J1c3o J1c3oJ1c3o J1c3o J1c3n J1c3nJ1c3n J1c3n J1c3l1 J1c3l1J1c3l1 J1c3l1 J1c3h J1c3hJ1c3h J1c3h J1c1b1a2 J1c1b1a2J1c1b1a2 J1c1b1a2 J1b10a J1b10aJ1b10a J1b10a HV0i HV0iHV0i HV0i HV0h1 HV0h1HV0h1 HV0h1 V32 V32V32 V32 V31a1 V31a1V31a1 V31a1 V30a V30aV30a V30a V29 V29V29 V29 HV0a3 HV0a3HV0a3 HV0a3 HV0a2 HV0a2HV0a2 HV0a2 H4a1d H4a1d H4a1dH4a1d H4a1b1a H4a1b1a H4a1b1aH4a1b1a H32 H32 H32H32 H3bc H3bc H3bcH3bc H3x1 H3x1 H3x1H3x1 H3u3 H3u3 H3u3H3u3 H3u2 H3u2 H3u2H3u2 H3f1 H3f1 H3f1H3f1 H3f2 H3f2 H3f2H3f2 H3f3 H3 H3f3 H3f3H3f3 H3bb H3bb H3bbH3bb H3ba1 H3ba1 H3ba1H3ba1 H3az H3az H3azH3az H3ay H3ay H3ayH3ay H3aw H3aw H3awH3aw H13a1a1f H13a1a1f H13a1a1fH13a1a1f H109a H109a H109aH109a H1b8 H1b8 H1b8H1b8 H1b7 H1b7 H1b7H1b7 H1b6 H1b6 H1b6H1b6 H1cv H1cv H1cvH1cv H1cu H1cu H1cuH1cu H1ct H1ct H1ctH1ct H1cs H1cs H1csH1cs H1cr H1cr H1crH1cr H1cq1 H1cq1 H1cq1H1cq1 D) H1cp H1 H1cp H1cpH1cp H1co H1co H1coH1co Method: Beast H1e9 H1e9 H1e9H1e9 H1e1d1 H1e1d1 H1e1d1H1e1d1 H1cn H1cn H1cnH1cn Rate: Posth et al. 2016 H1cm H1cm H1cmH1cm H1cl H1cl H1clH1cl H1ce1 H1ce1 H1ce1H1ce1 H1cb H1cb H1cbH1cb

180 70 50 30 20 18 16 14 12 10 8 6 4 2 0 Ky Figure 3.2. Schematic phylogeny of the 89 Sardinian-specific haplogroups. Blue dots indicate coalescence time estimates of the Sardinian-specific haplogroups, while the black dots indicate the ages of their closest upstream nodes, from which also non-Sardinian

52

3. The first peopling of Sardinia mitogenomes (Closest External Mitogenomes - CEMs) depart. Blue and black dots provide, for each SSH, lower and upper bound time estimates, respectively, for the arrival time of the founder mitogenome on the island. Age estimates in the four trees (panels A, B, C, and D) are those obtained with ML and BEAST and by employing two mutation rates (Soares et al., 2009; Posth et al., 2016). The pre-Neolithic SSHs N1b1a, K1a2, and U5b1i are in red.

N1b1a9 N1b1a9 U5b1i1 U5b1i1 K1a2d K1a2d H3bc H3bc H3x1 H3x1 H3u3 H3u3 H3u2 H3u2 H3f1 H3f1 H3f2 H3 H3 H3f2 H3f3 H3f3 Molecular clock by H3bb H3bb 2009 H3ba1 H3ba1 H3az H3az

et al. al. et H3ay H3ay H3aw H3aw H1b8 H1b8

Soares H1b7 H1b7 Posth H1b6 H1b6 H1cv H1cv

H1cu H1cu et al. 2016 H1ct H1ct H1cs H1cs H1cr ML Estimates H1cr

H1cq1 H1cq1 Molecular clock by by clock Molecular H1cp H1 Beast Estimates H1 H1cp H1co SE max (ML; Beast) H1co H1e9 H1e9 H1e1d1 H1e1d1 H1cn H1cn H1cm H1cm H1cl H1cl H1ce1 Pre-Neolithic H1ce1 H1cb H1cb Ky 0 5 10 15 20 25 4 0 50 60 50 40 25 20 15 10 5 0 Ky Figure 3.3. Specular schematic trees encompassing the three Sardinian-specific haplogroups (N1b1a9, U5b1i1, K1a2d) whose age estimates predate the Neolithic (>7.8 Kya) and the Sardinian haplogroups H1 and H3. Age estimates were calculated by employing two mutation rates, by Soares et al., 2009 (tree on the left) and by Posth et al., 2016 (tree on the right). Triangles and continuous lines indicate ML estimates. Circles and dashed lines indicate BEAST estimates. Ages are according to the (non-linear) time scale on the bottom. Colored shadings show the largest confidential intervals of age estimates.

3.4. Discussion

The analysis of the phylogenetic relationships between modern Sardinian and all publicly available worldwide mitogenomes revealed that 78.4% of Sardinians harbour ethnic-specific haplogroups. This finding might appear surprising in the European context. However, a similar - though not so extreme - situation has been

53

3. The first peopling of Sardinia described in the Basque-speaking regions of Spain, where a survey of haplogroup H mitogenomes (54.1% of the populations) identified six autochthonous sub- haplogroups encompassing 29.0% of all mtDNAs (Behar et al., 2012a). Most likely each SSH and its distinguishing mutational motif arose in situ. However, the possibility that some of these motifs arose outside the island, and, after their arrival in Sardinia, were lost in the ancestral sources, should not be overlooked. In either case, however, the coalescence age of each SSH, provides a minimum time estimate for the presence of its founder mutational motif on the island, unless two or more haplotypes within the same SSH had arrived on the island and were then all lost in the ancestral source. This is not a likely scenario, however, especially given the great diversity of some of the deep-rooting founding lineages – for example, within the predominant haplogroups H1 and H3, where lineages are very sharply partitioned between the island and the mainland. As expected, the ages for the 89 SSHs are differentially distributed across archaeological times, ranging from 16 to less than one Kya (Table 3.3). However, three SSHs (K1a2d, N1b1a9 and U5b1i1) showed a coalescence age predating the postulated archaeologically-based starting time of the Neolithic in Sardinia (>7.8 Ky), indicating that their founding haplotypes were already on the island during the Mesolithic (Table 3.5). In modern populations, the CEMs to the Sardinian-specific haplogroup U5b1i1 are those departing from the newly identified U5b1i node and are all from western Europe (1 from Germany, 2 from the UK) (Figure 3.4), the same geographic origin as the U5b1 mitogenomes (upstream of U5b1i) from ancient samples (data not shown). Therefore, the pre-Neolithic age of U5b1i1 supports a Mesolithic human presence on the island and reveals that Mesolithic Sardinians transmitted their mitogenomes to modern Sardinians. Such a matrilineal genetic component, with deep ancestral roots in Western Europe, overlaps the patrilineal source of the very frequent (38.9%) Y-chromosome haplogroup I2a1a-M26 both in terms of geography and timing (Francalacci et al., 2013). As for the other two pre-Neolithic SSHs, K1a2d and N1b1a9, they are both only one mutation away from previously defined nodes (K1a2 and N1b1a). From these nodes several other sub-branches depart (Figure 3.4), and the many members of these branches are CEMs to the Sardinian-specific branches. Most of those derived from K1a2 (K1a2a-d) encompass only European mitogenomes, but several descending directly from the root of K1a2 have been identified also in the Near East, and include ancient samples. The oldest are two K1a2 mitogenomes from Anatolian farmers radiocarbon-dated to 8.3 and 8.0 Kya (Figure 3.4). These observations indicate the Near East as the most likely ancestral source of K1a2. If so, K1a2 (female) carriers of Near Eastern ancestry arrived in Sardinia in the time frame between 18.5-14.5 Kya (ages of K1a2) and 16-11 Kya (ages of K1a2d) (Figure 3.4). A similar scenario applies for N1b1a9. Numerous other branches depart from its ancestor N1b1a and most are shared between Europeans and Near Easterners, again indicating the Near East as the likely homeland of N1b1a. This source is further supported by the geographical origin (Anatolia) of the only

54

3. The first peopling of Sardinia ancient N1b1a mitogenome (8.3 Ky) recovered so far (Figure 3.4). Thus, the (female) N1b1a carriers of Near Eastern ancestry arrived in Sardinia in the time frame between 17-11 Kya (age of N1b1a) and 9-7 Kya (age of N1b1a9) (Figure 3.4).

Figure 3.4. Schematic representation of N1b1a (panel a), K1a2 (panel b), and U5b1i (panel c) phylogenies. Sub-clades are represented by triangles, while singletons by lines. The width of triangles is proportional to the number of both modern and ancient mitogenomes, while the height to the age of the clades (Kya) estimated with ML and the molecular clock proposed by Soares et al., 2009. Colors indicate the geographical origin of samples according to the legend. Ancient samples are placed in correspondence of their radiocarbon calibrated ages. The name of Sardinian-specific haplogroups is underlined and in a purple field.

The sub-haplogroups observed in ancient Sardinian (Figure 3.1) are also present in modern Sardinians at the same (N =15) or at a very close (N = 4) level of haplogroup resolution. None of the ancient Sardinian mitogenomes clustered within a Sardinian-specific haplogroup, but four were closely related (Table 3.6).

55

3. The first peopling of Sardinia

Table 3.6. Haplogroup affiliation of the 21 ancient Sardinians analyzed in this study compared to those observed in modern Sardinians. Haplogroup Haplotype Haplogroup level of Sardinian-ancient observed in observed in resolution Haplogroup mitogenomes modern modern observed in Sardinians Sardinians modern Sardinians MA87 J2b1a no yes MA82 H1 no yes MA81 K1b1b1 no no K1b1 MA85 H1e1a no no H1e1 MA86 U5b2a no no U5b2 MA79 U5b2b3 no yes MA110 T2b3 no yes MA138 H5a no yes MA112 V no yes MA115 H3 yes yes MA78 H3u no yes MA88 U5b2b5 no yes MA108 HV0j1 no yes MA104 K1a no yes MA92 J2a1a1 no yes MA100 T2c1d no yes MA89 K1b1a1 no no K1b1 MA77 H1e1 no yes MA73 J2b1a no yes MA76 H1 no yes MA74 J1c3 no yes

A Sardinian Bell Beaker mitogenome (MA108) of ~4.3 Kya turned out to be a member of a novel branch (HV0j1: 6.6 ± 1.3 Ky), which was found in both a modern Sardinian and a continental Italian, and derives from a node (HV0j: 10.0 ± 2.1 Ky) from which two other Sardinian mitogenomes diverge (Figure 3.5). The phylogenetic age of HV0j1 is thus fully compatible with the radiocarbon dating of MA108. A similar conclusion emerges in all other cases in which an informative phylogenetic link between modern and ancient Sardinian samples was established: (i) MA78 (Early Bronze Age, ~4.0 Kya) is a direct molecular ancestor of the Sardinian-specific haplogroup H3u2 (~3.2 Kya); (ii) MA104 (Early Bronze Age,

56

3. The first peopling of Sardinia

~4.5 Kya) harbors one of the diagnostic mutations of the SSH K1a32 (~6.8 Kya); and (iii) MA88 (Early Bronze Age, ~4.2 Kya) shows the mutational motif of haplogroup U5b2b5 node (~12.6 Kya), from which the SSH U5b2b5a (~3.2 Kya) as well as other mtDNAs from Sardinia, Italy and the UK descend (Figure 3.5). Ancient DNA links extend beyond Sardinia. A Copper Age (~5 Kya) mitogenome from Northern Spain (ATP16 in Günther et al., 2015) identifies Iberia as the likely homeland of the molecular ancestor of the SSH X2c2a, and indicates that the founder mtDNA arrived in Sardinia between 8.9 ± 1.7 Kya (age of X2c2) and 5.7 ± 1.8 Kya (age of X2c2a) (Figure 3.5).

13.53 ± 3.23 Ky* 20.10 ± 3.00 Ky* HV0 K1a 8862T 9.51 ± 4.23 Ky* 195 10.21 ± 1.90 Ky X2c HV0j 6.83 ± 2.42 Ky 15314 2280 8.87 ± 1.66 Ky 16298@ K1a32 MA X2c2 8.16 ± 1.54 Ky 104 2 Early Bronze Age K1a32a 5.66 ± 1.76 Ky HV0j1 11 (3984 ± 29 ya) X2c2a ATP Bell Beaker 13 16 MA (3912 ± 26 ya) Copper Age 108 Pard’e Jossu (~5 Kya) 2 3 Bingia’ e Monti 7

Sierra di Atapuerca Su Asedazzu 9.49 ± 1.86 Ky 2 13161 2 Castiglia e León 16296 H3u 12.59 ± 2.99 Ky 16240 U5b2b5 MA Final Bronze Age 78 (3658 ± 26 ya) MA Final Bronze Age 88 (3794 ± 34 ya) 3.19 ± 1.29 Ky H3u2 3.16 ± 1.63 Ky Sardinia U5b2b5a 3 Europe North

8 Italy Spain 2 Unknown

Figure 3.5. Ancient mitogenomes harboring close phylogenetic relationships with Sardinian-specific haplogroups. Four mitogenomes are from Sardinia (MA78, MA88, MA104 and MA108) and one from Spain (ATP16). Sardinian-specific haplogroups are underlined and in a blue field. Mutations are reported with the corresponding nucleotide positions when shared between modern and ancient mitogenomes, otherwise they are only represented by black dots. Circles in gray indicate modern or ancient mitogenomes from Sardinia. When the same haplotype is shared by more than one mitogenome, the number of mitogenomes is reported (squared box). Haplogroup age estimates were calculated with BEAST by employing the mutation rate of Soares et al., 2009 or taken from literature (when indicated with an *). The radiocarbon dates of the four Sardinian mitogenomes are also indicated.

We also compared modern and ancient Sardinian mitogenomes with the mitogenome (haplogroup K1f) of the Late Neolithic/Chalcolithic Tyrolean Iceman (radiocarbon-dated to ~5.3 Kya) (Ermini et al., 2008). One Sardinian-specific haplogroup (K1g1), present in ~2.1 % of Sardinians, is indeed related to Otzi's

57

3. The first peopling of Sardinia mitogenome as well as to other mitogenomes found in modern and ancient Europeans. However, the link is extremely distant in time, at the level of a very early node (K1 + 16362), which is only one mutation away from the root of K1 and is dated ~23.5 Ky (Figure 3.6).

29.30 ± 7.70 Ky* K1 16362 K1+16362 23.45 ± 4.56 Ky

3513 8137

K1f 3.38 ± 1.72 Ky 14.65 ± 3.15 Ky 9.80 ± 3.49 Ky Ötzi_EU K1d K1e 810403 K1g Late Neolithic/Chalcolithic Tyrolean Iceman JQ70 (~5 Kya) 4724

JX15 1.97 ± 0.95 Ky GOK HQ JQ70 JQ70 2800 5 U3 5538 JQ70 K1g1 2852 KF16 Sweden Germany JX29 5063 1973 Neolithic Farmer Late Neolithic 3716 (~5 Kya) (~ 5 Kya) KT74 EU07 9809 3969 37 KF16 1935 JQ70 JQ70 KF16 2883 2254 Sardinia 1990 Europe North 4 KF16 Italy 2487 unknown Figure 3.6. Schematic representation of the phylogenetic relationships between the mitogenome of the Late Neolithic/Chalcolithic Tyrolean Iceman (haplogroup K1f) and the Sardinian-specific haplogroup K1g. The only two other sub-clades of K1 that harbor the 16362 mutation (K1d and K1e) are also shown. The diagnostic mutations of K1f are reported with the corresponding nucleotide positions, while the others are represented by black dots. Circles in grey indicate modern mitogenomes from Sardinia. For modern Sardinians, when the same haplotype is shared by more than one mitogenome, the number of mitogenomes is reported (squared box). Haplogroup age estimates were calculated with BEAST by employing the mutation rate of Soares et al., 2009, while K1 age (indicated with an *) is from Behar et al., 2012a.

The three pre-Neolithic SSHs (K1a2d, N1b1a9 and U5b1i1) comprise only 3.1% of modern Sardinians, but the genetic legacy of Mesolithic Sardinians could be much higher. Indeed, a large fraction of the SSHs within H1 and H3 (Figure 3.2), the two most common haplogroups in modern Sardinians (18.5% and 18.4%, respectively; Table 3.4) are only one (sometimes fast-evolving) mutation away

58

3. The first peopling of Sardinia from the H1 and H3 founding nodes and/or have CEMs departing from the H1 and H3 nodes. Therefore, their estimated coalescence ages represent upper bounds for the presence of H1 and H3 mitogenomes in the island. As shown in Figure 3.3, the ages of H1 and H3 leave open the possibility that both were present in Sardinia prior to the Neolithic. Notably, the frequency of H3 in Sardinia (18.4%) is the highest reported till now, and haplogroup H3 harbours a very peculiar geographical distribution. The highest frequencies are in western Mediterranean (Sardinians, Basques and other Iberians), with a sharp decrease towards Central and Eastern Europe and only very few occurrences in the Near East (Figure 3.7), which founder analyses explain as recent incursions. Given that the population size trends for the Sardinian H3 mtDNAs indicate an expansion beginning between 9.0 and 10.5 Kya (Figure 3.8), it is tempting to link such an expansion to a pre-Neolithic arrival and diffusion of H3 on the island, most likely from a Western Mediterranean source, as previously suggested (Achilli et al., 2004; Torroni et al., 2006; Soares et al., 2010), possibly the same ancestral source of the ancestors of U5b1i1 and Y-chromosome haplogroup I2a1a-M26 (Francalacci et al., 2015).

Figure 3.7. Spatial frequency (%) distribution map of haplogroup H3. Dots indicate the geographic locations of the surveyed populations. We constructed spatial frequency distribution plots with the program Surfer 9 (Golden Software, http://www.goldensoftware.com/products/surfer).

59

3. The first peopling of Sardinia

1000000

100000 Size

10000

1000 Population 100

10 0 2 4 6 8 10 12

Effective Kya

Figure 3.8. Bayesian skyline plot (BSP) showing effective population size trends of Sardinian H3 mtDNAs. The black and white lines are the median estimates obtained by employing the mutation rates proposed by Soares et al., 2009 and Posth et al., 2016, respectively; the grey shading shows the highest posterior density limits.

3.5. Conclusion

The analysis of the a very large dataset of modern and ancient Sardinian mitogenomes and the comparison with all the published worldwide modern and ancient mtDNA complete sequences brought to light new clues about the first peopling of Sardinia and probably the history of the entire European continent. Indeed, the pre-Neolithic age of U5b1i1 not only supports a Mesolithic human presence on the island and but also reveals that Mesolithic Sardinians contributed genetically to modern Sardinians. However, the coalescence ages of the haplogroups K1a2d and N1b1a19 and the geographical distribution of their CEMs suggest that pre-Neolithic Sardinians not only harbored mtDNAs with deep genetic roots in Paleolithic Western Europe - such as those belonging U5b1i1- but also haplogroups of Near Eastern ancestry, which have often been assumed to have spread from Anatolia only with the advent of the Neolithic. In addition, if confirmed, the pre-Neolithic presence of H3 (and possibly also of H1 and other lineages, for example within JT) on the island would indicate a major genetic legacy of Mesolithic Sardinians to the modern people of Sardinia. However, it is also important to realize that even if H3 (and H1) arrived in Sardinia only with the Neolithic, they most likely came from either Spain or elsewhere in the western Mediterranean, and not from the Near East. Therefore, they are the result of autochthonous west Mediterranean Mesolithic acculturation in the wider European context.

60

4. Mitogenome variation in Ecuador and Peru

4. Mitogenome variation in Ecuador and Peru

4.1. Background

South America is the last major geographic area reached by modern humans. Archaeological evidence indicates the presence of humans as far as southern Chile by 14.6 thousands years ago (Kya) (Dillehay et al., 2008). Taking into account that the first entry into North America from Beringia occurred 16 Kya, this implies that the human diffusion from the North to the Southern Cone was extremely rapid (Bodner et al., 2012). How this migration event occurred is still unclear and two possible scenarios have been proposed. The first is that the population groups expanding from Central America progressed on the costal (or continental) side of the Andes and later crossed the Cordillera at different latitudes (Hubbe et al., 2010; Bodner et al., 2012). A second hypothesis is a random split of the source population(s) in the northern area of South America, resulting in separated costal and continental population groups. Several genetic, craniometric and linguistic studies support the scenario of an early split into an Andean population and a smaller continental one, with limited genetic exchange. As for mitochondrial DNA (mtDNA), the majority of studies carried on so far at the level of entire mitogenomes have focused on geographically restricted, sometimes rare, clades and only a few South American-specific mitochondrial DNA haplogroups (i.e. C1b13, B2i2, D1g and D1j) have been identified, all typical of the Southern Cone (Bodner et al., 2012; de Saint Pierre et al., 2012). However, it is reasonable to expect that regional-specific branches characterize also the other areas of the subcontinent. In the last two years the sequencing of ancient South American mitogenomes from Early/Middle Holocene (Fehren-Schmitz et al., 2015) as well as pre-Columbian (Gòmez-Carballa et al., 2015; Llamas et al., 2016) human remains is providing new clues on the peopling process. Indeed, their radiocarbon dates might provide tip calibration points in support to the phylogenetic analyses based on modern sequences. Ecuador and Peru are two geographic areas of particular interest because of their location along the Pacific coastal route. Similarly to the majority of South America, they have a complex demographic history characterized by migration and admixture events, in pre- and post-colonial times, including Native American settlements, European colonization and the African slave trade (Homburger et al., 2015). As a consequence, different ethnic groups coexist in modern populations living in these areas. Among them, the Mestizos and the indigenous Amerindians are the most numerous. The formers are an admixed population group derived from Europeans (mostly Spanish) and Native Americans (Gonzalez-Andrade et al., 2007), and represents 72% of Ecuadorean and 37% of Peruvian populations (https://www.cia.gov/). The Amerindians (or Native Americans) comprise different

61

4. Mitogenome variation in Ecuador and Peru multiethnic and multicultural groups that retained their culture, language and distinct identity and are ~7% in Ecuador and 45% in Peru (https://www.cia.gov/). In this chapter, I have extended the current knowledge of the Southern American mitogenome sequence variation to Ecuador and Peru, with the aim to better define the ancestral genetic sources, the times and the routes that brought to the first peopling of South America.

4.2. The sample

To survey mitogenome variation in the Andean area, DNA was sampled from 213 Ecuadoreans (92 Native Americans and 121 Mestizos), representatives of all the major regions of Ecuador (Table 4.1), and seven (all Mestizos) Peruvian individuals (Table 4.2). For all individuals an appropriate written informed consent was obtained, with protocols approved by the Ethic Committee for Clinical Experimentation of the University of Pavia, Board minutes of the 11th of April 2013.

Table 4.1. Geographic origin of the 213 Ecuadorean samples and their subdivision into regions and provinces. N per % per Regions Provinces Sample IDsa province region

North Carchi #010, #061, #062, #063 4 3.76 Ecuador Imbabura #023, #070, #279, N108 4

#009, #011, #012, #017, #038, #039, #051, #069, #173, #208, #276, #278, North- #282, #289, #290, #291, #299, #301, Central Pichincha 33 15.49 #302, #317, #320, #332, #349, #424, Ecuador #459, #460, #466, #484, #486, #487, #575, #576, N124

#008, #014, #021, #050, #071, #129, Chimborazo 11 #307, #312, #425, #480, #521 Central #007, #074, #131, #144, #152, #237, 14.08 Ecuador Cotopaxi #277, #313, #319, #426, #462, #488, 14 #567, #579 Tungurahua #048, #049, #073, #413, #532 5 Azuay #127, #519 2 South- Central Cañar #184 1 2.35 Ecuador Morona-Santiago #023, #398 2 El Oro #052, #165 2 37.56

62

4. Mitogenome variation in Ecuador and Peru

#066, #128, #132, #133, #134, #135,

#136, #137, #139, #140, #142, #146,

#158, #160, #167, #168, #170, #171, South #172, #174, #175, #176, #178, #179, #181, #182, #183, #280, #285, #286, Ecuador #288, #292, #293, #294, #303, #304, Loja #306, #314, #315, #318, #324, #325, 78 #326, #327, #329, #331, #335, #338, #340, #341, #345, #346, #353, #354, #440, #441, #442, #444, #445, #446, #447, #448, #449, #450, #451, #453, #457, #465, #482, #489, #523, #538, #574, #578, #581, #586, #587, E405

Litoral Bolìvar #019, #295, #462 3 2.35 region Los Rìos #040, #370 2 Manabì #068 1 Pacific Santo Domingo 0.94 region #016 1 de los Tsàchilas

#015, #018, #037, #047, #067, #130, #138, #141, #143, #159, #161, #163, #166, #169, #177, #180, #185, #209, #283, #284, #287, #296, #298, #308, / Not Determined #309, #310, #311, #316, #330, #336, 50 23.47 #337, #339, #352, #443, #452, #454, #463, #470, #491, #522, #566, #580, #583, #584, #588, #589, E904, E916, E940, E1006 aSample IDs are those reported in the phylogenetic trees of Figure 4.1, except for the seven mitogenomes (in bold) classified into European or African mtDNA haplogroups.

Table 4.2. Ethnic origin of the 213 Ecuadorean samples and the seven individuals from Peru (underlined). Individuals are classified as either ‘Native Americans’ or ‘Mestizos’. For Native American individuals, when available, the ethnicity is specified. Ethnicity Sample IDsa N Authoctonous ethnic groups (‘Native Americans’) >Cañari #184 1 >Cayambi #063, #276, #460 3 >Otavalo #023 1 >Panzaleo (Quichua) #048, #152, #131, #567, #579 5 >Puruhà #312, #425, #426 3 >Quichua #068, #295 2 >Quitu-Cara/Cayambi #320, #466 2

63

4. Mitogenome variation in Ecuador and Peru

>Quitu-Cara #484 1 >Salasaca #040 1 >Saraguro #346, #440, #441 3 >Shuar #455 1 >Tsàchila #016 1

#127, #132, #133, #134, #135, #136, #137, #138, #140, #141, #142, #144, #146, #159, #160, #161, #167, #168, #169, #171, #172, #176, #177, #178, #179, #180, #181, #182, #280, #283, #284, #286, #287, #292, #293, #294, >Other ‘Natives’ 68 #303, #304, #308, #309, #310, #311, #314, #315, #316, #324, #327, #335, #337, #338, #339, #341, #345, #352, #353, #442, #443, #444, #445, #446, #447, #448, #449, #450, #451, #452, #463, #522

#007, #008, #009, #010, #011, #012, #014, #015, #017, #018, #019, #021, #037, #038, #039, #047, #049, #050, #051, #052, #061, #062, #066, #067, #069, #070, #071, #073, #074, #128, #129, #130, #139, #143, #158, #163, #165, #166, #170, #173, #174, #175, #183, #185, #191, #208, #209, #237, #256, #259, #277, #278, #279, #282, #285, #288, #289, #290, #291, #296, #298, #299, #301, ‘Mestizos’ #302, #305, #306, #307, #313, #317, #318, #319, #325, 128 #326, #329, #330, #331, #332, #336, #340, #349, #354, #370, #398, #413, #424, #453, #454, #457, #459, #461, #462, #465, #470, #480, #482, #486, #487, #488, #489, #491, #500, #519, #521, #523, #532, #538, #566, #574, #575, #576, #578, #580, #581, #583, #584, #586, #587, #588, #589, #595, #620, E0405, E0904, E0916, E0940, E1006, N108, N124 aSample IDs are those reported in the phylogenetic trees of Figure 4.1, except for the seven individuals (in bold) whose mtDNAs were classified into European or African mtDNA haplogroups.

Candidate Native American mtDNAs were identified and selected by a preliminary survey of the control-region sequence variation. This initial screening revealed that all the mtDNAs were members of the founding pan-American haplogroups (i.e. A2, B2, C1b, C1c, C1d, D1 and D4h3a) (Perego et al., 2010) with the exception of seven from Ecuador (all from Mestizos) belonging to the Old World haplogroups R0a (two subjects), U5b3f (one subject), L2a1 (one subject) and L3e2b (three subjects), for a total of 3.3% of the Ecuadorean sample (in bold in Tables 4.1 and 4.2). The 213 mtDNAs (206 from Ecuador and seven from Peru) of Native American ancestry underwent sequencing of the entire mitogenome and were included in the subsequent phylogeographic analyses. Among these, 145 (138 Ecuadorean and seven Peruvian) were sequenced by Next Generation Sequencing (NGS) with an Illumina MiSeq®, while for 68 (all from Ecuador) the mitogenome variation was determined by Sanger sequencing. An additional dataset of 362

64

4. Mitogenome variation in Ecuador and Peru previously published modern mitogenomes (10 from Ecuador and 352 from Peru) (Tamm et al., 2007; Perego et al., 2009; Perego et al., 2010, Tito et al., direct submission 2012; Cardoso et al., 2012; Greenspan, direct submission 2015; HGDP) and 68 ancient mitogenomes from Peru (Fehren-Schmitz et al., 2015; Llamas et al., 2016) were also included in the study.

4.3. Results

4.3.1. The mitogenome phylogeny

The final dataset of 223 (modern) Ecuadorean and 420 (68 ancient and 352 modern) Peruvian mitogenomes was used to build five maximum-parsimony trees, one each for macro-haplogroups A2, B2 without B2b, C1 (including C1b, C1c and C1d), and D4 (including D4h3a and D1), and one for haplogroup B2b that was found to be extremely frequent (25%) in our sample (Figure 4.1). These phylogenetic trees allowed us to identify 50 new haplogroups/sub-haplogroups (in blue in Figure 4.1) that were named following the nomenclature of the PhyloTree database build 17 (at http://www.phylotree.org/) (van Oven and Kayser, 2009). Note that a haplogroup/sub-haplogroup was newly defined only when encompassing a minimum of three different haplotypes, sharing at least one stable mutation. In some cases, the presence of new samples interrupting the mutational motif at the node of a haplogroup (i.e. A2y, A2z and B2o1), forced us to redefine the nomenclature of the branch from that previously in use (Table 4.5). All the pan-American haplogroups are represented in our phylogeny. To determine haplogroup frequencies, only modern mitogenomes from random surveys of modern populations were included (Table 4.3 and Figure 4.2). This led to the exclusion of 13 A2 (Tamm et al., 2007; Cardoso et al., 2012), one B2 (Tamm et al., 2007), one C1b (Greenspan, direct submssion 2015), nine C1d (Perego et al., 2010) and 12 D4h3a (Perego et al., 2009) mitogenomes. In detail, haplogroup B2 was found to be the most frequent, encompassing alone almost half of the overall sample (55% in Ecuador and 44% in Peru). Haplogroup A2 includes 17% and 20% of Ecuadorean and Peruvian mitogenomes, respectively. Haplogroup C1, as a whole, is equally distributed in both geographic areas, encompassing about 20% of the mtDNAs. The majority of the C1 mitogenomes belong to C1b in both populations, while C1c is rather uncommon (only 1%). Sub-haplogroup C1d1, instead, is present at a frequency of 3% in Ecuador and is totally absent in Peru. Haplogroup D4h3a is rare but it is found at a frequency of about 1% in both geographic areas. Finally, the frequency of haplogroup D1 in Peru is more than two-fold that observed in Ecuador (15% vs 6%, respectively). Thus, despite some frequency differences, the haplogroup composition (at the level of pan-American haplogroups) is rather similar in Ecuador and Peru.

65

4. Mitogenome variation in Ecuador and Peru

A

Ancient Peru

Peru

Ecuador

H

R

73 73

H2

R0

8860 8860

HV

2706 2706 7028

4769

1438

H2a

15326

11719

14766

263 750 750 263

rCRS

71

1008 5104

551

4087 4087

11009 15747 16448

64

70

@

5824 5824 6023

16327 16327

6284 6284 8812

127

68

794 794

69

@64

5147

9055 9055

10873

@16290

215

8347 8347

13651 13651 15553 15703

8602 8602 9095

10780 10780 11177 13614 13905 15106

126

67

4123 4123

10045 10045 10373 16093 16129 16354

16223 12705 16223

499

4310 6249

7842

6253 6253

124

125

66

15229 15229 16129

A2am

@235

12406 12406 13708 14861 16189

9719A

15314

125 125 127

153

122

4562

123

961 961

65

A2ar

4541 4541 4547 5249

2387 2387 4820

11025 11025 16092 16172 16189 16190

@

13806 13806 16527

11206A 11206A

121

6911

64

@64 @64

12414 12414 12582 12582 13713 16129 16291

13879A

675

3354 3354

119

13966

14800A

63

2010 2010 9699

120

15098 15098 16287 16311

16145

4994 8577 8577

N

14053 14053 14064 14687 16189

A2aw

139 139 194

153 4512 9389

61

118

62

14812 14812

4801

@64 @64

4772G

@

9058 9058

60

117

7278 7278 8440

16474T

13145 13145

58

212 212 215

14398 14398 13606 13606 16217

A2as1

8400 8400

116

59

16092

10238

9476

A2z2

4655

A2as

57 57

A2z

3744

56

115

57

264 264

10289 10289 12242 16233

16129

4625G 4625G

@16223

8292 8292

16129

152 214

114

@64

2836 6632

55

A2z1

16126

16083 16083 16256

@3744 @3744

4375 4375

16242A

16319

-

54

113

53

13416 13416 16218

14956 14956 16311

153

111

8987

@

68 68

16111

7444 8710

52 112

8794 16290 16290 8794

16274 16274 16278

@

235 663 1736 4248 4824 4248 1736 663 235

16092

14587

63 65

51

110

151 151

3547 8540 9468

14861

10292 12616 15930

13474

A2au

297 297

50

109

3397 3397 6914

4580 4580

16086 16086 16129

16293C

4122 4122

15043

10586

A2av

48

A2av1

@146

9012 4343 4343 8618

8870 8870

108

49 114G 114G

A2av1a

12351 14839 15031

10018 16266A

6094G 6094G

9967

107

47

2140 2140 5978

16360 16090 16090 16218

198 198 485

106

3140 3140 9266

46

1377 1377 5628

16362

64

A

A2

-

6956 6956

10577 10577 10673

16111

105

45

5147 7961

44

@16223

10128 10128 14551 15519 16391

@16111 @16111

@146 153 8027 12007 12007 8027 153 @146

-

60 60

152

41

104

239

43

42

3202

11092 11092 11107 16311

A2k

16259

-

-

183 183

103

3915 3915

101

36

40

13135

35

4820

10915

15924

A2k1

62 62

709 709

10887 10887 15344

39

100

16305

1888 5894

16192

@3202

-

99

195

38

1719 1719 2120 4129 4511

37

11016 11016

16132 13665 13665 14449

12280 12280 13681 14020 15077

13722T 13722T

-

98

33

150

24

34

6378 6378

5910 5910 6641

13092 13092 15891

16474T

16127C 16127C

@16111

-

97

96

23

8260

16304

14668 15262

A2y1a

3786

64

95

1189 1189 5222

@

22 5765 5765 7673 8738

5253

12127 12127 16221

84471

5093

A2at1

A2y1

-

59 59

94

92 143 143

195

21

2370 2370

8010 8010 8879 9424

189 207

@207 @207

16526

14881 14881 16189 16304

10083T

A2y

16293

@16319

64

91

15289

20

A2at

16111

@

16129 16189

3316 3669 5460 7229

11290 11290 12172

7001T 7001T

@

90

53 53

19

7960 7960 4991 4991 5915 8251

6734

11914 11914 16104

15113 16261

89

18

4216 4216

10143 10143 13434 15565 15784

3993 3993

10005 10005 16209

88

17

189

8848 8848

9053 9053

16129 16244

12351 12634 12634 16189 16325

279

15884

64

A2ac2

215

87

16

@

7199A

8387 8757

11224 11224 16129

82

16311

86

85

84

150

16111

15

3340 3340 8676

14443

16292

16093

14290

@

83

14

5585 5585 9163

12079 12079 16189

51 51

81

4596 4596

13

1503 1503

13584 13584 15954 16189

15074 15074 16213

14007 14007 14110 14614 16356

A2ac 14112A 14112A

@16362

80

200 200

93

12

11

125 127

11485 11485 14929 16399

1760 8473

3565 3565 8440

13934 16126 16320

-

78

76

79

384 384

10

4011 4011

1413 1413 4025 9755

11653 11653 13731

15586

13902 13902 14285 16214

9377

9

499

75

7094

179

@64

16192

385G 385G

16218

7

249

-

4

8

7

73

480 480

8480

16291

@153 @153

12927 12927 15884

6

72

152

3892

16390 16274

3368 3368

@641 @641

16097 16097 16098

16189 16189 14757 14757 16320

A2ah

5

215

8654 8654 9531

16189

4

A2aa

16266

-

1

16239A 16239A

2

6351 7673 11548

66

4. Mitogenome variation in Ecuador and Peru

Ecuador Ancient Peru

B Peru

H

73 73

R

263 263 750

H2

R0

HV

4769

2706 2706 7028

1438

H2a

11719

14766

rCRS

275

8697 8697

16362

2280A 2280A

208

3395 3395 9254

-

274

272

61 62

16242 16324

199 269

5054 5054 8389

207

11440 11440 13635 16239

2731G 2731G

146

B2ag

6284 6284 8841

271

14148 14148 16344

3392C 3392C

152

6351 6351 7269

206

8289d

16265 16265 16390

5322C 5322C

-

16357

@16357 @16357 3483

10166

143

8407

143 954

270

2124 2124 4136

14788

236 236 279

205

15671

6722 6722

16311

215

4181

B2ag1

B2ac

-

269

3978 3978 6680 6917

@146 @146

13194 13194 16086 16454 16455

64 64

16183C 8281 16183C

13909T 13909T

198

204

3531 3531 5493 9497 9545

202

12561

6345 6345 8602

268

11545 11545 16075 16274

16189

5295A 5295A 8720C

198

201

10214

- @

199

4038 4038

11119 11119 12235 16083

266

B2ae

265

267

3645 3645 6839 8691

15483

16362

150

200

152

709

B2y2

16319

73C

73

264 3593 5460 8278

97 97

227

@

7754

197

196 6260 6260

198

75 75

14185

B2ad

262

-

207

65G 65G

1555 1555 1719 5442

263

10595 10595 16168 16212

1627A 1627A

195

194

2413 2413 3523 6480 9186

12130 12130 12904 14053

6026C 6026C

234

16390

3907 3907

261

14581 14581 16168

16261

B2y

193

12192 12192 14233 14364 16304

146

8962 8962

260

15244 15244 16145

513 513

192

3591 3591 7109

15610

B

64

59

63

259

4928 9932

191

11335 11335 15289 16129

19

709

8643

228T

258

150

190

146

257

189

5021 5021

6956 6956

11575 11575 12172 16428

10373 10373 12696

16301A

195 195

256

15314 16179 16443

188 16066

8290

3316

255

9254

187

13594 13594 15932

499 6570T

827

6278

3306 3306

1

254

146 152 188 207

186

11453 11453 11908 13945 15884 16221

4820

9531

253

16311

184

185

3357 3357 8897 9139

@263

16437

5C

2755 2755

252

183

9540

12191

251

182

B2l

11084 11084 16287

B2l1

16438

16051

16422

16217 15535 13590 13590 15535 16217

16257

B2l1a

185

-

181

250

179 8343

16102

176

11383 13500

16302

B2l1a1

180

249

16195

16111

16194C 16194C

-

173

9284

238

175

248

239

15670

@152

174

16168

228

9716

247

172

3547

-

152

4047

B2

5051 7786

B4

B2q

16051

11177 11177

B2q1

6815 6815 8277 8279 9438

279

246

171

16172 16357

8280C 8280C

170

4944

9950 6473 6473 9950 4977

167

15900

9965 9965

-

245

4977 8950

166

12616

15989

3504 3504

162

161

11087 11087 14392

16360

158

B2q1a

310

165

244

15400

242

164

243

2880 2880

16362

12630 12630 16075

13934

@4047 @4047 @5051

51 51 57

152

241

10084

163

14605

11770

234

2951 2951 4586

240 16292

@73 @73

6182 6182 8853

157

15883 15883 16172

8308T 8308T

@16189

632

237

143

8194

103 103

156

11569 11569 16290 16381 16390

9894

15884

236

5628 5628 8736 9242

5186 5186 5375 5895

155

210 210

16186

181 181

150

6293 6293

235

1822 1822 3375

B2ah

154

16295 16295

@143

16320

42

1291 7521

234

5237 7226

153

16066

16043T 16043T

709

7340 7340 9139

233

152 182

152

5237 9060 9142 9210

15401

-

146

232

3732 3732 4317

8871 8871

230

446C 446C

151

13701

13708 13708 16172

16330A

-

B2ab1a1

207

229

8673 8673

228

15098 15098 16391

150

16290

B2ab1a

16051

215

3826 3826

212 212

@204 @204

13443 13443 16311

2581 2581 3696 7664

227

15942 15942 16311

149

4314A 4314A

-

150

152

13145 13145

226

4924 4924

148

147 12879 12879 14560

@16189

203 203

204

6056

16129

11353 11353 14053

B2ab

310

B2ab1

5033 5033 9082

1811 1811

10658 10658 15928

225

14129

146

16270A

3777 3777 9027

16218

224

3786 3786 8170

145

15777

152

-

146

94 94

223

140 140

222

1307 1307 7533

144

4245 4245 8933 9995

15571

12657 12657 13812 16301 16357

6267 6267

11455

221

143

152 152

6755

16153

152

220

5815

13813

3483

142

10124

1393 1393 5134

14110

247 247

B2z

14041 14587

219

16140

-

141

2349

14997A

134

132

16362

B2z1a

209 209

218

6663

10790

140

-

199

130

217

B2z1

1654 6353 9128

139

13014 13014 13350

135

16354

5277

216

174 174 204

5237

131

16188

6929C

B2aa1a

195

215

514 737

128

129

3394 3394 6278

2223 2223

B2o

15080 15080 15978 16017 16157 16274 16294

14999

16145

16092

B2o1

213

16330

214

5082

2857

15924

152 393

B2aa

212

B2aa1

B2b

Panel C Panel

* *

8022 8022

211

11242 11242 16526

12684

16129C 16129C

210

11026 11026 16325

310

3027 6510 8994 209 16093

67

4. Mitogenome variation in Ecuador and Peru

C

68

4. Mitogenome variation in Ecuador and Peru

D

Ecuador Ancient Peru

Peru

H

73 73

R

263 263 750

H2

R0

HV

4769

2706 2706 7028

1438

H2a

11719

14766

rCRS

275

8697 8697

16362

2280A 2280A

208

3395 3395 9254

-

274

272

61 62

16242 16324

199 269

5054 5054 8389

207

11440 11440 13635 16239

2731G 2731G

146

B2ag

6284 6284 8841

271

14148 14148 16344

3392C 3392C

152

6351 6351 7269

206

8289d

16265 16265 16390

5322C 5322C

-

16357

@16357 @16357 3483

10166

143

8407

143 954

270

2124 2124 4136

14788

236 236 279

205

15671

6722 6722

16311

215

4181

B2ag1

B2ac

-

269

3978 3978 6680 6917

@146 @146

13194 13194 16086 16454 16455

64 64

16183C 8281 16183C

13909T 13909T

198

204

3531 3531 5493 9497 9545

202

12561

6345 6345 8602

268

11545 11545 16075 16274

16189

5295A 5295A 8720C

198

201

10214

- @

199

4038 4038

11119 11119 12235 16083

266

B2ae

265

267

3645 3645 6839 8691

15483

16362

150

200

152

709

B2y2

16319

73C

73

264 3593 5460 8278

97 97

227

@

7754

197

196 6260 6260

198

75 75

14185

B2ad

262

-

207

65G 65G

1555 1555 1719 5442

263

10595 10595 16168 16212

1627A 1627A

195

194

2413 2413 3523 6480 9186

12130 12130 12904 14053

6026C 6026C

234

16390

3907 3907

261

14581 14581 16168

16261

B2y

193

12192 12192 14233 14364 16304

146

8962 8962

260

15244 15244 16145

513 513

192

3591 3591 7109

15610

B

64

59

63

259

4928 9932

191

11335 11335 15289 16129

19

709

8643

228T

258

150

190

146

257

189

5021 5021

6956 6956

11575 11575 12172 16428

10373 10373 12696

16301A

195 195

256

15314 16179 16443

188 16066

8290

3316

255

9254

187

13594 13594 15932

499 6570T

827

6278

3306 3306

1

254

146 152 188 207

186

11453 11453 11908 13945 15884 16221

4820

9531

253

16311

184

185

3357 3357 8897 9139

@263

16437

5C

2755 2755

252

183

9540

12191

251

182

B2l

11084 11084 16287

B2l1

16438

16051

16422

16217 15535 13590 13590 15535 16217

16257

B2l1a

185

-

181

250

179 8343

16102

176

11383 13500

16302

B2l1a1

180

249

16195

16111

16194C 16194C

-

173

9284

238

175

248

239

15670

@152

174

16168

228

9716

247

172

3547

-

152

4047

B2

5051 7786

B4

B2q

16051

11177 11177

B2q1

6815 6815 8277 8279 9438

279

246

171

16172 16357

8280C 8280C

170

4944

9950 6473 6473 9950 4977

167

15900

9965 9965

-

245

4977 8950

166

12616

15989

3504 3504

162

161

11087 11087 14392

16360

158

B2q1a

310

165

244

15400

242

164

243

2880 2880

16362

12630 12630 16075

13934

@4047 @4047 @5051

51 51 57

152

241

10084

163

14605

11770

234

2951 2951 4586

240 16292

@73 @73

6182 6182 8853

157

15883 15883 16172

8308T 8308T

@16189

632

237

143

8194

103 103

156

11569 11569 16290 16381 16390

9894

15884

236

5628 5628 8736 9242

5186 5186 5375 5895

155

210 210

16186

181 181

150

6293 6293

235

1822 1822 3375

B2ah

154

16295 16295

@143

16320

42

1291 7521

234

5237 7226

153

16066

16043T 16043T

709

7340 7340 9139

233

152 182

152

5237 9060 9142 9210

15401

-

146

232

3732 3732 4317

8871 8871

230

446C 446C

151

13701

13708 13708 16172

16330A

-

B2ab1a1

207

229

8673 8673

228

15098 15098 16391

150

16290

B2ab1a

16051

215

3826 3826

212 212

@204 @204

13443 13443 16311

2581 2581 3696 7664

227

15942 15942 16311

149

4314A 4314A

-

150

152

13145 13145

226

4924 4924

148

147 12879 12879 14560

@16189

203 203

204

6056

16129

11353 11353 14053

B2ab

310

B2ab1

5033 5033 9082

1811 1811

10658 10658 15928

225

14129

146

16270A

3777 3777 9027

16218

224

3786 3786 8170

145

15777

152

-

146

94 94

223

140 140

222

1307 1307 7533

144

4245 4245 8933 9995

15571

12657 12657 13812 16301 16357

6267 6267

11455

221

143

152 152

6755

16153

152

220

5815

13813

3483

142

10124

1393 1393 5134

14110

247 247

B2z

14041 14587

219

16140

-

141

2349

14997A

134

132

16362

B2z1a

209 209

218

6663

10790

140

-

199

130

217

B2z1

1654 6353 9128

139

13014 13014 13350

135

16354

5277

216

174 174 204

5237

131

16188

6929C

B2aa1a

195

215

514 737

128

129

3394 3394 6278

2223 2223

B2o

15080 15080 15978 16017 16157 16274 16294

14999

16145

16092

B2o1

213

16330

214

5082

2857

15924

152 393

B2aa

212

B2aa1

B2b

Panel C Panel

* *

8022 8022

211

11242 11242 16526

12684

16129C 16129C

210

11026 11026 16325

310

3027 6510 8994 209 16093

69

4. Mitogenome variation in Ecuador and Peru

E

Ancient Peru

Peru

Ecuador

N

H

R

73 73

263 263 750

H2

R0

HV

16223 16223 12705

4769

2706 2706 7028

1438

H2a

11719

14766

rCRS

10873

-

596 597

6113 6113

16189 16189 16365

@2092

150

643

3421 9758

@15301 @15301

10739 15777 16166

10398 9540 8701 9540 10398

-

204

595

6465 6465

13419 13419 15697

641 642

11365 11365 11906 12481 12557 14665

6752 6752

594

16399

7073 7073 9254

640

10601 10601 12397 15440 16319

6446 6446 8705

143 143

14605 14605 16154 16193 16319

593

6719G 6719G

L3

827 827

639

7859 7859 9355

16256

318 318

592

2251 2251 2404 3531 8387 9055

327 327

12011 12011 12063 15903 16168 16217

8191 8191

638

13604C 13604C

@16223 @16325 @16362

12795 12795 15346

@16223

228

16189

7427

591

9668

9668 9668

636

637

11665 11665 16527

-

14783 14783

16527

D1

590

2092 2092

16325

635

634

12403 12403 15193 16291

152

8038 8038

10400 489 10400

195

@15326

15043

589

3921 3921

11252 11252 16192

633 3363 3363 3764

@12705 @12705

588

13227 13227 14569

143 154

632

6272 6272 9490

10163 10163 11368 13581 15452

16223

6602 6602

587

11302 11302

9111

D1t

456

@

M

3200 3200 4704 7389 8392 9469

631

13431 13431 13809 15799 16209 16301

16209

9801

586

6680 6680

15727 15727 16189 16204 16213 16324

103 103

2757 2757

630

11339 11339

16235A

204 234 239 513

585

4562 4562 5201 8473 9804

10900 10900 11116 16207

4935T 4935T

813 813

65G 65G

629

11253 12361 13812

75

584

204

D1s

628

13287 13287 16209

1627A 1627A

152 224

581

16362

39 39

9701

26 64

4883 5178A 4883

583

D1s1

5252 5252

582

12121 12121 16272

10652

153 153

626

627

16286

16311

16265C 16265C

97 97

580

5237 5237 9621

964A 964A

12561 12561 14034 14587

625

9612

12358 12366 15301

@4769 @7028 @8860

195

D1f

-

16142

579

8265 8265

10172

14215 14215 15106

D D1u

630

D4

624

12127 15813 16390

622

14668

6351 6351 6827

14569 15776 16293

3010 8414 3010

16129

578

D1u1

4823

7196A

456

621

310 310

8348 8348

620

4688

14690

@3010

146 228

577

10454 10454 10799 12346 16148 16278 16356

619

16352

85 85

502 502 629

-

D1r

15244

3666 3666

576

16292

7352 7352

617

618

11152 12127 12618 15316

615

11204 11204

16086 16086 16400

@15244 @15244

57

575

289 289 989

152

614

4814 4814

11150 11150 12215 12930 14071 15565 16304 16356

@2092 @2092

@10873 @10873

9612

574

12477

2281C 2281C 4056A

196 196

189 748

613 9739

4583 4583 8659 9716

D1q

12280 16147

573

6480 6480

16291

66

612

1719

11410

15287 15287 16189

D1q1

10822G 10822G

146

1971 1971

572

13851 13851 13986 14065 16311

204

611

-

16188

6023 6023

@152

14861

D4h3a9

571

570

7879 7879

16249 16293

513

610

11087 11152 16193

569

64A

4907

752A 752A

15930 15930 16129

D1p

961 961

609

11318

@64A

143

200 200

4363

16185 16185 16463

11455

568

6293 6293 7348

16311

D4h3a6

222 222 228 460

608

16192

204

567

16129

152 179

4219 4219

607

606

7805 7805

12630 12630 15172

9287C 9287C

151 151

566

4958 4958 5147

16301

16342

4025

2263A

16241

13135

6340 6340 7175

605

10042 10042 16157

2065 2065 7990

3336 3336 3644 5048

11482 11482 16311

D4h

D4h3

735 735

@16301

D4h3a

6872

152

3396 3396 8946 6285

D4h3a5

9458 9458

16301

565

9956 9956

15383 15383 16445

D1o

11087

604

6515

4413

564

603

16325

562

4959 4959 9851

3639 3639 7055 7389 8093

16294

11928 11928 13934 15784

@16223 @

D4h3a4

563

8998

72

204

195 199

13368

602

9804

@ @16189

561

11465 11465 12717 15791

5460

64 64

65 65

279

600

57A

601

D1k1a

3348 3348

12882 12882

72 72

722

560

16292

1664 1664 4820 6092 6572 9965

11016 11016 13404

143 143 189

5655 5655

@16241

189

D1k

16189

13368

D1k1

599

@143 @143 @

16129 16291 16299

16255

294 294

559

1719 2056 9413

189

D4h3a10

13368

598

5192 5192 8718

5451 5451

@

12771 12771

13395 13395 15289 16355

9098G 9098G

@

558

5745 5745

16390

557

1309 1309 8296

12338 12338 13748 16111 16129 16189 16278 11890 11890 Figure 4.1. Phylogeny encompassing 643 modern and ancient Native American mitogenomes from Ecuador and Peru. Modern samples from Ecuador and Peru are in green and red, respectively. Ancient mitogenomes from Peru are in pink. Sub-haplogroups defined for the first time in this study are reported in light blue color, while those in black were previously described. The prefix @ indicates the reversion of a mutation occurring earlier in the phylogeny. Recurrent mutations in the tree are underlined. In the different panels the phylogenetic trees of the pan-American haplogroups are shown: A2 (panel A) encompassing 127 mitogenomes; B2 without the sub-clade B2b (panel B) encompassing 148 mitogenomes; B2b (panel C) encompassing 138 mitogenomes; C1 (panel D) encompassing 143 mitogenomes; D4h3a and D1 (panel E) encompassing 87 mitogenomes.

70

4. Mitogenome variation in Ecuador and Peru

In order to evaluate possible differences in the distribution of pan- American haplogroups, between North and South American regions, we compared our data with those from Mexican Native Americans, the only other Native population randomly analyzed at the level of entire mitogenomes (Kumar et al., 2011). As shown in Figure 4.2, the frequency of haplogroup A2 in Mexico is twice that in the two Andean states (40% in Mexico, 20% in Ecuador and 18% in Peru). Similarly, haplogroup C1c is found in 8% of the Mexican population but only at a frequency of 1% in Ecuador and Peru. On the other hand, we found the opposite situation when considering haplogroup B2. Indeed, it represents 22% of the Mexican mitogenomes, while it reaches much higher frequencies in Ecuador and Peru (55% and 45%, respectively).

Figure 4.2. Frequency distribution of pan-American mtDNA haplogroups in Ecuador (N=206), Peru (N=333) and Mexico (N=262). Mitogenomes of recent Old World ancestry were not included. Only mitogenomes from this study, HGDP, Tito et al. (2012) and Kumar et al. (2011) have been included because they were randomly selected.

71

4. Mitogenome variation in Ecuador and Peru

Table 4.3. Frequencies of the sub-haplogroups identified in the Ecuadorean and Peruvian modern populations. Only subjects with Native American mitogenomes were included. Only mitogenomes from this study, HGDP and Tito et al., 2012 were included because they were randomly selected. Frequency Haplogroup/ Sub-haplogroup Ecuador (%) Peru (%) N=206 N=333 A2 36 (17.48) 68 (20.42) A2k 4 (1.94) 7 (2.10) A2y 2 (0,97) 2 (0.60) A2z 0 (0.00) 4 (1.20) A2aa 0 (0.00) 5 (1.50) A2ac 12 (5.83) 0 (0.00) A2am 0 (0.00) 2 (0.60) A2ar 2 (0.97) 2 (0.60) A2as 0 (0.00) 4 (1.20) A2at 0 (0.00) 8 (2.40) A2au 0 (0.00) 5 (1.50) A2av 6 (2.91) 3 (0.90) A2aw 3 (1.46) 0 (0.00) Other A2 7 (3.40) 26 (7.81) B2 113 (54.85) 146 (43.84) B2b 65 (31.55) 61 (18.32) >B2b5 22 (10.68) 2 (0.60) >B2b6 27 (13.11) 5 (1.50) >B2b7 6 (2.91) 0 (0.00) >B2b8 3 (1.46) 1 (0.30) >B2b9 1 (0.49) 10 (3.00) >B2b10 0 (0.00) 10 (3.00) >B2b11 0 (0.00) 11 (3.30) >B2b12 1 (0.49) 6 (1.80) >B2b13 0 (0.00) 3 (0.90) >Other 5 (2.43) 13 (3.90) B2b B2l 8 (3.88) 0 (0.00) B2o 0 (0.00) 0 (0.00) B2q 16 (7.77) 1 (0.30) B2y 0 (0.00) 7 (2.10) B2z 14 (6.80) 0 (0.00)

72

4. Mitogenome variation in Ecuador and Peru

B2aa 1 (0.49) 6 (1.80) B2ab 0 (0.00) 14 (4.20) B2ac 0 (0.00) 5 (1.50) B2ad 0 (0.00) 3 (0.90) B2ae 0 (0.00) 3 (0.90) B2ag 0 (0.00) 7 (2.10) B2ah 0 (0.00) 3 (0.90) Other B2 9 (4.37) 36 (10.81) C1 42 (20.39) 67 (20.12) C1b 33 (16.02) 62 (18.62) >C1b2 0 (0.00) 10 (3.00) >C1b16 0 (0.00) 5 (1.50) >C1b21 0 (0.00) 3 (0.90) >C1b23 15 (7.28) 0 (0.00) >C1b24 0 (0.00) 4 (1.20) >C1b25 0 (0.00) 3 (0.90) >C1b26 1 (0.49) 4 (1.20) >C1b28 3 (1.46) 0 (0.00) >C1b29 4 (1.94) 0 (0.00) >Other 10 (4.85) 33 (9.91) C1b C1c 2 (0.97) 5 (1.50) C1d1 7 (3.40) 0 (0.00) D4h3a 2 (0.97) 3 (0.90) D1 13 (6.31) 49 (14.71) D1f 9 (4.37) 3 (0.90) D1k 0 (0.00) 5 (1.50) D1o 0 (0.00) 4 (1.20) D1p 0 (0.00) 3 (0.90) D1q 0 (0.00) 4 (1.20) D1r 0 (0.00) 5 (1.50) D1s 0 (0.00) 4 (1.20) D1t 0 (0.00) 3 (0.90) D1u 0 (0.00) 5 (1.50) Other D1 4 (1.94) 13 (3.90)

In order to evaluate the differences between Ecuadorean and Peruvian mitogenomes, we assessed some diversity indices in the two areas (number of

73

4. Mitogenome variation in Ecuador and Peru haplotypes, haplotype and nucleotide diversities, average number of nucleotide differences) based on the same dataset used for frequency estimations. This analysis did not reveal major differences between the two populations (Table 4.4).

Table 4.4. Diversity indices in Ecuador and Peru. Only subjects with Native American mitogenomes were included. Geographic areas Ecuador Peru N of subjectsa 206 333 N of haplotypes (h) 166 290

Haplotype diversity (H) 0.9950 ± 0.0017 0.9988 ± 0.0004

Nucleotide diversity % (π) 0.00196 ± 0.00005 0.00206 ± 0.00003

Average number of nucleotide 32.407 34.087 differences (M) aOnly modern mitogenomes from this study, HGDP and Tito et al. (2012) were included here because they were randomly selected; indels and heteroplasmies were not considered.

Similarly to present-day mitogenomes, the 68 ancient mitogenomes included in our analyses were found to encompass all the pan-American haplogroups, with the exception of D4h3a (Figure 4.1). The phylogenetic relationships between the ancient mitogenomes and those from present Ecuadorean and Peruvian populations are showed in Figure 4.1. In detail, there is only one haplotype turning out to be identical between ancient (#239) and modern (#238) samples, while almost all the remaining ancient haplotypes share at least one private mutation with modern Peruvian and/or Ecuadorean sequences. The only exceptions are represented by the four mitogenomes #556, #641, #643 and #643 (GenBank codes: KU523332, KU523344, KU523342, KU523347), whose private mutations are not shared with any other mitogenome.

4.3.2. Phylogeography

To evaluate the geographical distribution range of both the new (54) and previously defined (14) sub-haplogroups and their sub-lineages, we searched our in-house database - encompassing more than 1,700 Native American mitogenomes – for their diagnostic mutations. We identified 79 mitogenomes, from both North and South America, belonging or phylogenetically closely related to those from Ecuador and Peru (Table 4.5). This analysis revealed that the phylogeny of

74

4. Mitogenome variation in Ecuador and Peru

Ecuadorean and Peruvian mitogenomes includes two kinds of sub-haplogroups: those restricted to South America and others with representatives also in North America (Table 4.6).

Table 4.5. Origins and sub-haplogroup affiliations of mitogenomes not included in Figure 4.1 but used for age estimates. GenBank Haplogrou Continent Country Ethnicity Reference ID p EU095196 A2aa South America Brazil Waiwai Fagundes et al., 2008 JQ702750 A2aa Europe West Spain Behar et al., 2012a EU095201 A2aa South America Brazil Poturujara Fagundes et al., 2008 KJ923839 A2ac South America Colombia Rieux et al., 2014 KJ923840 A2ac South America Colombia Rieux et al., 2014 KM102053 A2ac North America USA Hispanic Just et al., 2015 JQ702324 A2ac1 South America Colombia Behar et al., 2012a JQ704789 A2ac1 North America Cuba Behar et al., 2012a HG01440 A2ac1 South America Colombia HGDP HG01112 A2ac1 South America Colombia HGDP HG01251 A2ac1 South America Colombia HGDP Söchtig et al., 2015, Direct KM051472 A2ar North America Guatemala submission DQ282389 A2av North America USA Hispanic Just et al., 2008 DQ282417 A2k North America USA Hispanic Just et al., 2008 HQ012071 A2k1 North America Mexico Kumar et al., 2011 HQ012058 A2k1 North America Mexico Kumar et al., 2011 Greenspan 2011, Direct JN419250 A2k1 North America Mexico submission DQ282390 A2k1a North America USA Hispanic Just et al., 2008 DQ282400 A2k1a North America USA Hispanic Just et al., 2008 JQ702172 A2k1a Behar et al., 2012a EU095552 A2k1 South America Wayuu Tamm et al., 2007 DQ282425 A2z1 North America USA Hispanic Just et al., 2008 DQ282426 A2z1 North America USA Hispanic Just et al., 2008 DQ282427 A2z1 North America USA Hispanic Just et al., 2008 JQ705069 A2z1 North America Puerto Rico Behar et al., 2012a HG01083 A2z1 North America Puerto Rico HGDP HG00731 A2z1 North America Puerto Rico HGDP DQ282424 A2z1 North America USA Hispanic Just et al., 2008 JQ705380 A2z1 North America USA Behar et al., 2012a KM102071 A2z1 North America USA Hispanic Just et al., 2015 JQ705881 B2aa2 North America Mexico Behar et al., 2012a

75

4. Mitogenome variation in Ecuador and Peru

HQ012157 B2aa2 North America Mexico Kumar et al., 2011 HQ012174 B2aa2 North America Mexico Kumar et al., 2011 Ancient KU523299 B2ab1a South America Bolivia Llamas et al., 2016 sample Ancient KU523303 B2ab South America Bolivia Llamas et al., 2016 sample HG01437 B2b South America Colombia HGDP EU095208 B2b North America California Pomo Fagundes et al., 2008 EU095210 B2b South America Brazil Xavante Fagundes et al., 2008 KC503928 B2b South America Bolivia Taboada-Echalar et al., 2013 KM102138 B2b North America USA Hispanic Just et al., 2015 Gomez-Carballa et al., 2011, JF431064 B2b1 South America Venezuela Direct Submission Gomez-Carballa et al., 2011, JF431065 B2b1 South America Venezuela Direct Submission KC503926 B2b2 South America Bolivia Taboada-Echalar et al., 2013 KM102111 B2b2 North America USA Hispanic Just et al., 2015 KC503925 B2b2a South America Bolivia Toboada-Echalar et al., 2013

KC503927 B2b2a South America Bolivia Toboada-Echalar et al., 2013 EU095221 B2b3 South America Yanomama Fagundes et al., 2008 EU095216 B2b3a South America Brazil Kayapo Fagundes et al., 2008 KM102108 B2b3a North America USA Hispanic Just et al., 2015 HG00640 B2b3a North America Puerto Rico HGDP HG01198 B2b3a North America Puerto Rico HGDP HG01191 B2b3a North America Puerto Rico HGDP HG01079 B2b3a North America Puerto Rico HGDP HQ012137 B2b4 North America Mexico Kumar et al., 2011 JQ702293 B2l Behar et al., 2012a NA19785 B2l1 North America Mexico HGDP KC503931 B2o1a South America Bolivia Taboada-Echalar et al., 2013

KC503932 B2o1a South America Bolivia Taboada-Echalar et al., 2013

KC503933 B2o1a South America Bolivia Taboada-Echalar et al., 2013 HQ012153 B2q North America Mexico Kumar et al., 2011 HQ012176 B2q North America Mexico Kumar et al., 2011 KM102005 B2q North America USA Hispanic Just et al., 2015 EU095536 D1f South America Colombia Coreguaje Tamm et al., 2007 EU431089 D1f North America USA Achilli et al., 2008a KC257286 D1f Sequeira et al., 2015 EU095235 D1f1 South America Brazil Waiwai Fagundes et al., 2008 EU095238 D1f1 South America Brazil Katuena Fagundes et al., 2008 EU095240 D1f1 South America Brazil Tiryo Fagundes et al., 2008

76

4. Mitogenome variation in Ecuador and Peru

EU095241 D1f1 South America Brazil Tiryo Fagundes et al., 2008 KF451520 D1f1 South America Brazil Lippold et al., 2014 Zheng et al., 2014, Direct KJ446600 D1f1 South America Brazil Karitiana submission JQ702137 D1f2 Behar et al., 2012a HG01441 D1f2 South America Colombia HGDP HQ012257 D1f3 North America Mexico Kumar et al., 2011 JQ701868 D1f3 Behar et al., 2012a DQ282483 D1k North America USA Hispanic Just et al., 2008 NA19755 D1k North America Mexico HGDP HQ012252 D1k North America Mexico Kumar et al., 2011 KM102096 D1k1 North America USA Hispanic Just et al., 2015

Most of the analyzed sub-haplogroups have been found only in South America (Table 4.6) and they are often restricted to a single geographic area, some only to Ecuador (i.e. A2ac2, A2av1a, A2aw, B2b5a, B2b5b1a, B2b6a1a, B2b7, B2b8a, B2l1a, B2z, C1b23, C1b28, C1b29 and C1d1f), others only to Peru (i.e. A2z2, A2as, A2at, A2au, B2b9a, B2b9b, B2b10, B2b11, B2b12b, B2b13, B2aa1a, B2ab1a1, B2ac, B2ad, B2ae, B2y2, B2ag, B2ah, C1b16, C1b19, C1b21, C1b24, C1b25, C1b26a1, C1b27, C1d1e, D4h3a10, D1k1a, D1o, D1p, D1q, D1r, D1s, D1t and D1u) (Figure 4.1). Others apparently harbour wider distribution ranges encompassing both Ecuador and Peru (i.e. A2y, A2av1, B2b5, B2b6, B2b8, B2b9, B2b12, B2q1, C1b26) (Figure 4.1) or even include further away regions of South America (B2o1, B2ab) (Table 4.5). Even though less frequent, eleven haplogroups were found to include mitogenomes from both North and South America (i.e. A2k, A2z, A2ac, A2av, A2ar, B2b, B2l, B2q, B2aa, D1f and D1k) (Table 4.5). Among these, a very interesting case is represented by haplogroup B2b, which alone encompasses about 25% of our Andean sample. In 2013, Taboada-Echalar and colleagues already proposed a phylogeny of this haplogroup. We enlarged it by adding 133 B2b modern mitogenomes that allowed us to define eight new sub-clades and to obtain a more accurate estimation of the age of the whole haplogroup. It is worth to notice that among the 146 B2b mitogenomes included in Figure 4.3a, nine are from North America, in particular one from North California (EU095208), one from Mexico (HQ012137), three are from Hispanic subjects living in the USA (KM102108, KM102111, KM102138) and four from Puerto Rico (HG00640, HG01079, HG01191, HG01198). The remaining B2b mitogenomes (N=137) are from South America, but they are from both the Northeastern Atlantic side and Pacific western side of the subcontinent (panel A in Figure 4.3). Similarly, haplogroup D1f, despite its rarity in our phylogeny, shows a geographic distribution overlapping that of haplogroup B2b, including samples from North America, from three Andean states (Colombia, Ecuador and Peru) and Brazil (panel B in Figure 4.3). Other examples of haplogroups found both in South and North America are A2z and B2aa (panels C and D in Figure 4.3). In these clades, mitogenomes from North America and those from South America cluster in

77

4. Mitogenome variation in Ecuador and Peru two clearly distinct branches. Both sub-branches of B2aa are only one mutation away from the common node while, in the A2z, the branch A2z1 in the North is more distant from the root than A2z2 in the South (6 mutations vs 1, respectively) (not shown).

Figure 4.3. Phylogenetic relationships and geographical distributions of B2b (panel A), D1f (panel B), A2z (panel C), B2aa (panel D) modern mitogenomes.

78

4. Mitogenome variation in Ecuador and Peru

4.3.3. Haplogroup age estimates

In order to date the minimum arrival time of the clades of interest, we estimated their coalescence ages with three different methods: ρ statistic, Maximum Likelihood (ML) and BEAST (Bayesian Evolutionary Analysis Sampling Trees). All the analyses were performed on a dataset including only modern mitogenomes. In particular, ML estimations were performed using two partitions (with different weights to mutations falling in the coding region – nps 577-16023 – and in the control region – nps 16024-576) as well as one partition (with all mutations contributing equally in the analysis). In the latter case we used the estimated ages of macro-haplogroups M (including the pan-American haplogroups C1b, C1c, C1d, D4h3a and D1) and N (including pan-American haplogroups A2 and B2) reported in Behar et al., 2012a as fixed priors (Table 4.6). The ML and BEAST computations were also performed on a second dataset including both modern and ancient mtDNAs, with radiocarbon dates used as tip calibration points (Table 4.6). This allowed us to assess also the age of haplogroups represented by almost only ancient mitogenomes. An example is haplogroup C1b19 that encompasses nine ancient and only two modern mitogenomes (Figure 4.1).

Table 4.6. Age estimates (rho statistics, ML and BEAST) of the sub-haplogroups shown in Figure 4.1. Age Estimates (Ky) ML BEAST

Sub- 1 partition, 1 partition, with M, Nb haplogroupa with M, Nb with M and ρ statistics with M and and ancientc 2 partitions and ancientc Nb estimates Nb estimates estimates as estimates as as priors as priors priors priors

A2k 14.40 ± 6.28 18.84 ± 4.43 15.53 ± 3.21 15.84 ± 3.35 25.02 ± 0.38 19.27 ± 0.61 >A2k1 10.61 ± 4.16 11.38 ± 3.61 9.72 ± 2.66 9.77 ± 2.69 22.90 ± 0.32 16.90 ± 0.56 >>A2k1a 2.59 ± 2.95 5.63 ± 4.66 4.91 ± 3.25 4.91 ± 3.24 14.95 ± 0.11 8.26 ± 0.51 A2yd 13.19 ± 9.12 18.52 ± 4.61 15.20 ± 3.24 15.54 ± 3.41 22.89 ± 0.28 17.09 ± 1.07 >A2y1e 9.63 ± 7.88 13.86 ± 4.98 11.63 ± 3.38 11.84 ± 3.53 19.74 ± 0.21 13.41 ± 1.24 >>A2y1ad 5.63 ± 6.31 8.41 ± 4.19 7.20 ± 2.93 7.26 ± 2.98 16.93 ± 0.15 10.16 ± 1.95 A2zd 17.48 ± 11.47 17.25 ± 5.44 14.35 ± 3.70 14.62 ± 3.84 21.73 ± 0.29 15.47 ± 1.36 >A2z1e 1.55 ± 1.44 2.17 ± 1.05 1.89 ± 0.78 1.90 ± 0.78 12.01 ± 0.05 3.99 ± 0.86 >A2z2e 4.33 ± 3.85 6.22 ± 3.87 5.39 ± 3.12 5.42 ± 2.81 14.48 ± 0.07 7.45 ± 1.91 A2aa 7.89 ± 4.67 9.47 ± 3.85 8.12 ± 2.81 8.16 ± 2.82 19.21 ± 0.23 13.10 ± 0.88 A2ac 12.71 ± 3.94 13.01 ± 2.71 10.98 ± 2.06 11.10 ± 2.11 21.86 ± 0.27 16.39 ± 0.46 >A2ac1 4.69 ± 4.01 4.66 ± 2.94 4.09 ± 2.09 4.12 ± 2.11 13.07 ± 0.05 5.80 ± 1.19 >A2ac2e 8.34 ± 4.05 9.32 ± 3.30 7.98 ± 2.46 8.03 ± 2.39 17.82 ± 0.18 11.96 ± 0.54 A2ave 12.82 ± 6.52 13.71 ± 3.94 11.55 ± 2.83 11.67 ± 2.86 21.91 ± 0.31 15.75 ± 0.68 >A2av1e 10.91 ± 4.67 11.38 ± 3.24 9.71 ± 2.44 9.78 ± 2.43 20.23 ± 0.29 13.86 ± 0.46 >>A2av1ae 3.46 ± 3.43 3.56 ± 3.27 3.15 ± 2.37 3.16 ± 2.39 11.74 ± 0.06 3.84 ± 1.51 A2ase 18.30 ± 10.54 19.56 ± 3.86 14.41 ± 3.45 16.43 ± 2.95 21.75 ± 0.28 16.26 ± 1.33 >A2as1e 3.46 ± 3.43 3.78 ± 2.48 3.17 ± 1.75 3.34 ± 1.84 11.85 ± 0.03 4.22 ± 2.13

79

4. Mitogenome variation in Ecuador and Peru

A2ate 15.47 ± 7.97 18.44 ± 4.04 15.50 ± 2.81 15.55 ± 3.05 23.42 ± 0.28 16.79 ± 0.68 >A2at1e 9.70 ± 6.29 11.86 ± 4.92 10.80 ± 2.76 10.16 ± 3.47 19.61 ± 0.19 11.93 ± 0.75 A2aue 11.71 ± 7.32 13.86 ± 6.85 11.70 ± 4.61 11.87 ± 4.73 20.57 ± 0.25 14.27 ± 1.31 A2ar 18.44 ± 7.07 17.80 ± 3.64 14.74 ± 2.72 15.00 ± 2.81 22.51 ± 0.30 16.56 ± 0.58 A2awe 7.89 ± 6.35 7.65 ± 4.72 6.54 ± 3.23 6.64 ± 3.29 14.80 ± 0.10 7.86 ± 2.64 B2b 19.29 ± 3.20 19.00 ± 1.93 15.83 ± 1.87 15.85 ± 1.92 27.20 ± 0.39 21.95 ± 0.36 >B2b2 22.62 ± 9.76 17.33 ± 3.06 14.53 ± 2.42 14.54 ± 2.44 22.36 ± 0.29 16.40 ± 0.66 >>B2b2a 3.90 ± 4.46 3.41 ± 2.39 2.97 ± 1.76 2.97 ± 1.76 10.90 ± 0.04 3.08 ± 2.99 >B2b3 41.24 ± 21.94 17.41 ± 2.58 14.66 ± 2.17 14.65 ± 2.15 22.87 ± 0.35 16.88 ± 0.65 >>B2b3ae 26.28 ± 15.79 8.33 ± 3.92 7.17 ± 2.83 7.18 ± 2.76 15.70 ± 0.13 8.82 ± 1.15 >B2b5e 12.04 ± 5.06 10.00 ± 2.30 8.59 ± 1.78 8.60 ± 1.78 22.57 ± 0.31 16.55 ± 0.50 >>B2b5ae 6.10 ± 5.37 4.22 ± 2.22 3.65 ± 1.62 3.65 ± 1.62 13.71 ± 0.13 6.39 ± 2.27 >>>B2b5a1e 1.72 ± 2.40 1.15 ± 0.96 1.02 ± 0.72 1.03 ± 0.72 9.86 ± 0.02 1.97 ± 1.56 >>B2b5be 9.12 ± 6.63 7.27 ± 2.81 6.32 ± 2.01 6.33 ± 2.01 17.57 ± 0.18 10.85 ± 0.71 >>>B2b5b1e 7.59 ± 5.98 6.07 ± 2.79 5.25 ± 2.04 5.26 ± 2.03 15.32 ± 0.10 8.29 ± 1.03 >>>>B2b5b1ae 4.55 ± 3.88 4.00 ± 2.04 3.51 ± 1.49 3.51 ± 1.46 13.02 ± 0.06 5.41 ± 0.71 >>>>>B2b5b1a1e 3.24 ± 3.40 2.24 ± 1.49 1.97 ± 1.17 1.97 ± 1.07 10.74 ± 0.04 2.79 ± 0.97 >B2b6e 18.83 ± 8.44 17.17 ± 3.63 14.43 ± 2.72 14.44 ± 2.71 24.48 ± 0.36 18.75 ± 0.64 >>B2b6ae 14.30 ± 6.91 13.94 ± 3.57 11.80 ± 2.68 11.81 ± 2.62 22.23 ± 0.29 16.19 ± 0.68 >>>B2b6a1c 11.33 ± 4.58 11.77 ± 3.34 10.03 ± 2.54 10.04 ± 2.43 20.87 ± 0.28 14.61 ± 0.49 >>>>B2b6a1ae 11.11 ± 6.42 7.95 ± 4.45 6.85 ± 3.29 6.86 ± 3.21 18.07 ± 0.22 11.56 ± 0.78 >>B2b6be 8.91 ± 7.49 10.31 ± 4.42 8.84 ± 3.18 8.84 ± 3.16 19.14 ± 0.26 12.54 ± 1.61 >>>B2b6b1e 4.84 ± 3.99 4.74 ± 4.81 4.10 ± 3.51 4.10 ± 3.48 14.11 ± 0.14 6.60 ± 1.28 >>>>B2b6b1ae 4.55 ± 3.88 3.56 ± 2.12 3.14 ± 1.54 3.09 ± 1.53 11.90 ± 0.08 4.01 ± 1.12 >node152 13.58 ± 5.57 16.28 ± 4.09 13.68 ± 2.98 13.69 ± 2.99 22.55 ± 0.25 16.72 ± 0.38 >>B2b7e 7.44 ± 6.02 6.75 ± 4.69 5.83 ± 3.34 5.84 ± 3.34 14.82 ± 0.07 8.21 ± 0.71 >node195-16390 20.16 ± 10.01 17.09 ± 4.20 14.25 ± 3.20 14.26 ± 3.08 21.34 ± 0.26 15.25 ± 0.60 >>B2b8e 17.59 ± 10.41 14.80 ± 4.34 12.51 ± 3.19 12.52 ± 3.09 19.28 ± 0.25 12.89 ± 0.97 >>>B2b8ae 4.33 ± 3.85 2.46 ± 1.67 2.17 ± 1.25 2.17 ± 1.25 10.69 ± 0.05 2.78 ± 1.67 >B2b9e 10.86 ± 4.23 13.16 ± 3.46 10.90 ± 2.36 11.16 ± 2.60 22.10 ± 0.29 15.89 ± 0.60 >>B2b9ae 9.25 ± 6.52 10.44 ± 4.42 8.30 ± 2.63 8.98 ± 3.13 17.19 ± 0.19 10.86 ± 1.18 >>B2b9be 3.90 ± 5.76 0.00 ± 9.40 4.49 ± 1.95 0.00 ± 6.81 13.30 ± 0.11 1.94 ± 3.18 >>B2b9ce 5.22 ± 5.20 7.50 ± 4.08 6.85 ± 2.26 6.47 ± 2.92 16.19 ± 0.17 6.57 ± 2.95 >B2b10e 8.97 ± 5.45 11.54 ± 5.29 9.88 ± 3.69 9.89 ± 3.67 19.79 ± 0.29 13.48 ± 0.79 >>B2b10ae 12.68 ± 7.79 9.77 ± 5.14 8.32 ± 3.72 8.33 ± 3.71 17.58 ± 0.25 11.07 ± 0.77 >>B2b10be 2.15 ± 2.25 0.65 ± 19.08 0.58 ± 27.93 0.58 ± 15.27 11.67 ± 0.05 3.96 ± 1.75 >B2b11e 17.46 ± 6.69 16.14 ± 3.42 13.63 ± 2.39 13.61 ± 2.59 22.78 ± 0.27 16.50 ± 0.60 >>>B2b11ae 9.93 ± 7.44 8.18 ± 3.64 7.01 ± 2.54 7.08 ± 2.65 14.73 ± 0.11 8.38 ± 1.91 >>>>B2b11a1e 4.33 ± 5.17 2.31 ± 2.54 1.99 ± 1.84 2.01 ± 1.86 10.21 ± 0.03 2.64 ± 2.91 >>B2b11be 11.98 ± 7.65 12.54 ± 4.85 10.70 ± 3.31 10.70 ± 3.40 18.50 ± 0.19 11.95 ± 1.05 >>>B2b11b1e 9.70 ± 7.76 9.39 ± 4.95 8.06 ± 3.45 8.04 ± 3.46 15.40 ± 0.13 8.43 ± 1.10 >B2b12e 18.20 ± 7.69 15.90 ± 2.94 13.36 ± 2.25 13.37 ± 2.27 22.32 ± 0.33 16.29 ± 0.89 >>B2b12ae 18.30 ± 8.72 13.16 ± 3.28 11.17 ± 2.38 11.18 ± 2.39 19.69 ± 0.28 13.38 ± 0.97 >>B2b12be 7.89 ± 6.35 11.69 ± 4.64 9.99 ± 3.28 10.00 ± 3.27 18.31 ± 0.24 11.77 ± 2.13 >B2b13e 9.70 ± 6.42 13.16 ± 4.77 11.21 ± 3.40 11.22 ± 3.36 18.61 ± 0.20 12.32 ± 0.94 B2l 12.26 ± 7.96 18.20 ± 3.46 15.31 ± 2.67 15.28 ± 2.69 22.52 ± 0.27 16.48 ± 1.07 >B2l1e 9.70 ± 6.79 16.38 ± 3.61 13.81 ± 2.79 13.76 ± 2.79 20.56 ± 0.22 14.24 ± 1.22

80

4. Mitogenome variation in Ecuador and Peru

>>B2l1ae 3.57 ± 4.68 5.18 ± 4.02 4.49 ± 2.82 4.50 ± 2.83 14.68 ± 0.05 7.35 ± 2.56 >>>B2l1a1e 1.10 ± 1.25 1.22 ± 1.04 1.10 ± 0.78 1.10 ± 0.78 11.41 ± 0.04 3.44 ± 1.50 B2o1d 19.01 ± 9.03 16.85 ± 5.53 14.08 ± 3.84 14.05 ± 3.85 21.73 ± 0.27 15.58 ± 1.19 >B2o1ae 8.79 ± 6.63 10.08 ± 5.70 8.69 ± 4.05 8.68 ± 4.05 16.83 ± 0.14 10.05 ± 2.14 B2q 15.05 ± 10.33 16.22 ± 4.66 13.84 ± 3.31 13.66 ± 3.32 22.31 ± 0.24 15.93 ± 0.92 >B2q1e 4.90 ± 4.75 5.03 ± 5.09 4.91 ± 3.34 4.33 ± 3.58 15.96 ± 0.10 8.14 ± 1.60 >>B2q1ae 2.40 ± 2.10 1.88 ± 0.87 2.61 ± 0.78 1.65 ± 0.67 13.32 ± 0.07 4.76 ± 0.93 >>>B2q1a1e 1.03 ± 1.01 0.86 ± 0.87 0.90 ± 0.84 0.74 ± 0.65 10.13 ± 0.03 2.06 ± 1.01 B2ze 7.51 ± 6.80 9.70 ± 5.42 8.33 ± 3.87 8.33 ± 3.85 19.00 ± 0.23 12.40 ± 1.36 >B2z1e 4.81 ± 4.95 5.18 ± 3.48 4.54 ± 2.58 4.54 ± 2.55 15.32 ± 0.07 8.18 ± 1.02 >>B2z1ae 2.35 ± 2.54 2.39 ± 1.32 2.11 ± 0.99 2.11 ± 0.99 12.37 ± 0.05 4.56 ± 0.69 B2aae 15.05 ± 5.76 17.01 ± 4.29 14.27 ± 3.12 14.29 ± 3.08 23.56 ± 0.35 17.60 ± 0.73 >B2aa1e 11.39 ± 4.62 12.54 ± 3.26 10.54 ± 2.31 10.67 ± 2.40 20.20 ± 0.29 13.69 ± 0.50 >>B2aa1ae 9.25 ± 7.05 9.77 ± 3.40 7.24 ± 2.10 8.37 ± 2.49 15.56 ± 0.17 8.93 ± 1.64 >B2aa2e 14.30 ± 7.30 14.18 ± 4.05 12.01 ± 2.95 12.03 ± 2.93 20.68 ± 0.28 14.38 ± 0.83 B2abe 13.97 ± 6.95 14.57 ± 4.15 14.22 ± 2.23 12.29 ± 3.00 23.20 ± 0.29 16.72 ± 0.63 >B2ab1e 12.67 ± 5.88 12.77 ± 4.11 12.12 ± 3.02 10.81 ± 3.00 21.07 ± 0.26 14.98 ± 0.54 >>B2ab1ae 9.44 ± 6.60 10.46 ± 4.43 9.96 ± 3.42 8.88 ± 3.16 18.22 ± 0.20 12.01 ± 0.85 >>>B2ab1a1e 6.10 ± 5.08 6.97 ± 4.43 6.40 ± 3.42 6.02 ± 3.16 14.64 ± 0.11 8.04 ± 1.08 B2ace 19.58 ± 10.17 19.00 ± 3.18 15.94 ± 2.49 15.90 ± 2.52 23.31 ± 0.35 17.43 ± 0.78 B2ade 2.59 ± 3.81 3.26 ± 2.65 2.83 ± 1.90 2.83 ± 1.90 11.34 ± 0.04 3.48 ± 1.87 B2aee 2.58 ± 2.95 2.97 ± 2.47 2.62 ± 1.84 2.62 ± 1.80 11.53 ± 0.04 3.62 ± 1.85 B2y2e 15.47 ± 9.44 13.79 ± 4.13 10.04 ± 2.62 11.62 ± 2.99 20.54 ± 0.26 14.75 ± 1.07 B2age 13.37 ± 5.88 16.30 ± 4.18 13.68 ± 3.23 13.67 ± 3.10 22.47 ± 0.30 16.51 ± 0.76 >B2ag1e 20.93 ± 9.79 15.04 ± 4.06 12.64 ± 3.07 12.63 ± 2.96 20.66 ± 0.27 14.50 ± 0.91 B2ahe 12.45 ± 7.23 14.33 ± 4.24 12.15 ± 3.04 12.14 ± 3.04 20.45 ± 0.26 14.16 ± 1.35 C1b16e 5.22 ± 3.89 7.88 ± 3.82 6.80 ± 2.75 6.82 ± 2.75 16.53 ± 0.18 9.91 ± 1.19 C1b19 13.37 ± 8.61 15.19 ± 4.73 6.26 ± 2.22 12.89 ± 3.35 18.68 ± 0.23 12.86 ± 1.56 C1b21 15.23 ± 7.99 17.72 ± 4.40 14.25 ± 3.25 14.82 ± 3.23 21.15 ± 0.28 14.99 ± 0.93 >C1b21a 11.99 ± 8.11 14.10 ± 4.80 11.75 ± 3.34 11.96 ± 3.38 18.53 ± 0.24 11.63 ± 1.66 C1b23e 1.37 ± 1.07 1.44 ± 0.87 1.30 ± 0.64 1.30 ± 0.63 14.79 ± 0.08 7.60 ± 0.72 C1b24e 16.17 ± 9.96 19.24 ± 4.24 15.85 ± 3.00 16.11 ± 3.09 21.16 ± 0.28 14.68 ± 1.09 C1b25e 13.37 ± 8.71 10.54 ± 3.69 8.88 ± 2.60 9.00 ± 2.65 16.73 ± 0.14 10.00 ± 1.27 C1b26e 25.99 ± 9.32 19.64 ± 3.09 16.20 ± 2.38 16.37 ± 2.45 23.34 ± 0.30 17.89 ± 0.69 >C1b26ae 23.35 ± 9.03 18.28 ± 3.17 15.15 ± 2.39 15.30 ± 2.46 21.81 ± 0.27 16.09 ± 0.71 >>C1b26a1e 22.86 ± 15.82 16.93 ± 3.24 14.11 ± 2.43 14.25 ± 2.48 20.49 ± 0.26 14.48 ± 0.60 C1b27e 9.25 ± 7.05 12.31 ± 5.22 7.02 ± 2.88 10.52 ± 3.68 15.80 ± 0.14 10.81 ± 1.47 C1b28e 5.22 ± 4.25 5.93 ± 3.32 5.12 ± 2.41 5.16 ± 2.43 13.38 ± 0.07 6.00 ± 1.14 C1b29e 11.71 ± 7.48 17.25 ± 5.92 14.32 ± 4.13 14.51 ± 4.15 20.24 ± 0.23 13.86 ± 1.04 C1d1ee 14.77 ± 6.72 15.19 ± 4.35 12.72 ± 3.07 12.79 ± 3.10 21.15 ± 0.24 14.96 ± 0.96 C1d1fe 1.29 ± 1.79 1.44 ± 1.66 1.29 ± 1.22 1.29 ± 1.21 11.06 ± 0.05 3.00 ± 0.29 D4h3a10e 19.97 ± 9.92 19.40 ± 5.50 15.73 ± 3.59 15.73 ± 3.59 21.96 ± 0.06 16.16 ± 1.52 D1f 19.70 ± 4.82 26.65 ± 3.59 21.86 ± 2.96 21.83 ± 3.02 27.32 ± 0.36 22.11 ± 0.13 >D1f1 10.61 ± 6.07 12.00 ± 5.49 10.23 ± 3.85 10.23 ± 3.84 18.90 ± 0.27 12.36 ± 0.73 >D1f2 19.01 ± 13.71 14.72 ± 5.85 12.40 ± 4.08 12.41 ± 4.08 19.05 ± 0.21 12.44 ± 1.25 >D1f3 13.37 ± 8.61 25.75 ± 6.55 20.96 ± 5.14 20.93 ± 5.16 25.18 ± 0.34 19.59 ± 0.17 D1k 24.16 ± 9.72 27.90 ± 8.91 22.78 ± 6.05 22.74 ± 5.95 25.52 ± 0.37 19.72 ± 0.54

81

4. Mitogenome variation in Ecuador and Peru

>D1k1e 23.35 ± 8.68 20.77 ± 7.57 17.17 ± 5.18 17.16 ± 5.07 22.59 ± 0.32 16.37 ± 0.66 >>D1k1ae 19.01 ± 9.50 17.72 ± 6.89 14.76 ± 4.76 14.76 ± 4.68 19.46 ± 0.27 12.84 ± 0.51 D1oe 19.73 ± 9.22 24.10 ± 4.83 19.37 ± 3.71 19.84 ± 3.56 23.15 ± 0.35 17.34 ± 1.24 D1pe 12.45 ± 7.67 15.67 ± 6.35 13.27 ± 4.38 13.26 ± 4.39 19.72 ± 0.22 13.29 ± 1.68 D1qe 20.45 ± 9.14 22.55 ± 4.60 18.63 ± 3.39 18.61 ± 3.41 23.36 ± 0.37 17.44 ± 0.80 >D1q1e 11.53 ± 6.48 13.94 ± 4.86 11.79 ± 3.43 11.79 ± 3.45 19.17 ± 0.30 12.73 ± 1.04 D1re 12.26 ± 9.86 11.08 ± 5.00 9.44 ± 3.59 9.44 ± 3.56 17.73 ± 0.22 11.09 ± 1.37 D1u1e 11.16 ± 7.69 13.94 ± 6.95 12.63 ± 6.16 11.73 ± 4.81 18.27 ± 0.26 12.28 ± 2.17 D1se 10.61 ± 6.57 17.25 ± 6.40 14.46 ± 5.00 14.47 ± 4.48 20.81 ± 0.26 14.46 ± 1.98 >D1s1e 16.17 ± 5.83 10.31 ± 5.16 8.80 ± 3.77 8.81 ± 3.62 16.22 ± 0.17 9.47 ± 2.60 D1te 22.86 ± 9.87 25.00 ± 4.16 20.57 ± 3.19 20.55 ± 3.23 23.76 ± 0.34 17.94 ± 0.82 aThe underlined sub-haplogroups are restricted to South America. bCoalescence ages of macro-haplogroups M (including haplogroups C1b, C1c, C1d, D4h3a and D1) and N (including haplogroups A2 and B2), used as priors, are those reported in Behar et al. (2012a). cRadiocarbon calibrated ages of ancient mitogenomes are used as priors. dThe sub-haplogroup nomenclature differs from that reported in PhyloTree (http://www.phylotree.org/). eSub-haplogroups defined for the first time in this study.

According to expectations, the ML age estimates of virtually all sub- haplogroups are lower than 16 Kya (Table 4.6), the postulated entry time of Paleo- Indians in North America from Beringia. The finding of some sub-haplogroups that are apparently older than 16 Kya can be attributed to either (i) sequencing mistakes or (ii) non-stable mutations at the sub-haplogroup root. For instance, the ages of haplogroup D1k (22.78 ± 6.05 Kya) and its sub-clade D1k1 (17.17 ± 5.18 Kya) are probably caused by sequencing mistakes, as suggested by the presence of several suspect reverted mutations in the previously published mitogenomes from Peru (Figure 4.1). An example of the second scenario is instead probably represented by haplogroup D1f (21.86 ± 2.96 Kya), which is defined only by one control-region mutation (np 16142) at its root. This condition implies that mitogenomes not truly phylogenetically correlated to each other might have been improperly clustered within D1f, despite their possible polyphyletic origin. The sub-haplogroups with ML ages >13 Ky are listed in Table 4.7. These include, as expected, most of those detected also in North America (A2k, A2z, A2ar, B2b, B2l, B2q, B2aa, D1f, D1k), but also 20 sub-haplogroups whose distributions are restricted to South America. Some of these harbor ages that, probably for the reasons mentioned above, are overestimated (D1o, D1t), but their geographical distributions indicate that they most likely arose in situ in South America, possibly shortly after the human entry in the subcontinent. Among these ancient South American branches there are also three B2b branches (B2b6, B2b11, B2b12), which have been so far reported only in the Andean area, and are dated 13-14 Kya (Table 4.7).

82

4. Mitogenome variation in Ecuador and Peru

Table 4.7. ML age estimates and distribution of the sub-haplogroups older than 13 Ky.

ML age Sub- Nb estimates Geographical distribution haplogroupa (Ky)c

USA, Mexico, Wayuu A2k 19 15.53 ± 3.21 (Colombia/Venezuela), Ecuador, Peru A2yd 15 15.20 ± 3.24 Ecuador, Peru A2zd 13 14.35 ± 3.70 USA, Puerto Rico, Peru A2ar 5 14.74 ± 2.72 Guatemala, Ecuador, Peru A2ase 6 (2) 14.41 ± 3.45 Peru A2ate 10 (2) 15.50 ± 2.81 Peru USA, Puerto Rico, Mexico, Colombia, B2b 146 (11) 15.83 ± 1.87 Bolivia, Brazil, Venezuela, Ecuador, Peru B2b6e 32 14.43 ± 2.72 Ecuador, Peru B2b11e 15 (4) 13.63 ± 2.39 Peru B2b12e 7 13.36 ± 2.25 Ecuador, Peru B2l 10 15.31 ± 2.67 Mexico, Ecuador B2o1d 5 14.08 ± 3.84 Bolivia, Ecuador B2q 20 13.84 ± 3.31 USA, Mexico, Ecuador, Peru B2aae 15 (5) 14.27 ± 3.12 Mexico, Ecuador, Peru B2abe 15 (1) 14.22 ± 2.23 Bolivia, Peru B2ace 5 15.94 ± 2.49 Peru B2age 7 13.68 ± 3.23 Peru C1b21 5 (2) 14.25 ± 3.25 Peru C1b24e 4 15.85 ± 3.00 Peru C1b26e 5 16.20 ± 2.38 Ecuador, Peru C1b29e 5 14.32 ± 4.13 Ecuador D4h3a10e 3 15.73 ± 3.59 Peru D1f 25 21.86 ± 2.96 USA, Brazil, Colombia, Ecuador, Peru D1k 9 22.78 ± 6.05 USA, Mexico, Peru D1k1ae 4 14.76 ± 4.76 Peru D1oe 5 (1) 19.37 ± 3.71 Peru D1pe 3 13.27 ± 4.38 Peru D1se 4 14.46 ± 5.00 Peru D1te 3 20.57 ± 3.19 Peru aThe underlined sub-haplogroups are restricted to South America. bNumber of all mitogenomes included in the analysis (both modern and ancient). The number of ancient mitogenomes is in brackets.

83

4. Mitogenome variation in Ecuador and Peru cML estimates are those obtained using one partition, the coalescence ages of macro- haplogroups M and N (Behar et al., 2012a) and the radiocarbon calibrated ages of ancient mitogenomes as priors. dThe sub-haplogroup nomenclature differs from that reported in PhyloTree (http://www.phylotree.org/). eSub-haplogroups defined for the first time in this study.

4.4. Discussion

The analysis of the mtDNA variation in Ecuador and Peru at the level of mitogenomes (N=643) attested the presence of all pan-American haplogroups (Figure 4.1), and revealed 50 novel clades/sub-clades (in blue in Figure 4.1) with a wide range of coalescence ages, some overlapping the postulated arrival times of the first South Americans. The reliability of the coalescence ages of oldest South American haplogroups identified in this study (Table 4.7) are supported by the Bayesian skyline plot analyses (BSPs), which were run for both Ecuadorean and Peruvian samples. Indeed, as shown in Figure 4.4, the BSPs indicate an increase of population size in the range of time comprised between 16 and 13 Kya.

a 1000000

100000 size

10000

1000 population

100

Effective 10 ky 0 5 10 15 20 25

b 1000000

size 100000

10000

population 1000

100 Effective 10 ky 0 5 10 15 20 25

Figure 4.4. Bayesian skyline plots (BSPs) based on all the available Ecuadorean (a) and Peruvian (b) modern mitogenomes. Mitogenomes of Old World ancestry were excluded. The thick solid lines are the median estimates and the blue shadings show the 95% highest posterior density limits.

84

4. Mitogenome variation in Ecuador and Peru

The geographically-restricted distributions and the estimated ages (Figure 4.1 and Table 4.5) of most of the new sub-haplogroups indicate that their mutational motifs probably arose in situ in South America. On the other hand, some sub-haplogroups were found to include also representatives from North America and thus they most likely arose in North America and later entered South America. Moreover, according to expectations, the Ecuadorean and Peruvian mitogenomes branched neither in haplogroups previously defined as North American-specific (i.e. A2a, A2b, C4c, D2a, D3, D4e1, X2a and X2g), nor in those classified as Southern Cone-specific (i.e. C1b13, B2i2, D1g and D1j). These observations are in agreement with the hypotheses that the carriers of the former haplogroups might have entered into North America through the ice-free corridor and did not spread into the subcontinent (Perego et al., 2009; Perego et al., 2010), while the latter haplogroups most likely arose in the Southern Cone regions and crossed the Andes at low latitudes (Bodner et al., 2012; de Saint Pierre et al., 2012). New information on the north to south expansion and the possible main entry routes of the first South Americans are provided in particular by those haplogroups that are shared between North and South America (Figure 4.3). The geographical distribution of the B2b and D1f mitogenomes appear to support an early population split of the first settlers in the northern part of South America, followed by diffusions along both the Pacific and Atlantic regions (Figure 4.5). In particular, the age estimates of B2b and its derivates suggest that the early Paleo- Indian carriers of B2b might have moved from North America to the regions corresponding to modern Ecuador and Peru in the short time frame of only ~1.5 Ky comprised between 15.8 ± 1.9 Kya (the overall coalescence age of B2b) and 14.4 ± 2.7 Kya (the coalescence age of B2b6, the oldest South American-specific branch of B2b). On the other hand, the distributions of A2z and B2aa, which are less common and have been so far found only in the Andean region (panels C and D, Figure 4.3), suggest that these ~14 Ky old clades of North American origin, might have expanded into the South only through the Pacific coastal route (Table 4.6). An early settlement of South America is also supported by the estimated ages of the several sub-haplogroups whose distributions are restricted to the subcontinent (i.e. A2y, A2as, A2at, B2b6, B2b11, B2b12, B2o1, B2ab, B2ac, B2ag, C1b21, C1b24, C1b26, C1b29, D4h3a10, D1k1a, D1o, D1p, D1s, D1t) and that most likely arose in situ in South America. Indeed, the coalescence ages of the oldest of these attest the presence of the first colonizers already ~15 Kya (Table 4.7).

85

4. Mitogenome variation in Ecuador and Peru

1 B2b 4 D1f

1 1

1 2 2

9 3 6 66 3

61

4

Monte Verde

Figure 4.5. Migration route model as suggested by the geographical distributions of B2b and D1f mitogenomes.

4.5. Conclusion

In conclusion, by surveying the mitogenomic variation in the two Andean regions, Ecuador and Peru, we enriched the global phylogeny of Native Americans with 50 additional sub-haplogroups never described before, often restricted to South America. Our phylogeographic analyses confirmed that the founder populations(s) that moved from Central America and first entered in the subcontinent split at the early stages of the colonization process, with derived groups following the Pacific coastal route and others spreading westward, possibly along the Atlantic coast. Our findings confirm the scenario that the north to south expansion may has been extremely rapid, in a range of time of ~1.5 Ky, as indicated by the ages of haplogroup B2b and its sub-clades, and that, although some of the initial founders employed a western route, the Pacific coastal route played a major role during the process of human entry and spread into South America.

86

5. The worldwide spread of the tiger mosquito

5. The worldwide spread of the tiger mosquito

5.1. Background

The genus Aedes includes five highly invasive species, Ae. albopictus, Ae. aegypti, Ae. j. japonicus, Ae. koreicus and Ae. atropalpus. Of these, Ae. albopictus and Ae. j. japonicus are the most widespread across the globe, and Ae. aegypti and Ae. albopictus, being competent vectors for several human tropical diseases, have a major impact on human health. The North American species, Ae. atropalpus, arrived in Europe (Italy, France and the Netherlands) through international trade (Romi et al., 1997), but it was subsequently exterminated in Italy and France and is unlikely to have established in the Netherlands due to climatic conditions (Scholte et al., 2009). Aedes j. japonicus and Ae. koreicus, native to East Asia (Japan, Korea, China, ), have both colonized central Europe and Ae. j. japonicus is also widely distributed in the US (Medlock et al., 2015). Aedes aegypti originated in sub-Saharan Africa and is now considered the main vector of dengue. Climate appears to be the decisive factor limiting the distribution of Ae. aegypti to tropical and sub-tropical regions, with few incursions into Europe and North America (Powell and Tabachnick, 2013; Khormi and Kumar, 2014). In contrast, Ae. albopictus, indigenous to East Asia, is not so restrained by climatic factors, and in the last 40 years has successfully colonized the tropical and temperate regions of all continents (Benedict et al., 2007; Paupy et al., 2009; Bonizzoni et al., 2013, Kraemer et al., 2015) (Figure 5.1). This mosquito has become a growing public health concern, being a competent vector for many arboviruses which cause lethal or debilitating human diseases, including the dengue (DEN), chikungunya (CHIK), West Nile (WN) viruses (Gasperi et al., 2012; Bonizzoni et al., 2013) and the recently emerged ZIKA virus (Wong et al., 2013; Chouin-Carneiro et al., 2016). Although it has been considered a less efficient vector than Ae. aegypti, this species is the sole vector of recent DENV outbreaks in Southern China, Hawaii, the Indian Ocean and Gabon and the first autochthonous DENV transmission in France and (Paupy et al., 2009; Wu et al., 2010; Peng et al., 2012; Rezza, 2012; Rezza, 2014; Schaffner et al., 2013). Ae. albopictus was most likely the main DENV vector in Asia prior to the introduction of Ae. aegypti in the mid nineteenth century (Gubler, 2006). Introduced into Europe (Albania) in 1979, Ae. albopictus has now colonized all Mediterranean countries from Spain to Syria and has been reported in Central Europe (Medlock et al., 2012). It was introduced into Hawaii at the end of the nineteenth century (Rai, 1991) and into continental USA (Texas) in 1985, and is now well established in 32 states. Its presence has been reported in Mexico (first recorded in 1988), Central and South America (first recorded in Brazil, 1986) and Africa (first recorded in South Africa, 1989) (Bonizzoni et al., 2013). Its ability to spread from the native range and adapt to local environments is probably due to its ecological characteristics, drought-resistant eggs with the ability to diapause,

87

5. The worldwide spread of the tiger mosquito daylight biting habit, aggressive and opportunistic feeding behavior and capacity to achieve high population densities (Paupy et al., 2009). The nuclear genome of the tiger mosquito from two laboratory strains, the Italian Fellini (an isofemale line derived from the Rimini strain) (Bellini et al., 2007; Dritsou et al., 2015) and the Chinese Foshan strain (Chen et al., 2015), was recently published, but only two complete mitogenomes (~16.7 kb) are available in GenBank, from Taiwan (Asian tiger mosquito Reference Sequence, NC006817) and from Nanjing, Jiangsu Province, China (KR068634 (Zhang et al., 2015)). Despite the availability of mitogenomes, virtually all Ae. albopictus mtDNA surveys were restricted to short segments of the cytochrome c oxidase subunit 1 (COI) and/or NADH dehydrogenase subunit 5 (ND5) genes, suggesting a limited phylogeographic differentiation among populations, possibly also caused by the inclusion in these studies of laboratory stocks or sibling eggs (Delatte et al., 2011; Kamgang et al., 2011; Kamgang et al., 2013; Porretta et al., 2012; Zhong et al., 2013; Zawani et al., 2014; Futami et al., 2015) and the postulated cytoplasmatic sweep caused by Wolbachia infection (Armbruster et al., 2003). However, more extensive sequencing of the COI gene has revealed more variation than previously thought (Goubert et al., 2016), a scenario also partially supported by microsatellite studies that highlighted slight genetic diversity between native and adventive populations with high variability within populations (Manni et al., 2015). Previous studies have shown that the variation seen in short mtDNA segments may be inadequate to both identify and phylogenetically link haplogroups (Torroni et al., 2006, Achilli et al., 2008a; Achilli et al., 2012). This is especially true for the insect mtDNA control region due to its peculiar features: high A+T content and reduced substitution rate, variable size and high length mutation rate, concerted evolution of tandem repeats and directional mutation pressure (Zhang and Hewitt, 1997). To identify the ancestral source(s) of Ae. albopictus adventive populations, overcoming previous limitations, in this study we determined and analyzed the sequence variation at the level of entire coding regions of 27 mitogenomes (25 novel and two previously published) from eastern and southeastern Asian, American and European populations. Our analyses reveal that only three of the five identified Asian haplogroups, which are differentially distributed in Asian populations living in temperate and tropical regions, were involved in the recent worldwide spread. These different ancestral sources from Asia now coexist in many adventive populations with possible implications for the adaptive capability of the species.

88

5. The worldwide spread of the tiger mosquito

Figure 5.1. Map of occurrence points for Ae. albopictus (Kraemer et al., 2015).

5.2. The sample

A total of 25 novel mitogenomes were included in this study. Twenty-two were from wild populations collected in Europe, Asia and the Americas (Table 5.1). Three were from the Americas (two from Virginia and one from Brazil). Nine were from Asia: three from Thailand (one from Hang Chat district, Lampang province in the North; one from Ban Rai district, Uthai Thani province in the West; one from Phato district, Chumphon province in the South), five from Los Baños, Laguna, Philippines and one from Wakayama prefecture, Japan. Ten were from Europe: two from Tirana (Albania), two from Athens (Greece), two from Cesena and two from Pavia (Northern Italy), one from Cassino (Central Italy), and one from Reggio Calabria (Southern Italy). This study also included three adult laboratory- maintained strain mosquitoes: two from the Italian Rimini strain (Bellini et al., 2007; Manni et al., 2015), established at CAA (Centro Agricoltura Ambiente ‘G. Nicoli’, Crevalcore, Italy) from mosquitoes collected in Rimini, Italy, and one from the Chinese Foshan strain (Center for Disease Control and Prevention of Guangdong Province) (Table 5.1). The study did not involve protected species and specimens were not collected at sites protected by law.

Table 5.1. Origin and haplogroup affiliation of Ae. albopictus mitogenomes considered in this study. Number Number Original Country (place GenBank ID#a Continent Haplogroup of Type I of Type II References name of collection) ID repeatsb repeatsb

1 Rim1c Europe Italy, Rimini A1a1a1 KX383916 7 5 this study

2 Vir1 America US, Virginia A1a1a1a1 KX383917 N.D. 6 this study

Italy, Reggio 3 Rc1 Europe A1a1a1a1 KX383918 N.D. 6 this study Calabria

4 Vir2 America US, Virginia A1a1a1a1 KX383919 N.D. 6 this study

89

5. The worldwide spread of the tiger mosquito

5 Ces1 Europe Italy, Cesena A1a1a1a1 KX383920 N.D. 6 this study

6 Cas1 Europe Italy, Cassino A1a1a1a KX383921 N.D. 6 this study

7 Pav3 Europe Italy, Pavia A1a1a1a KX383922 N.D. 6 this study

8 Ces2 Europe Italy, Cesena A1a1 KX383923 N.D. 4 this study

9 Bra America Brazil A1b KX383924 N.D. N.D. this study

Thailand, 10 Lam2 Asia Lampang, Hang A1b1a KX383925 N.D. 3 this study Chat

Thailand, Uthai 11 Ban7 Asia A1b1a KX383926 N.D. 3 this study Thani, Ban Rai

12 Ath1 Europe Greece, Athens A1b1a KX383927 N.D. 3 this study

Thailand, 13 Chu3 Asia A1b1 KX383928 N.D. 3 this study Chumphon, Phato

14 Rim4c Europe Italy, Rimini A1a2a1 KX383929 N.D. 4 this study

15 Tir1 Europe Albania, Tirana A1a2a1 KX383930 N.D. 4 this study

16 Tir2 Europe Albania, Tirana A1a2a1 KX383931 N.D. 4 this study

China, Jiangsu, Zhang et 17 - Asia A1a2a1 KR068634 5 4 Nanjing al., 2015

18 Ath2 Europe Greece, Athens A1a2a KX383932 N.D. 4 this study

19 Pav4 Europe Italy, Pavia A1a2a KX383933 N.D. 4 this study

20 Fo2c Asia China, Foshan A1a2 KX383934 N.D. 4 this study

Philippines, 21 Los1 Asia Laguna, Los A2a KX383935 N.D. 3 this study Baños

Philippines, 22 Los2 Asia Laguna, Los A2a KX809761 N.D. 3 this study Baños

Philippines, 23 Los3 Asia Laguna, Los A2a KX809762 N.D. 3 this study Baños

Philippines, 24 Los5 Asia Laguna, Los A2a KX809764 N.D. 3 this study Baños

Philippines, 25 Los4 Asia Laguna, Los A2 KX809763 N.D. 3 this study Baños

26 J-Wa1 Asia Japan, Wakayama A1a1a KX809765 N.D. 4 this study

27 - Asia Taiwan, Taipei A3 NC006817 5 4 - aID numbers correspond to those in Figure 5.3. bN.D. = not determined. cLaboratory-maintained strain.

90

5. The worldwide spread of the tiger mosquito

Morphological keys (Rueda, 2004) and/or PCRs with species-specific primers for internal transcribed spacer regions (ITS1 and ITS2) of ribosomal DNA (rRNA) (Higa et al., 2010) were used to identify the specimens. For the Philippines samples, eggs were collected using ovitraps, and the emerging adults were reared in an insectary under standard conditions of temperature (27°C), humidity (60- 80%) and photoperiod (12:12 h). Samples were preserved in 80% ethanol and stored at -20°C until DNA extraction. DNA extraction was carried out using the Wizard Genomic DNA Purification Kit (Promega) following the manufacturer’s protocol.

5.3. Results

5.3.1. The mtDNA control region of Ae. albopictus

The first step in our study was to sequence the entire mtDNA (mitogenome #1) from one mosquito of the Italian (Rimini) laboratory-maintained strain (Figure 5.3 and Table 5.1), whose nuclear genome was recently sequenced (Dritsou et al., 2015). This mitogenome sequence confirmed that the Ae. albopictus control region belongs to group 2 of insect-control regions (Zhang and Hewitt, 1997), like that of Ae. aegypti (Dueñas et al., 2006). A schematic representation of insect’s D-loop is reported in Figure 5.2. Three conserved blocks are positioned along the region, which contains two different types (I and II) of tandem repeats. Type I consists of a ~190 bp repeat unit, whereas type II is made up of a short unit of ~42 bp. The number of type I and type II tandem repeats that we observed in mitogenome #1 was different from those in sequence NC006817 from Taiwan (mitogenome #27 in Figure 5.2 and Figure 5.3) and variable among mtDNAs (Table 5.1 and data not shown). Moreover, between these two types of repeats, delimited by two conserved blocks, lies an A+T rich region of variable length. The overall length and tandem repeat composition make the PCR amplification and sequencing of the entire Ae. albopictus control region extremely difficult. For these reasons, and difficulties originating from the impossibility of distinguishing heteroplasmy from PCR artefacts (due to replication slippage), we restricted our sequencing of Ae. albopictus mitogenomes to the coding-region (from np 1 to np 14893, NC006817) and nearby control-region segments. This approach was employed to obtain the coding-region sequences of the additional 24 mitogenomes.

Figure 5.2. Organization of tandem repeats in the control region of Ae. albopictus mitogenomes. This schematic representation is based on the complete sequence

91

5. The worldwide spread of the tiger mosquito information of mitogenomes #1 (Rimini strain) and #27 (Figure 5.3 and Table 5.1). Two types of repeats (I and II) were observed with copy number differing for both repeats in the two mitogenomes. Type I consists of a ~190 bp repeat unit whereas type II is made up of a shorter unit of ~42 bp. The exact length (in bp) of each repeat is reported inside the box. The size of boxes representing repeats is not proportional to repeat lengths. Shown nucleotide positions are relative to sequence #1. Variation in size due to variable number of tandem repeats in the two regions, when available, is provided in Table 5.1. Black diamonds represent in the order from left to right: a poly-T stretch motif, a poly-T stretch motif followed by a GC-rich block, and a short poly-A stretch motif. All of these are conserved in different mosquito species (Dueñas et al., 2006).

5.3.2. The phylogeny

Figure 5.3 illustrates the Bayesian and Most Parsimonious (MP) trees derived from the coding regions of 27 Ae. albopictus mitogenomes (25 novel and two previously published). The overall tree structure is virtually identical with the two approaches and is supported by the ML tree (Figure 5.4), indicating a high degree of internal consistency for all major branches. A magnified MP tree is shown in the lower part Figure 5.3 in order to illustrate the branch location of the identified mutations. The mitogenomes (24 distinct haplotypes) cluster into three major branches that we named haplogroups A1, A2 and A3. Haplogroup A1 includes 21 mitogenomes, A2 consists of the five Philippine samples, while A3 encompasses only one mitogenome (#27) from Taiwan. Haplogroups A1 and A2 are rather close to each other and to the A1'2 node. In contrast, the A3 mitogenome differs by 363 mutations from the same node. Overall, this phylogeny reveals an extensive and previously unreported mitogenome differentiation within Ae. albopictus. Indeed, when calculated on the standard COI sequence (Hebert et al., 2003) employed for DNA barcoding, the maximum intraspecific divergence was 0.012 (8 mutations in 658 bp), a value in line with those recently reported for Ae. scutellaris (0.008) and Ae. aegypti (0.022), but much greater than the value previously reported for Ae. albopictus (0.002) (Sumruayphol et al., 2016). Haplogroup A1, which encompasses most of the mitogenomes in the phylogeny, is subdivided into two branches that we termed A1a and A1b, with the former further split into A1a1 and A1a2. In our phylogeny these clades and subclades appear to be correlated with different geographic distributions.

92

5. The worldwide spread of the tiger mosquito

Figure 5.3. Phylogeny of Ae. albopictus mitogenomes. The Bayesian (left) and MP (right) trees are shown in the top inset. The posterior probability for the major nodes in the Bayesian tree is reported in red, whereas bootstrap values (1000 replications) are reported in red in the MP tree. These trees encompass 25 novel and two previously published sequences (Table 5.1). A magnified MP tree is also shown reporting all mutations that characterize the 27 mitogenomes except those linking mitogenome #27 to the A1'2 node (see below). The asterisk (*) indicates the location of these mutations (347 in the coding- region and 16 in control-region), which are listed below. For the phylogeny construction, the entire coding-region variation of all mitogenomes was included as well as some control- region mutations (see Material and Methods). The published sequences (#17 and #27, Table 5.1) are from the Jiangsu Province, China (Zhang et al., 2015) and from Taipei, Taiwan, respectively. The mitogenome from a mosquito of the Italian Rimini strain (#1, marked by the arrow) was employed to number the mutations shown on the branches. Mutations are transitions unless a base is explicitly indicated for transversions (to A, G, C, or T) or a suffix for indels (.1, d). Heteroplasmic positions are marked by an ‘h’. Recurrent mutations within the phylogeny are underlined (and in italic if present in mitogenome #27) and back mutations are marked with the suffix @. The numerous mutations shared only by the published mitogenomes #17 and #27 are marked with the suffix §. Taking also into account

93

5. The worldwide spread of the tiger mosquito that, despite their extensive coding-region differentiation, mitogenomes #17 and #27 intriguingly harbor virtually identical control-region sequences, it is likely that at least some of the mutations marked with the suffix § are mistakes. Colors illustrate geographic origins. Table 5.1 provides additional information concerning the geographic origin and haplogroup affiliation of each sample. Length variation (insertions/deletions) in a poly A- stretch beginning at np 3808 was not considered. The sub-haplogroup A1a1a1 and its derivatives of probable North American origin are encircled. There are 347 coding-region mutations (nps 1-14896) as well as 16 control-region mutations (14897-17150) that separate mitogenome #27 from the A1'2. The location of these mutations is indicated by the asterisk. These control-region mutations were included because they were assessed in all mitogenomes (except #9 and #13). The 363 mutations are listed below and are numbered relative to mitogenome #1: Coding-region mutations: 143.1A 496 1423.1C 1503 1578C 1676C 1704 1964 2939 2982 2983 2984 3038.1TTAA 3196 3751A 3759T 3774-81inv 3992G 4085 4443 4470 5167C 5168T 5179 5187T 5197 5202 5204A 5216A 5222A 5244 5246A 5264 5289T 5290C 5294T 5303A 5311A 5313 5355A 5378A 5396 5404 5408T 5458A 5481 5483A 5514d 5593C 6191C 6511A 6512A 6514 6518T 6520A 6528G 6535C 6536C 6547A 6548 6550T 6551 6552 6556 6559C 6565T 6573T 6583G 6584 6586A 6589A 6591A 6596 6602A 6610A 6619A 6620A 6621 6624A 6625A 6636T 6643A 6650C 6651 6654G 6664 6667 6670A 6671A 6673A 6676G 6684A 6685T 6687T 6691A 6700 6708A 6712 6714A 6717 6719T 6736T 6737G 6738 6739 6757 6761 6772G 6775T 6784 6792C 6794 6803 6820A 6821 6822 6823 6826C 6830G 6831 6841A 6843A 6846G 6852T 6853 6854C 6855G 6857C 6868A 6892G 6898T 6910A 6919T 6951C 6957 6964A 6974 6994C 7003 7006 7020 7021C 7039A 7044 7054A 7060T 7078A 7079 7085 7086 7087 7093T 7096T 7118 7123 7132A 7133 7134A 7138T 7156G 7158A 7168T 7180A 7198 7213A 7222A 7234C 7249A 7267 7294 7303A 7315T 7320A 7324 7327A 7329 7345T 7349T 7350T 7351 7352C 7353A 7360A 7362 7366A 7369 7371 7378 7391A 7392T 7665A 8032T 8055G 8056T 8057A 8077C 8078A 8079T 8089 8099d 8101T 8102 8103C 8137.1A 8178 8348 8469 8934 8952T 8958 8959C 8965A 8970T 8971A 8973A 8974A 8975A 8976T 8978T 8979T 8980A 8981A 8985 8986C 8989T 8990A 8992T 8994T 8995A 8996A 8999A 9000A 9001 9006T 9007T 9008 9009T 9010A 9011 9013A 9014G 9016A 9018C 9030 9042 9048G 9060G 9081C 9096 9100 9102G 9104T 9267 9274A 9324 9704 9808d 9809C 9817 9831 9832 9926.1T 9950 9951 9952T 9991 9992G 10031T 10034 10036G 10092T 10099A 10100T 10101 10121 10133T 10136 10152T 10155 10158A 10160A 10167C 10173A 10175T 10177G 10178C 10179A 10182T 10183A 10185T 10186 10187 10188A 10189 10193 10194C 10195A 10196T 10198A 10200T 10203A 10213A 10808 10858A 10867A 10870 10876G 10917A 10918T 10922C 10982A 10988T 10989A 10990 11635d 11829 12158A 12266@ 12299C 12407 12440A 12461 12532 12616T 12617-22d 12668G 12672T 12693d 13430d 13463d 13470.1T 13655G 13672 13880G 14027d 14114.1C 14121.1G 14131.1T 14143.1T 14164C 14179G 14180.1T 14182.1A 14183G 14548d 14769.1TA 14857. Control-region mutations: 14953 14959G 14962d 14969 15106A 15142d 15159 15199d 15349 15476 (not assessed in samples #8, #12, #14 and #19) 15482d (not assessed in samples #8, #12, #14 and #19) 16759G 16801 16856.2TT 17119G 17123. Of these, three (14969, 15159, 15349) are recurrent mutations, ten (15106A, 15142d, 15199d, 15476 15482d 16759G 16801 16856.2TT 17119G 17123) are shared by mitogenomes #17 and #27 and three (14953, 14959G, 14962d) are private mutations of mitogenome #27. Note that for mitogenomes #9 and #13 (in squared boxes), the sequence variation was assessed only partially for the coding region and not at all for the control region. As for the coding region, mitogenome #9 lacks sequence data for nucleotide positions (nps) 1-318,

94

5. The worldwide spread of the tiger mosquito

2863-2916, 4665-5731, 6617-7033, 8825-13183 and 14115-14896, whereas mitogenome #13 lacks information encompassing nps 3520-10533 and 11753-14597.

27 25 24 23 22 21 13 12 11 10 9 20 Maximum Likelihood 17 19 18 Sub.s/site 16 15 0.0020 14 8 26 7 6 5 4 3 2 1

Figure 5.4. Ae. albopictus phylogeny obtained by using Maximun Likelihood.

The branch A1a1 includes the single mitogenome from Japan, the two mitogenomes from the US (Virginia) and many of the mitogenomes from Italy, including one (#1) of the two detected in the Rimini laboratory strain. The mitogenome from Japan (#26) departs from the node A1a1a and its sister clade A1a1a1 contains a sub-branch, A1a1a1a1, of particular interest. It consists of four mitogenomes, two from Italy and the two from the US mentioned above. One of the US mitogenomes (#4) is identical to one from northern Italy (#5) while the second (#2) is closely related to the mitogenome #3 from southern Italy. The sister branch A1a2 is formed of mitogenomes from different regions of southern Europe (Italy, Albania and Greece), including the second mitogenome (#14) from the Rimini strain, the previously published Chinese sequence (#17, KR068634 (Zhang et al., 2015)) and the mitogenome (#20) from the Chinese Foshan strain, a laboratory-maintained colony founded in 1981 from mosquitoes from southeast China. It is worth mentioning that the presence of two distinct haplotypes in different subjects of the Rimini laboratory-maintained strain, one belonging to A1a1a1 and the second to A1a2a1, reveals that at least two females contributed to the genetic formation of the strain. Haplogroup A1b contains all the mitogenomes from Thailand, which cluster in its A1b1 sub-branch, as well as one from Greece and one from Brazil. Finally, haplogroup A2 consists of multiple haplotypes, all from the Philippines.

95

5. The worldwide spread of the tiger mosquito

5.3.3. The geographical distribution of Ae. albopictus haplogroups

The phylogenetic analysis not only allowed the identification of Ae. albopictus haplogroups and sub-haplogroups but also the definition of their distinguishing mutations (Figure 5.3). These include some diagnostic markers that are located in COI and ND5 partial sequences whose variation has been extensively assessed by published studies and can be retrieved from GenBank (Table 5.2).

Table 5.2. Frequencies of Ae. albopictus mtDNA haplogroups in worldwide populations. Haplogroup frequenciesa References Geographic origin N A1a1a1 A1a2 A1b A2 A3 Othersb

America 349 94 (0.27) 59 (0.17) 141 (0.40) 0 0 55 (0.16)

Zhong et al., USA (New Jersey) 30 28 (0.93) 0 0 0 0 2 (0.07) 2013

Zhong et al., USA (California) 49 0 29 (0.59) 5 (0.10) 0 0 15 (0.31) 2013

Zhong et al., USA (Texas) 31 4 (0.77) 0 2 (0.07) 0 0 5 (0.16) 2013

Zhong et al., USA (Hawaii) 32 0 27 (0.84) 0 0 0 5 (0.16) 2013

Futami et al., Costa Rica 57 29 (0.51) 0 0 0 0 28 (0.49) 2015

Futami et al., Panama 16 13 (0.81) 3 (0.19) 0 0 0 0 2015

Birungi and Brazilc 134 0 0 134 (1.0) 0 0 0 Munstermann, 2002

Europe 32 10 (0.31) 20 (0.63) 0 0 0 2 (0.06)

Zhong et al., Italy (Trento) 32 10 (0.31) 20 (0.63) 0 0 0 2 (0.06) 2013

Africa 153 0 0 153 (1.0) 0 0 0

Kamgang et Cameroon 153 0 0 153 (1.0) 0 0 0 al., 2011

Asia 284 0 100 (0.35) 102 (0.36) 25 (0.09) 0 57 (0.20)

Zhong et al., Chinad 61 0 39 (0.64) 0 0 0 22 (0.36) 2013

Zhong et al., Chinae (strain) 30 0 23 (0.77) 0 0 0 7 (0.23) 2013

Zhong et al., Japan 15 0 15 (1.00) 0 0 0 0 2013

Zhong et al., Taiwan 30 0 4 (0.13) 0 0 0 26 (0.87) 2013

96

5. The worldwide spread of the tiger mosquito

Sumruayphol Thailand 10 0 0 10 (1.00) 0 0 0 et al., 2016

Zawani et al., Malaysia 77 0 0 77 (1.00) 0 0 0 2014

Zhong et al., Singapore 36 0 19 (0.53) 15 (0.42) 0 0 2 (0.05) 2013

Beebe et al., Indonesia (Java) 8 0 0 0 8 (1.00) 0 0 2013

Indonesia (Timor- Beebe et al., 17 0 0 0 17 (1.00) 0 0 Leste) 2013

Oceania 352 0 0 236 (0.67) 115 (0.33) 0 1 (<0.01)

Australia (Torres Beebe et al., 115 0 0 42 (0.36) 72 (0.63) 0 1 (0.01) Strait) 2013

Beebe et al., Papua New Guinea 170 0 0 162 (0.95) 8 (0.05) 0 0 2013

Papua New Guinea Beebe et al., 67 0 0 32 (0.48) 35 (0.52) 0 0 (Southern Fly) 2013 aHaplogroup affiliation is based on the sequence variation of the cytochrome c oxidase subunit I (COI) gene, except for the Brazilian samples (see footnote c). The diagnostic mutations are the lack of the transitions at nps 2165 and 1536 for haplogroup A1a1a1, the presence of the transition at np 2294 for haplogroup A1a2, the presence of the transition at np 1820 for haplogroup A1b, and the presence of the transition at np 2027 for haplogroup A2. bThese samples were classified as ‘others’ because they harbor the transition at np 2165 (except those from Australia, in which that nucleotide position was not sequenced), thus they do not belong to haplogroup A1a1a. Moreover, they lack the transitions at nps 2294, 1820 and 2027, thus they are not members of haplogroups A1a2, A1b nor A2. Some additional mutations found in these samples are shared between specimens from the same or different geographic areas. In most cases they are likely due to multiple mutational events (recurrent mutations), but some might instead mark additional and not yet identified haplogroups. cSamples from Brazil were classified on the basis of their NADH dehydrogenase subunit 5 (ND5) gene sequences. They were considered members of haplogroup A1b because they harbored the transition at np 7210. dSample are from Guangdong and Fujian provinces. eLaboratory-maintained strain, Jiangsu province.

By surveying these sequences for the presence or absence of these mutations, we were able to determine the most likely haplogroup affiliation for most of the 1170 tiger mosquito mtDNAs from populations worldwide. Table 5.2 reports the haplogroup frequencies for A1a1a1, A1a2, A1b, A2 and A3 obtained from this survey, as well as a category termed ‘others’ that includes mtDNAs that we could not classify and might encompass haplogroups not represented in our phylogeny. Figure 5.5 provides an overview of the worldwide spatial distribution of these haplogroups.

97

5. The worldwide spread of the tiger mosquito

Figure 5.5. Frequency (%) of Ae. albopictus mtDNA haplogroups in worldwide populations and possible diffusion routes. Arrows indicate possible diffusion routes from the native home-range (South-East Asia) and subsequent dispersals. Populations and corresponding frequency values are listed in Table 5.2. Note that the Chinese laboratory strain was not included. The original world map is from the website (http://www.freeworldmaps.net).

5.4. Discussion

Phylogenetic analyses revealed that our Ae. albopictus mitogenomes cluster into three main haplogroups A1, A2 and A3. Intriguingly, the population screening of the COI mutations (1503, 1578C, 1676C, 1740, 1964) found in mitogenome #27 from Taiwan, the only one belonging to haplogroup A3 in our phylogeny, did not reveal any match in the 1170 tiger mosquito mtDNAs from worldwide populations, not even in the population sample (N=30) from Taiwan (Table 5.2). This suggests that Ae. albopictus mosquitoes with A3 mtDNAs were probably not involved in the recent worldwide spread of the species, raising the possibility that this haplogroup might be rare and/or with a restricted geographical distribution. In contrast, the survey for COI mutations characterizing the Philippine mitogenomes #21-25 allowed the identification of many other mtDNAs belonging to haplogroup A2 (Table 5.2), but all in Insular Southeast Asia, suggesting that this haplogroup might be typical and possibly limited to the Philippines, Indonesia, Papua New Guinea and northern Australia (Table 5.2). Therefore, haplogroup A2 appears to have played a role in the spread of Ae. albopictus from South East Asia (Figure 5.5) restricted to the context of Oceania. Indeed, Ae. albopictus is thought to have spread from Indonesia by human-mediated transportation (Beebe et al., 2013). The low frequency of A2 observed on the Papua New Guinean mainland further supports this scenario, whereas the presence of both A1b and A2 mtDNAs

98

5. The worldwide spread of the tiger mosquito along the north Australian border (Table 5.2) suggests multiple arrivals from distinct geographical sources (Beebe et al., 2013). Figure 5.5 shows that, in contrast to the situation described above, members of the other three Asian haplogroups A1a1, A1a2 and A1b are detected in many adventive populations worldwide. This finding identifies these as the Asian mtDNA lineages mainly associated with the recent global spread. As for haplogroup A1a1, seen in the Japanese mitogenome #26, it is widely distributed in Italy (Figure 5.3) and shows high frequencies in Central America and eastern USA. Haplogroup A1a2 is present with frequencies higher than 50% in Japan, Southern China, Singapore, Hawaii, California and Italy, whereas A1b is fixed or almost fixed in Thailand, Malaysia, Papua New Guinea mainland, Cameroon and Brazil, but present at much lower frequencies also in California and Texas (Table 5.2). Even though these geographical distributions are based on the limited population sampling reported in Table 5.2, some preliminary conclusions can be drawn. It appears that the ancestral homeland of haplogroup A1a2 might have been a temperate area, possibly Japan or northern Asia, rather than the tropical range, in agreement with early allozyme studies (Kambhampati et al., 1991). In contrast, haplogroup A1b appears to mainly characterize the tropical belt (Figure 5.5). This may imply that genetic and physiological traits make populations with A1b most suited to the colonization of tropical areas (Kamgang et al., 2011; Birungi and Munstermann, 2002). In our phylogeny the Brazilian A1b mitogenome (#9) harbours a mutational motif that includes the transition at np 7210 in the ND5 gene, also found in all Brazilian samples retrieved from the literature and already identified as a marker for Brazilian Ae. albopictus (Birungi and Munstermann, 2002). The same transition was found in two ND5 haplotypes retrieved from the literature, one from Phuntsholing in southern Bhutan (JQ436953) and one from Chiang Mai in Thailand (JQ436956) (Porretta et al., 2012), suggesting a probable route of invasion from Indochina (Goubert et al., 2016). Instead the absence of this transition in the samples from Cameroon (Table 5.2) suggests that A1b mtDNAs arrived in Cameroon from a different tropical source. Interestingly, the absence of photoperiodic diapause in Brazilian mosquitoes supports their origin in tropical Asia, while the diapause in US populations is in agreement with the scenario of an ancestry in the Asian temperate regions (Mousson et al., 2005; Urbanski et al., 2010). As for haplogroup A1a1, its distribution could not be fully assessed because of the lack of informative marker mutations. However, the absence of the COI transitions at nps 1536 and 2165 distinguishes the members of A1a1a1 (Figure 5.3) from all other mitogenomes in the phylogeny. The survey of this these transitions in published datasets revealed that haplogroup A1a1a is the most common in Costa Rica, Panama, Texas and New Jersey (Figure 5.5) and widespread in Italy (Table 5.2). This distribution suggests that A1a1a1 and/or its derivatives A1a1a1a and A1a1a1a1 arose recently in an adventive non-Asian population, probably from an ancestral Japanese source, and reached a high frequency because of genetic drift or founder events. From the earliest adventive

99

5. The worldwide spread of the tiger mosquito non-Asian population(s) they then further spread to other distant regions. Such a possibility is in agreement with some previous observations, in particular with the suggestion that Italian tiger mosquitoes, whose first presence was documented in 1990 in northern Italy (Sabatini et al., 1990), have a dual origin: a possibly direct northern American source and a probably indirect (through Albania and Greece) Eastern Asian source, with the former most likely related to the international trade of used tires from the eastern coast of the US (Dritsou et al., 2015). An origin of haplogroup A1a1a1 and/or its derived sub-branches A1a1a1a and A1a1a1a1 in North America and a subsequent arrival to Italy from the US of multiple haplotypes is supported by (i) the detection of four partial sequences from Texas with the transition at np 1823 (Zhong et al., 2013) which defines sub-branch A1a1a1a1, and (ii) our findings that mitogenomes #4 and #5, the first from Virginia and the second from Italy, are identical, and that mitogenomes # 2 and #3, again one from Virginia and one from Italy, are very closely related (Figure 5.3). Finally, the scenario that the ancestral source of A1a1a1 might be a northern temperate area such as Japan is further supported by the presence of the COI transitions 1536 and 2435, which characterize mitogenome #26, in five published mtDNAs from Kyoto (JQ004524, Xu and Fonseca, 2011) and is in agreement with allozyme data that have highlighted genetic links between North American, Italian and Japanese populations (Urbanelli et al., 2000).

5.5. Conclusion

In conclusion, through our analyses, based on complete coding regions, the phylogeny of the Ae. albopictus mitogenomes was charted, and the most likely Asian sources of some adventive populations were identified The worldwide spread of Ae. albopictus appears to be associated with three mtDNA haplogroups (A1a1, A1a2 and A1b) differently distributed in Asian populations living in temperate and tropical regions, whereas a fourth Asian haplogroup (A2) appears to be restricted to Insular South-East Asia. These ancestral genetic sources now coexist (and interbreed) in many of the recently colonized areas. This occurs not only in the field but also in the laboratory as attested by our detection of both A1a1a1 and A1a2a1 mitogenomes in the Rimini maintained-strain, thus creating novel genomic combinations that might be one of the causes of the continuous and apparently growing capability of Ae. albopictus to expand its geographical range. Note that fine scale mitogenome surveys, encompassing multiple specimens from a wide range of East Asian populations might prove to be an essential pre-requisite to controlling the diffusion of this mosquito and limiting its social, medical and economic implications. With a precise identification of the source populations in Asia it will become possible to evaluate the extent and nature of their nuclear genome diversity and the possible selective advantages (e.g. production of cold or desiccation-resistant eggs, zoophilic versus anthropophilic changes in feeding behavior) relative to other Asian Ae. albopictus populations that instead have not spread.

100

6. Additional projects

6. Additional projects

6.1. Origin and spread of human haplogroup R0a

6.1.1. Background

The Horn of Africa is characterized by an extensive diversity in terms of mtDNA haplogroups (Gomes et al., 2015) often deriving from non-African sources. Analyses of the uniparental genetic systems, in particular mtDNA, have suggested an ancient gene flow into this area from the Levant and/or the Arabian Peninsula, although the timing of this ancient event has not been clearly defined (Phillipson, 1998; Kitchen et al., 2009; Pagani et al., 2012; Al-Abri et al., 2012; Hodgson et al., 2014; Pickrell et al., 2014; Fernandes et al., 2015). The most prominent Eurasian mitochondrial lineage that is shared across the Horn and Arabia is R0a, which is found at very low frequencies across west Eurasia, but reaches levels of up to 35% in eastern and up to 15% in some parts of the Horn of Africa facing the Red Sea (Richards et al., 2003; Thomas et al., 2002; Kivisild et al., 2004; Poloni et al., 2009; Non et al., 2011; Cerný et al., 2011, Musilovà et al, 2011; Boattini et al., 2013; Fernandes et al., 2015). It has been thought that R0a has originated in the Near East and has spread into Arabia at the end of the Pleistocene, albeit with difficulties in defining a source (Al-Abri et al., 2012), while others have hypothesized a more ancient ancestry within Arabia (Rose et al., 2013). This question is of great interest because evidence in favour of deeper Arabian ancestry would imply the existence of refugial areas in Arabia spanning the Last Glacial Maximum (LGM), which have been hypothesized but never confirmed (Rose, 2010). In order to clarify the timing and the mode of the entry in Eastern Africa of the R0a lineage, we analyzed 205 mitogenomes belonging to haplogroups R0a and R0b. Fifty-nine candidate R0a and two R0b mtDNAs were completely sequenced in the course of this study. They were identified by screening ~10,000 mtDNA control-regions from subjects of various geographic origins (Africa, East and South Asia, the Near East, Caucasus and Europe) whose DNA was available in laboratories participating in the project. The remaining 143 R0a and one R0b mitogenomes were already available in public databases.

6.1.2. Results and discussion

R0a’b, of which R0a forms the major part (Figure 6.1), dates ~40 Kya and is the sole known sister clade to the major West Eurasian haplogroup HV, with the two together comprising haplogroup R0. R0 branches directly from macro-haplogroup R, which dates to ~59 Kya (Fernandes et al., 2015). Although haplogroup R predominates amongst West Eurasians, especially Europeans, continent-specific basal branches are also found amongst South Asians, East Asians, Southeast

101

6. Additional projects

Asians and Oceanians (Behar et al., 2012a). Thus whilst haplogroup R is a global non-African founder clade, R0 is primarily West Eurasian. R0a, dating to ~30 Kya (Table 6.1), falls into at least five major sub-clades, three (R0a1, R0a2’3 and R0a4) already known (Cerný et al., 2011; Behar et al., 2012a) and two (R0a5 and R0a6) newly defined in the course of the study (Figure 6.1). Two further basal haplotypes (#201 and #202) are seen in Italy and Spain, respectively.

Figure 6.1. Maximum-parsimony phylogenetic tree of 202 complete mtDNA sequences belonging to haplogroup R0a. Three R0b sequences are also included. Each circle represents a mitogenome. Mutations are shown on the branches (relative to rCRS); they are transitions unless the base change is explicitly indicated. Suffixes indicate: transversions (to A, G, C, or T), deletions (d), heteroplasmies (R and Y) and reversions (@). Insertions are also suffixed with a dot followed by a number indicating how many bases were inserted and the inserted nucleotide/s (.1C). Recurrent mutations are underlined. The variation at np 16519, in the number of Cs at nps 309 and 315 as well as the AC indels at nps 515–522 were not included in the phylogeny. All the samples are colored according to their geographic origin as shown in the legend. ML age estimates are reported in Kya for nodes encompassing at least three mitogenomes, except for R0a5 (two mitogenomes), which is extremely rare.

102

6. Additional projects

R0a4, R0a5 and R0a6 are all rare. A survey of the R0a5 HVS-I motif indicates a wide distribution across the Near East and Europe and a deep Glacial ancestry (data not shown). A similar assessment for haplogroup R0a6 is more difficult, because its only control-region mutation is the reversion of the transition 16126, but its distribution appears to be mainly restricted to Pakistan (mainly but not exclusively Kalash), with Palestinian, Iranian and Italian singletons. In contrast, the frequency and distribution for R0a4 cannot be assessed from published datasets because it lacks any diagnostic control-region mutations, but the analysis of mitogenomes suggests a distribution mainly restricted to Western Europe, Spain in particular, but also Iraq (Figure 6.1). These are the two major expansion lineages in R0a, but although widespread, they are both overwhelmingly seen in Arabia, especially Yemen (Figure 6.2). R0a1 also includes R0a1b, comprising mainly lineages from Arabia, and several possibly related lineages. This distribution suggests that the whole of R0a1 most likely has an Arabian origin, dating back to at least 26 Kya and thus spanning the LGM, implying that the several Iranian lineages and a single Syrian lineage within R0a1a were derived from an Arabian source. The great majority of R0a mitogenomes cluster within R0a1 and R0a2’3, dating to the LGM (~26 Kya and 21 Kya, respectively), each mainly represented by a single star-like sub-clade, R0a1a and R0a2. These sub-clades both coalesce to the Late Glacial: ~13 and 17 Kya (Table 6.1). Similarly, R0a2’3, at ~21 Kya, most likely has an Arabian ancestry. R0a3 is a minor Late Glacial Arabian sub-clade that sits alongside a paraphyletic Iranian lineage (shared with an Egyptian in the HVS-I dataset). As with R0a1a, Iranian HVS-I lineages within the major R0a2 are broadly a subset of Arabian diversity, with a number of ancestral haplotypes at elevated frequencies (not shown). This may be explained by sporadic gene flow across the Gulf, but some Iranian lineages (along with lineages found further east in Pakistan) may also represent gene flow along the maritime trading networks which intensified in the mid- to late Holocene (Boivin and Fuller, 2009). The distribution of R0a2 is also compatible with a Glacial arrival in Arabia, by 26 Kya, although in this case the existence of the Levantine/European R0a2r sub-clade may suggest that we should not completely rule out a source in a Levantine refugium and Later Glacial expansions into Arabia as an alternative possibility.

Table 6.1. Age estimates (maximum likelihood and ρ) for haplogroup R0a’b and its sub-clades. ML age 95% CI ρ statistics age 95% CI Haplogroup Na estimates (Kya) estimates (Kya)b (Kya) (Kya)b R0a’b 205 39.20 (32.01;46.57) 30.30 (23.97;36.79) >R0a 202 29.22 (24.64;33.89) 21.37 (15.28;27.63) >>R0a1 58 26.39 (21.43;31.45) 21.89 (11.64;32.64) >>>R0a1a 52 12.85 (10.92;14.80) 10.97 (8.51;13.46) >>>>R0a1a1 12 10.31 (7.31;13.36) 7.00 (2.19;11.97) >>>>>R0a1a1a 9 3.51 (1.59;5.46) 3.16 (0.01;6.38)

103

6. Additional projects

>>>>>>R0a1a1a1 5 1.63 (0.30;2.98) 1.03 (0;2.47) >>>>R0a1a2 3 4.65 (2.33;7.01) 6.09 (0.95;11.41) >>>>R0a1a3 4 11.73 (9.51;13.98) 16.88 (7.41;26.81) >>>>>R0a1a3a 3 5.64 (2.34;9.01) 7.00 (0.56;13.70) >>>>R0a1a4 3 7.49 (4.16;10.90) 8.78 (2.27;15.55) >>>>R0a1a5 4 9.21 (6.03;12.45) 7.22 (2.91;11.65) >>>R0a1b 3 1.78 (0;21.55) 1.73 (0;9.76) >>R0a2’3 123 20.92 (16.29;25.65) 16.34 (10.56;22.29) >>>R0a2 117 16.56 (14.28;18.87) 13.65 (10.99;16.34) >>>>R0a2a 7 8.82 (4.18;13.58) 5.99 (2.37;9.69) >>>>>R0a2a1 3 2.53 (0.34;4.76) 1.73 (0;4.14) >>>>R0a2b 24 12.69 (9.25;16.19) 11.41 (5.26;17.77) >>>>>R0a2b1 14 10.62 (7.12;14.20) 11.19 (3.75;18.94) >>>>>>R0a2b1a 9 0.67 (0.10;1.24) 0.56 (0;1.37) >>>>>>R0a2b1b 5 8.04 (4.59;11.56) 8.43 (1.88;15.24) >>>>>>>R0a2b1b1 4 4.57 (2.35;6.83) 4.55 (1.17;8.01) >>>>>R0a2b2 9 3.89 (1.39;6.43) 2.87 (0.48;5.30) >>>>R0a2c 4 12.37 (6.75;18.17) 9.25 (1.90;16.92) >>>>>R0a2c1 3 2.16 (0.06;4.28) 1.73 (0;4.14) >>>>R0a2d 7 12.45 (8.11;16.89) 11.79 (5.58;18.22) >>>>R0a2f 15 14.94 (12.06;17.85) 16.93 (8.36;25.89) >>>>>R0a2f1 9 5.41 (2.70;8.18) 5.80 (1.02;10.73) >>>>>>R0a2f1a 6 2.76 (0.47;5.09) 3.03 (0;6.82) >>>>>>R0a2f1b 3 3.29 (0.79;5.82) 3.45 (0;7.72) >>>>R0a2g 7 11.02 (6.03;16.15) 8.68 (3.65;13.85) >>>>>R0a2g1 4 8.27 (4.53;12.10) 7.89 (2.08;13.91) >>>>>>R0a2g1a 3 6.49 (3.17;9.88) 7.00 (1.58;12.62) >>>>R0a2h 3 9.21 (4.37;14.19) 7.00 (1.07;13.16) >>>>R0a2i 3 4.27 (1.00;7.61) 4.34 (0;8.96) >>>>>R0a2j 3 13.81 (10.15;17.55) 18.07 (9.55;26.95) >>>>R0a2k 3 14.78 (11.91;17.69) 19.01 (8.27;30.33) >>>>R0a2m 3 1.41 (0;3.23) 1.29 (0;3.84) >>>>R0a2n 7 13.17 (8.72;17.73) 10.23 (3.97;16.72) >>>>>R0a2n1 4 6.41 (0.13;12.94) 4.55 (0;10.30) >>>>>R0a2n2 3 6.41 (2.81;10.10) 6.09 (1.56;10.76) >>>>R0a2o 5 10.15 (5.38;15.05) 6.28 (1.03;11.71) >>>>>R0a2o1 4 4.35 (1.17;7.59) 3.24 (0;6.67) >>>>R0a2q 4 3.82 (0.55;7.15) 2.59 (0;5.74)

104

6. Additional projects

>>>>R0a2r 11 12.37 (7.84;17.02) 10.12 (3.81;16.66) >>>R0a3 5 13.89 (9.22;18.68) 10.06 (4.38;15.94) >>>>R0a3a 3 11.26 (6.75;15.88) 9.71 (2.96;16.73) >>R0a4 6 1.19 (0.14;2.24) 0.85 (0;2.07) >>R0a5 2 18.77 (12.24;25.51) 14.77 (5.87;24.09) >>R0a6 11 2.98 (1.07;4.92) 2.59 (0.81;4.39) >R0b 3 15.34 (9.93;20.91) 13.37 (6.05;20.98) aNumber of mitogenomes. bUsing the corrected molecular clock proposed by Soares et al., 2009. Except for R0a5, we calculated age estimates only for sub-clades encompassing at least three mitogenomes.

Figure 6.2. Spatial frequency distribution maps of haplogroups R0a, R0a1a, R0a2b1 and R0a2b2. Dots indicate the geographical locations of the surveyed populations. The extremely high frequencies of R0a and R0a1a in the sample (38.5% and 24.6%, respectively) were not included in order to provide a correct representation of the much lower frequencies in the regions surrounding the island. We constructed spatial frequency distribution plots with the program Surfer 9 (Golden Software, http://www.goldensoftware.com/products/surfer).

105

6. Additional projects

This overall pattern strongly suggests that R0a1 and R0a2’3 both entered Arabia before or around the LGM and that the R0a1b/R0a1* and R0a3/R0a2’3* lineages are relicts that were not caught up to the same extent in the Late Glacial expansions that followed the LGM. This conclusion is further supported by the Bayesian skyline plots (BSPs) (Figure 6.3) and reciprocal founder analyses. The BSP analyses of the sub-clades R0a1a and R0a2 (that mainly represent haplogroups R0a1 and R0a2’3) show that the coalescences correspond to two major phases of population growth amongst R0a lineages in both the Late Glacial – the Bølling-Allerød interstadial (R0a2) – and the immediate postglacial, after the Younger Dryas (R0a1a) (Figure 6.3). The BSP for R0a as a whole points to a major episode of ~12-fold growth from ~16 Kya until ~10 Kya, with a more recent episode of ~20-fold growth at ~3 Kya. The separate plots show that, while the growth of R0a2 overlaps with R0a overall, R0a1a was involved in a subsequent population expansion, in the early postglacial warming period following the Younger Dryas glacial relapse, ~11.5 Kya. The finding of distinct demographic histories for R0a1a and R0a2 suggests that they may at one time have characterized different populations, possibly even dispersing from separate glacial refugial areas.

Figure 6.3. Bayesian skyline plots (BSPs) of haplogroups R0a (panel a), R0a1a (panel b) and R0a2 (panel c). The thick solid line is the median estimate and the shading shows the 95% highest posterior density limits. The time axis is limited to 25 Kya, beyond which the curves remain flat. In order to date and quantify the main dispersal episodes, a founder analysis on the mitogenome data was performed at the University of Huddersfield, UK. This identifies ‘founder sequences’ shared between two populations as potential evidence for gene flow between the two populations. The analysis were

106

6. Additional projects performed assuming a northern source in the Fertile Crescent, in order to provide the most conservative estimate for the age of Arabian lineages and to quantify and date subsequent dispersals into the Horn of Africa, Europe and South Asia. The obtaining results emphasize that the Fertile Crescent R0a variation seen today cannot be the main source for much of the diversity in Arabia, confirming that Arabia is the most ancient reservoir of R0a variation. The earliest non-African ancestor of R0a dates to ~59 Kya, and may have originated in the Gulf Oasis soon after the dispersal of modern humans from Eastern Africa (Fernandes et al., 2012). Its more immediate ancestor, R0a’b, dates to ~40 Kya and its earliest branches have a relict distribution around the Mediterranean/Near East. We have identified several new minor sister sub-clades to the main R0a branches, and these too have a similar distribution. Nevertheless, multiple lines of evidence suggest that the major R0a sub-clades had entered Arabia and begun diversifying before the Last Glacial Maximum. Rose proposed three potential ‘oases’ in Arabia: the Gulf Oasis in the east, the South Arabian refugium in the Dhofar highlands and eastern Yemen-Oman coastal zone, and the Red Sea coastal plain (Rose, 2010). It seems likely that one or both of these were refugia for early Arabian hunter-gatherer groups carrying predominantly R0a1 and R0a2’3, and from which R0a1a and R0a2, in particular, expanded after the LGM. It is tempting to speculate that R0a2’3 may have sheltered in the Red Sea refugium, given the very early postglacial dispersals of R0a2 sub-clades both into the Horn of Africa and into southern Europe, likely via the Levant. R0a1a began its dramatic expansions ~12 Kya and is now seen mainly in the southern part of the Arabian Peninsula. However, the first major expansions in Arabia were earlier, in the early Late Glacial period, and involved R0a2. Our results do indicate population growth within Arabia at ~3 Kya, which may be implicated in a late Holocene range expansion across the Arabian Sea involving perhaps HV1, and perhaps also of R0a1a1a lineages into the island of Socotra, where the age of the R0a1a1 lineages date to the same timeframe (Cerný et al., 2009). Populations survived along the southeast Arabian coast during the extreme aridity of the so-called ‘Dark Millennium’ after 5.9 Kya and may have prospered as climatic conditions improved again in the Arabian Bronze Age. Although there is less evidence from Yemen, this phase saw marked re-settlement of southeast Arabia during the Hafit phase of oasis agriculture after 5.1 Kya (Preston et al., 2012), and a similar trajectory seems likely to have taken place to the west. The return to more pluvial conditions in Eastern Africa appears to have been later, ~4 Kya (Tierney and deMenocal, 2013), matching estimates for the establishment of Ethiosemitic languages in the Horn (Kitchen et al., 2009). It also coincides with the appearance of the poorly-known literate Daamat-Di’amat polity in northern Ethiopia/, which extended from roughly 850–350 BC, and has long been thought to show signs of Arabian influence (Fattovich, 2010). However, some recent archaeological studies have downplayed the extent of Arabian influence and consider large-scale migration at this time unlikely, more in line with the evidence that we present here (Phillipson, 2009). There may have been some

107

6. Additional projects minor gene flow due to the intensification of maritime trading networks that had begun around this time (Boivin and Fuller, 2009; Fattovich, 2010), also indicated by the appearance of R0a lineage around the Indian Ocean as far as India. However, the main episodes of Arabian settlement in the Horn occurred much earlier, at the end of the Ice Age.

6.2. The mitogenome variation of Egyptian cattle breeds

6.2.1. Background

Archeozoological evidence indicates that taurine cattle were initially domesticated in the upper Euphrates Valley in a frame of time of 11 to 10 Kya (Helmer et al., 2005; Zeder, 2008; Zeder, 2009; Ajmone et al., 2010). Studies involving modern (Troy et al., 2001; Mannen et al., 2004; Pellecchia et al., 2007) and ancient (Bailey et al., 1996; Anderung et al., 2005; Beja-Pereira et al., 2006; Bollongino et al., 2006; Edwards et al., 2007; Mona et al., 2010) samples, generally focusing on a control-region segment of the mitochondrial DNA, revealed that modern taurine mtDNAs cluster within a number of closely related branches, initially termed T, T1, T2, T3 and T4. The frequency and geographic distributions of the T lineages were compatible with the scenario of a single ancestral population source in the Near East and a later spread of Bos taurus following its domestication event in that area (Troy et al., 2001), even though alternative models were also proposed to explain some peculiar features in the geographic distributions of T1 (Bailey et al., 1996), T3 (Beja-Pereira et al., 2006) and T4 (Mannen et al., 2004). Further analysis based on mitogenomes revealed that the macro-haplogroup T is made up of two clades, T1'2'3 (including the previously defined T1, T2 and T3) and T5 (Achilli et al., 2008b; Achilli et al., 2009), whereas haplogroup T4 turned out to be a derived sub-clade within T3. The age of super-haplogroup T was estimated at ∼16 Ky, and those of haplogroups T1, T2, T3 and T5 were all compatible with the scenario that their founding haplotypes were present and directly involved in the domestication event that occurred 10–11 Kya in the Near East. The exception was T4 whose younger age is suggestive of an origin within domestic cattle, probably while diffusing from the Near East towards Eastern Asia (Achilli et al., 2008b; Achilli et al., 2009). Furthermore, other three haplogroups, termed P, Q and R, have been detected in modern breeds (Achilli et al., 2008b; Achilli et al., 2009; Bonfiglio et al., 2010) and additional two, E and C, in ancient specimens and are probably extinct (Edwards et al., 2007; Stock et al., 2009). Among these, haplogroups P and R are derived from European wild aurochs cows either because of sporadic interbreeding events (naturally occurring and/or human-mediated) (Achilli et al., 2008b; Schibler et al., 2014) or possibly, in the case of haplogroup R, as consequence of a minor event of B. primigenius domestication in Italy (Bonfiglio et al., 2010). Instead, the origin of haplogroup Q is less clear.

108

6. Additional projects

It was suggested that it might have derived from European aurochsen (Achilli et al., 2008b), while later was hypothesized that haplogroup Q represents an additional lineage that was domesticated in the Near East and spread with trade and human migrations (Achilli et al., 2009; Bonfiglio et al., 2010). A recent survey of prehistoric domestic cattle control-regions has identified haplogroup Q in Middle/Late Neolithic remains from Anatolia, and in skeletal remains from Bulgaria and Romania dated 7–4 Kya (Scheu et al., 2015). Overall these findings highlight a growing complexity in the geographic distribution of Q. In this study, to obtain more information concerning the mitogenome variation in a geographic area that is adjacent to the Near East, the postulated ancestral homeland of Bos taurus domestication, and thus possibly reconstruct some of the early events associated with the spread of domestic cattle out of the Near East, we analyzed a total of 31 mtDNA genomes (22 reported here for the first time; GenBank records KT184451-KT184472) from two autochthonous Egyptian cattle breeds (Menofi and Domiaty) (also known as Baladi and Damietta).

6.2.2. Results and discussion

The phylogenetic relationships of the 31 Egyptian mitogenomes are illustrated in Figure 6.4. As expected haplogroup T1 (58.0%) is the most frequent and also haplogroups T2 and T3 showed rather high frequencies: 19.4% and 16.1%, respectively. Instead, haplogroup T5, which was reported at low frequencies both in Europe and Iraq (Achilli et al., 2008b), was not found in the Egyptian sample. In addition to the T mtDNAs, two mitogenomes (6.5%) belonging to the rare haplogroup Q1 were observed in the Domiaty breed. The presence of non-T1 mtDNAs in Egypt has been already reported (Beja-Pereira et al., 2006), however, the overall frequency of ~42% detected here in each of the two Nile Delta breeds (Figure 6.4) is the highest reported so far. Moreover, the two (very divergent) complete Q mtDNA sequences found in the Domiaty breed are the first obtained from non-Italian breeds. The high diversity in terms of haplogroups is not the only feature of the Egyptian cattle, our (randomly collected) samples are also characterized by an extremely high haplotype diversity (H=1.0 ± 0.008) with each mitogenome harboring a different haplotype (Figure 6.4). Furthermore, almost all haplotypes depart directly from the respective haplogroup root, with a star-like topology and a rather similar average number of nucleotide differences within each haplogroup: T1 (M=11.9), T2 (M=11.7), T3 (M=13.6) and Q1 (M=16.0). The comparison of the Nile Delta mitogenomes with those available from other geographic areas shows that three T1 (#13, #14 and #15), three T2 (#6, #9 and #10) and one T3 (#1) of the Egyptian mitogenomes share mutations with mtDNAs from the same haplogroups from other geographic areas (Figure 6.5 and data not shown). For three of these (T3 #1, T2 #9 and T2 #10) the mutational sharing is with two mitogenomes from Northern Europe, while for the three T1 with mitogenomes from Italian breeds. In all of these cases the mutation shared is restricted to a single control-region mutation. Because of their location in the

109

6. Additional projects control-region, it is unclear whether these shared mutations mark ancient genetic links between mitogenomes that are now distant both in terms of breed and geography or whether they are the product of independent mutational events, with no real phylogenetic significance. The phylogenetic link of the T2 mitogenome #6 with one from a local cattle breed (Cabannina) of Liguria (EU177851) is much more robust because they share four transitions (nps 1459, 2558, 15985 and 16074), three of which in the coding region (Figure 6.5). At variance with the sub-clades of the T macro-haplogroup, the two Egyptian Q1 mtDNAs do not share non-founding mutations with Q1 mitogenomes from other geographic areas (Figure 6.5). Age estimates are rather similar for all haplogroups and they are in line with those reported in the past (Achilli et al., 2008b; Achilli et al., 2009; Bonfiglio et al., 2010; Bonfiglio et al., 2012). Haplogroups T1, T2, T3 and Q1 harbour ML coalescence times in the 11-15 Ky range (Figure 6.6), in agreement with the scenario that they were present in the Near Eastern aurochs population(s) that underwent domestication about 10 Ky ago. The finding that the coalescence ages of T1 and Q1 predate the postulated domestication time by 4–5 Ky, suggests that a certain degree of sequence diversity already existed within each haplogroup in the wild population(s) at the time of domestication, and more than one haplotype per haplogroup might have been involved in the domestication event(s).

Figure 6.4. Tree of mitogenomes from Egyptian cattle. Sequences #1–19, #25, #30–31 have been determined in this study, while sequences #20–24 and #26–29 were previously reported (Bonfiglio et al., 2012). This tree was built as previously described (Achilli et al., 2008b; Achilli et al., 2009; Bonfiglio et al., 2010; Bonfiglio et al., 2012). The hypervariable insertion of a G at np 364, the length variations in the C tract scored at np 221 and the A tract scored at np 1600 were not used for the phylogeny construction. The position of the Bovine Reference Sequence (BRS) (Anderson et al., 1982) is indicated for reading off- sequence motifs. Branches display mutations with numbers according to the BRS; they are transitions unless a base is explicitly indicated for transversions (to A, G, C, or T) or a suffix for indels (+, d) and heteroplasmy (h). Recurrent mutations within the phylogeny are underlined and back mutations are marked with the suffix @. Note that the reconstruction

110

6. Additional projects of recurrent mutations in the control region is ambiguous in a number of cases. The pie charts summarize haplogroup frequencies in the Menofi (green) and Domiaty (orange) breeds.

Figure 6.5. Worldwide phylogeny of taurine haplogroups T2 and Q. This most parsimonious tree encompasses the Egyptian mitogenomes belonging to haplogroups T2 (N=6) and Q1 (N=2) and all previously published worldwide mitogenomes from the same haplogroups (T2, N=17 and Q, N=16). Branches display mutations with numbers according to the BRS; they are transitions unless a base is explicitly indicated for transversions (to A, G, C, or T) or a suffix for indels (+, d) and heteroplasmy (h). Recurrent mutations within the phylogeny are underlined and back mutations are marked with the suffix @. Coalescence times are maximum likelihood (ML) estimates.

Figure 6.6. Schematic representation of the cattle mtDNA phylogeny. This tree highlights the founding haplotypes that most likely were involved in the domestication process. Approximate ages (Ky) can be inferred from the scale. Some correspond to the ML

111

6. Additional projects ages in estimated in this work, those for haplogroups R and I are from (Achilli et al., 2009; Bonfiglio et al., 2010; Bonfiglio et al., 2012), while those for the probably extinct haplotypes E and C correspond to the radiocarbon dates of the specimens in which they have been found (Edwards et al., 2007, Zhang et al., 2013). A dotted line is shown in T3 and Q1 to indicate that other not yet identified founder sub-haplogroups are likely for these two haplogroups.

Bayesian analyses (Figure 6.7) carried on both single breeds and the whole Egyptian sample revealed a main episode of population growth beginning at ~12.5 Kya that probably reflects the initial population expansion that followed the domestication of wild aurochs in the Near East, although a Late/postglacial expansion of wild cattle, prior to domestication, can not be ruled out.

a b

c

Figure 6.7. Bayesian Skyline Plots showing the size trend of the Egyptian cattle. The BSPs refers to: a) the Domiaty sample (N=14), b) the Menofi sample (N=17), c) the BSP obtained by considering both samples. The Y-axis indicates the effective number of females. The thick solid line is the median estimate and the blue shading shows the 95% highest posterior density limits. The time axis is limited to 30 Kya, beyond that time the curve remains linear. A generation time of six years was employed (Bollongino et al., 2012).

Haplogroup Q is the closest to super-haplogroup T and the only non-T haplogroup that we found in the Egyptian cattle. Its source is rather controversial. It was initially suggested that, similar to haplogroup P, haplogroup Q mitogenomes might have derived from European aurochsen (Achilli et al., 2008b). Later, an extensive survey of published control-region sequences from worldwide samples showed that haplogroup Q was probably present in modern breeds from several European countries, and China as well as in cattle skeletal remains from

112

6. Additional projects

Neolithic archeological sites in Germany, France, and Eastern Thrace (Turkey), suggesting that it was domesticated in the Near East and, similar to macro- haplogroup T, later spread with trade and human migrations (Bonfiglio et al., 2010). Taking into account that the Nile Delta region is in close geographic and cultural proximity to the Near East, and was one of the first neighboring areas to adopt the Neolithic package, including cattle rearing, our finding of two Q1 mtDNAs in the Domiaty breed, with very different haplotypes and each rooting directly from the Q1 node (Figure 6.5), has relevant implications for the debate concerning the ancestral source of haplogroup Q. First, the inclusion of the two Egyptian Q1 mitogenomes in the phylogeny allows for a more precise estimate of the Q and Q1 coalescence times (Figure 6.5). The ML estimate of 15.7 ± 5.9 Ky for Q1 indicates a rather recent (late glacial) origin, which makes it unlikely that this haplogroup had the time to spread in aurochsen populations living far away from each other. Second, the detection of two Q mtDNAs in a local breed (Domiaty) from Egypt weakens the scenario that the ancestral source of haplogroup Q might have been European aurochs, unless gene flow in historical times from European domestic cattle is envisioned. However, if this gene flow had occurred, it should have mainly involved haplogroup T3, which represents the vast majority of European mtDNAs (Lenstra et al., 2014), a scenario that does not appear compatible with the haplogroup composition of the Domiaty breed where only 7.1% is represented by T3 mtDNAs. Third, despite the two Egyptian Q1 mtDNAs are from the same breed, they harbor different and extremely divergent haplotypes (#30 and #31), which radiate directly from the Q1 node (Figure 6.5) and do not share any of their private mutations with any of the Q1 mitogenomes reported so far (all from Europe). This again does not support gene flow from Europe, but rather suggests that the two Egyptian Q1 mtDNAs are direct local derivatives from Q1 founder mtDNAs brought to Egypt by the first domestic herds. In other words, similar to T1, T2 and T3, Q1 was among the haplogroups involved in domestication in the Near East, from where it spread along with the others. An in situ differentiation from the founder domestic herds which first arrived in Egypt from the Near East appears to be very likely not only for the Q1 mtDNAs but also for the majority of Domiaty and Menofi mitogenomes. Indeed, many of the T1, T2 and T3 mitogenomes depart from nodal haplotypes which most likely were directly involved in the domestication process (Figure 6.6). Moreover, they do not share private mutations with other mitogenomes of the same haplogroup from other geographic areas, suggesting that there was very limited gene flow towards and from the two Egyptian cattle breeds, at least on the maternal line, after the initial founder event. One exception to this general trend is represented by a T2 mitogenome (#6) that shares four transitions with a mitogenome from a local cattle breed (Cabannina) from Italy, a finding that indicates direct or indirect gene flow between the two geographic areas, probably as a result of trades or human migrations.

113

7. Materials and Methods

7. Materials and Methods

7.1. DNA extraction

7.1.1. Extraction of human and bovine DNA

Both human and bovine DNA was obtained from peripheral blood, except for Native American samples, for which donors’ DNA was extracted from mouthwash, buccal swabs and/or cord blood. Peripheral blood was previously treated with a red blood cell lysing step, while mouthwash was centrifugated for 10 minutes at 6,000 rpm at 4°C in order to remove the liquid phase. After these initial steps, DNA extraction was performed following the standard phenol/chloroform method, whose steps are listed below:  Add 8 ml of white-cell lysis buffer A1 and centrifuge for 10 minutes at 6,000 rpm at 4°C.  Discard the supernatant and add 1 ml of TBS2 buffer, 1 ml of lysis buffer B3 and 88 µl of Proteinase K4.  Place the suspension of lysed cells in a water bath at 55°C overnight.  Cool the solution to room temperature. Add 1 volume of phenol and 1 volume of chloroform/isoamylic alcohol (24:1), and gently mix the two phases by slowly turning the tube end over end for 5 minutes. Separate the two phases by centrifugation at 6,000 rpm for 7 minutes at 18°C.  With a wide-bore pipette, transfer the viscous aqueous phase to a clean centrifuge Corex. Repeat the extraction once with phenol and chloroform/isoamylic and once with 2 volumes of chloroform/isoamylic.  Add 670 µl of acetate ammonium (from a stock 10M solution to a final concentration of 2.5M) and 2.5 volumes of 95% ethanol (stored at -20°C); and swirl the Corex gently (a small ball of DNA should be visible).  Leave the Corex at -20°C over-night.  Centrifuge for 30 minutes at 8000 rpm at 4°C, discard the supernatant and dry DNA in a vacuum pump at room temperature. Add TE5 buffer 10:1 (pH 8.0) and leave in bath at 37°C over-night.  Centrifuge for 1 minute at 6,000 rpm and store the DNA solution at -20°C.

7.1.2. Extraction of mosquito DNA

1 Lysis buffer A: 10 mM Tris HCl (pH 7.5); 10 mM NaCl; 3 mM MgCl2. 2 TBS: 20 mM Tris HCl (pH 7.5); 150 mM NaCl. 3 Lysis buffer B: 400 mM Tris HCl (pH 8.0); 100 mM EDTA (pH 8.0); 1% SDS. 4 At a concentration of 10 mg/ml in 10 mM Tris HCl (pH 8.0). 5 TE 10:1 (pH 8.0): 10 mM Tris-HCl (pH 8.0); 1 mM EDTA (pH 8.0).

114

7. Materials and Methods

DNA from Ae. albopictus was extracted from entire insects (preserved in 80% ethanol at -20°C) with the Wizard Genomic DNA Purification Kit (Promega) following the manufacturer’s standard protocol for animal tissue:  Add 160 µl of PBS and process the sample with a pestle.  When the sample is homegenized, add 20 µl of Proteinase K and 200 µl of Cell Lysis Buffer (CLD).  Incubate the suspension of lysed cells ate 56° overnight.  Cool the solution to room temperature and add 250 µl of Binding Buffer (BBA) and mix by vortexing.  Place a ReliaPrepTM Binding Column inside a collection tube and transfer the liquid portion of the sample onto the binding column and centrifuge for 1 minute at maximum speed.  Discard the flowthrough in the collection tube and add 500 µl of Column Wash Solution (CWD) and centrifuge for 2 minutes at maximum speed. Repeat this step for a total of three washes.  Place the column in a clean 1.5ml tube and add 150 µl of Nuclease-Free Water and centrifuge for 1 minute at maximum speed. Discard the ReliaPrepTM Binding Column and store the DNA solution at -20°C.

7.2. DNA amplification

The polymerase chain reaction (PCR) is a technique widely used in molecular biology. PCR consists in the replicated copying of a selected region of a DNA molecule, generating millions or more copies of that segment. The copying is carried out by the purified thermostable DNA polymerase of Thermus aquaticus. PCR is now a common technique used in medical and biological research labs for a variety of tasks, such as the sequencing of genes and the diagnosis of hereditary disease, the identification of genetic fingerprints (used in forensic and paternity testing), the detection and diagnosis of infectious diseases, and the creation of transgenic organisms.

7.2.1. Whole genome amplification

In some cases, when sample amount or concentration was too low, whole genome amplification, using the REPLI-g Mitochondrial DNA kit (Quiagen), was employed. Typical DNA preparations comprise approximately 0.1% of mitochondrial DNA (e.g., 10 μg of human DNA contains approximately 10 ng of mitochondrial DNA). The REPLI-g Mitochondrial DNA Kit provides specific amplification of the whole mitochondrial genome and yields approximately 3–5 μg of amplified mitochondrial DNA per reaction. The REPLI-g Mitochondrial DNA Kit provides highly uniform amplification across the entire mitochondrial genome. The method is based on MDA (Multiple Displacement Amplification) technology, which carries out isothermal genome amplification utilizing a uniquely processive DNA polymerase. The DNA

115

7. Materials and Methods polymerase has 3'→5' exonuclease proofreading activity to maintain high fidelity during replication and is used in the presence of exonuclease-resistant primers to achieve high yields of DNA product. Since the resulting mitochondrial DNA is very concentrated, the amplified DNA should be diluted 1:1000 prior to use in downstream PCR assays. The procedure includes the following steps:  Place 1 to 10 μl of template DNA into a microcentrifuge tube (the amount of template DNA should be >1 ng).  Adjust the sample volume to 20 μl with water.  Add 29 μl REPLI-g mt Reaction Buffer to the DNA. Mix by vortexing and centrifuge briefly.  Denaturate the sample DNA by incubation for 5 min at 75°C.  Interrupt denaturation by bringing the solution to room temperature (15– 25°C).  Add 1 μl of REPLI-g Midi Polymerase to the DNA. Mix and centrifuge briefly.  Incubate the sample at 33°C for 8 h.  Inactivate REPLI-g Midi DNA Polymerase by heating the sample for 3 min at 65°C.

7.2.2. Long range PCR for Illumina sequencing

Most of the samples analyzed in this thesis have been sequenced using a Next Generation Sequencing (NGS) platform (MiSeq desktop sequencer, Illumina®) starting from 2 overlapping fragments, obtained with a long range PCR technique, covering the whole mtDNA sequence. The long range PCR uses an optimized polymerase that can amplify DNA lengths up to 30 Kb and beyond, that cannot typically be amplified using routine PCR methods or reagents.

7.2.2.1. Long range PCR of human mtDNA

The complete human mtDNA was amplified using 2 overlapping fragments each about 8.5 Kb in length (Table.7.1). The first fragment extends from np 5871 to np 13829, while the second from np 13477 to 6151 and includes the control region.

Table 7.1. Oligonucleotides used to amplify the human mitogenome in two fragments. PCR product Fragment Namea Length (nt) Sequence (5'-3') Tm (°C)b length (bp) 5871 for 22 gcttcactcagccattttacct 59.79 1 7959 13829 rev 22 agtcctaggaaagtgacagcga 60.44

13477 for 22 gcaggaatacctttcctcacag 60.13 2 9244 6151 rev 22 actagtcagttgccaaagcctc 59.95

116

7. Materials and Methods aIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999). bMelting temperature.

Amplification reactions were performed with 10-50 ng of template DNA in 50 µl of reaction mix for each fragment containing 1X GoTaq® LongPCR Master Mix (Promega) and 0.2 µM of each primer (Tables 7.2 and 7.3).

Table 7.2. Long range amplification protocol. Initial Reagent Volume (µl) Final Concentration concentration

H2O 22

GoTaq® LongPCR Master Mix 2X 25 1X

Forward primer 10 µM 1 0.2 µM

Reverse primer 10 µM 1 0.2 µM

DNA ~25 ng/µl 1 ~25 ng/µl

Final Volume 50

Table 7.3. Long range PCR reaction protocol used for human and bovine mtDNA. Step Temperatures °C Time

Initial denaturation 94°C 2 min Denaturation 94°C 30 sec Annealing 55°C 30 sec Extension 65°C 9 min Repeat from denaturation to annealing for 30 times Final extension 72°C 10 min

7.2.2.2. Long range PCR of bovine mtDNA

The long range amplifications on the complete bovine mtDNA were performed using the 2 overlapping fragments reported in Table 7.4, following the same protocol used for human samples (Tables 7.2 and 7.3).

Table 7.4. Oligonucleotides used to amplify the bovine mitogenome in two fragments. PCR product Fragment Namea Length (nt) Sequence (5'-3') Tm (°C)b length (bp)

1801 for 20 tggtgatagctggttgtcca 58.56 1 8661 10461 rev 20 gacctagggctgcttcacag 58.48

117

7. Materials and Methods

9220 for 20 gccaccctactcgtcatcat 58.40 2 8655 2236 rev 20 tgcactcctgtgttgggtta 58.56 aIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the reference sequence BRS (Andrews et al., 1982). bMelting temperature.

7.2.2.3. Long range PCR of tiger mosquito mtDNA

A primer set was designed to amplify the entire Ae. albopictus mitogenome in four overlapping PCR fragments. The Ae. albopictus Reference Sequence (NC006817) was used to design primer sets. The coding region (nps 1-14893) was amplified by two long PCR fragments (Table 7.5), whereas the control region (nps 14894- 16665) was amplified by two rather short PCR fragments (Table 7.9) because of its high A+T content and the presence of repeated elements.

Table 7.5. Oligonucleotides used to amplify the coding region of Ae. albopictus in two fragments. PCR product Length Fragment Namea Sequence (5’-3’) Tm (°C)b length (bp) (nt) 274 for 22 agctaactcttgattaggggca 58.88 1 8602 8875 rev 20 tgttgaggcacctgtttcag 58.32 8415 for 20 ttaaagtcggaggagcagct 58.73 2 6303 14717 20 aaatttgtgccagctaccgc 59.76 rev aIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the Ae. albopictus Reference Sequence [NC006817]. bMelting temperature.

Coding-region long PCRs were performed on 10-50 ng of template DNA in 50 µl of reaction mix for each fragment containing 1X GoTaq® LongPCR Master Mix (Promega) and 0.2 µM of each primer and 10-20 ng, following the same protocol used for human and bovine samples (Table 7.2) and the PCR conditions reported in Table 7.6.

Table 7.6. Long range PCR reaction protocol used for tiger mosquito mtDNA. Step Temperatures °C Time

Initial denaturation 94°C 2 min Denaturation 94°C 30 sec Annealing 59°C 30 sec Extension 65°C 9 min

118

7. Materials and Methods

Repeat from denaturation to annealing for 30 times Final extension 72°C 10 min

7.2.3. DNA amplification for Sanger sequencing

Some of the human and mosquito mitogenome sequences were obtained by Sanger sequencing. Amplification reactions were performed on 10-50 ng of template DNA in 25 µl of reaction mix in a thermocycler.

7.2.3.1. Amplification of human mtDNA

The amplification of human mtDNA was carried out in the reaction mix reported in Table 7.7, following the PCR conditions shown in Table 7.8.

Table 7.7. Amplification protocol of human mtDNA. Reagent Initial concentration Volume (µl) Final Concentration

H2O 13.3

Buffer 5X 5 1X dNTP mix 1.25 mM (each) 4 200 mM (each)

Forward primer 10 µM 0.75 0.2 µM

Reverse primer 10 µM 0.75 0.2 µM

Taq polymerasea 5000 U/ml 0.2 40 U/ml

DNA ~25 ng/µl 1 ~25 ng/µl Final Volume 25 aPromega GoTaq® DNA polymerase.

Table 7.8. PCR reaction protocol for human mtDNA. Step Temperatures °Ca Time Initial denaturation 95°C 2 min Denaturation 95°C 30 sec Annealing 55°Ca 30 sec Extension 72°C 70 sec Repeat from denaturation to annealing for 35 times Final extension 72°C 10 min aVariable temperature according to the pair of primers.

119

7. Materials and Methods

Reactions to amplify either the whole mitochondrial genome or the mtDNA control-region were performed according to protocols developed in our laboratory (Torroni et al., 2001). The whole mitochondrial genome was amplified in 11 overlapping PCR fragments. Primers used for the complete mitochondrial genome amplification, and those used for the control-region amplification (at the bottom), are listed and reported in Table 7.9.

Table 7.9. Oligonucleotides used to amplify the entire human mtDNA in eleven fragments (complete sequence PCRs) and a portion of the control-region (control- region PCR). Oligonucleotide Fragment PCR ID length (bp) Namea Length (nt) Sequence (5'-3') Tm (°C)b

Complete sequence PCRs

14900 for 20 gccatgcactactcaccaga 59.96 1 1760 90 rev 20 aatgctatcgcgtgcatacc 61.02 16458 for 20 cccataacacttgggggtag 59.17 2 1682 1570 rev 20 tgtaagttgggtgctttgtgtt 58.72 1404 for 22 acttaagggtcgaaggtggatt 60.23 3 1832 3235 rev 22 cttaacaaaccctgttcttggg 59.90 2932 for 20 gggataacagcgcaatccta 59.90 4 1607 4538 rev 20 gcttagcgctgtgatgagtg 60.13 4366 for 22 aaaattctccgtgccacctatc 61.53 5 1659 6024 rev 22 ttatgttgtttatgcggggaaa 61.37 5871 for 22 gcttcactcagccattttacct 59.79 6 1747 7617 rev 22 tcttgtagacctacttgcgctg 59.72 7356 for 24 gtagaagaaccctccataaacctg 59.46 7 1822 9177 rev 24 tagaagtgtgaaaacgtaggcttg 59.88 8896 for 20 gccctagcccacttcttacc 60.10 8 1740 10728 rev 21 ggccatatgtgttggagattg 60.21 10466 for 20 ccaaatgcccctcatttaca 60.04 9 1774 12240 rev 20 ggggcatgagttagcagttc 59.57 12014 for 22 ctcacccaccacattaacaaca 60.70 10 1816 13829 rev 22 agtcctaggaaagtgacagcga 60.44 13504 for 22 tactccaaagaccacatcatcg 59.99 11 1804 15307 rev 22 gaagggcaagatgaagtgaaag 60.24

Control-region PCR

D-loop 1420 15877 for 20 caaatgggcctgtccttgta 60.88

120

7. Materials and Methods

727 rev 20 agggtgaactcactggaacg 60.15 aIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999). bMelting temperature.

7.2.3.2. Amplification of Ae. albopictus mtDNA

The mosquito mtDNA coding (nps 1-14893) and control (np 14894-16665) regions were amplified in 9 and 2 overlapping PCR fragments, respectively, with the primers reported in Table 7.10. The amplification reactions were performed in 25 µl reactions with a standard reaction mix (Table 7.11) following the schemes reported in Tables 7.12 (for the amplification of the coding-region) and 7.13 (for the amplification of the control-region).

Table 7.10. Oligonucleotides used to amplify the coding-region (complete sequence PCRs) and a portion of the control-region (control-region PCR) of Ae. albopictus mtDNA. Oligonucleotide Fragment PCR ID Tm length (bp) Namea Length (nt) Sequence (5'-3') (°C)b Coding-region PCRs 247 for 22 agctaactcttgattaggggca 58.88 1 1493 1766 rev 20 gcagcagtgttaaagagggg 58.83 1423 for 20 tcagccatttaatcgcgacac 57.98 2 2159 3582 rev 20 ttaaacgtccgggagtagca 58.74 3272 for 23 tgcctttccttctttacgacttt 58.54 3 2167 5438 rev 23 ctgctgcttcaaaaccaaaatga 58.88 5269 for 22 cgaagcaccttttacaattgcad 58.36 4 2181 7448 rev 25 tactcctgtttctgctttagttcat 58.23 7125 for 23 ttgtgaataattaccccagcacae 58.33 5 1754 8875 rev 20 tgttgaggcacctgtttcag 58.32 8799 for 23 cccgtaataaaccatatcctccc 58.16 6 1647 10445 rev 21 tttattggtcgtaatgggcca 57.63 9985 for 23 aaacatctgcttgctataggtct 57.63 7 1816 11800 rev 18 gggactttgcctcggttt 57.51 11460 for 20 tgaattggagctcgacctgt 59.02 8 1819 13278 rev 20 aagggccgcagtattttgac 58.83 12793 for 20 cgccggtttgaactcagatc 59.28 9 1925 14717 rev 20 aaatttgtgccagctaccgc 59.76

121

7. Materials and Methods

Control-region PCR

14534 for 20 agggtatctaatcctagtttc 49.46 10 1700-2500 16267 rev 19 ctatgggtcctaaatgaag 49.44 16261 for 25 cccataggcttataaatacttcact 56.22 11 819-903 414 rev 21 gaagaagcaaaggcttgaact 57.00 aIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the Ae. albopictus Reference Sequence [NC006817]. bMelting temperature. cThe Ae. albopictus Reference Sequence [NC006817] harbors a C-insertion between the underlined nucleotide positions. d The Ae. albopictus reference sequence [NC006817] harbors C>T at the underlined nucleotide position. e The Ae. albopictus reference sequence [NC006817] harbors G>A, T>A, A>G, C>A and A>T at the underlined nucleotides, respectively.

Table 7.11. Amplification protocol of Ae. albopictus mtDNA. Reagent Initial concentration Volume (µl) Final Concentration

H2O 13

Buffer 5X 5 1X

dNTP mix 1.25 mM (each) 5 0.2 mM (each)

Forward primer 10 µM 0.50 0.2 µM

Reverse primer 10 µM 0.50 0.2 µM

Taq polymerasea 5 U/µl 0.13 0.6 U/µl

DNA ~25 ng/µl 1 ~25 ng/µl

Final Volume 25 aPromega GoTaq® DNA polymerase.

Table 7.12. PCR reaction protocol for Ae. albopictus mtDNA coding region. Step Temperatures °C Time

Initial denaturation 94°C 2 min Denaturation 94°C 30 sec Annealing 55°Ca 30 sec Extension 72°C 2 min Repeat from denaturation to annealing for 35 times Final extension 72°C 5 min

122

7. Materials and Methods

Table 7.13. PCR reaction protocol for Ae. albopictus mtDNA control region. PCR #10 and #11 refer to those reported in Table 7.10. Step Temperatures °C Time

Initial denaturation 94°C 2 min Denaturation 94°C 30 sec 54°Ca (PCR #10) Annealing 30 sec 55°Ca (PCR #11) 2 min (PCR #10) Extension 60°C 1 min (PCR #11) Repeat from denaturation to annealing for 35 times (PCR #10) or for 40 time (PCR #11) 5 min (PCR #10) Final extension 60°C 10 min (PCR #11)

7.3. Electrophoresis

To assess the quality of the PCR product, the amplified fragments were loaded on a horizontal agarose gel. Usually 5 µl of PCR product were mixed with a Bromophenol Blue and saccarose solution and loaded on a 1-2% agarose gel (depending on the length of fragments), stained with ethidium bromide (0.5 µg/ml). The electrophoretic run was performed in TBE6 buffer and, after applying an electrical voltage of 80-100 Volts for 50-60 minutes. PCR products were visualized under ultraviolet rays and the sizes of the fragments were determined by comparison to commercially available DNA weight molecular markers of known length (usually the 100-1000 bp ‘ladder’).

7.4. Sequence analysis

7.4.1. Next Generation Sequencing

Long range PCR products were purified with the Wizard® SV Gel and PCR Clean- Up System (Promega) according to manufacture instructions and quantified using a Quantus™ Fluorometer (Promega). A total amount of 1.5 ng of PCR product (0.75 ng for each PCR) were used for the set up of a sequencing library with the Nextera® XT DNA sample preparation kit (Illumina) following the manufacturer's protocol. The steps of this protocol are briefly described below:  Tagmentation of DNA: input DNA is tagged and enzymatically fragmented by the Nextera XT transposome, which simultaneously fragments the DNA and adds adapter sequences to the ends.

6 Buffer TBE 10X: 0.89 M Tris HCl (pH 8.0), boric acid 0.89 M, 20 mM EDTA (pH 8.0).

123

7. Materials and Methods

 Amplification of tagmented DNA: the tagmented DNA is amplified via a limited-cycle PCR program. The PCR step also adds index 1 (i7) and index 2 (i5), oligos that bind the DNA segments at both ends. These oligos are complementary to those present in the flow cell, allowing hybridization, clustering and subsequent amplification when the samples are loaded in the MiSeq. The use of two different indexes allows first the hybridization process to occur between index 1 and the complementary oligo in the flow cell and then the formation of ‘bridges’ with the hybridization between the index 2 and its complementary oligo.  PCR Clean-Up: the library DNA is purified with AMPure XP beads, a step that also provides a size selection removing very short library fragments from the population. These beads are paramagnetic (magnetic only in a magnetic field) and this prevents them from clumping and falling out of solution. Each bead is made of polystyrene surrounded by a layer of magnetite, which is coated with carboxyl molecules. This composition allows the reversibly binding of negatively-charged DNA to the carboxyl groups on their surface.  Library normalization: the quantity of each library is normalized to ensure more equal library representation in your pooled sample. In preparation for sequencing, equal volumes of normalized library are combined, diluted in hybridization buffer, and heat denatured prior to MiSeq sequencing. Once obtained the final normalized sample, it is loaded on the MiSeq flow cell. Sequencing reactions were carried out on a MiSeq (Illumina) by using the MiSeq Reagent Nano Kit, v2 (300 cycles). On-board software created results in FASTQ format, which were analyzed with the Geneious software (version 8.1). It was used to compare the sequences with the references rCRS (Andrews et al., 1999) (human sequences) and BRS (Andrews et al., 1982) (bovine sequences) and create a report of sequence variants (nucleotide substitutions and indels).

7.4.2. Sanger sequencing

After the PCR reaction, fragments were purified using the ExoSAP-IT enzymatic system (Exonuclease I and Shrimp Alcaline Phosphatase, GE Healthcare). An ABI 3730 sequencer with 96 capillaries was employed for separation of the sequencing ladders The sequencing was performed by the BMR Genomics (http://www.bmr- genomics.it/) or the GATC Biotech (http:/www.gatc-biotech.com/). Obtained sequences were aligned, assembled, and compared with the references using the software Sequencher™ 5.0 (Gene Codes). Since the traces were usually unambiguous, it was generally necessary to sequence only one strand.

7.4.2.1. Human mtDNA sequencing

The oligonucleotides used for the Sanger sequencing of human samples are listed in Table 7.14. At the bottom is reported the protocol applied to the control region.

124

7. Materials and Methods

Generally only the forward primer 15981 is employed for the D-loop sequencing, while the 305 rev was used for mtDNAs harbouring the T16189C. This transition creates a poly-C (Bendall and Sykes, 1995) that causes the fall of the sequencing reaction and the loss of the signal. The reverse primer solves the problem by completing the sequence information from the reverse side. For the same reason also the sequencing protocol for the complete human mtDNA includes a reverse primer (58 rev). Moreover, if length variation in the C tracts at np 309 is present, another poly-C could be created, causing the fall of sequence signal again. In this case an additional primer should be used to cover the entire sequence in that range. This oligonucleotide, named 653 rev are comprised in PCR fragment 2. However, the poly-C at np 309 is much less common than the one nearby np 16189, thus the primer 653 rev has not been included among the primers in the standard protocol for complete mtDNA sequencing.

Table 7.14. Oligonucleotides used for sequencing the entire human mtDNA. An additional primer used in case of poly-C nearby np 309 (additional oligonucleotide for complete sequencing) is also included.

Template Sequencing oligonucleotide a PCR ID Nameb Length (nt) Sequence (5'-3') Tm (°C)c Complete sequencing 14948 for 20 cacatcactcgagacgtaaa 54.92 1 15564 for 20 atttcctattcgcctacaca 54.93 58 rev 20 aataccaaatgcatggagag 55.17 16522 for 20 taaagcctaaatagcccaca 55.27 2 584 for 20 tagcttacctcctcaaagca 55.46 1060 for 20 aagacccaaactgggattag 55.74 1445 for 20 gagtgcttagttgaacaggg 55.02 3 2047 for 20 tttaaatttgcccacagaac 55.39 2509 for 20 atcacctctagcatcaccag 55.23 3067 for 20 tgagttcagaccggagtaat 54.76 4 3540 for 20 tctcaccatcgctcttctac 55.54 4010 for 20 acaccctcaccactacaatc 54.77 4410 for 20 cagctaaataagctatcggg 54.58 5 5014 for 20 cctcaattacccacatagga 55.02 5544 for 20 tcaaagccctcagtaagttg 55.63 6041 for 20 ccttctaggtaacgaccaca 55.33 6 6473 for 20 cacagcagtcctacttctcc 55.00 7027 for 20 cccacttccactatgtccta 55.08 7416 for 20 ttcgaagaacccgtatacat 54.77 7 7987 for 20 actccttgacgttgacaatc 55.09

125

7. Materials and Methods

8505 for 20 ataacaaaccctgagaacca 54.62 8975 for 18 tcattcaaccaatagccc 54.27 8 9589 for 20 aagtcccactcctaaacaca 54.68 10147 for 20 acatagaaaaatccacccct 55.09 10498 for 22 tagcatttaccatctcacttct 53.48 9 11081 for 20 ataacattcacagccacaga 54.03 11644 for 20 cctcgtagtaacagccattc 54.99 12114 for 19 acatcattaccgggttttc 54.81 10 12611 for 20 attcatccctgtagcattgt 54.75 13134 for 20 agcagaaaatagcccactaa 54.42 13569 for 20 cgcctatagcactcgaataa 55.85 11 14115 for 20 cccactcatcctaaccctac 56.03 14646 for 20 cactcaacagaaacaaagca 54.98 Control-region sequencing D-loop 15981 for 19 ccattagcacccaaagcta 56.44 D-loop 305 rev 20 gggtttggtggaaattttt 55.37 Additional oligonucleotide for complete sequencing 2 653 rev 20 cctatttgtttatggggtga 55.04 aNumbers in column correspond to the 11 PCR fragments listed in Table 7.9. bIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999). cMelting temperature.

7.4.2.2. Mosquito mtDNA sequencing

For the mosquito mtDNA Sanger sequencing, two protocols (named I and II) were defined. The oligonucleotides used for the different protocols are reported in Tables 7.15 and 7.16.

Table 7.15. Oligonucleotides used for sequencing the entire Ae. Albopictus mtDNA from four PCR fragments (protocol I). The template PCRs of the coding and control regions are those obtained with primers listed in Tables 7.5 and 7.10, respectively. Template Sequencing oligonucleotide a PCR ID Nameb Length (nt) Sequence (5'-3') Tm (°C)c

Coding-region sequencing 274 ford 22 agctaactcttgattaggggca 58.88 1452 for 25 ggtcaacaaatcataaagatattgge 54.80 1 1766 rev 20 gcagcagtgttaaagagggg 58.83 2027 for 20 cccgtattagccggagctate 58.82

126

7. Materials and Methods

2867 for 21 cgaactccttctttccccata 57.09 4063 for 23 tgacccttcaactactattttca 55.26 4233 rev 20 ccatttggccctaaaagagt 55.59 4728 for 20 cacatgcaaatcaccctttt 55.02 5910 rev 21 tgctccttgatttcattcatg 54.52 5685 for 20 cccctttgaatgtggatttg 54.80 6287 for 23 aataatttccccaacatcttcaa 54.46 6992 for 22 atccttagaataaaatcccgctf 55.00 7441 rev 23 gtttctgctttagttcattcttc 54.27 7638 for 21 ccaattcgattagaaagagca 57.15 8415 ford 20 ttaaagtcggaggagcagct 58.73 8658 for 20 gagctacagaagaataagca 52.44 9085 for 19 tgcttgtaatcgttctggtg 54.41 9986 for 22 aacatctgcttgctataggtct 56.96 10425 for 20 tggcccattacgaccaataa 56.90 11799 rev 18 ggactttgcctcggtttc 56.01 2 11460 for 20 tgaattggagctcgacctgt 59.02 12193 for 20 ctactgctcgtaaaccccct 58.81 12913 for 25 gaactctaaaaaaaaattacgctgt 55.38 13288 for 20 tcagtgggcaggttagactt 58.27 13820 for 20 accctgatacacaaggtaca 55.46

Control-region sequencing

10 14534 ford 20 agggtatctaatcctagttth 49.46 16261 ford 25 cccataggcttataaatacttcact 56.22 11 414 revd 21 gaagaagcaaaggcttgaact 57.00 aNumbers in column correspond to the two PCR coding region fragments listed in Table 7.5 and to the two PCR control region fragments listed in table 7.10. bIt corresponds to the first nucleotide position (at 5’) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999). cMelting temperature. dPrimers used for PCR reactions (Tables 7.5 and 7.10). ePrimer from Zhong et al., 2013; the Ae. albopictus Reference Sequence [NC006817] harbors two G>T at the underlined nucleotides. fThe Ae. albopictus reference sequence [NC006817] harbors A>C, T>C and C>T at the underlined nucleotides, respectively. gThe Ae. albopictus Reference Sequence [NC006817] harbors an A-insertion between the underlined nucleotides. hPrimer from Xu and Fonseca, 2011; the Ae. albopictus Reference Sequence [NC006817] harbors an A-deletion at the underlined nucleotide position.

127

7. Materials and Methods

Table 7.16. Oligonucleotides used for sequencing the entire Ae. albopictus mtDNA from eleven PCR fragments (protocol II). The template PCRs are those obtained with primers listed in Table 7.10. Sequencing oligonucleotide Template PCR ID Name Length (nt) Sequence (5'-3') Tm (°C)

Complete sequencing

274 forc 22 agctaactcttgattaggggca 58.88 1 1766 revc 20 gcagcagtgttaaagagggg 58.83 1452 for 25 ggtcaacaaatcataaagatattggd 54.80 2 2027 for 20 cccgtattagccggagctate 58.82 2867 for 21 cgaactccttctttccccata 57.09 4233 rev 20 ccatttggccctaaaagagt 55.59 4063 forc 23 tgacccttcaactactattttca 55.26 3 4728 for 20 cacatgcaaatcaccctttt 55.02 5143 for 22 aggagttactgttacatgggct 58.82 5910 rev 21 tgctccttgatttcattcatg 54.52 5685 for 20 cccctttgaatgtggatttg 54.80 4 6287 for 23 aataatttccccaacatcttcaa 54.46 6992 for 22 atccttagaataaaatcccgcte 55.00 7125 forc 23 ttgtgaataattaccccagcacaf 58.33 5 7638 for 21 ccaattcgattagaaagagca 57.15 8415 forc 20 ttaaagtcggaggagcagct 58.73 8798 forc 22 acccgtaataaaccatatcctc 55.12 9085 for 19 tgcttgtaatcgttctggtg 54.41 6 9564 for 20 agacaaccctaatacccctt 55.08 9986 for 22 aacatctgcttgctataggtct 56.96 10425 for 20 tggcccattacgaccaataa 56.90 7 11799 rev 18 ggactttgcctcggtttc 56.01 11460 forc 20 tgaattggagctcgacctgt 59.02 8 12193 for 20 ctactgctcgtaaaccccct 58.81 12913 for 25 gaactctaaaaaaaaattacgctgt 55.38 13288 for 20 tcagtgggcaggttagactt 58.27 9 13820 for 20 accctgatacacaaggtaca 55.46 14534 forc 20 agggtatctaatcctagttth 49.46

Control-region sequencing

10 14534 forc 20 agggtatctaatcctagttth 49.46

128

7. Materials and Methods

16261 forc 25 cccataggcttataaatacttcact 56.22 11 414 revc 21 gaagaagcaaaggcttgaact 57.00 aTemplate PCR ID Numbers refer to those in Table 7.10. bMelting Temperature cPrimers used also for PCR reactions listed in Table 7.10. dPrimer from Zhong et al., 2013; the Ae. albopictus Reference Sequence [NC006817] harbors two G>T at the underlined nucleotides. eThe Ae. albopictus reference sequence [NC006817] harbors A>C, T>C and C>T at the underlined nucleotides, respectively. fThe Ae. albopictus reference sequence [NC006817] harbors G>A, T>A, A>G, C>A and A>T at the underlined nucleotides, respectively. gThe Ae. albopictus Reference Sequence [NC006817] harbors an A-insertion between the underlined nucleotide positions. hPrimer from Xu and Fonseca, 2011; the Ae. albopictus Reference Sequence [NC006817] harbors an A-deletion at the underlined nucleotide.

7.5. Phylogenetic (and other) analyses

7.5.1. Tree design and haplogroup definition

One of the best ways to infer phylogenetic information is using a tree, that is an intuitive method for displaying the relationships between many kinds of variant entities. It consists of branches connecting nodes, where each node represents a group of populations or molecules known as taxa (or operational taxonomic units - OTUs). Trees can be rooted or unrooted. Rooted trees contain one outgroup that can be defined as having the most ancestral divergence compared to all others. In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common ancestor of the descendants. On the other hand, unrooted trees illustrate only the relatedness of the leaf nodes and do not require the ancestral root to be known or inferred. Trees considered in this thesis were rooted using the mtDNA reference sequences rCRS (Andrews et al., 1999) for Homo sapiens, BRS for Bos taurus (Anderson et al., 1982) and the sequence NC006817 for Ae. albopictus. These sequences are not outgroups, but the first published mitogenomes of each of the previously listed species. The mutation(s) defining each branch are reported as variants or deviations from the reference sequences. The branches including mtDNAs sharing at least one mutation are defined ‘haplogroups’. However, to define a new haplogroup it is necessary to be sure that the shared mutation is not a mutational hotspot.

7.5.1.1. Maximum parsimony

Phylogenetic trees were made using a Maximum Parsimony (MP) principle. MP is the most robust and commonly used criterion and it aims to minimalize the total number of evolutionary changes from a common ancestor obtaining the shortest

129

7. Materials and Methods possible tree. Thus, the best tree requires the smallest number of mutations to account for the sequences. In the case that two (or more) trees are equally parsimonious, there is no criterion for choosing between them, and no unique tree can be inferred. MP methods can incorporate information about the relative rate of different mutations, for example if transversions are known to occur less frequently than transitions or if certain sites are known to be hyper-mutable. These mutational events can be weighted accordingly, such that the rarer changes carry more influence (Jobling et al., 2004). We built the MP phylogenetic trees of the Sardinian, Native American, R0a haplogroup and Bos taurus projects (chapters 3, 4 and 6.2, respectively) with the aid of the mtPhyl software (http://eltsov.org/mtphyl.aspx). Human trees were then corrected by hand with reference to PhyloTree (van Oven and Kayser, 2009). Haplogroup labels were assigned by following the nomenclature proposed by the PhyloTree database (at http://www.phylotree.org/), defining a new lineage when encompassing at least three mitogenomes with a minimum of two different haplotypes and enclosing at least one stable mutation (not recurrent in the tree) at the root.

7.5.2. Coalescence times

Coalescence times for human and bovine mtDNA haplogroups/sub-haplogroups were estimated using both maximum likelihood (ML) and the average distance of the haplotypes of a clade from the respective root haplotype (ρ statistic) (Forster et al., 1996) accompanied by a heuristic estimate of the standard error (σ) calculated from an estimate of the genealogy (Saillard et al., 2000).

7.5.2.1. Rho statistics

Rho statistics coalescence ages were calculated on the trees by means of the average number of base substitutions (ρ) in the entire mtDNA from the root haplotypes disregarding the hot-spot mutations (16182C, 16183C, 16194C and 16519 in humans, according to Soares et al., 2009), indels (except the 9-bp deletion between 8281 and 8289, diagnostic of human haplogroup B2, that counts as one) and heteroplasmies. Standard error (sigma-σ) was calculated from an estimate of the genealogy. These calculations were performed by using the mtPhyl software (http://eltsov.org). Age estimates (T) were obtained by converting in time estimates the mutational distances. In human we assumed as mutation rate a nucleotide variation every 3.624 years, using the corrected molecular clock proposed by Soares et al. (2009). In bovine the mutational distances were converted into years using the substitution rate for the bovine coding region of about one mutation every 3,172 years (Achilli et al., 2008b).

130

7. Materials and Methods

7.5.2.2. Maximum likelihood

The Maximum likelihood (ML) is a statistical method based on the principle of likelihood, which is the probability of observing the data given a particular model (in this case concerning the mutation process). Different models may make the observed data more or less probable. Likelihood (L) is often written as L=Pr(D/H) where Pr(D/H) is the probability of getting the data D given hypothesis H. In the context of molecular phylogenetics D is the set of sequences being compared, and H is a phylogenetic tree, hence we want to find the likelihood of obtaining the observed sequences given a particular tree (with a certain topology and set of branch lengths). The program therefore takes into account information about the mutation process when estimating the tree (such as the extent of rate heterogeneity and the transition/transversion ratio). In practice, tree topology estimation from complete mitogenomes is more easily accomplished using the simpler Network package, and we used a tree estimated in this way as input into PAMLX in order to generate branch lengths. Calibrated against an appropriate mutation rate, this provides more realistic and more precise estimates of coalescence times than other approaches, with smaller confidence intervals. ML estimations were performed using the software PAMLX (Yang, 1997), assuming the HKY85 mutation model (two parameters in the model of DNA evolution) with gamma-distributed rates (approximated by a discrete distribution with 32 categories). In human mtDNA analyses we assumed two partitions: coding region (positions 577-16023) and control region (positions 16024-576) as described in Achilli et al. (2013). Indels, heteroplasmies and the mutations 16182C, 16183C, 16194C and 16519 were excluded from the analysis. Mutational distances were converted into years by employing the corrected molecular clock embedded in the calculator of Soares et al. (2009). In the analysis of Native American mitogenomes: i) ML estimations were also obtained using one partition (nps 1- 16569); ii) coalescence ages of haplogroups M and N (Behar et al., 2012a) and radiocarbon calibrated ages of ancient mitogenomes were used as priors. In the analysis of Sardinian mitogenomes: i) the linear mutation rate published by Post et al. (2016) (2.7 ± 0.2 x 10-8 base substitution per nucleotide per year) was employed in addition to the one from Soares et al. (2009). In the analysis of bovine mitogenomes, ML estimations were performed only on the coding region (nps 364- 15791). Mutational distances were converted into years using the substitution rate for the bovine coding region reported in Achilli et al. (2008b).

7.5.2.3. Bayesian approach

In some analyses (see chapters 3 and 4), age estimates were also calculated using the Bayesian method. We employed BEAST 1.8.3 (Drummond and Rambaut, 2007) using radiocarbon dating of ancient specimens as priors. The program was run under the HKY substitution model (gamma-distributed rates plus invariant sites) with a fixed molecular clock. The clock rates proposed by Soares et al. (2009) and Posth et al. (2016) (only for Sardinian project) were entered as priors.

131

7. Materials and Methods

7.5.3. Population expansion times

The Bayesian skyline plot model uses standard Markov chain Monte Carlo (MCMC) sampling procedures to estimate a posterior distribution of effective population size through time directly from a sample of gene sequences. The Bayesian skyline plot (BSP) includes credibility intervals for the estimated effective population size at every point in time, back to the most recent common ancestor of the gene sequences. These credibility intervals represent both phylogenetic and coalescent uncertainty (Drummond et al., 2005). BSPs were produced with the BEAST software using a relaxed molecular clock (lognormal distribution across branches and uncorrelated between them) and an HKY85-type model with γ-distributed rates. The ages of haplogroups N and M (Behar et al., 2012a) and of haplogroup R (Fernandes et al., 2015) were considered as consistent internal calibration points in the analysis of Native American (chapter 4) and R0a (chapter 6.1) mitogenomes, respectively. In the analysis of bovine mitogenomes (chapter 6.2) we considered a P sequence (GenBank ID DQ124389) and the age of PQT of 76 ± 10 Ky (Bonfiglio et al., 2010) as outgroup and calibration point, respectively. Specifically, the chain length was established at 50,000,000 iterations, with samples drawn every 10,000 MCMC steps, after a discarded burn-in of 5,000,000 steps. The BSPs obtained were visualized in plots with Tracer v1.5 and then converted to Excel graphs by assuming a generation time of 25 years for humans and of 6 years for bovines (Bollongino et al., 2012).

7.5.4. Frequency distribution analysis

Frequency maps were drawn by using Surfer 9.0 (Golden Software) with the Kriging procedure and estimates at each grid node were inferred by consideration of the entire data set. A geographic map of gene frequencies is comparable to a geographic map of altitudes in which altitudes are represented by colors or shadings of varying intensity. Ordinarily, isopleths (in this case, lines of equal haplogroup frequency) are first calculated for various levels of haplogroup frequencies, for instance 10%, 20%, etc., or either suitable intervals, and the areas between two successive isopleths are filled with a color or shading of increasing intensity. In this way map conveys immediate information on the location of relative maxima and minima (peaks and troughs) and gradients (clines) of the haplogroup frequency surface.

132

REFERENCES

REFERENCES

1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010; 467(7319):1061-73. Abi-Rached L, Jobin MJ, Kulkarni S, McWhinnie A, Dalva K, Gragert L, Babrzadeh F, Gharizadeh B, Luo M, Plummer FA, Kimani J, Carrington M, Middleton D, Rajalingam R, Beksac M, Marsh SG, Maiers M, Guethlein LA, Tavoularis S, Little AM, Green RE, Norman PJ, Parham P. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science. 2011; 334(6052):89-94. Achilli A, Rengo C, Magri C, Battaglia V, Olivieri A, Scozzari R, Cruciani F, Zeviani M, Briem E, Carelli V, Moral P, Dugoujon JM, Roostalu U, Loogväli EL, Kivisild T, Bandelt HJ, Richards M, Villems R, Santachiara-Benerecetti AS, Semino O, Torroni A. The molecular dissection of mtDNA haplogroup H confirms that the Franco- Cantabrian glacial refuge was a major source for the European gene pool. Am J Hum Genet. 2004; 75(5):910-8. Achilli A, Perego UA, Bravi CM, Coble MD, Kong QP, Woodward SR, Salas A, Torroni A, Bandelt HJ. The phylogeny of the four pan-American mtDNA haplogroups: implications for evolutionary and disease studies. PLoS One. 2008a; 3(3):e1764. Achilli A, Olivieri A, Pellecchia M, Uboldi C, Colli L, Al-Zahery N, Accetturo M, Pala M, Hooshiar Kashani B, Perego UA, Battaglia V, Fornarino S, Kalamati J, Houshmand M, Negrini R, Semino O, Richards M, Macaulay V, Ferretti L, Bandelt HJ, Ajmone- Marsan P, Torroni A. Mitochondrial genomes of extinct aurochs survive in domestic cattle. Curr Biol. 2008b; 18(4):R157-8. Achilli A, Bonfiglio S, Olivieri A, Malusà A, Pala M, Hooshiar Kashani B, Perego UA, Ajmone-Marsan P, Liotta L, Semino O, Bandelt HJ, Ferretti L, Torroni A. The multifaceted origin of taurine cattle reflected by the mitochondrial genome. PLoS One. 2009; 4(6):e5753. Achilli A, Olivieri A, Soares P, Lancioni H, Hooshiar Kashani B, Perego UA, Nergadze SG, Carossa V, Santagostino M, Capomaccio S, Felicetti M, Al-Achkar W, Penedo MC, Verini-Supplizi A, Houshmand M, Woodward SR, Semino O, Silvestrelli M, Giulotto E, Pereira L, Bandelt HJ, Torroni A. Mitochondrial genomes from modern horses reveal the major haplogroups that underwent domestication. Proc Natl Acad Sci U S A. 2012; 109(7):2449-54. Achilli A, Perego UA, Lancioni H, Olivieri A, Gandini F, Hooshiar Kashani B, Battaglia V, Grugni V, Angerhofer N, Rogers MP, Herrera RJ, Woodward SR, Labuda D, Smith DG, Cybulski JS, Semino O, Malhi RS, Torroni A. Reconciling migration models to the Americas with the variation of North American native mitogenomes. Proc Natl Acad Sci U S A. 2013; 110(35):14308-13. Ajmone-Marsan P, Garcia JF, Lenstra JA. On the origin of cattle: how aurochs became cattle and colonized the world. Evol Anthropol. 2010; 19:148–157.

133

REFERENCES

Al-Abri A, Podgorná E, Rose JI, Pereira L, Mulligan CJ, Silva NM, Bayoumi R, Soares P, Cerný V. Pleistocene-Holocene boundary in Southern Arabia from the perspective of human mtDNA variation. Am J Phys Anthropol. 2012; 149(2):291- 8. Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young G. Sequence and organization of the human mitochondrial genome. Nature. 1981; 290(5806):457-65. Anderson S, de Bruijn MH, Coulson AR, Eperon IC, Sanger F, Young IG. Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J Mol Biol. 1982; 156:683-717. Anderung C, Bouwman A, Persson P, Carretero JM, Ortega AI, Elburg R, Smith C, Arsuaga JL, Ellegren H, Götherström A. Prehistoric contacts over the Straits of Gibraltar indicated by genetic analysis of Iberian Bronze Age cattle. Proc Natl Acad Sci U S A. 2005; 102(24):8431-5. Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet. 1999; 23(2):147. Armbruster P, Damsky WE Jr, Giordano R, Birungi J, Munstermann LE, Conn JE. Infection of New- and Old-World Aedes albopictus (Diptera: Culicidae) by the intracellular parasite Wolbachia: implications for host mitochondrial DNA evolution. J Med Entomol. 2003; 40(3):356-60. Armitage SJ, Jasim SA, Marks AE, Parker AG, Usik VI, Uerpmann HP. The southern route "out of Africa": evidence for an early expansion of modern humans into Arabia. Science. 2011; 331(6016):453-56. Avise JC. Phylogeography: the history and formation of species. Harvard University Press, Cambridge. 2000. Avital G, Buchshtav M, Zhidkov I, Tuval Feder J, Dadon S, Rubin E, Glass D, Spector TD, Mishmar D. Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins. Hum Mol Genet. 2012; 21(19):4214-24. Bailey JF, Richards MB, Macaulay VA, Colson IB, James IT, Bradley DG, Hedges RE, Sykes BC. Ancient DNA suggests a recent expansion of European cattle from a diverse wild progenitor species. Proc Biol Sci. 1996; 263(1376):1467-73. Bandelt HJ, Kloss-Brandstätter A, Richards MB, Yao YG, Logan I. The case for the continuing use of the revised Cambridge Reference Sequence (rCRS) and the standardization of notation in human mitochondrial DNA studies. J Hum Genet. 2014; 59(2):66-77. Barbujani G, Colonna V. Human genome diversity: frequently asked questions. Trends Genet. 2010; 26(7):285-95.

134

REFERENCES

Barrell BG, Bankier AT, Drouin J. A different genetic code in human mitochondria. Nature. 1979; 282(5735):189-94. Beebe NW, Ambrose L, Hill LA, Davis JB, Hapgood G, Cooper RD, Russell RC, Ritchie SA, Reimer LJ, Lobo NF, Syafruddin D, van den Hurk AF. Tracing the tiger: population genetics provides valuable insights into the Aedes (Stegomyia) albopictus invasion of the Australasian Region. PLoS Negl Trop Dis. 2013; 7(8):e2361. Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D, Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells RS, Rosset S; Genographic Consortium. The dawn of human matrilineal diversity. Am J Hum Genet. 2008; 82(5):1130-40. Behar DM, van Oven M, Rosset S, Metspalu M, Loogväli EL, Silva NM, Kivisild T, Torroni A, Villems R. A "Copernican" reassessment of the human mitochondrial DNA tree from its root. Am J Hum Genet. 2012a; 90(4):675-84. Behar DM, Harmant C, Manry J, van Oven M, Haak W, Martinez-Cruz B, Salaberria J, Oyharçabal B, Bauduer F, Comas D, Quintana-Murci L; Genographic Consortium. The Basque paradigm: genetic evidence of a maternal continuity in the Franco- Cantabrian region since pre-Neolithic times. Am J Hum Genet. 2012b; 90(3):486- 93. Beja-Pereira A, Caramelli D, Lalueza-Fox C, Vernesi C, Ferrand N, Casoli A, Goyache F, Royo LJ, Conti S, Lari M, Martini A, Ouragh L, Magid A, Atash A, Zsolnai A, Boscato P, Triantaphylidis C, Ploumi K, Sineo L, Mallegni F, Taberlet P, Erhardt G, Sampietro L, Bertranpetit J, Barbujani G, Luikart G, Bertorelle G. The origin of European cattle: evidence from modern and ancient DNA. Proc Natl Acad Sci U S A. 2006; 103(21):8113-8. Bellini R, Calvitti M, Medici A, Carrieri M, Celli G, Maini S. “Use of the sterile insect technique against Aedes albopictus in Italy: first results of a pilot trial” in Area- Wide Control of Insect Pests, From Research to Field Implementation, eds. Vreysen MJB, Robinson AS, Hendrichs J. (Springer Netherlands). 2007; pp 505-515. Bendall KE, Sykes BC. Length heteroplasmy in the first hypervariable segment of the human mtDNA control region. Am J Hum Genet. 1995; 57(2):248-56. Benedict MQ, Levine RS, Hawley WA, Lounibos LP. Spread of the tiger: global risk of invasion by the mosquito Aedes albopictus. Vector Borne Zoonotic Dis. 2007; 7(1):76-85. Berger JF, Guilaine J. The 8200 cal BP abrupt environmental change and the Neolithic transition: a Mediterranean perspective. Quatern Int. 2009; 200: 31–49. Birky CW. Relaxed and stringent genomes: why cytoplasmic genes don't obey Mendel's Laws. J Hered. 1994; 85(5):355-65. Birky CW. The inheritance of genes in mitochondria and chloroplasts: laws, mechanisms, and models. Annu Rev Genet. 2001; 35:125-48.

135

REFERENCES

Birungi J, Munstermann LE. Genetic structure of Aedes albopictus (Diptera: Culicidae) populations based on mitochondrial ND5 sequences: evidence for an independent invasion into Brazil and United States. Ann Entomol Soc Am. 2002; 95(1):125-32. Boattini A, Castrì L, Sarno S, Useli A, Cioffi M, Sazzini M, Garagnani P, De Fanti S, Pettener D, Luiselli D. mtDNA variation in East Africa unravels the history of Afro-Asiatic groups. Am J Phys Anthropol. 2013; 150(3):375-85. Bodner M, Perego UA, Huber G, Fendt L, Röck AW, Zimmermann B, Olivieri A, Gómez- Carballa A, Lancioni H, Angerhofer N, Bobillo MC, Corach D, Woodward SR, Salas A, Achilli A, Torroni A, Bandelt HJ, Parson W. Rapid coastal spread of first Americans: novel insights from South America's Southern Cone mitochondrial genomes. Genome Res. 2012; 22(5):811-20. Bogenhagen D, Clayton DA. Mouse L cell mitochondrial DNA molecules are selected randomly for replication throughout the cell cycle. Cell. 1977; 11(4):719-27. Boivin N, Fuller DQ. Shell middens, ships and seeds: exploring coastal subsistence, maritime trade and the dispersal of domesticates in and around the ancient Arabian peninsula. J World Prehist. 2009; 22:113–80. Bollongino R, Burger J, Powell A, Mashkour M, Vigne JD, Thomas MG. Modern taurine cattle descended from small number of near-eastern founders. Mol Biol Evol. 2012; 29(9):2101-4. Bonfiglio S, Achilli A, Olivieri A, Negrini R, Colli L, Liotta L, Ajmone-Marsan P, Torroni A, Ferretti L. The enigmatic origin of bovine mtDNA haplogroup R: sporadic interbreeding or an independent event of Bos primigenius domestication in Italy? PLoS One. 2010; 5(12):e15760. Bonfiglio S, Ginja C, De Gaetano A, Achilli A, Olivieri A, Colli L, Tesfaye K, Agha SH, Gama LT, Cattonaro F, Penedo MC, Ajmone-Marsan P, Torroni A, Ferretti L. Origin and spread of Bos taurus: new clues from mitochondrial genomes belonging to haplogroup T1. PLoS One. 2012; 7(6):e38601. Bonizzoni M, Gasperi G, Chen X, James AA. The invasive mosquito species Aedes albopictus: current knowledge and future perspectives. Trends Parasitol. 2013; 29(9):460-8. Bowmaker M, Yang MY, Yasukawa T, Reyes A, Jacobs HT, Huberman JA, Holt IJ. Mammalian mitochondrial DNA replicates bidirectionally from an initiation zone. J Biol Chem. 2003; 278(51):50961-9. Brown WM, George M, Wilson AC. Rapid evolution of animal mitochondrial DNA. Proc Natl Acad Sci U S A. 1979; 76(4):1967-71. Brandstätter A, Peterson CT, Irwin JA, Mpoke S, Koech DK, Parson W, Parsons TJ. Mitochondrial DNA control region sequences from Nairobi (): inferring phylogenetic parameters for the establishment of a forensic database. Int J Legal Med. 2004; 118(5):294-306.

136

REFERENCES

Calloway CD, Reynolds RL, Herrin GL Jr, Anderson WW. The frequency of heteroplasmy in the HVII region of mtDNA differs across tissue types and increases with age. Am J Hum Genet. 2000; 66:1384–97 Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987; 325(6099):31-6. Cao L, Shitara H, Horii T, Nagao Y, Imai H, Abe K, Hara T, Hayashi J, Yonekawa H. The mitochondrial bottleneck occurs without reduction of mtDNA content in female mouse germ cells. Nat Genet. 2007; 39(3):386-90. Cardoso S, Alfonso-Sánchez MA, Valverde L, Sánchez D, Zarrabeitia MT, Odriozola A, Martínez-Jarreta B, de Pancorbo MM. Genetic uniqueness of the Waorani tribe from the Ecuadorian Amazon. Heredity (Edinb). 2012; 108(6):609-15. Carling PJ, Cree LM, Chinnery PF. The implications of mitochondrial DNA copy number regulation during embryogenesis. Mitochondrion. 2011; 11(5):686-92. Cavalli-Sforza LL, Menozzi P, Piazza, A. The history and geography of human genes. Princeton Univ. Press. Princeton, New Jersey, USA. 1994. Cerný V, Pereira L, Kujanová M, Vasíková A, Hájek M, Morris M, Mulligan CJ. Out of Arabia-the settlement of island Soqotra as revealed by mitochondrial and Y chromosome genetic diversity. Am J Phys Anthropol. 2009; 138(4):439-47. Cerný V, Mulligan CJ, Fernandes V, Silva NM, Alshamali F, Non A, Harich N, Cherni L, El Gaaied AB, Al-Meeri A, Pereira L. Internal diversification of mitochondrial haplogroup R0a reveals post-last glacial maximum demographic expansions in South Arabia. Mol Biol Evol. 2011; 28(1):71-8. Chandrasekar A, Kumar S, Sreenath J, Sarkar BN, Urade BP, Mallick S, Bandopadhyay SS, Barua P, Barik SS, Basu D, Kiran U, Gangopadhyay P, Sahani R, Prasad BV, Gangopadhyay S, Lakshmi GR, Ravuri RR, Padmaja K, Venugopal PN, Sharma MB, Rao VR. Updating phylogeny of mitochondrial DNA macrohaplogroup M in India: dispersal of modern human in South Asian corridor. PLoS One. 2009; 4(10):e7447. Chatters JC, Kennett DJ, Asmerom Y, Kemp BM, Polyak V, Blank AN, Beddows PA, Reinhardt E, Arroyo-Cabrales J, Bolnick DA, Malhi RS, Culleton BJ, Erreguerena PL, Rissolo D, Morell-Hart S, Stafford TW Jr. Late Pleistocene human skeleton and mtDNA link Paleoamericans and modern Native Americans. Science. 2014; 344(6185):750-4. Chaubey G, Karmin M, Metspalu E, Metspalu M, Selvi-Rani D, Singh VK, Parik J, Solnik A, Naidu BP, Kumar A, Adarsh N, Mallick CB, Trivedi B, Prakash S, Reddy R, Shukla P, Bhagat S, Verma S, Vasnik S, Khan I, Barwa A, Sahoo D, Sharma A, Rashid M, Chandra V, Reddy AG, Torroni A, Foley RA, Thangaraj K, Singh L, Kivisild T, Villems R. Phylogeography of mtDNA haplogroup R7 in the Indian peninsula. BMC Evol Biol. 2008; 8:227. Chen XG, Jiang X, Gu J, Xu M, Wu Y, Deng Y, Zhang C, Bonizzoni M, Dermauw W, Vontas J, Armbruster P, Huang X, Yang Y, Zhang H, He W, Peng H, Liu Y, Wu K, Chen J, Lirakis M, Topalis P, Van Leeuwen T, Hall AB, Jiang X, Thorpe C, Mueller

137

REFERENCES

RL, Sun C, Waterhouse RM, Yan G, Tu ZJ, Fang X, James AA. Genome sequence of the Asian Tiger mosquito, Aedes albopictus, reveals insights into its biology, genetics, and evolution. Proc Natl Acad Sci U S A. 2015; 112(44):E5907-15. Chinnery PF, Samuels DC. Relaxed replication of mtDNA: A model with implications for the expression of disease. Am J Hum Genet. 1999; 64(4):1158-65. Chinnery PF, Schon EA. Mitochondria. J Neurol Neurosurg Psychiatry. 2003; 74(9):1188- 99. Chinnery PF, Hudson G. Mitochondrial genetics. Br Med Bull. 2013; 106:135-59. Chouin-Carneiro T, Vega-Rua A, Vazeille M, Yebakima A, Girod R, Goindin D, Dupont- Rouzeyrol M, Lourenço-de-Oliveira R, Failloux AB. Differential susceptibilities of Aedes aegypti and Aedes albopictus from the Americas to Zika Virus. PLoS Negl Trop Dis. 2016; 10(3):e0004543. Christian BE, Spremulli LL. Evidence for an active role of IF3mt in the initiation of translation in mammalian mitochondria. Biochemistry. 2009; 48(15):3269-78. Clayton DA, Doda JN, Friedberg EC. The absence of a pyrimidine dimer repair mechanism in mammalian mitochondria. Proc Natl Acad Sci U S A. 1974; 71(7):2777-81. Clayton DA. Replication of animal mitochondrial DNA. Cell. 1982; 28(4):693-705. Clark PU, Dyke AS, Shakun JD, Carlson AE, Clark J, Wohlfarth B, Mitrovica JX, Hostetler SW, McCabe AM. The Last Glacial Maximum. Science. 2009; 325(5941):710-4. Costa MD, Pereira JB, Pala M, Fernandes V, Olivieri A, Achilli A, Perego UA, Rychkov S, Naumova O, Hatina J, Woodward SR, Eng KK, Macaulay V, Carr M, Soares P, Pereira L, Richards MB. A substantial prehistoric European ancestry amongst Ashkenazi maternal lineages. Nat Commun. 2013; 4:2543. Cree LM, Samuels DC, de Sousa Lopes SC, Rajasimha HK, Wonnapinij P, Mann JR, Dahl HH, Chinnery PF. A reduction of mitochondrial DNA molecules during embryogenesis explains the rapid segregation of genotypes. Nat Genet. 2008; 40(2):249-54. deMenocal PB, Stringer C. Human migration: Climate and the peopling of the world. Nature. 2016; 538(7623):49-50. de Saint Pierre M, Gandini F, Perego UA, Bodner M, Gómez-Carballa A, Corach D, Angerhofer N, Woodward SR, Semino O, Salas A, Parson W, Moraga M, Achilli A, Torroni A, Olivieri A. Arrival of Paleo-Indians to the Southern Cone of South America: new clues from mitogenomes. PLoS One. 2012; 7: e51311. Delatte H, Bagny L, Brengue C, Bouetard A, Paupy C, Fontenille D. The invaders: phylogeography of dengue and chikungunya viruses Aedes vectors, on the South West islands of the Indian Ocean. Infect Genet Evol. 2011;11(7):1769-81. Derenko M, Malyarchuk B, Grzybowski T, Denisova G, Dambueva I, Perkova M, Dorzhu C, Luzina F, Lee HK, Vanecek T, Villems R, Zakharov I. Phylogeographic

138

REFERENCES

analysis of mitochondrial DNA in northern Asian populations. Am J Hum Genet. 2007; 81(5):1025-41. Derenko M, Malyarchuk B, Grzybowski T, Denisova G, Rogalla U, Perkova M, Dambueva I, Zakharov I. Origin and post-glacial dispersal of mitochondrial DNA haplogroups C and D in northern Asia. PLoS One. 2010; 5(12):e15214. Deschamps M, Laval G, Fagny M, Itan Y, Abel L, Casanova JL, Patin E, Quintana-Murci L. Genomic signatures of selective pressures and introgression from archaic hominins at human innate immunity genes. Am J Hum Genet. 2016; 98(1):5-21. Dillehay TD, Ramírez C, Pino M, Collins MB, Rossen J, Pino-Navarro JD. Monte Verde: seaweed, food, medicine, and the peopling of South America. Science. 2008; 320(5877):784-6. DiMauro S, Schon EA. Mitochondrial respiratory-chain diseases. N Engl J Med. 2003; 348(26):2656-68. Dritsou V, Topalis P, Windbichler N, Simoni A, Hall A, Lawson D, Hinsley M, Hughes D, Napolioni V, Crucianelli F, Deligianni E, Gasperi G, Gomulski LM, Savini G, Manni M, Scolari F, Malacrida AR, Arcà B, Ribeiro JM, Lombardo F, Saccone G, Salvemini M, Moretti R, Aprea G, Calvitti M, Picciolini M, Papathanos PA, Spaccapelo R, Favia G, Crisanti A, Louis C. A draft genome sequence of an invasive mosquito: an Italian Aedes albopictus. Pathog Glob Health. 2015; 109(5):207-20. Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005; 22(5):1185-92. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007; 7:214. Dueñas JC, Gardenal CN, Llinás GA, Panzetta-Dutari GM. Structural organization of the mitochondrial DNA control region in Aedes aegypti. Genome. 2006; 49(8):931-7. Duggan AT, Evans B, Friedlaender FR, Friedlaender JS, Koki G, Merriwether DA, Kayser M, Stoneking M. Maternal history of Oceania from complete mtDNA genomes: contrasting ancient diversity with recent homogenization due to the Austronesian expansion. Am J Hum Genet. 2014; 94(5):721-33. Duggan AT, Stoneking M. Recent developments in the genetic history of East Asia and Oceania. Curr Opin Genet Dev. 2014; 29:9-14. Edwards CJ, Bollongino R, Scheu A, Chamberlain A, Tresset A, Vigne JD, Baird JF, Larson G, Ho SY, Heupink TH, Shapiro B, Freeman AR, Thomas MG, Arbogast RM, Arndt B, Bartosiewicz L, Benecke N, Budja M, Chaix L, Choyke AM, Coqueugniot E, Döhle HJ, Göldner H, Hartz S, Helmer D, Herzig B, Hongo H, Mashkour M, Ozdogan M, Pucher E, Roth G, Schade-Lindig S, Schmölcke U, Schulting RJ, Stephan E, Uerpmann HP, Vörös I, Voytek B, Bradley DG, Burger J. Mitochondrial DNA analysis shows a Near Eastern Neolithic origin for domestic cattle and no indication of domestication of European aurochs. Proc Biol Sci. 2007; 274(1616):1377-85.

139

REFERENCES

Elson JL, Andrews RM, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Analysis of European mtDNAs for recombination. Am J Hum Genet. 2001; 68(1):145-53. Ennafaa H, Cabrera VM, Abu-Amero KK, González AM, Amor MB, Bouhaha R, Dzimiri N, Elgaaïed AB, Larruga JM. Mitochondrial DNA haplogroup H structure in North Africa. BMC Genet. 2009; 10:8. Ermini L, Olivieri C, Rizzi E, Corti G, Bonnal R, Soares P, Luciani S, Marota I, De Bellis G, Richards MB, Rollo F. Complete mitochondrial genome sequence of the Tyrolean Iceman. Curr Biol. 2008; 18(21):1687-93. Falkenberg M, Gaspari M, Rantanen A, Trifunovic A, Larsson NG, Gustafsson CM. Mitochondrial transcription factors B1 and B2 activate transcription of human mtDNA. Nat Genet. 2002; 31(3):289-94. Falkenberg M, Larsson NG, Gustafsson CM. DNA replication and transcription in mammalian mitochondria. Annu Rev Biochem. 2007; 76:679-99. Fagundes NJ, Kanitz R, Eckert R, Valls AC, Bogo MR, Salzano FM, Smith DG, Silva WA Jr, Zago MA, Ribeiro-dos-Santos AK, Santos SE, Petzl-Erler ML, Bonatto SL. Mitochondrial population genomics supports a single pre-Clovis origin with a coastal route for the peopling of the Americas. Am J Hum Genet. 2008; 82(3):583-92. Fattovich R. The development of ancient states in the northern Horn of Africa, c 3000 BC–AD 1000: An archaeological outline. J World Prehist. 2010; 23:145–75. Fehren-Schmitz L, Llamas B, Lindauer S, Tomasto-Cagigao E, Kuzminsky S, Rohland N, Santos FR, Kaulicke P, Valverde G, Richards SM, Nordenfelt S, Seidenberg V, Mallick S, Cooper A, Reich D, Haak W. A re-appraisal of the early Andean human remains from Lauricocha in Peru. PLoS One. 2015; 10(6):e0127141. Fernandes V, Alshamali F, Alves M, Costa MD, Pereira JB, Silva NM, Cherni L,Harich N, Cerny V, Soares P, Richards MB, Pereira L. The Arabian cradle: mitochondrial relicts of the first steps along the southern route out of Africa. Am J Hum Genet. 2012; 90:347–55. Fernandes V, Triska P, Pereira JB, Alshamali F, Rito T, Machado A, Fajkošová Z, Cavadas B, Černý V, Soares P, Richards MB, Pereira L. Genetic stratigraphy of key demographic events in Arabia. PLoS One. 2015; 10(3):e0118625. Fix AG. Rapid deployment of the five founding Amerind mtDNA haplogroups via coastal and riverine colonization. Am J Phys Anthropol. 2005; 128(2):430-6. Forster LM, Tannhauser M, Barros HM. Drug use among street children in southern Brazil. Drug Alcohol Depend. 1996; 43(1-2):57-62. Forster P, Matsumura S. Evolution. Did early humans go north or south? Science. 2005; 308(5724):965-6. Francalacci P, Morelli L, Angius A, Berutti R, Reinier F, Atzeni R, Pilu R, Busonero F, Maschio A, Zara I, Sanna D, Useli A, Urru MF, Marcelli M, Cusano R, Oppo M, Zoledziewska M, Pitzalis M, Deidda F, Porcu E, Poddie F, Kang HM, Lyons R, Tarrier B, Gresham JB, Li B, Tofanelli S, Alonso S, Dei M, Lai S, Mulas A, Whalen

140

REFERENCES

MB, Uzzau S, Jones C, Schlessinger D, Abecasis GR, Sanna S, Sidore C, Cucca F. Low-pass DNA sequencing of 1200 Sardinians reconstructs European Y- chromosome phylogeny. Science. 2013; 341(6145):565-9. Francalacci P, Sanna D, Useli A, Berutti R, Barbato M, Whalen MB, Angius A, Sidore C, Alonso S, Tofanelli S, Cucca F. Detection of phylogenetically informative polymorphisms in the entire euchromatic portion of human Y chromosome from a Sardinian sample. BMC Res Notes. 2015; 8:174. Friedlaender J, Schurr T, Gentz F, Koki G, Friedlaender F, Horvat G, Babb P, Cerchio S, Kaestle F, Schanfield M, Deka R, Yanagihara R, Merriwether DA. Expanding Southwest Pacific mitochondrial haplogroups P and Q. Mol Biol Evol. 2005; 22(6):1506-17. Fu Q, Posth C, Hajdinjak M, Petr M, Mallick S, Fernandes D, Furtwängler A, Haak W, Meyer M, Mittnik A, Nickel B, Peltzer A, Rohland N, Slon V, Talamo S, Lazaridis I, Lipson M, Mathieson I, Schiffels S, Skoglund P, Derevianko AP, Drozdov N, Slavinsky V, Tsybankov A, Cremonesi RG, Mallegni F, Gély B, Vacca E, Morales MR, Straus LG, Neugebauer-Maresch C, Teschler-Nicola M, Constantin S, Moldovan OT, Benazzi S, Peresani M, Coppola D, Lari M, Ricci S, Ronchitelli A, Valentin F, Thevenet C, Wehrberger K, Grigorescu D, Rougier H, Crevecoeur I, Flas D, Semal P, Mannino MA, Cupillard C, Bocherens H, Conard NJ, Harvati K, Moiseyev V, Drucker DG, Svoboda J, Richards MP, Caramelli D, Pinhasi R, Kelso J, Patterson N, Krause J, Pääbo S, Reich D. The genetic history of Ice Age Europe. Nature. 2016; 534(7606):200-5. Futami K, Valderrama A, Baldi M, Minakawa N, Marín Rodríguez R, Chaves LF. New and common haplotypes shape genetic diversity in Asian tiger mosquito populations from Costa Rica and Panamá. J Econ Entomol. 2015; 108(2):761-8. Gamble C, Davies W, Pettitt P, Richards M. Climate change and evolving human diversity in Europe during the last glacial. Philos Trans R Soc Lond B Biol Sci. 2004; 359(1442):243-53. Gandini F, Achilli A, Pala M, Bodner M, Brandini S, Huber G, Egyed B, Ferretti L, Gómez-Carballa A, Salas A, Scozzari R, Cruciani F, Coppa A, Parson W, Semino O, Soares P, Torroni A, Richards MB, Olivieri A. Mapping human dispersals into the Horn of Africa from Arabian Ice Age refugia using mitogenomes. Sci Rep. 2016; 6:25472. Gasparre G, Porcelli AM, Bonora E, Pennisi LF, Toller M, Iommarini L, Ghelli A, Moretti M, Betts CM, Martinelli GN, Ceroni AR, Curcio F, Carelli V, Rugolo M, Tallini G, Romeo G. Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors. Proc Natl Acad Sci U S A. 2007; 104(21):9001-6. Gasparre G, Porcelli AM, Lenaz G, Romeo G. Relevance of mitochondrial genetics and metabolism in cancer development. Cold Spring Harb Perspect Biol. 2013; 5(2). pii: a011411.

141

REFERENCES

Gasperi G, Bellini R, Malacrida AR, Crisanti A, Dottori M, Aksoy S. A new threat looming over the Mediterranean basin: emergence of viral diseases transmitted by Aedes albopictus mosquitoes. PLoS Negl Trop Dis. 2012; 6(9):e1836. Gilbert MT, Jenkins DL, Götherstrom A, Naveran N, Sanchez JJ, Hofreiter M, Thomsen PF, Binladen J, Higham TF, Yohe RM 2nd, Parr R, Cummings LS, Willerslev E. DNA from pre-Clovis human coprolites in Oregon, North America. Science. 2008; 320(5877):786-9. Giles RE, Blanc H, Cann HM, Wallace DC. Maternal inheritance of human mitochondrial DNA. Proc Natl Acad Sci U S A. 1980; 77(11):6715-19. Gómez-Carballa A, Catelli L, Pardo-Seco J, Martinón-Torres F, Roewer L, Vullo C, Salas A. The complete mitogenome of a 500-year-old Inca child mummy. Sci Rep. 2015; 5:16462. Gonder MK, Mortensen HM, Reed FA, de Sousa A, Tishkoff SA. Whole-mtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol. 2007; 24(3):757-68. González AM, Larruga JM, Abu-Amero KK, Shi Y, Pestano J, Cabrera VM. Mitochondrial lineage M1 traces an early human backflow to Africa. BMC Genomics. 2007; 8:223. González-Andrade F, Sánchez D, González-Solórzano J, Gascón S, Martínez-Jarreta B. Sex-specific genetic admixture of Mestizos, Amerindian Kichwas, and Afro- Ecuadorans from Ecuador. Hum Biol. 2007; 79(1):51-77. Goubert C, Minard G, Vieira C, Boulesteix M. Population genetics of the Asian tiger mosquito Aedes albopictus, an invasive vector of human diseases. Heredity (Edinb). 2016; 117(3):125-34. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, Alkan C, Prüfer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Höber B, Höffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Z, Gusic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, de la Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PL, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, Pääbo S. A draft sequence of the Neandertal genome. Science. 2010; 328(5979):710-22. Greenberg JH, Christy GTI, Zegura SL. The settlement of the Americas: A comparison of the linguistic, dental, and genetic evidence. Curr Anthropol. 1986; 27(5):477– 97. Greenberg JH. Language in the Americas. Stanford, CA: Stanford University Press. 1987. Groucutt HS, Petraglia MD, Bailey G, Scerri EM, Parton A, Clark-Balzan L, Jennings RP, Lewis L, Blinkhorn J, Drake NA, Breeze PS, Inglis RH, Devès MH, Meredith- Williams M, Boivin N, Thomas MG, Scally A. Rethinking the dispersal of Homo sapiens out of Africa. Evol Anthropol. 2015; 24(4):149-64. Grün R, Stringer CB. Electron spin resonance dating and the evolution of modern humans. Archaeometry. 1991; 33:153-99.

142

REFERENCES

Gubler DJ. Dengue/dengue haemorrhagic fever: history and current status. Novartis Found Symp. 2006; 277:3-16; discussion 16-22, 71-3, 251-3. Günther T, Valdiosera C, Malmström H, Ureña I, Rodriguez-Varela R, Sverrisdóttir ÓO, Daskalaki EA, Skoglund P, Naidoo T, Svensson EM, Bermúdez de Castro JM, Carbonell E, Dunn M, Storå J, Iriarte E, Arsuaga JL, Carretero JM, Götherström A, Jakobsson M. Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proc Natl Acad Sci U S A. 2015; 112(38):11917-22. Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, Fu Q, Mittnik A, Bánffy E, Economou C, Francken M, Friederich S, Pena RG, Hallgren F, Khartanovich V, Khokhlov A, Kunst M, Kuznetsov P, Meller H, Mochalov O, Moiseyev V, Nicklisch N, Pichler SL, Risch R, Rojo Guerra MA, Roth C, Szécsényi-Nagy A, Wahl J, Meyer M, Krause J, Brown D, Anthony D, Cooper A, Alt KW, Reich D. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015; 522(7555):207-11. Hagelberg E, Goldman N, Lio P, Whelan S, Schiefenhovel W, Clegg JB, Bowden DK. Evidence for mitochondrial DNA recombination in a human population of island Melanesia. Proc Biol Sci. 1999; 266(1418):485-92. Hebert PD, Cywinska A, Ball SL, deWaard JR. Biological identifications through DNA barcodes. Proc Biol Sci. 2003; 270(1512):313-21. Helmer D, Gourichon L, Monchot H, Peters J, Sana Segui M. The first steps of animal domestication: new archaeozoological approaches. In: Vigne JD, Helmer D, Peters J, editors. Oxford: Oxbow Books. 2005. Hervella M, Svensson EM, Alberdi A, Günther T, Izagirre N, Munters AR, Alonso S, Ioana M, Ridiche F, Soficaru A, Jakobsson M, Netea MG, de-la-Rua C. The mitogenome of a 35,000-year-old Homo sapiens from Europe supports a Palaeolithic back- migration to Africa. Sci Rep. 2016; 6:25501. Higa Y, Toma T, Tsuda Y, Miyagi I. A multiplex PCR-based molecular identification of five morphologically related, medically important subgenus Stegomyia mosquitoes from the genus Aedes (Diptera: Culicidae) found in the Ryukyu Archipelago, Japan. Jpn J Infect Dis. 2010; 63(5):312-6. Hill C, Soares P, Mormina M, Macaulay V, Clarke D, Blumbach PB, Vizuete-Forster M, Forster P, Bulbeck D, Oppenheimer S, Richards M. A mitochondrial stratigraphy for island southeast Asia. Am J Hum Genet. 2007; 80(1):29-43. Ho SY. The molecular clock and estimating species divergence. Nature Education 2008; 1(1):168. Ho SY, Duchêne S. Molecular-clock methods for estimating evolutionary rates and timescales. Mol Ecol. 2014; 23(24):5947-65. Hodgson JA, Mulligan CJ, Al-Meeri A, Raaum RL. Early back-to-Africa migration into the Horn of Africa. PLoS Genet. 2014; 10(6):e1004393.

143

REFERENCES

Hoffecker JF, Elias SA, O'Rourke DH. Anthropology. Out of Beringia? Science. 2014; 343(6174):979-80. Hoffecker JF, Elias SA, O'Rourke DH, Scott GR, Bigelow NH. Beringia and the global dispersal of modern humans. Evol Anthropol. 2016; 25(2):64-78. Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-Del-Molino D, van Dorp L, López S, Kousathanas A, Link V, Kirsanow K, Cassidy LM, Martiniano R, Strobel M, Scheu A, Kotsakis K, Halstead P, Triantaphyllou S, Kyparissi- Apostolika N, Urem-Kotsou D, Ziota C, Adaktylou F, Gopalan S, Bobo DM, Winkelbach L, Blöcher J, Unterländer M, Leuenberger C, Çilingiroğlu Ç, Horejs B, Gerritsen F, Shennan SJ, Bradley DG, Currat M, Veeramah KR, Wegmann D, Thomas MG, Papageorgopoulou C, Burger J. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Acad Sci U S A. 2016; 113(25):6886-91. Homburger JR, Moreno-Estrada A, Gignoux CR, Nelson D, Sanchez E, Ortiz-Tello P, Pons-Estel BA, Acevedo-Vasquez E, Miranda P, Langefeld CD, Gravel S, Alarcón- Riquelme ME, Bustamante CD. Genomic insights into the ancestry and demographic history of South America. PLoS Genet. 2015; 11(12):e1005602. Hooshiar Kashani B, Perego UA, Olivieri A, Angerhofer N, Gandini F, Carossa V, Lancioni H, Semino O, Woodward SR, Achilli A, Torroni A. Mitochondrial haplogroup C4c: a rare lineage entering America through the ice-free corridor? Am J Phys Anthropol. 2012; 147(1):35-9. Howell N, Elson JL, Howell C, Turnbull DM. Relative rates of evolution in the coding and control regions of African mtDNAs. Mol Biol Evol. 2007; 24(10):2213-21. Hubbe M, Neves WA, Harvati K. Testing evolutionary and dispersion scenarios for the settlement of the New World. PLoS One. 2010; 5(6):e11105. Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P, Oefner P, Renfrew C, Villems R, Forster P. Revealing the prehistoric settlement of Australia by Y chromosome and mtDNA analysis. Proc Natl Acad Sci U S A. 2007; 104(21):8726-30. Iborra FJ, Kimura H, Cook PR. The functional organization of mitochondrial genomes in human cells. BMC Biol. 2004; 2:9. Ingman M, Kaessmann H, Paabo S, Gyllensten U. Mitochondrial genome variation and the origin of modern humans. Nature. 2000; 408(6813):708-13. Irwin JA, Saunier JL, Niederstätter H, Strouss KM, Sturk KA, Diegoli TM, Brandstätter A, Parson W, Parsons TJ. Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples. J Mol Evol. 2009; 68(5):516-27. Jazin EE, Cavelier L, Eriksson I, Oreland L, Gyllensten U. Human brain contains high levels of heteroplasmy in the noncoding regions of mitochondrial DNA. Proc Natl Acad Sci U S A. 1996; 93:12382–7.

144

REFERENCES

Jenuth JP, Peterson AC, Fu K, Shoubridge EA. Random genetic drift in the female germline explains the rapid segregation of mammalian mitochondrial DNA. Nat Genet. 1996; 14(2):146-51. Jobling MA, Hurles M, Tyler-Smith C. Human Evolutionary Genetics: origins, peoples and disease. London/New York: Garland Science Publishing. 2004; pp 523. Jukes TH. Amino acid codes in mitochondria as possible clues to primitive codes. J Mol Evol. 1981; 18(1):15-7. Just RS, Diegoli TM, Saunier JL, Irwin JA, Parsons TJ. Complete mitochondrial genome sequences for 265 African American and U.S. "Hispanic" individuals. Forensic Sci Int Genet. 2008; 2(3):e45-8. Just RS, Scheible MK, Fast SA, Sturk-Andreaggi K, Röck AW, Bush JM, Higginbotham JL, Peck MA, Ring JD, Huber GE, Xavier C, Strobl C, Lyons EA, Diegoli TM, Bodner M, Fendt L, Kralj P, Nagl S, Niederwieser D, Zimmermann B, Parson W, Irwin JA. Full mtGenome reference data: development and characterization of 588 forensic-quality haplotypes representing three U.S. populations. Forensic Sci Int Genet. 2015; 14:141-55. Kambhampati S, Black WC 4th, Rai KS. Geographic origin of the US and Brazilian Aedes albopictus inferred from allozyme analysis. Heredity (Edinb). 1991; 67(Pt 1):85-93. Kamgang B, Brengues C, Fontenille D, Njiokou F, Simard F, Paupy C. Genetic structure of the tiger mosquito, Aedes albopictus, in Cameroon (Central Africa). PLoS One. 2011; 6(5):e20257. Kamgang B, Ngoagouni C, Manirakiza A, Nakouné E, Paupy C, Kazanji M. Temporal patterns of abundance of Aedes aegypti and Aedes albopictus (Diptera: Culicidae) and mitochondrial DNA analysis of Ae. albopictus in the Central African Republic. PLoS Negl Trop Dis. 2013;7(12):e2590. Kang L, Zheng HX, Chen F, Yan S, Liu K, Qin Z, Liu L, Zhao Z, Li L, Wang X, He Y, Jin L. mtDNA lineage expansions in Sherpa population suggest adaptive evolution in Tibetan highlands. Mol Biol Evol. 2013; 30(12):2579-87. Keller A, Graefen A, Ball M, Matzas M, Boisguerin V, Maixner F, Leidinger P, Backes C, Khairat R, Forster M, Stade B, Franke A, Mayer J, Spangler J, McLaughlin S, Shah M, Lee C, Harkins TT, Sartori A, Moreno-Estrada A, Henn B, Sikora M, Semino O, Chiaroni J, Rootsi S, Myres NM, Cabrera VM, Underhill PA, Bustamante CD, Vigl EE, Samadelli M, Cipollini G, Haas J, Katus H, O'Connor BD, Carlson MR, Meder B, Blin N, Meese E, Pusch CM, Zink A. New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing. Nat Commun. 2012; 3:698. Kemp BM, Malhi RS, McDonough J, Bolnick DA, Eshleman JA, Rickards O, Martinez- Labarga C, Johnson JR, Lorenz JG, Dixon EJ, Fifield TE, Heaton TH, Worl R, Smith DG. Genetic analysis of early Holocene skeletal remains from Alaska and its implications for the settlement of the Americas. Am J Phys Anthropol. 2007; 132(4):605-21.

145

REFERENCES

Kemp BM, González-Oliver A, Malhi RS, Monroe C, Schroeder KB, McDonough J, Rhett G, Resendéz A, Peñaloza-Espinosa RI, Buentello-Malo L, Gorodesky C, Smith DG. Evaluating the farming/language dispersal hypothesis with genetic variation exhibited by populations in the Southwest and Mesoamerica. Proc Natl Acad Sci U S A. 2010; 107(15):6759-64. Kemp BM, Shurr TG. Ancient and modern genetic variation in the Americas. In: AuerbachBM (Ed.) Human variation in the Americas. Occasional paper No. 38. Carbondale, IL: Center for Archaeological Investigations, Southern Illinois University. 2010; pp.12-50. Khormi HM, Kumar L. Climate change and the potential global distribution of Aedes aegypti: spatial modelling using GIS and CLIMEX. Geospat Health. 2014; 8(2):405-15. Khrapko K. Two ways to make an mtDNA bottleneck. Nat Genet. 2008; 40(2):134-5. Kitchen A, Ehret C, Assefa S, Mulligan CJ. Bayesian phylogenetic analysis of identifies an Early Bronze Age origin of Semitic in the Near East. Proc Biol Sci. 2009; 276(1668):2703–10. Kivisild T, Reidla M, Metspalu E, Rosa A, Brehm A, Pennarun E, Parik J, Geberhiwot T, Usanga E, Villems R. Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am J Hum Genet. 2004; 75(5):752-70. Kivisild T, Shen P, Wall DP, Do B, Sung R, Davis K, Passarino G, Underhill PA, Scharfe C, Torroni A, Scozzari R, Modiano D, Coppa A, de Knijff P, Feldman M, Cavalli- Sforza LL, Oefner PJ. The role of selection in the evolution of human mitochondrial genomes. Genetics. 2006; 172(1):373-87. Knight RD, Freeland SJ, Landweber LF. Rewiring the keyboard: evolvability of the genetic code. Nat Rev Genet. 2001; 2(1):49-58. Koc EC, Spremulli LL. Identification of mammalian mitochondrial translational initiation factor 3 and examination of its role in initiation complex formation with natural mRNAs. J Biol Chem. 2002; 277(38):35541-9. Kong QP, Bandelt HJ, Sun C, Yao YG, Salas A, Achilli A, Wang CY, Zhong L, Zhu CL, Wu SF, Torroni A, Zhang YP. Updating the East Asian mtDNA phylogeny: a prerequisite for the identification of pathogenic mutations. Hum Mol Genet. 2006; 15(13):2076-86. Kornberg A BT. DNA replication. New York: WH Freeman and Company, New York. 1992. Kornblum C, Zsurka G, Wiesner RJ, Schröder R, Kunz WS. Concerted action of two novel tRNA mtDNA point mutations in chronic progressive external ophthalmoplegia. Biosci Rep. 2008; 28(2):89-96. Kraemer MU, Sinka ME, Duda KA, Mylne A, Shearer FM, Brady OJ, Messina JP, Barker CM, Moore CG, Carvalho RG, Coelho GE, Van Bortel W, Hendrickx G, Schaffner F, Wint GR, Elyazar IR, Teng HJ, Hay SI. The global compendium of Aedes aegypti and Ae. albopictus occurrence. Sci Data. 2015; 2:150035.

146

REFERENCES

Kumar S, Bellis C, Zlojutro M, Melton PE, Blangero J, Curran JE. Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins. BMC Evol Biol. 2011; 11:293. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, Berger B, Economou C, Bollongino R, Fu Q, Bos KI, Nordenfelt S, Li H, de Filippo C, Prüfer K, Sawyer S, Posth C, Haak W, Hallgren F, Fornander E, Rohland N, Delsate D, Francken M, Guinet JM, Wahl J, Ayodo G, Babiker HA, Bailliet G, Balanovska E, Balanovsky O, Barrantes R, Bedoya G, Ben-Ami H, Bene J, Berrada F, Bravi CM, Brisighelli F, Busby GB, Cali F, Churnosov M, Cole DE, Corach D, Damba L, van Driem G, Dryomov S, Dugoujon JM, Fedorova SA, Gallego Romero I, Gubina M, Hammer M, Henn BM, Hervig T, Hodoglugil U, Jha AR, Karachanak-Yankova S, Khusainova R, Khusnutdinova E, Kittles R, Kivisild T, Klitz W, Kučinskas V, Kushniarevich A, Laredj L, Litvinov S, Loukidis T, Mahley RW, Melegh B, Metspalu E, Molina J, Mountain J, Näkkäläjärvi K, Nesheva D, Nyambo T, Osipova L, Parik J, Platonov F, Posukh O, Romano V, Rothhammer F, Rudan I, Ruizbakiev R, Sahakyan H, Sajantila A, Salas A, Starikovskaya EB, Tarekegn A, Toncheva D, Turdikulova S, Uktveryte I, Utevska O, Vasquez R, Villena M, Voevoda M, Winkler CA, Yepiskoposyan L, Zalloua P, Zemunik T, Cooper A, Capelli C, Thomas MG, Ruiz- Linares A, Tishkoff SA, Singh L, Thangaraj K, Villems R, Comas D, Sukernik R, Metspalu M, Meyer M, Eichler EE, Burger J, Slatkin M, Pääbo S, Kelso J, Reich D, Krause J. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014; 513(7518):409-13. Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, Fernandes D, Novak M, Gamarra B, Sirak K, Connell S, Stewardson K, Harney E, Fu Q, Gonzalez-Fortes G, Jones ER, Roodenberg SA, Lengyel G, Bocquentin F, Gasparian B, Monge JM, Gregg M, Eshed V, Mizrahi AS, Meiklejohn C, Gerritsen F, Bejenaru L, Blüher M, Campbell A, Cavalleri G, Comas D, Froguel P, Gilbert E, Kerr SM, Kovacs P, Krause J, McGettigan D, Merrigan M, Merriwether DA, O'Reilly S, Richards MB, Semino O, Shamoon-Pour M, Stefanescu G, Stumvoll M, Tönjes A, Torroni A, Wilson JF, Yengo L, Hovhannisyan NA, Patterson N, Lippold S, Xu H, Ko A, Li M, Renaud G, Butthof A, Schröder R, Stoneking M. Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences. Investig Genet. 2014; 5:13. Lehtinen SK, Hance N, El Meziane A, Juhola MK, Juhola KM, Karhu R, Spelbrink JN, Holt IJ, Jacobs HT. Genotypic stability, segregation and selection in heteroplasmic human cell lines containing np 3243 mutant mtDNA. Genetics. 2000; 154(1):363-80.

Lenstra JA, Ajmone-Marsan P, Beja-Pereira A, Bollongino R, Bradley DG, Colli L, De Gaetano A, Edwards CJ, Felius M, Ferretti L, Ginja C, Hristov P, Kantanen J, Liròn JP, Magee DA, Negrini R, Radoslavov GA. Meta-analysis of mitochondrial DNA reveals several population bottlenecks during worldwide migrations of cattle. Diversity. 2014; 6(1):178–87.

147

REFERENCES

Li M, Schönberg A, Schaefer M, Schroeder R, Nasidze I, Stoneking M. Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am J Hum Genet. 2010; 87(2):237-49. Li S, Besenbacher S, Li Y, Kristiansen K, Grarup N, Albrechtsen A, Sparsø T, Korneliussen T, Hansen T, Wang J, Nielsen R, Pedersen O, Bolund L, Schierup MH. Variation and association to diabetes in 2000 full mtDNA sequences mined from an exome study in a Danish population. Eur J Hum Genet. 2014; 22(8):1040-5. Lightowlers RN, Chrzanowska-Lightowlers ZM. Exploring our origins—the importance of OriL in mtDNA maintenance and replication. EMBO Rep. 2012; 13(12):1038- 9. Lippold S, Xu H, Ko A, Li M, Renaud G, Butthof A, Schröder R, Stoneking M. Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences. Investig Genet. 2014; 5:13. Liu H, Prugnolle F, Manica A, Balloux F. A geographically explicit genetic model of worldwide human-settlement history. Am J Hum Genet. 2006; 79(2):230-7. Liu M, Spremulli L. Interaction of mammalian mitochondrial ribosomes with the inner membrane. J Biol Chem. 2000; 275(38):29400-6. Llamas B, Fehren-Schmitz L, Valverde G, Soubrier J, Mallick S, Rohland N, Nordenfelt S, Valdiosera C, Richards SM, Rohrlach A, Romero MI, Espinoza IF, Cagigao ET, Jiménez LW, Makowski K, Reyna IS, Lory JM, Torrez JA, Rivera MA, Burger RL, Ceruti MC, Reinhard J, Wells RS, Politis G, Santoro CM, Standen VG, Smith C, Reich D, Ho SY, Cooper A, Haak W. Ancient mitochondrial DNA provides high- resolution time scale of the peopling of the Americas. Sci Adv. 2016; 2(4):e1501385. Luo SM, Ge ZJ, Wang ZW, Jiang ZZ, Wang ZB, Ouyang YC, Hou Y, Schatten H, Sun QY. Unique insights into maternal mitochondrial inheritance in mice. Proc Natl Acad Sci U S A. 2013; 110(32):13038-43. Maca-Meyer N, González AM, Larruga JM, Flores C, Cabrera VM. Major genomic mitochondrial lineages delineate early human expansions. BMC Genet. 2001; 2:13. Macaulay V, Richards M, Sykes B. Mitochondrial DNA recombination-no need to panic. Proc Biol Sci. 1999a; 266(1433):2037-42. Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, Scozzari R, Bonné- Tamir B, Sykes B, Torroni A. The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet. 1999b; 64(1):232-49. Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, Meehan W, Blackburn J, Semino O, Scozzari R, Cruciani F, Taha A, Shaari NK, Raja JM, Ismail P, Zainuddin Z, Goodwin W, Bulbeck D, Bandelt HJ, Oppenheimer S, Torroni A, Richards M. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005; 308(5724):1034-6.

148

REFERENCES

Malaspinas AS, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, Bergström A, Athanasiadis G, Cheng JY, Crawford JE, Heupink TH, Macholdt E, Peischl S, Rasmussen S, Schiffels S, Subramanian S, Wright JL, Albrechtsen A, Barbieri C, Dupanloup I, Eriksson A, Margaryan A, Moltke I, Pugach I, Korneliussen TS, Levkivskyi IP, Moreno-Mayar JV, Ni S, Racimo F, Sikora M, Xue Y, Aghakhanian FA, Brucato N, Brunak S, Campos PF, Clark W, Ellingvåg S, Fourmile G, Gerbault P, Injie D, Koki G, Leavesley M, Logan B, Lynch A, Matisoo-Smith EA, McAllister PJ, Mentzer AJ, Metspalu M, Migliano AB, Murgha L, Phipps ME, Pomat W, Reynolds D, Ricaut FX, Siba P, Thomas MG, Wales T, Wall CM, Oppenheimer SJ, Tyler-Smith C, Durbin R, Dortch J, Manica A, Schierup MH, Foley RA, Lahr MM, Bowern C, Wall JD, Mailund T, Stoneking M, Nielsen R, Sandhu MS, Excoffier L, Lambert DM, Willerslev E. A genomic history of Aboriginal Australia. Nature. 2016; 538(7624):207-14. Malyarchuk B, Grzybowski T, Derenko M, Perkova M, Vanecek T, Lazur J, Gomolcak P, Tsybovsky I. Mitochondrial DNA phylogeny in Eastern and Western Slavs. Mol Biol Evol. 2008; 25(8):1651-8. Malyarchuk B, Derenko M, Grzybowski T, Perkova M, Rogalla U, Vanecek T, Tsybovsky I. The peopling of Europe from the mitochondrial haplogroup U5 perspective. PLoS One. 2010a; 5(4):e10285. Malyarchuk B, Derenko M, Denisova G, Kravtsova O. Mitogenomic diversity in from the Volga-Ural region of Russia. Mol Biol Evol. 2010b; 27(10):2220-6. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, Skoglund P, Lazaridis I, Sankararaman S, Fu Q, Rohland N, Renaud G, Erlich Y, Willems T, Gallo C, Spence JP, Song YS, Poletti G, Balloux F, van Driem G, de Knijff P, Romero IG, Jha AR, Behar DM, Bravi CM, Capelli C, Hervig T, Moreno-Estrada A, Posukh OL, Balanovska E, Balanovsky O, Karachanak-Yankova S, Sahakyan H, Toncheva D, Yepiskoposyan L, Tyler-Smith C, Xue Y, Abdullah MS, Ruiz-Linares A, Beall CM, Di Rienzo A, Jeong C, Starikovskaya EB, Metspalu E, Parik J, Villems R, Henn BM, Hodoglugil U, Mahley R, Sajantila A, Stamatoyannopoulos G, Wee JT, Khusainova R, Khusnutdinova E, Litvinov S, Ayodo G, Comas D, Hammer MF, Kivisild T, Klitz W, Winkler CA, Labuda D, Bamshad M, Jorde LB, Tishkoff SA, Watkins WS, Metspalu M, Dryomov S, Sukernik R, Singh L, Thangaraj K, Pääbo S, Kelso J, Patterson N, Reich D. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016; 538(7624):201-6. Mannen H, Kohno M, Nagata Y, Tsuji S, Bradley DG, Yeo JS, Nyamsamba D, Zagdsuren Y, Yokohama M, Nomura K, Amano T. Independent mitochondrial origin and historical genetic differentiation in North Eastern Asian cattle. Mol Phylogenet Evol. 2004; 32(2):539-44. Manni M, Gomulski LM, Aketarawong N, Tait G, Scolari F, Somboon P, Guglielmino CR, Malacrida AR, Gasperi G. Molecular markers for analyses of intraspecific genetic diversity in the Asian Tiger mosquito, Aedes albopictus. Parasit Vectors. 2015; 8:188.

149

REFERENCES

Manning K, Timpson A, Colledge S, Crema E, Edinborough K, Kerig T, Shennan S. The chronology of culture: a comparative assessment of European Neolithic dating approaches. Antiquity. 2014; 88:1065–80. Marchington DR, Macaulay V, Hartshorne GM, Barlow D, Poulton J. Evidence from human oocytes for a genetic bottleneck in an mtDNA disease. Am J Hum Genet. 1998; 63(3):769-75. Margulis L. The origin of plant and animal cells. Am Sci. 1971; 59(2):230-5. Martin W, Muller M. The hydrogen hypothesis for the first eukaryote. Nature. 1998; 392(6671):37-41. Medlock JM, Hansford KM, Schaffner F, Versteirt V, Hendrickx G, Zeller H, Van Bortel W. A review of the invasive mosquitoes in Europe: ecology, public health risks, and control options. Vector Borne Zoonotic Dis. 2012; 12(6):435-47. Medlock JM, Hansford KM, Versteirt V, Cull B, Kampen H, Fontenille D, Hendrickx G, Zeller H, Van Bortel W, Schaffner F. An entomological review of invasive mosquitoes in Europe. Bull Entomol Res. 2015;105(6):637-63. Mellars P. Why did modern human populations disperse from Africa ca. 60,000 years ago? A new model. Proc Natl Acad Sci U S A. 2006a; 103(25):9381-6. Mellars P. A new radiocarbon revolution and the dispersal of modern humans in Eurasia. Nature. 2006b; 439(7079):931-5. Mellars P. Going East: new genetic and archaeological perspectives on the modern human colonization of Eurasia. Science. 2006c; 313(5788):796-800. Mellars P. Archeology and the dispersal of modern humans in Europe: Deconstructing the “Aurignacian”. Evol Anthropol. 2006d; 15(5):167-82. Mellars P, Gori KC, Carr M, Soares PA, Richards MB. Genetic and archaeological perspectives on the initial modern human colonization of southern Asia. Proc Natl Acad Sci U S A. 2013; 110(26):10699-704. Mendez FL, Watkins JC, Hammer MF. A haplotype at STAT2 Introgressed from neanderthals and serves as a candidate of positive selection in Papua New Guinea. Am J Hum Genet. 2012; 91(2):265-74. Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, de Filippo C, Sudmant PH, Alkan C, Fu Q, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Stenzel U, Dabney J, Shendure J, Kitzman J, Hammer MF, Shunkov MV, Derevianko AP, Patterson N, Andrés AM, Eichler EE, Slatkin M, Reich D, Kelso J, Pääbo S. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012; 338(6104):222-6. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, Brandon M, Easley K, Chen E, Brown MD, Sukernik RI, Olckers A, Wallace DC. Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A. 2003; 100(1):171-6.

150

REFERENCES

Mokranjac D, Neupert W. Protein import into mitochondria. Biochem Soc Trans. 2005; 33(Pt 5):1019-1023. Mona S, Catalano G, Lari M, Larson G, Boscato P, Casoli A, Sineo L, Di Patti C, Pecchioli E, Caramelli D, Bertorelle G. Population dynamic of the extinct European aurochs: genetic evidence of a north-south differentiation pattern and no evidence of post-glacial expansion. BMC Evol Biol. 2010; 10:83. Mousson L, Dauga C, Garrigues T, Schaffner F, Vazeille M, Failloux AB. Phylogeography of Aedes (Stegomyia) aegypti (L.) and Aedes (Stegomyia) albopictus (Skuse) (Diptera: Culicidae) based on mitochondrial DNA variations. Genet Res. 2005; 86(1):1-11. Musilová E, Fernandes V, Silva NM, Soares P, Alshamali F, Harich N, Cherni L, Gaaied AB, Al-Meeri A, Pereira L, Cerný V. Population history of the Red Sea--genetic exchanges between the Arabian Peninsula and East Africa signaled in the mitochondrial DNA HV1 haplogroup. Am J Phys Anthropol. 2011; 145(4):592-8. Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000; 156(1):297-304. Non AL, Al-Meeri A, Raaum RL, Sanchez LF, Mulligan CJ. Mitochondrial DNA reveals distinct evolutionary histories for Jewish populations in Yemen and Ethiopia. Am J Phys Anthropol. 2011; 144(1):1-10. O'Rourke DH, Raff JA. The human genetic history of the Americas: the final frontier. Curr Biol. 2010; 20(4):R202-207. Olivieri A, Achilli A, Pala M, Battaglia V, Fornarino S, Al-Zahery N, Scozzari R, Cruciani F, Behar DM, Dugoujon JM, Coudray C, Santachiara-Benerecetti AS, Semino O, Bandelt HJ, Torroni A. The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa. Science. 2006; 314(5806):1767-70. Olivieri A, Pala M, Gandini F, Hooshiar Kashani B, Perego UA, Woodward SR, Grugni V, Battaglia V, Semino O, Achilli A, Richards MB, Torroni A. Mitogenomes from two uncommon haplogroups mark late glacial/postglacial expansions from the Near East and neolithic dispersals within Europe. PLoS One. 2013; 8(7):e70492. Omrak A, Günther T, Valdiosera C, Svensson EM, Malmström H, Kiesewetter H, Aylward W, Storå J, Jakobsson M, Götherström A. Genomic evidence establishes Anatolia as the source of the European Neolithic gene pool. Curr Biol. 2016; 26(2):270-5. Pagani L, Kivisild T, Tarekegn A, Ekong R, Plaster C, Gallego Romero I, Ayub Q, Mehdi SQ, Thomas MG, Luiselli D, Bekele E, Bradman N, Balding DJ, Tyler-Smith C. Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. Am J Hum Genet. 2012; 91(1):83-96. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, Clemente F, Hudjashov G, DeGiorgio M, Saag L, Wall JD, Cardona A, Mägi R, Sayres MA, Kaewert S, Inchley C, Scheib CL, Järve M, Karmin M, Jacobs GS, Antao T, Iliescu FM, Kushniarevich A, Ayub Q, Tyler-Smith C, Xue Y, Yunusbayev B, Tambets K, Mallick CB, Saag L, Pocheshkhova E, Andriadze G, Muller C, Westaway MC, Lambert DM, Zoraqi G, Turdikulova S, Dalimova D, Sabitov Z, Sultana GN,

151

REFERENCES

Lachance J, Tishkoff S, Momynaliev K, Isakova J, Damba LD, Gubina M, Nymadawa P, Evseeva I, Atramentova L, Utevska O, Ricaut FX, Brucato N, Sudoyo H, Letellier T, Cox MP, Barashkov NA, Škaro V, Mulahasanovic L, Primorac D, Sahakyan H, Mormina M, Eichstaedt CA, Lichman DV, Abdullah S, Chaubey G, Wee JT, Mihailov E, Karunas A, Litvinov S, Khusainova R, Ekomasova N, Akhmetova V, Khidiyatova I, Marjanović D, Yepiskoposyan L, Behar DM, Balanovska E, Metspalu A, Derenko M, Malyarchuk B, Voevoda M, Fedorova SA, Osipova LP, Lahr MM, Gerbault P, Leavesley M, Migliano AB, Petraglia M, Balanovsky O, Khusnutdinova EK, Metspalu E, Thomas MG, Manica A, Nielsen R, Villems R, Willerslev E, Kivisild T, Metspalu M. Genomic analyses inform on migration events during the peopling of Eurasia. Nature. 2016; 538(7624):238- 42. Pakendorf B, Stoneking M. Mitochondrial DNA and human evolution. Annu Rev Genomics Hum Genet. 2005; 6:165-83. Pala M, Achilli A, Olivieri A, Hooshiar Kashani B, Perego UA, Sanna D, Metspalu E, Tambets K, Tamm E, Accetturo M, Carossa V, Lancioni H, Panara F, Zimmermann B, Huber G, Al-Zahery N, Brisighelli F, Woodward SR, Francalacci P, Parson W, Salas A, Behar DM, Villems R, Semino O, Bandelt HJ, Torroni A. Mitochondrial haplogroup U5b3: a distant echo of the Epipaleolithic in Italy and the legacy of the early Sardinians. Am J Hum Genet. 2009; 84(6):814-21. Pala M, Olivieri A, Achilli A, Accetturo M, Metspalu E, Reidla M, Tamm E, Karmin M, Reisberg T, Hooshiar Kashani B, Perego UA, Carossa V, Gandini F, Pereira JB, Soares P, Angerhofer N, Rychkov S, Al-Zahery N, Carelli V, Sanati MH, Houshmand M, Hatina J, Macaulay V, Pereira L, Woodward SR, Davies W, Gamble C, Baird D, Semino O, Villems R, Torroni A, Richards MB. Mitochondrial DNA signals of late glacial recolonization of Europe from Near Eastern refugia. Am J Hum Genet. 2012; 90(5):915-24. Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Wang CY, Chaudhuri TK, Palla V, Zhang YP. Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet. 2004; 75(6):966-78. Palanichamy MG, Zhang CL, Mitra B, Malyarchuk B, Derenko M, Chaudhuri TK, Zhang YP. Mitochondrial haplogroup N1a phylogeography, with implication to the origin of European farmers. BMC Evol Biol. 2010; 10:304. Paupy C, Delatte H, Bagny L, Corbel V, Fontenille D. Aedes albopictus, an arbovirus vector: from the darkness to the light. Microbes Infect. 2009; 11(14-15):1177-85. Pellecchia M, Negrini R, Colli L, Patrini M, Milanesi E, Achilli A, Bertorelle G, Cavalli- Sforza LL, Piazza A, Torroni A, Ajmone-Marsan P. The mystery of Etruscan origins: novel clues from Bos taurus mitochondrial DNA. Proc Biol Sci. 2007; 274(1614):1175-9. Peng HJ, Lai HB, Zhang QL, Xu BY, Zhang H, Liu WH, Zhao W, Zhou YP, Zhong XG, Jiang S, Duan JH, Yan GY, He JF, Chen XG. A local outbreak of dengue caused by an imported case in Dongguan China. BMC Public Health. 2012;12:83.

152

REFERENCES

Pennarun E, Kivisild T, Metspalu E, Metspalu M, Reisberg T, Moisan JP, Behar DM, Jones SC, Villems R. Divorcing the Late Upper Palaeolithic demographic histories of mtDNA haplogroups M1 and U6 in Africa. BMC Evol Biol. 2012; 12:234. Perego UA, Achilli A, Angerhofer N, Accetturo M, Pala M, Olivieri A, Hooshiar Kashani B, Ritchie KH, Scozzari R, Kong QP, Myres NM, Salas A, Semino O, Bandelt HJ, Woodward SR, Torroni A. Distinctive Paleo-Indian migration routes from Beringia marked by two rare mtDNA haplogroups. Curr Biol. 2009; 19(1):1-8. Perego UA, Angerhofer N, Pala M, Olivieri A, Lancioni H, Hooshiar Kashani B, Carossa V, Ekins JE, Gómez-Carballa A, Huber G, Zimmermann B, Corach D, Babudri N, Panara F, Myres NM, Parson W, Semino O, Salas A, Woodward SR, Achilli A, Torroni A. The initial peopling of the Americas: a growing number of founding mitochondrial genomes from Beringia. Genome Res. 2010; 20(9):1174-9. Pereira L, Richards M, Goios A, Alonso A, Albarrán C, Garcia O, Behar DM, Gölge M, Hatina J, Al-Gazali L, Bradley DG, Macaulay V, Amorim A. High-resolution mtDNA evidence for the late-glacial resettlement of Europe from an Iberian refugium. Genome Res. 2005; 15(1):19-24. Pereira L, Gonçalves J, Franco-Duarte R, Silva J, Rocha T, Arnold C, Richards M, Macaulay V. No evidence for an mtDNA role in sperm motility: data from complete sequencing of asthenozoospermic males. Mol Biol Evol. 2007; 24(3):868-74. Pereira JB, Costa MD, Pala M, Bamford L, Harrich N Cherni L, Alshamali F, Hatina J, Rychkov S, Stefanescu G, King T, Torroni A, Soares P, Pereira L, Richards MB. Reconciling evidence from ancient and contemporary genomes: a major source for the European Neolithic within Mediterranean Europe. 2016. [Submitted] Phillipson DW. Ancient Ethiopia: Aksum, its predecessors and successors. British Museum Press, 1998. Phillipson L. Lithic artefacts as a source of cultural, social and economic information: the evidence from Aksum, Ethiopia. Afr Archaeol Rev. 2009; 26:45–58. Pickrell JK, Patterson N, Loh PR, Lipson M, Berger B, Stoneking M, Pakendorf B, Reich D. Ancient west Eurasian ancestry in southern and eastern Africa. Proc Natl Acad Sci U S A. 2014; 111(7):2632-7. Pilia G, Chen WM, Scuteri A, Orrú M, Albai G, Dei M, Lai S, Usala G, Lai M, Loi P, Mameli C, Vacca L, Deiana M, Olla N, Masala M, Cao A, Najjar SS, Terracciano A, Nedorezov T, Sharov A, Zonderman AB, Abecasis GR, Costa P, Lakatta E, Schlessinger D. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2006; 2(8):e132. Pinhasi R, Thomas MG, Hofreiter M, Currat M, Burger J. The genetic history of Europeans. Trends Genet. 2012; 28(10):496-505. Poloni ES, Naciri Y, Bucho R, Niba R, Kervaire B, Excoffier L, Langaney A, Sanchez- Mazas A. Genetic evidence for complexity in ethnic differentiation and history in East Africa. Ann Hum Genet. 2009; 73(Pt 6):582-600.

153

REFERENCES

Porretta D, Mastrantonio V, Bellini R, Somboon P, Urbanelli S. Glacial history of a modern invader: phylogeography and species distribution modelling of the Asian tiger mosquito Aedes albopictus. PLoS One. 2012; 7(9):e44515. Posth C, Renaud G, Mittnik A, Drucker DG, Rougier H, Cupillard C, Valentin F, Thevenet C, Furtwängler A, Wißing C, Francken M, Malina M, Bolus M, Lari M, Gigli E, Capecchi G, Crevecoeur I, Beauval C, Flas D, Germonpré M, van der Plicht J, Cottiaux R, Gély B, Ronchitelli A, Wehrberger K, Grigorescu D, Svoboda J, Semal P, Caramelli D, Bocherens H, Harvati K, Conard NJ, Haak W, Powell A, Krause J. Pleistocene mitochondrial genomes suggest a single major dispersal of non- Africans and a Late Glacial population turnover in Europe. Curr Biol. 2016; 26(6):827-33. Powell JR, Tabachnick WJ. History of domestication and spread of Aedes aegypti--a review. Mem Inst Oswaldo Cruz. 2013; 108 Suppl 1:11-7. Preston GW, PArker AG, Walkington H, Leng MJ, Hodson MJ. From nomadic herder- hunters to sedentary farmers: The relationship between climate change and ancient subsistence strategies in south-eastern Arabia. Journal of Arid Environments. 2012; 86:122–30. Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic diversity of human populations. Curr Biol. 2005; 15(5):R159-60. Pyle A, Hudson G, Wilson IJ, Coxhead J, Smertenko T, Herbert M, Santibanez-Koref M, Chinnery PF. Extreme-depth re-sequencing of mitochondrial DNA finds no evidence of paternal transmission in humans. PLoS Genet. 2015; 11(5):e1005040. Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, Santachiara- Benerecetti AS. Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet. 1999; 23(4):437-41. Raff J, Tackney J, O'Rourke DH. South from Alaska: a pilot aDNA study of genetic history on the Alaska Peninsula and the eastern Aleutians. Hum Biol. 2010; 82(5-6):677-93. Raff JA, Bolnick DA. Palaeogenomics: genetic roots of the first Americans. Nature. 2014; 506(7487):162-3. Raghavan M, DeGiorgio M, Albrechtsen A, Moltke I, Skoglund P, Korneliussen TS, Grønnow B, Appelt M, Gulløv HC, Friesen TM, Fitzhugh W, Malmström H, Rasmussen S, Olsen J, Melchior L, Fuller BT, Fahrni SM, Stafford T Jr, Grimes V, Renouf MA, Cybulski J, Lynnerup N, Lahr MM, Britton K, Knecht R, Arneborg J, Metspalu M, Cornejo OE, Malaspinas AS, Wang Y, Rasmussen M, Raghavan V, Hansen TV, Khusnutdinova E, Pierre T, Dneprovsky K, Andreasen C, Lange H, Hayes MG, Coltrain J, Spitsyn VA, Götherström A, Orlando L, Kivisild T, Villems R, Crawford MH, Nielsen FC, Dissing J, Heinemeier J, Meldgaard M, Bustamante C, O'Rourke DH, Jakobsson M, Gilbert MT, Nielsen R, Willerslev E. The genetic prehistory of the New World Arctic. Science. 2014; 345(6200):1255832. Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, Albrechtsen A, Valdiosera C, Ávila-Arcos MC, Malaspinas AS, Eriksson A, Moltke

154

REFERENCES

I, Metspalu M, Homburger JR, Wall J, Cornejo OE, Moreno-Mayar JV, Korneliussen TS, Pierre T, Rasmussen M, Campos PF, Damgaard Pde B, Allentoft ME, Lindo J, Metspalu E, Rodríguez-Varela R, Mansilla J, Henrickson C, Seguin- Orlando A, Malmström H, Stafford T Jr, Shringarpure SS, Moreno-Estrada A, Karmin M, Tambets K, Bergström A, Xue Y, Warmuth V, Friend AD, Singarayer J, Valdes P, Balloux F, Leboreiro I, Vera JL, Rangel-Villalobos H, Pettener D, Luiselli D, Davis LG, Heyer E, Zollikofer CP, Ponce de León MS, Smith CI, Grimes V, Pike KA, Deal M, Fuller BT, Arriaza B, Standen V, Luz MF, Ricaut F, Guidon N, Osipova L, Voevoda MI, Posukh OL, Balanovsky O, Lavryashina M, Bogunov Y, Khusnutdinova E, Gubina M, Balanovska E, Fedorova S, Litvinov S, Malyarchuk B, Derenko M, Mosher MJ, Archer D, Cybulski J, Petzelt B, Mitchell J, Worl R, Norman PJ, Parham P, Kemp BM, Kivisild T, Tyler-Smith C, Sandhu MS, Crawford M, Villems R, Smith DG, Waters MR, Goebel T, Johnson JR, Malhi RS, Jakobsson M, Meltzer DJ, Manica A, Durbin R, Bustamante CD, Song YS, Nielsen R, Willerslev E. POPULATION GENETICS. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science. 2015; 349(6250):aab3884. Rai KS. Aedes albopictus in the Americas. Annu Rev Entomol. 1991; 36:459-84. Ramsey CB, Lee S. Recent and planned developments of the program OxCal. Radiocarbon. 2013; 55(2-3):720-30. Rasmussen M, Sikora M, Albrechtsen A, Korneliussen TS, Moreno-Mayar JV, Poznik GD, Zollikofer CP, Ponce de León MS, Allentoft ME, Moltke I, Jónsson H, Valdiosera C, Malhi RS, Orlando L, Bustamante CD, Stafford TW Jr, Meltzer DJ, Nielsen R, Willerslev E. The ancestry and affiliations of Kennewick Man. Nature. 2015; 523(7561):455-8. Rantanen A, Jansson M, Oldfors A, Larsson NG. Downregulation of Tfam and mtDNA copy number during mammalian spermatogenesis. Mamm Genome. 2001; 12(10):787-92. Raule N, Sevini F, Li S, Barbieri A, Tallaro F, Lomartire L, Vianello D, Montesanto A, Moilanen JS, Bezrukov V, Blanché H, Hervonen A, Christensen K, Deiana L, Gonos ES, Kirkwood TB, Kristensen P, Leon A, Pelicci PG, Poulain M, Rea IM, Remacle J, Robine JM, Schreiber S, Sikora E, Eline Slagboom P, Spazzafumo L, Antonietta Stazi M, Toussaint O, Vaupel JW, Rose G, Majamaa K, Perola M, Johnson TE, Bolund L, Yang H, Passarino G, Franceschi C. The co-occurrence of mtDNA mutations on different oxidative phosphorylation subunits, not detected by haplogroup analysis, affects human longevity and is population specific. Aging Cell. 2014; 13(3):401-7. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PL, Maricic T, Good JM, Marques-Bonet T, Alkan C, Fu Q, Mallick S, Li H, Meyer M, Eichler EE, Stoneking M, Richards M, Talamo S, Shunkov MV, Derevianko AP, Hublin JJ, Kelso J, Slatkin M, Pääbo S. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010; 468(7327):1053-60. Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, Parra MV, Rojas W, Duque C, Mesa N, García LF, Triana O, Blair S, Maestre A, Dib JC, Bravi CM,

155

REFERENCES

Bailliet G, Corach D, Hünemeier T, Bortolini MC, Salzano FM, Petzl-Erler ML, Acuña-Alonzo V, Aguilar-Salinas C, Canizales-Quinteros S, Tusié-Luna T, Riba L, Rodríguez-Cruz M, Lopez-Alarcón M, Coral-Vazquez R, Canto-Cetina T, Silva- Zolezzi I, Fernandez-Lopez JC, Contreras AV, Jimenez-Sanchez G, Gómez- Vázquez MJ, Molina J, Carracedo A, Salas A, Gallo C, Poletti G, Witonsky DB, Alkorta-Aranburu G, Sukernik RI, Osipova L, Fedorova SA, Vasquez R, Villena M, Moreau C, Barrantes R, Pauls D, Excoffier L, Bedoya G, Rothhammer F, Dugoujon JM, Larrouy G, Klitz W, Labuda D, Kidd J, Kidd K, Di Rienzo A, Freimer NB, Price AL, Ruiz-Linares A. Reconstructing Native American population history. Nature. 2012; 488(7411):370-4. Reidla M, Kivisild T, Metspalu E, Kaldma K, Tambets K, Tolk HV, Parik J, Loogväli EL, Derenko M, Malyarchuk B, Bermisheva M, Zhadanov S, Pennarun E, Gubina M, Golubenko M, Damba L, Fedorova S, Gusar V, Grechanina E, Mikerezi I, Moisan JP, Chaventré A, Khusnutdinova E, Osipova L, Stepanov V, Voevoda M, Achilli A, Rengo C, Rickards O, De Stefano GF, Papiha S, Beckman L, Janicijevic B, Rudan P, Anagnou N, Michalodimitrakis E, Koziel S, Usanga E, Geberhiwot T, Herrnstadt C, Howell N, Torroni A, Villems R. Origin and diffusion of mtDNA haplogroup X. Am J Hum Genet. 2003; 73(5):1178-90. Rezza G. Aedes albopictus and the reemergence of Dengue. BMC Public Health. 2012;12:72. Rezza G. Dengue and chikungunya: long-distance spread and outbreaks in naïve areas. Pathog Glob Health. 2014; 108(8):349-55. Richards MB, Macaulay VA, Bandelt HJ, Sykes BC. Phylogeography of mitochondrial DNA in western Europe. Ann Hum Genet. 1998; 62(Pt 3):241-260. Richards M, Macaulay V. The mitochondrial gene tree comes of age. Am J Hum Genet. 2001; 68(6):1315-20. Richards M, Rengo C, Cruciani F, Gratrix F, Wilson JF, Scozzari R, Macaulay V, Torroni A. Extensive female-mediated gene flow from sub-Saharan Africa into near eastern Arab populations. Am J Hum Genet. 2003; 72(4):1058-64. Richards MB, Bandelt H-J, Kivisild T, Oppenheimer S. A Model for the dispersal of modern humans out of Africa. In: Bandelt PH-J, Macaulay DV, and Richards DM, editors. Human Mitochondrial DNA and the Evolution of Homo sapiens: Springer Berlin Heidelberg. 2006. Richards MB, Soares P, Torroni A. Palaeogenomics: mitogenomes and migrations in Europe's past. Curr Biol. 2016; 26(6):R243-6. Rieux A, Eriksson A, Li M, Sobkowiak B, Weinert LA, Warmuth V, Ruiz-Linares A, Manica A, Balloux F. Improved calibration of the human mitochondrial clock using ancient genomes. Mol Biol Evol. 2014; 31(10):2780-92. Romi R, Sabatinelli G, Savelli LG, Raris M, Zago M, Malatesta R. Identification of a North American mosquito species, Aedes atropalpus (Diptera: Culicidae), in Italy. J. Am. Mosq. Control. Assoc. 1997; 13(3):245-6.

156

REFERENCES

Rose JI. New light on human prehistory in the Arabo-Persian Gulf Oasis. Curr Anthropol. 2010; 51:849–83. Rose, JI, Cerny V, Bayoumi R. Tabula rasa or refugia? Using genetic data to assess the peopling of Arabia. Arab Archaeol Epigr. 2013; 24:95–101. Rueda LM. Pictorial keys for the identification of mosquitoes (Diptera: Culicidae) associated with dengue virus transmission. Zootaxa. 2004; 589:1-60. Sabatini A, Raineri V, Trovato G, Coluzzi M. Aedes albopictus in Italy and possible diffusion of the species into the Mediterranean area. Parassitologia. 1990; 32:301-4. Saillard J, Forster P, Lynnerup N, Bandelt HJ, Norby S. mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion. Am J Hum Genet. 2000; 67(3):718-26. Sanna S, Pitzalis M, Zoledziewska M, Zara I, Sidore C, Murru R, Whalen MB, Busonero F, Maschio A, Costa G, Melis MC, Deidda F, Poddie F, Morelli L, Farina G, Li Y, Dei M, Lai S, Mulas A, Cuccuru G, Porcu E, Liang L, Zavattari P, Moi L, Deriu E, Urru MF, Bajorek M, Satta MA, Cocco E, Ferrigno P, Sotgiu S, Pugliatti M, Traccis S, Angius A, Melis M, Rosati G, Abecasis GR, Uda M, Marrosu MG, Schlessinger D, Cucca F. Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat Genet. 2010; 42(6):495-7. Santos C, Montiel R, Arruda A, Alvarez L, Aluja MP, Lima M. Mutation patterns of mtDNA: empirical inferences for the coding region. BMC Evol Biol. 2008; 8:167. Sato M, Sato K. Maternal inheritance of mitochondrial DNA by diverse mechanisms to eliminate paternal mitochondrial DNA. Biochim Biophys Acta. 2013; 1833(8):1979-84. Schaffner F, Medlock JM, Van Bortel W. Public health significance of invasive mosquitoes in Europe. Clin Microbiol Infect. 2013;19(8):685-92. Scheu A, Powell A, Bollongino R, Vigne JD, Tresset A, Çakırlar C, Benecke N, Burger J. The genetic prehistory of domesticated cattle from their origin to the spread across Europe. BMC Genet. 2015; 16:54. Schibler J, Elsner J, Schlumbaum A. Incorporation of aurochs into a cattle herd in : single event or breeding? Sci Rep. 2014; 4:5798. Scholte EJ, Den Hartog W, Braks M, Reusken C, Dik M, Hessels A. First report of a North American invasive mosquito species Ochlerotatus atropalpus (Coquillett) in the Netherlands, 2009. Euro Surveill. 2009;14(45). Schon EA, Bonilla E, DiMauro S. Mitochondrial DNA mutations and pathogenesis. J Bioenerg Biomembr. 1997; 29:131-49. Schönberg A, Theunert C, Li M, Stoneking M, Nasidze I. High-throughput sequencing of complete human mtDNA genomes from the Caucasus and West Asia: high diversity and demographic inferences. Eur J Hum Genet. 2011; 19(9):988-94.

157

REFERENCES

Schroeder KB, Schurr TG, Long JC, Rosenberg NA, Crawford MH, Tarskaia LA, Osipova LP, Zhadanov SI, Smith DG. A private allele ubiquitous in the Americas. Biol Lett. 2007; 3(2):218-23. Schurr TG, Ballinger SW, Gan YY, Hodge JA, Merriwether DA, Lawrence DN, Knowler WC, Weiss KM, Wallace DC. Amerindian mitochondrial DNAs have rare Asian mutations at high frequencies, suggesting they derived from four primary maternal lineages. Am J Hum Genet. 1990; 46(3):613-23. Schurr TG, Sherry ST. Mitochondrial DNA and Y chromosome diversity and the peopling of the Americas: evolutionary and demographic evidence. Am J Hum Biol. 2004; 16(4):420-39. Schwartz M, Vissing J. Paternal inheritance of mitochondrial DNA. N Engl J Med. 2002; 347:576-80. Seoane M, Mosquera-Miguel A, Gonzalez T, Fraga M, Salas A, Costoya JA. The mitochondrial genome is a "genetic sanctuary" during the oncogenic process. PLoS One. 2011; 6(8):e23327. Sequeira A, Rollins B, Magnan C, van Oven M, Baldi P, Myers RM, Barchas JD, Schatzberg AF, Watson SJ, Akil H, Bunney WE, Vawter MP. Mitochondrial mutations in subjects with psychiatric disorders. PLoS One. 2015; 10(5):e0127280. Shackleton JC, van Andel TH, Runnels CN. Coastal paleogeography of the Central and Western Mediterranean during the last 125,000 years and its archaeological implications. J Field Archaeol. 1984; 11:307–14. Shigenaga MK, Hagen TM, Ames BN. Oxidative damage and mitochondrial decay in aging. Proc Natl Acad Sci U S A. 1994; 91(23):10771-8. Sidore C, Busonero F, Maschio A, Porcu E, Naitza S, Zoledziewska M, Mulas A, Pistis G, Steri M, Danjou F, Kwong A, Ortega Del Vecchyo VD, Chiang CW, Bragg- Gresham J, Pitzalis M, Nagaraja R, Tarrier B, Brennan C, Uzzau S, Fuchsberger C, Atzeni R, Reinier F, Berutti R, Huang J, Timpson NJ, Toniolo D, Gasparini P, Malerba G, Dedoussis G, Zeggini E, Soranzo N, Jones C, Lyons R, Angius A, Kang HM, Novembre J, Sanna S, Schlessinger D, Cucca F, Abecasis GR. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat Genet. 2015; 47(11):1272-81. Sikora M, Carpenter ML, Moreno-Estrada A, Henn BM, Underhill PA, Sánchez-Quinto F, Zara I, Pitzalis M, Sidore C, Busonero F, Maschio A, Angius A, Jones C, Mendoza- Revilla J, Nekhrizov G, Dimitrova D, Theodossiev N, Harkins TT, Keller A, Maixner F, Zink A, Abecasis G, Sanna S, Cucca F, Bustamante CD. Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe. PLoS Genet. 2014; 10(5):e1004353. Skeates R, Beckett J, Gradoli MG. The cultural life of caves in Seulo, central Sardinia. Journal of Mediterranean Archaeology. 2013; 26:97–126.

158

REFERENCES

Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hünemeier T, Petzl-Erler ML, Salzano FM, Patterson N, Reich D. Genetic evidence for two founding populations of the Americas. Nature. 2015; 525(7567):104-8. Soares P, Ermini L, Thomson N, Mormina M, Rito T, Röhl A, Salas A, Oppenheimer S, Macaulay V, Richards MB. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet. 2009; 84(6):740-59. Soares P, Achilli A, Semino O, Davies W, Macaulay V, Bandelt HJ, Torroni A, Richards MB. The archaeogenetics of Europe. Curr Biol. 2010; 20(4):R174-83. Soares P, Alshamali F, Pereira JB, Fernandes V, Silva NM, Afonso C, Costa MD, Musilová E, Macaulay V, Richards MB, Cerny V, Pereira L. The expansion of mtDNA haplogroup L3 within and out of Africa. Mol Biol Evol. 2012; 29(3):915-27. Sobenin IA, Chistiakov DA, Bobryshev YV, Postnov AY, Orekhov AN. Mitochondrial mutations in atherosclerosis: new solutions in research and possible clinical applications. Curr Pharm Des. 2013; 19(33):5942-53. Sondaar PY. Palaeolithic Sardinians: paleontological evidence and methods. In: Sardinian and Aegean chronology. Eds. Balmuth MS and Tykot RH, Oxbow Books, Oxford, UK. 1998; pp. 45– 51. Sosa MX, Sivakumar IK, Maragh S, Veeramachaneni V, Hariharan R, Parulekar M, Fredrikson KM, Harkins TT, Lin J, Feldman AB, Tata P, Ehret GB, Chakravarti A. Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency. PLoS Comput Biol. 2012; 8(10):e1002737. St John J, Sakkas D, Dimitriadi K, Barnes A, Maclin V, Ramey J, Barratt C, De Jonge C. Failure of elimination of paternal mitochondrial DNA in abnormal embryos. Lancet. 2000; 355(9199):200. Stewart JB, Chinnery PF. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat Rev Genet. 2015; 16(9):530-42. Stock F, Edwards CJ, Bollongino R, Finlay EK, Burger J, Bradley DG. Cytochrome b sequences of ancient cattle and wild ox support phylogenetic complexity in the ancient and modern bovine populations. Anim Genet. 2009; 40(5):694-700. Stringer C, Andrews P. In reply: modern human origins. Science. 1988; 241(4867):773- 4. Söchtig J, Álvarez-Iglesias V, Mosquera-Miguel A, Gelabert-Besada M, Gómez-Carballa A, Salas A. Genomic insights on the ethno-history of the Maya and the 'Ladinos' from Guatemala. BMC Genomics. 2015; 16:131. Sumruayphol S, Apiwathnasorn C, Ruangsittichai J, Sriwichai P, Attrapadung S, Samung Y, Dujardin JP. DNA barcoding and wing morphometrics to distinguish three Aedes vectors in Thailand. Acta Trop. 2016; 159:1-10. Sun C, Kong QP, Palanichamy MG, Agrawal S, Bandelt HJ, Yao YG, Khan F, Zhu CL, Chaudhuri TK, Zhang YP. The dazzling array of basal branches in the mtDNA

159

REFERENCES

macrohaplogroup M from India as inferred from complete genomes. Mol Biol Evol. 2006; 23(3):683-90. Sutovsky P, Schatten G. Paternal contributions to the mammalian zygote: fertilization after sperm-egg fusion. Int Rev Cytol. 2000; 195:1-65.

Sutovsky P. Ubiquitin‐dependent proteolysis in mammalian spermatogenesis, fertilization, and sperm quality control: Killing three birds with one stone. Microsc Res Tech. 2003; 61(1):88-102. Taboada-Echalar P, Alvarez-Iglesias V, Heinz T, Vidal-Bralo L, Gómez-Carballa, A, Catelli L, Pardo-Seco J, Pastoriza A, Carracedo A, Torres-Balanza A, Rocabado O, Vullo C, Salas A. The genetic legacy of the pre-colonial period in contemporary Bolivians. PLoS One. 2013; 8(3):e58980. Tackney JC, Potter BA, Raff J, Powers M, Watkins WS, Warner D, Reuther JD, Irish JD, O'Rourke DH. Two contemporaneous mitogenomes from terminal Pleistocene burials in eastern Beringia. Proc Natl Acad Sci U S A. 2015; 112(45):13833-8. Tambets K, Rootsi S, Kivisild T, Help H, Serk P, Loogväli EL, Tolk HV, Reidla M, Metspalu E, Pliss L, Balanovsky O, Pshenichnov A, Balanovska E, Gubina M, Zhadanov S, Osipova L, Damba L, Voevoda M, Kutuev I, Bermisheva M, Khusnutdinova E, Gusar V, Grechanina E, Parik J, Pennarun E, Richard C, Chaventre A, Moisan JP, Barác L, Pericić M, Rudan P, Terzić R, Mikerezi I, Krumina A, Baumanis V, Koziel S, Rickards O, De Stefano GF, Anagnou N, Pappa KI, Michalodimitrakis E, Ferák V, Füredi S, Komel R, Beckman L, Villems R. The western and eastern roots of the Saami--the story of genetic "outliers" told by mitochondrial DNA and Y chromosomes. Am J Hum Genet. 2004; 74(4):661-82. Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, Mulligan CJ, Bravi CM, Rickards O, Martinez-Labarga C, Khusnutdinova EK, Fedorova SA, Golubenko MV, Stepanov VA, Gubina MA, Zhadanov SI, Ossipova LP, Damba L, Voevoda MI, Dipierri JE, Villems R, Malhi RS. Beringian standstill and spread of Native American founders. PLoS One. 2007; 2(9):e829. Tao M, You CP, Zhao RR, Liu SJ, Zhang ZH, Zhang C, Liu Y. Animal mitochondria: evolution, function, and disease. Curr Mol Med. 2014; 14(1):115-24. Taylor RW, Turnbull DM. Mitochondrial DNA mutations in human disease. Nat Rev Genet. 2005; 6(5):389-402. Thomas MG, Weale ME, Jones AL, Richards M, Smith A, Redhead N, Torroni A, Scozzari R, Gratrix F, Tarekegn A, Wilson JF, Capelli C, Bradman N, Goldstein DB. Founding mothers of Jewish communities: geographically separated Jewish groups were independently founded by very few female ancestors. Am J Hum Genet. 2002; 70(6):1411-20. Tierney JE, deMenocal PB. Abrupt shifts in Horn of Africa hydroclimate since the Last Glacial Maximum. Science. 2013; 342:843–6. Timmermann A, Friedrich T. Late Pleistocene climate drivers of early human migration. Nature. 2016; 538(7623):92-5.

160

REFERENCES

Torroni A, Schurr TG, Yang CC, Szathmary EJ, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM. Native American mitochondrial DNA analysis indicates that the Amerind and the Nadene populations were founded by two independent migrations. Genetics. 1992; 130(1):153-62. Torroni A, Schurr TG, Cabell MF, Brown MD, Neel JV, Larsen M, Smith DG, Vullo CM, Wallace DC. Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet. 1993; 53(3):563-90. Torroni A, Lott MT, Cabell MF, Chen YS, Lavergne L, Wallace DC. mtDNA and the origin of Caucasians: identification of ancient Caucasian-specific haplogroups, one of which is prone to a recurrent somatic duplication in the D-loop region. Am J Hum Genet. 1994a; 55(4):760-76. Torroni A, Miller JA, Moore LG, Zamudio S, Zhuang J, Droma T, Wallace DC. Mitochondrial DNA analysis in Tibet: implications for the origin of the Tibetan population and its adaptation to high altitude. Am J Phys Anthropol. 1994b; 93(2):189-99. Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R, Obinu D, Savontaus ML, Wallace DC. Classification of European mtDNAs from an analysis of three European populations. Genetics. 1996; 144(4):1835-50. Torroni A, Bandelt HJ, D'Urbano L, Lahermo P, Moral P, Sellitto D, Rengo C, Forster P, Savontaus ML, Bonné-Tamir B, Scozzari R. mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am J Hum Genet. 1998; 62(5):1137-52. Torroni A, Bandelt HJ, Macaulay V, Richards M, Cruciani F, Rengo C, Martinez-Cabrera V, Villems R, Kivisild T, Metspalu E, Parik J, Tolk HV, Tambets K, Forster P, Karger B, Francalacci P, Rudan P, Janicijevic B, Rickards O, Savontaus ML, Huoponen K, Laitinen V, Koivumäki S, Sykes B, Hickey E, Novelletto A, Moral P, Sellitto D, Coppa A, Al-Zaheri N, Santachiara-Benerecetti AS, Semino O, Scozzari R. A signal, from human mtDNA, of postglacial recolonization in Europe. Am J Hum Genet. 2001; 69(4):844-52. Torroni A, Achilli A, Macaulay V, Richards M, Bandelt HJ. Harvesting the fruit of the human mtDNA tree. Trends Genet. 2006; 22(6):339-45. Troy CS, MacHugh DE, Bailey JF, Magee DA, Loftus RT, Cunningham P, Chamberlain AT, Sykes BC, Bradley DG. Genetic evidence for Near-Eastern origins of European cattle. Nature. 2001; 410(6832):1088-91. Tucci S, Akey JM. Population genetics: A map of human wanderlust. Nature. 2016; 38(7624):179-80. Underhill PA, Kivisild T. Use of Y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu Rev Genet. 2007; 41:539-64. Urbanelli S, Bellini R, Carrieri M, Sallicandro P, Celli G. Population structure of Aedes albopictus (Skuse): the mosquito which is colonizing Mediterranean countries. Heredity (Edinb). 2000; 84 (Pt 3):331-7.

161

REFERENCES

Urbanski JM, Benoit JB, Michaud MR, Denlinger DL, Armbruster P. The molecular physiology of increased egg desiccation resistance during diapause in the invasive mosquito, Aedes albopictus. Proc Biol Sci. 2010; 277(1694):2683-92. van Oven M, Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat. 2009; 30(2):E386-94. Wallace DC. Mitochondrial DNA sequence variation in human evolution and disease. Proc Natl Acad Sci U S A. 1994; 91(19):8739-46. Wallace DC, Brown MD, Melov S, Graham B, Lott M. Mitochondrial biology, degenerative diseases and aging. Biofactors. 1998; 7(3):187-90. Wallace DC, Chalkia D. Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. Cold Spring Harb Perspect Biol. 2013; 5(11):a021220. Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, Bedoya G, Rojas W, Parra MV, Molina JA, Gallo C, Mazzotti G, Poletti G, Hill K, Hurtado AM, Labuda D, Klitz W, Barrantes R, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Llop E, Rothhammer F, Excoffier L, Feldman MW, Rosenberg NA, Ruiz-Linares A. Genetic variation and population structure in Native Americans. PLoS Genet. 2007; 3(11):e185. Williams RC, Steinberg AG, Gershowitz H, Bennett PH, Knowler WC, Pettitt DJ, Butler W, Baird R, Dowda-Rea L, Burch TA, Morse HG, Smith CG. GM allotypes in Native Americans: evidence for three distinct migrations across the Bering land bridge. Am J Phys Anthropol. 1985; 66(1):1-19. Wong PS, Li MZ, Chong CS, Ng LC, Tan CH. Aedes (Stegomyia) albopictus (Skuse): a potential vector of Zika virus in Singapore. PLoS Negl Trop Dis. 2013; 7(8):e2348. Wu JY, Lun ZR, James AA, Chen XG. Dengue Fever in mainland China. Am J Trop Med Hyg. 2010; 83(3):664-71. Xu J, Fonseca DM. One-way sequencing of multiple amplicons from tandem repetitive mitochondrial DNA control region. Mitochondrial DNA. 2011; 22(5-6):155-8. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997; 13(5):555-6. Zawani MK, Abu HA, Sazaly AB, Zary SY, Darlina MN. Population genetic structure of Aedes albopictus in Penang, Malaysia. Genet Mol Res. 2014; 13(4):8184-96. Zeder MA. Domestication and early agriculture in the Mediterranean Basin: origins, diffusion, and impact. Proc Natl Acad Sci U S A. 2008; 105(33):11597-604. Zeder MA. The Neolithic macro-(r)evolution: macroevolutionary theory and the study of culture change. J Archaeol Res. 2009; 17(1): 1–63. Zegura SL, Karafet TM, Zhivotovsky LA, Hammer MF. High-resolution SNPs and microsatellite haplotypes point to a single, recent entry of Native American Y chromosomes into the Americas. Mol Biol Evol. 2004; 21(1):164-75.

162

REFERENCES

Zhang DX, Hewitt GM. Insect mitochondrial control region: a review of its structure, evolution and usefulness in evolutionary studies. Biochem Syst Ecol. 1997; (25):99-120. Zhang H, Paijmans JL, Chang F, Wu X, Chen G, Lei C, Yang X, Wei Z, Bradley DG, Orlando L, O'Connor T, Hofreiter M. Morphological and genetic evidence for early Holocene cattle management in northeastern China. Nat Commun. 2013; 4:2755. Zhang H, Xing D, Wang G, Li C, Zhao T. Sequencing and analysis of the complete mitochondrial genome of Aedes albopictus (Diptera: Culicidae) in China. Mitochondrial DNA A DNA Mapp Seq Anal. 2015; 27(4):2787-8. Zhong D, Lo E, Hu R, Metzger ME, Cummings R, Bonizzoni M, Fujioka KK, Sorvillo TE, Kluh S, Healy SP, Fredregill C, Kramer VL, Chen X, Yan G. Genetic analysis of invasive Aedes albopictus populations in Los Angeles County, California and its potential public health impact. PLoS One. 2013; 8(7):e68586. Zhou Q, Li H, Li H, Nakagawa A, Lin JL, Lee ES, Harry BL, Skeen-Gaar RR, Suehiro Y, William D, Mitani S, Yuan HS, Kang BH, Xue D. Mitochondrial endonuclease G mediates breakdown of paternal mitochondria upon fertilization. Science. 2016; 353(6297):394-9. Zoledziewska M, Costa G, Pitzalis M, Cocco E, Melis C, Moi L, Zavattari P, Murru R, Lampis R, Morelli L, Poddie F, Frongia P, Pusceddu P, Bajorek M, Marras A, Satta AM, Chessa A, Pugliatti M, Sotgiu S, Whalen MB, Rosati G, Cucca F, Marrosu MG. Variation within the CLEC16A gene shows consistent disease association with both multiple sclerosis and type 1 diabetes in Sardinia. Genes Immun. 2009; 10(1):15-7.

163

LIST OF ORIGINAL PUBLICATIONS

LIST OF ORIGINAL PUBLICATIONS

Olivieri A, Sidore C, Achilli A, Angius A, Posth C, Furtwängler A, Brandini S, Capodiferro MR, Gandini F, Zoledziewska M, Pitzalis M, Maschio A, Busonero F, Lai L, Skeates R, Gradoli MG, Beckett J, Marongiu M., Mazzarello V, Marongiu P, Rubino S, Rito T, Macaulay V, Semino O, Pala M, Abecasis GR, Schlessinger D, Soares P, Richards MB, Cucca F, Torroni A. Mitogenomes of both Western European and Near Eastern ancestry were present in pre-Neolithic Sardinia: implications for the genetic origin of modern Europeans. (submitted)

Battaglia V, Gabrieli P, Brandini S, Capodiferro MR, Javier PA, Chen XG, Achilli A, Semino O, Gomulski LM, Malacrida AR, Gasperi G, Torroni A, Olivieri A. The worldwide spread of the tiger mosquito as revealed by mitogenome haplogroup diversity. Front Genet. 2016; 7:208.

Olivieri A, Gandini F, Achilli A, Fichera A, Rizzi E, Bonfiglio S, Battaglia V, Brandini S, De Gaetano A, El-Beltagi A, Lancioni H, Agha S, Semino O, Ferretti L, Torroni A. Mitogenomes from Egyptian cattle breeds: new clues on the origin of haplogroup Q and the early spread of Bos taurus from the Near East. PLoS One. 2015; 10(10):e0141170.

Gandini F, Achilli A, Pala M, Bodner M, Brandini S, Huber G, Egyed B, Ferretti L, Gómez-Carballa A, Salas A, Scozzari R, Cruciani F, Coppa A, Parson W, Semino O, Soares P, Torroni A, Richards MB, Olivieri A. Mapping human dispersals into the Horn of Africa from Arabian Ice Age refugia using mitogenomes. Sci Rep. 2016; 6:25472.

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

– 179

––

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203