Characterization Of Male Lineages In the Asháninka From

Gonçalo Aragão da Fonseca Pinto Leite Genética Forense Departamento de Biologia 2018

Orientador

Maria João Prata, Professor Associado, FCUP, I3S, IPATIMUP

Coorientador

Verónica Gomes, Junior Researcher, I3S, IPATIMUP

Leonor Gusmão, Visiting Professor, UERJ

Todas as correções determinadas pelo júri, e só essas, foram efetuadas.

O Presidente do Júri,

Porto, / / FCUP i Characterization of Male Lineages in the Asháninka from Peru

FCUP ii Characterization of Male Lineages in the Asháninka from Peru

Dissertação de candidatura ao grau de Mestre em Genética Forense submetida à Faculdade de Ciências da Universidade do Porto.

Este trabalho foi desenvolvido sob a orientação científica da Doutora Maria João Prata, da Doutora Verónica Gomes e da Doutora Leonor Gusmão.

Dissertation for applying to a Master’s Degree in Forensic Genetics, submitted to the Faculty of Sciences of the University of Porto.

This work was developed under the scientific supervision of Maria João Prata PhD, Verónica Gomes PhD and Leonor Gusmão PhD. FCUP iii Characterization of Male Lineages in the Asháninka from Peru

FCUP iv Characterization of Male Lineages in the Asháninka from Peru

FCUP i Characterization of Male Lineages in the Asháninka from Peru

Agradecimentos

À Professora Maria João, o seu apoio, incentivo e crença em mim, não só em momentos académicos, como pessoais, foram cruciais neste ano. Mais que uma orientadora, foi uma mentora para mim, e disto saio com um conhecimento científico muitíssimo reforçado e com um enorme orgulho em poder chamá-la de minha orientadora. Só me resta dizer “Obrigado por tudo e até breve”.

À Verónica, obrigado por todo o conhecimento e sapiência que me transmitiste. Num ano em que tiveste uma mão cheia para guiar, arranjaste tempo e paciência para guiar todos nós o melhor possível, e por isso não te consigo agradecer o suficiente. Ficam as ajudas todas, os risos e acima de tudo, os conselhos. Muito obrigado!

À Leonor, embora nunca tenhas estado presente fisicamente, sinto que é como se tivesses estado. Sempre pronta a ajudar e aconselhar, mesmo do outro lado do Atlântico. Obrigado por tudo o que fizeste por mim e até um dia!

A todo o grupo de Genética Populacional, sempre pronto a ajudar no que fosse necessário, a fazer-me rir das maneiras mais únicas e a tranquilizar-me sempre que necessário. Não podia ter pedido um melhor grupo. Obrigado!

A todos os meus amigos. Faculdade, secundário, infância, são todos especiais para mim e todos tiveram uma mão em fazer-me chegar aqui. Ficam os risos, lágrimas e histórias das melhores pessoas que alguma vez conheci. Um agradecimento especial ao Pedro Rosa e à Sara Sousa por todas as ajudas que me deram ao longo deste último ano, vocês foram cruciais neste caminho.

A toda a minha família, em especial aos meus avós, Lurdes e António, por sempre terem acreditado em mim e estarem lá sempre que precisava. Não seria metade do que sou sem vocês.

À minha mãe e ao António, por acreditarem, por me aturarem, por me possibilitarem isto, pelos sacrifícios e pelos sermões. Todas as vossas palavras e atos tivera importância. Devo a vocês a outra metade do que sou hoje.

À minha irmã, Joana, por me aturar acima de tudo. És tudo aquilo que desejei numa irmã e espero que eu te inspire da mesma maneira que tu me inspiras. Esteja onde estiver, estarei sempre ao teu lado. FCUP ii Characterization of Male Lineages in the Asháninka from Peru

À Rita, minha namorada, melhor amiga e parceira. Fizeste de mim uma pessoa melhor. Nunca teria conseguido fazer isto sem ti, a tua paciência, a tua boa vontade e o teu amor. E por isso tudo nunca irei deixar de te agradecer.

Ao Zeus, o meu lobo. Nem dois anos estiveste comigo, mas nunca te irei esquecer. Obrigado por seres uma inspiração, pela tua amizade e pelas inúmeras memórias. Ficarás sempre no meu coração. Até sempre.

Muito obrigado!

FCUP iii Characterization of Male Lineages in the Asháninka from Peru

Summary

Over the course of the last decades, studies have persistently addressed the original colonization of the American continent. The numerous works until now performed, contributed to achieve a broad perspective on the peopling of the Americas. More recently, however, attention was focused in South America, where strong heterogeneity was found across current-day Native-American populations.

Peru is home to many indigenous populations that have experienced variable degrees of admixture after the European colonization. Due to the peculiarities of the ecosystem region encompassed by nowadays Peru, many native communities are still fairly isolated from one another.

In order to get a deeper insight into the history on Peruvian Amerindians, the paternal lineages of the Asháninka, one of the largest Peruvian ethnic groups, were characterized in this study. It involved 59 Asháninka males who were interrogated for 39 Y-SNPs. In order to expand the conventional characterization of Q-lineages, typically associated with Amerindians, a Multiplex Q was redeveloped incorporating Y-SNPs downstream to Q-M3, which is one of the widely assumed Amerindian founder lineages. The obtained results were analyzed taking into account the 27 Y-STRs profiles provide in a previous study (Tineo et al 2015).

The majority (91%) of the Asháninka lineages belonged to Q. The remaining chromosomes (9%) were ascribed to African or European ancestry, indicating that Asháninka are among the less admixed Native–American groups from South America. Within haplogroup Q, 65,3% of the Asháninka fell in sub-haplogroup Q-M3, 30,6% in sub-lineages within Q-M3, namely Q-Z19319 (4.1%), Q-Z19483 (2.0%) and especially Q-SA05 (24.5%), while only 4.1% were ascribed to sub-lineages upstream to M3, in particular Q-P36.2 and Q-M346. Concerning the Q-SA05 lineages, STR based evidence was obtained on the presence of two sub-clades downstream Q-SA05 in the Asháninka gene pool.

Compared with other Amerindian populations, the Asháninka did not reveal special affinities with other native-American populations, including from Peru.

Overall, and contrarily to the reported for Europe and Asia, where population genetic structure was highly correlated with geography and language, in South America Native populations neither language or geography accounted to explain the high heterogeneity observed across different populations . FCUP iv Characterization of Male Lineages in the Asháninka from Peru

The detection of a STR-defined sub-branch of Q-M3 in Peruvian South- Americans together with finding out that Q-SA05 encompasses two sub-types in the Asháninka, led to anticipate that in the future a fine characterization of haplogroup Q will provide important insights into the complex history of current-day South-American native populations.

FCUP v Characterization of Male Lineages in the Asháninka from Peru

Resumo

Nas últimas décadas, muitos estudos de genética populacional debruçaram-se sobre a colonização do continente americano. Os numerosos trabalhos até agora efetuados contribuíram para alcançar uma razoável perspetiva global quanto ao povoamento das Américas. Mais recentemente, porém, a atenção focou-se na América do Sul, caracterizada por grande heterogeneidade entre as populações nativas.

No Peru, ainda existem atualmente muitas populações indígenas, pese embora terem sofrido graus variáveis de mistura após a colonização europeia. Devido à diversidade de ecossistemas que o Peru abrange, muitas dessas comunidades nativas estão bastante isoladas umas das outras.

No sentido de obter uma melhor visão sobre a história dos ameríndios peruanos, neste estudo procedeu-se à caracterização das linhagens paternas dos Asháninka, um dos maiores grupos étnicos peruanos. O estudo envolveu 59 homens Asháninka que foram analisados quanto a 39 Y-SNPs. De forma a refinar o nível de caracterização convencional do haplogrupo Q, considerado um dos fundadores nos ameríndios, foi atualizado um Multiplex Q previamente desenvolvido, incorporando novos Y-SNPs a jusante a Q-M3. Os resultados obtidos foram depois analisados tendo também em conta os perfis de 27 Y-STRs descritos num estudo anterior (Tineo et al 2015).

A maioria das linhagens masculinas dos Asháninka (91%) pertencia ao haplogrupo Q, enquanto que as restantes (9%) tinham ascendência africana ou europeia, resultado que indica que os Asháninka estão entre os grupos indígenas da América do Sul menos miscigenados. Dentro do haplogupo Q, 65,3% dos Asháninka pertenciam ao sub-haplogrupo Q-M3, 30,6% a sub-linhagens dentro de Q-M3, nomeadamente Q-Z19319 (4,1%), Q-Z19483 (2,0%) e, em especial, Q-SA05 (24,5%), enquanto que apenas 4,1% foram classificados em dois sub-ramos a montante de M3, Q-P36.2 e Q-M346. No que respeita às linhagens Q-SA05, a conjugação de dados fornecidos por STRs permitiu detetar a presença de dois sub-clados a jusante Q-SA05 presentes no pool genético dos Asháninka.

Comparativamente a outras populações ameríndias, os Asháninka não revelaram afinidades especiais com outras populações nativas americanas, inclusive do Peru. FCUP vi Characterization of Male Lineages in the Asháninka from Peru

No geral, e contrariamente ao descrito para a Europa e Ásia, onde a estrutura genética das populações está fortemente correlacionada com a geografia e a língua, nas populações nativas da América do Sul nem a língua nem a geografia contribuem para explicar a elevada heterogeneidade populacional observada.

A deteção em sul-americanos peruanos de um sub-ramo particular dentro de Q- M3 (definido com base nos perfis de STRs), juntamente com a observação de que Q- SA05 engloba dois sub-tipos de linhagens nos Asháninka (também de acordo com perfis de STRs), permite antever que no futuro uma caracterização mais detalhada do haplogupo Q pode fornecer informações importantes sobre a complexa história das populações nativas da América do Sul.

FCUP vii Characterization of Male Lineages in the Asháninka from Peru

Table of Contents

Agradecimentos ...... i Summary ...... iii Resumo ...... v Table of Contents ...... vii Figure Index ...... ix Table Index ...... xi Keywords ...... xiii Introduction ...... 1 Population Genetics ...... 2 Genetic Diversity ...... 2 Human Genetic Diversity ...... 3 Y chromosome ...... 5 Mutation and Polymorphisms...... 7 Multiallelic markers ...... 7 Biallelic markers ...... 9 Entrance in the Americas ...... 11 Uniparentally transmited lineages in native-americans ...... 14 Peru and the Asháninka ...... 16 Aims ...... 19 Materials and Methods ...... 21 Sampling ...... 22 Construction of a new Multiplex Q ...... 22 DNA Amplification and Sequencing ...... 23 Statistical Analysis ...... 25 Results and Discussion ...... 28 Asháninka Genetic Diversity...... 29 Q lineages ...... 31 Heterogeneity within Q-SA05 lineages ...... 33 Population Comparisons ...... 35 Y-STRs ...... 36 Pairwise Genetic Distances ...... 37 Y-SNPs ...... 42 Comparison between Peruvian populations ...... 46 FCUP viii Characterization of Male Lineages in the Asháninka from Peru

Conclusion ...... 49 Bibliography ...... 52 Web Resources ...... 63 Appendix ...... 64

FCUP ix Characterization of Male Lineages in the Asháninka from Peru

Figure Index

Figure 1 -Chromosomal locations for several of the commonly utilized Y-STRs (Butler 2003) ...... 8 Figure 2- Distribution and diversity of the Y-Chromosome haplogroups around the world (adapted from http://www.transpacificproject.com/index.php/transpacific-migrations/) 10 Figure 3- Map depicting the early Homo Sapiens migrations. (Addapted from http://news.bbc.co.uk/2/hi/science/nature/4435009.stm) ...... 13 Figure 4- Phylogenetic tree of the Y chromosome according to Karafet et al 2008. Adapted from Geppert M et al 2011...... 16 Figure 5- Geographic location of Peru within South America (adapted from www.who.int) ...... 16 Figure 6- Phylogenetic tree of the SNPs included in Multiplex Q and inferred haplogrous ...... 25 Figure 7- Relative frequency of haplogroups from Native Americans of several South American countries. Amerindian lineages are assigned in tones of orange, European in tones of blue and African in tones of green ...... 29 Figure 8- Network of all the Asháninka that were assigned to haplogroup Q. It is noticeable that the samples seem to form two clusters, one where all the samples that were genotyped as Q-SA05 are aggregated, and a second where all the samples from the remaining haplogroups are grouped, seemingly with not much differentiation between them...... 34 Figure 9- Network with the Q-SA05 Asháninka samples and Q-SA05 individuals from Jota MS et al (2016) ...... 34

Figure 10- MDS plot based on the RST genetic distances, based on the information of 15 Y-STR, between the 24 populations analyzed. A) Populations in this plot are grouped according to country as follows: Peru (Red), Bolivia (Purple), (Orange), Brazil (Green), Argentina (Blue), Venezuela (Pink), Colombia (Yellow). The Asháninka are marked as black. B) Populations in this plot are grouped according to linguistic clade as follows: Equatorian-Tuconoan (Red), Ge-Pano-Carib (Purple), Andean (Green), Chibcan-Paezan (Black). The Asháninka are marked as yellow...... 39

Figure 11- MDS plot based on the RST genetic distances, based on the information of 7 Y-STR, between the 24 populations analyzed. Populations in this plot were grouped by country as follows: Peru (Red), Bolivia (Orange), Ecuador (Purple), Venezuela (Pink), Brazil (Green), Colombia (Yellow), Argentina (Blue). The Asháninka are marked in black ...... 40 FCUP x Characterization of Male Lineages in the Asháninka from Peru

Figure 12- MDS plot based on the FST genetic distances, based on the information the Y-SNP haplogroups, between the 18 populations analyzed. Populations in this plot were grouped by country as follows: Peru (Red), Bolivia (Orange), Ecuador (Purple), Venezuela (Pink), Brazil (Green), Colombia (Yellow), Argentina (Blue). The Asháninka are marked in black...... 43 Figure 13- Phylogenetic network of South Amerindians constructed with the information from 15-Y-STR and 5 Y-SNPs within haplogroup Q...... 46 Figure 14- Phylogenetic network of Peruvian Native Americans constructed with the information from 15-Y-STR and 5 Y-SNPs within haplogroup Q...... 46

FCUP xi Characterization of Male Lineages in the Asháninka from Peru

Table Index

Table 1- Primers that were added to Multiplex Q described in Roewer et al. (2013) in order to create the newly developed Multiplex Q...... 25 Table 2- Relative frequency of the haplogroups detected in the Asháninka. The nomenclature of the haplogroups is according to van Oven M, et al. 2014...... 29 Table 3- Genetic diversity of the Ameridian populations according to the various resolutions levels of Y-STRs...... 37

Table 4- RST genetic distances and P-values between 5 Peruvian populations using 19 Y-STRs. Values that are non-statistically significant are underlined in red...... 38

Table 5- RST genetic distances and P-values between the 24 populations using 15 Y-STRs. Values that are non-statistically significant are underlined in red...... 39 Table 6- FCT (variance between groups) and P-values of non-differentiation in an AMOVA based on the information of 15 and 7 Y-STRs...... 41 Table 7 Y-SNP haplogroups genetic diversity for 18 Amerindian populations. The genetic diversity was measured using three distinct resolutions. One where only the Asháninka could be evaluated, as haplogroup Q was genotyped with the rMultiplex Q. A second where the highest resolution Q would be Q-M3, aside from the other non- indigenous haplogroups that were found. Finally, the third where only the diversity inside haplogroup Q was measured in all populations...... 42 Table 8- FCT (variance between groups) and P-values of non-differentiation in an AMOVA based on the information of Y-SNPs...... 44 Table 9- FCT (variance between groups) and P-values of non-differentiation in an AMOVA based on the information of 15 Y-STRs of 11 Peruvian populations...... 46

FCUP xii Characterization of Male Lineages in the Asháninka from Peru

FCUP xiii Characterization of Male Lineages in the Asháninka from Peru

Keywords

Native Americans Y Chromosome Y-SNPs Y-STRs Population Genetics Genetic Structure

FCUP xiv Characterization of Male Lineages in the Asháninka from Peru

Abbreviations

AIM Autosomal Ancestry Informative Markers

AMOVA Analysis of Molecular Variance

BLAT BLAST-like alignment tool

Bp Base pairs

DNA Deoxyribonucleic Acid dNTP deoxy Nucleotide-TriPhosphate

KA Thousand Years Ago

MB Megabases

MDS Multi-Dimensional Scaling

MNPD Mean Number of Pairwise Differences

MSY Male-Specific region of the Y chromosome mtDNA Mitochondrial DNA

NGS Next Generation Sequencing

NRY Non-recombinant region of the Y chromosome

OOA Out-of-Africa

PAR Pseudo-Autosomal Regions

PCR Polymerase Chain Reaction rMultiplex Q Redeveloped Multiplex Q

SBE Single Base Extension

STR Short Tandem Repeat

SNP Single Nuclear Polymorphism

Tm Melting Temperature

YCC Y Chromosome Consortium

YHRD Y Chromosome Haplotype Reference Database FCUP xv Characterization of Male Lineages in the Asháninka from Peru

FCUP 1 Characterization of Male Lineages in the Asháninka from Peru

Introduction

FCUP 2 Characterization of Male Lineages in the Asháninka from Peru

Population Genetics

Population genetics is the discipline that studies the factors leading to the genetic differentiation of populations within the same species. Through the analysis of patterns of genetic diversity, it might address diverse topics such as the origin of a given population, its phylogenetic relationship with other populations, its migrations and even the propensity to certain diseases. Population genetics has since long established important ties with several other scientific areas, such as anthropology, linguistics, medicine and forensic sciences, being helped while simultaneously helping to investigate a great number of questions. This is illustrated, for instance, by the extremely important contribution into the forensic sciences, among which forensic genetics is nowadays considered the best model of how science can be applied in the forensic setting (Saks & Koehler, 2005).

Population genetics is crucial in forensic genetics, where more or less intricate analyses of biological samples at a molecular level are needed to investigate cases of paternity, sexual assaults, missing persons, among many other. The important role of population genetics relies on the models it affords to obtain likelihood ratios of hypothesis, besides permitting to guide the construction of the databases necessary to evaluate the evidence. Due to the high diversity of human populations from all over the world, it is important to have different databases, depending on the area where a sample is retrieved. For instance, to estimate the Random Match Probability in a population (i.e. the probability that a certain profile, based on a specific set of markers, is shared by a randomly chosen profile in a given population), a database of the population in question must be used in order to obtain a reliable probability estimates.

Genetic Diversity

Genetic diversity stands for the variation in number and proportion of alleles at several loci found between individuals of a population. This variation is the result of various evolutionary factors, which interact in a way that ends up molding the patterns of diversity within and among populations. Mutation and recombination, selection, migration, matting pattern, and genetic drift, are the factors that account to a population diversity. FCUP 3 Characterization of Male Lineages in the Asháninka from Peru

The matting pattern of a population influences much its diversity. For simplicity’s sake, most population genetic models assumes random matting within a population. This might be difficult to understand when considering today’s human society, given that physical appearance, social status and even geographic distance may influence the matting patterns. In certain societies the reproductive preference is between individuals that are closely related (inbreeding). However, in most cases, matting criteria are too subjective, though preferences determined by the physical appearance (phenotypical traits), are not unusual, once an individual may prefer a partner with traits that resemble their own (positive assortative matting). The cases above described, illustrate kinds of “preferential” mating that may end up in augmenting homozygosity, which, in its turn, diminishes diversity in a population.

Opposite outcomes might arise from other evolutionary forces, such as mutation and migration. Migration implies the relocation of individuals from a population into another. Since they can bear alleles uncommon or not found in the new population, this ends up injecting diversity within the latter population, while simultaneously lessening the differentiation between the two populations. Migration also offers an opportunity to broken matting barriers determined by geography, language, among other factors.

Mutations, regardless of the kind, are the primary source of variability and consequently of differences between individuals, populations and species. However, because the average rate of mutations is fairly low, the role it plays alone in evolutionary change is relatively modest. Recombination that occurs during meiosis, leads to new arrangements of maternal and paternal transmitted chromosomes, promoting novel combinations of alleles. Together with mutation, recombination is an important contributor to increase genetic diversity in a population.

Human Genetic Diversity

Humans have always been curious about their history, migrations, and the origin of Homo sapiens. Since long that genetics is becoming more and more important to obtain insights into many questions that initially were mainly addressed by archeology, paleontology and linguistics.

The studies on human genetic diversity grew much with the advances in molecular anthropology. The first major study in this field was performed by Landsteiner in the 40’s and was based on the analysis of the ABO system. This study represented a milestone that stirred the interest for knowing how diverse human populations could be FCUP 4 Characterization of Male Lineages in the Asháninka from Peru

at the molecular level and how differences between human populations could be explained. With the advances in Molecular Biology, other tools were developed able to measure genetic diversity, being of note the detection by electrophoresis of blood protein polymorphisms, pioneered by Pauling et al in 1949 and Harris in 1966. The wide scale study of protein and blood groups polymorphisms, afforded the first comprehensive view of the human genome, that though indirect and of low resolution, provided the surprising result, for the time, that the majority of human total genetic diversity was due to differences between individuals within a population (85%) and not between different human populations (15%) (Lewontin, 1972). These results would be latter fully sustained.

In 1991 was published a work that represents one of the biggest revolutions in human population genetics studies. Using a fairly recent technology, Li & Saddler presented the first worldwide estimate of human genetic variation based on direct DNA screening, which relied in the amplification of DNA sequences via polymerase chain reaction (PCR; Mullis et al, 1985). While the results obtained did not differ substantially from data provided by the previous generation of genetic markers, the study paved the way to explore the most indicated markers of the genome to address the history of human populations. From the originally studied autosomal DNA markers, the set was soon expanded to the uniparental markers located in the mitochondrial DNA and the Non- Recombinant Region of the Y chromosome (NRY). The unique characteristics of the uniparental markers, namely the fact of being non-recombining together with the low effective number, makes them highly sensitive to capture past episodes of gene flow between populations as well as past bottlenecks and expansions, which were demographic events that deeply shaped the substructure of human populations, leaving traces in the patterns of genetic diversity that persisted throughout generations until today.

Since then, DNA markers were widely recruited to investigate the worldwide patterns of genetic diversity in modern human populations, providing important clues to reconstruct the human evolutionary history. Overall, genetic variation was found at higher levels in African populations compared to non-African populations, and Africans were shown to possess the largest number of population-specific alleles while non-African populations harbored a subset of the genetic diversity present in Africa (Tishkoff and Verrelli, 2003). It was also demonstrated that the genetic differentiation between populations was highly correlated with the geographic distance, meaning thus that the global human genetic variation is mainly clinal, while another striking finding was that geographic distance from East Africa (a probable cradle of anatomically modern FCUP 5 Characterization of Male Lineages in the Asháninka from Peru

humans) explained as much as 85% of the smooth decrease in gene diversity within human populations (reviewed in Handley et al, 2007). The increasingly available genetic evidence was highly used to fuel a debate that since long dominates the anthropological area. It is focused on where and when archaic hominins evolved into modern Homo sapiens, concerning which two main oversimplified hypothesis exists: the multiregional model and the replacement, also known as out-of-Africa (OOA) model. The majority of genetic studies sustained the OOA model, holding that anatomically modern humans originated in Africa around 200 Kya giving rise to all extant world populations through a major migration wave that left Africa around 60–70 Kya.

Although still riddled in controversy which and how many geographical routes were taken by the modern humans when leaving Africa, as well as the timing of the out- of-Africa event(s) (reviewed in López et al, 2015), it is widely assumed that the dispersion of humans throughout Eurasia involved a series of migrations that were accompanied by severe bottlenecks in the migrating groups, leading to a drastic decrease in genetic diversity before rapid increase in population size and expansion into unoccupied regions.

Y chromosome

The Y chromosome, is of the two sexual chromosomes that is male-specific, being only transmitted from father to son. The mammalian X and Y sex chromosomes have evolved from a pair of ancestral autosomes, which supposedly were different in therians (placentals and marsupials) and egg-laying monotremes. Sex chromosome differentiation began as a consequence of events of suppression of recombination, likely triggered by chromosomal rearrangements, which was accompanied by a continuous gene loss and progressive deterioration of the proto-Y chromosome, leading to the 60 Mbps chromosome that we observe today in humans (Cortez et al, 2014).

The specific part of the Y chromosome considered more useful in population genetic studies is the so called male-specific region of the Y chromosome (MSY), sometimes known as the non-recombining region of the Y, which in humans is flanked by two Pseudo-Autossomal Regions (PAR 1 and PAR2, the latter being encountered solely in Homo sapiens; Jobling, Pandya & Tyler-Smith, 1997). Together, PAR 1 and PAR2 encompass 3 Mbps.

Apart from three euchromatic regions, denominated as the X-transposed, X- degenerated and ampliconic regions, around 60% of the human Y chromosome is FCUP 6 Characterization of Male Lineages in the Asháninka from Peru

composed by repetitive sequences that are largely confined to the heterochromatic portion of the long arm and the pericentric region (Skaletsky et al, 2003).

Recombination events occur in most of the human genome, leading to new arrangements of maternal and paternal alleles on a given chromosome. However, the Y chromosome does not suffer recombination in the majority of its extension, as in the MSY, and recombination events are restricted to the PARs, regions where homologous X-Y pairing occurs. For that reason, the PARs are essential regions of the Y, because they are responsible for the correct disjunction of the sex chromosomes during meiosis.

Due to the absence of recombination, the transmission of the genes located in MSY to the next generation occur as a block, meaning that the entire region represents a haplotype. Furthermore, the region will be integrally transmitted from one generation to the next, unless any mutation occurs (Jobling & Tyler Smith, 2003).

These properties of MSY underlie the importance that the study of paternal lineages has achieved in forensic and evolutionary genetics. The use of Y-chromosome genetic marker in these fields has seen a huge improvement and adherence in the past decades (Karafet et al, 1999; Bortolini et al, 2003; Battaglia et al, 2013).

In forensic DNA analysis, it has revealed to be a crucial tool in missing person investigations and in crime scene investigations, being particularly suitable in cases of sexual assaults. Y-chromosome can also assist paternity investigations where biological material for the alleged father is absent. In this field, the properties of the Y chromosome bring a critical disadvantage, which is the incapability to discern between individuals that belong to the same paternal lineage (Jobling & Tyler-Smith, 1997).

As for evolutionary and population genetics, genetic variation of Y chromosome has also become a key part of research (Jobling and Tyler-Smith, 2017). Since this chromosome exists only as a single copy in males, the effective population size of the Y is one fourth that of the autosomes, turning it highly susceptible to the effects of demographic events such as bottlenecks, founding events, and also explaining the sharp genetic differences between populations spread throughout the world detected with Y- chromosome analyses (reviewed in Jobling, 2012). FCUP 7 Characterization of Male Lineages in the Asháninka from Peru

Mutation and Polymorphisms

The basic definition of mutation is any permanent and heritable change in the genome from which arise differences, either in sequence or in length, between an ancestral and a descendant (derived) configuration. Since the Y chromosome is a constituent of the nuclear genome, its average mutation rate is similar to the average of autosomes, once all chromosomes in the nucleus are under the same regulatory mechanisms. However, since there is only one copy of the Y chromosome per cell, no hidden mutations or silent alleles pose the challenges that exist in autosomes (Vicard et al, 2008; Slooten & Ricciardi, 2013).

Mutation is the source of genetic polymorphisms in the genome, including the Y chromosome. The next section will focus the most commonly studied types of Y polymorphisms, namely the rapidly mutating short tandem repeats (STRs), which belong to the category of multiallelic markers, and the more slowly mutating single-nucleotide polymorphisms (SNPs), which usually are biallelic markers.

Multiallelic markers Short-tandem repeats, also known as microsatellites, are a type of length polymorphisms that can be found throughout the genome. STRs consist on a near- perfect repetition of a small sequence unit (1–6 bp), typically containing 10 to 25 repetitions (International Human Genome Sequencing Consortium, 2001). These markers are multiallelic, with each allele corresponding to a different number of tandem repeats at a locus, and are among the most polymorphic loci in the genome (Weber, JL 1990). The high degree of polymorphism at the STRs is due to their average high mutation rate. In the last decades, more than 200 Y-STRs have been identified. The first Y-STRs was discovered in 1992 by Lutz Roewer (Roewer et al, 1992), being now known as DYS19. This first discovery, led to an increasing use of Y-STRs in the forensic casework, which in turn prompted intensive research on the characteristics of the STRs from the paternally transmitted chromosome. FCUP 8 Characterization of Male Lineages in the Asháninka from Peru

Strong efforts were done to explore the mutation rate at the various Y-STR loci, with the results pointing out the high variability in the mutability at each STR. In the majority of STR markers currently used in forensic genetics, mutation rates were recently estimated to vary between 1 × 10−4 and 1 × 10−3 (Ballantyne et al, 2010). However, a few of those Y-STRs showed mutation rates substantially higher than the average, varying between 1.19 × 10−2 and 7.73 × 10−2, being there after designed as rapidly mutating Y- STRs (Ballantyne et al, 2010; Pinto et al, 2014).

Other studies have addressed the mutation mechanism of Y-STRs. It was possible to demonstrate that, similarly to the observed at autosomal STRs, Y-STRs predominantly fit the stepwise mutation model, with the majority of the mutations being single step changes leading to the increase or decrease of one repeat motif (Gusmão et al, 2005).

In parallel, various databases with Y-STRs information were launched, among which the Y Chromosome Haplotype Reference Database (https//: yhrd.com; Willuweit & Roewer, 2007) is currently considered the most valuable open access online resource.

Figure 1 -Chromosomal locations for several of the commonly utilized Y-STRs (Butler, 2003)

One of the consequences of the introduction of Y-STRs typing in the forensic casework, was the implementation of various multiplex kits allowing the simultaneous characterization of several STRs (Tirado et al, 2009; Wang et al, 2016). Many FCUP 9 Characterization of Male Lineages in the Asháninka from Peru

commercial kits are currently available constituted by primers that amply different sets of Y-STRs, from the minimal set containing the 9 loci that define the so-called minimal Y- STR haplotype (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS385ab), to the most recently released commercial kit (Yfiler®Plus, ThermoFisher, Scientific) that consists in 27 Y-STRs, some of which being rapidly mutating STRs, a kind of Y-STRs shown to increase substantially the resolution to differentiate unrelated males ( Ballantyne et al, 2014).

Biallelic markers

SNPs are by far the most abundant class of variation in the human genome (1000 Genomes Project Consortium, 2015), accounting to 85% of the total human genetic variation (Genomes Project et al, 2012). SNPs arise usually from unique events in the genome, implying that although four bases can occupy a given nucleotide position, only two alternative alleles (the ancestral and the derived) are found in the majority of SNPs (Underhill et al, 1997). This has two main causes, one is the low average mutation rate of SNPs, estimated at approximately 10-9 per generation (Nachman and Crowell, 2000), much lower when compared to the average in STRs; while the other is that due to the huge amount of nucleotide sites, the probability is extremely rare that one site mutates two or more times. There are SNPs reported to present three or even four alleles, though they are extremely rare compared to biallelic SNPs.

In terms of forensic studies, compared to STRs, considered to be the markers of choice in the field, SNPs present some limitations mainly due to the low mutation rate and limited phylogenetic resolution. The number of SNPs necessary to have a similar discrimination power than the conventionally used STRs is around 5-6 times higher (Amorim and Pereira, 2005). However, the forensic interest in SNPs has continuously grown along the years, particularly having in mind some specific applications (Sobrino et al, 2005; Xue & Tyler-Smith, 2010). For instance, their low mutation rate can be advantageous to augment the efficiency of paternity testing. Besides, SNPs can be analyzed in much shorter amplicons than necessary for STRs, making SNPs very suitable to be tested in degraded samples (Hughes-Stamm et al, 2011).

SNPs from the Y define stable haplotypes, known as haplogroups, groups of SNP-based haplotypes that share a common ancestor (Y Chromosome Consortium, 2002). They have been used to build the Y-Chromosome phylogeny, whose resolution FCUP 10 Characterization of Male Lineages in the Asháninka from Peru

has saw dramatic refinement in the last few years, as well has saw its time calibration (Jobling and Chris Tyler-Smith, 2017). Although initially, diversity of Y-STRs within haplogroups was the resource used to obtain information about their time-depths, the increasing availability of Y-chromosome data through next-generation sequencing permitted the use of new approaches to calibrate the age of the Y-haplogroups.

The Y-haplogroups revealed to present sharp geographic differentiation, with different haplogroups being distributed across different regions of the world (figure 2).

Along the past few decades, many population genetic studies were performed based on Y-SNPs, providing important insights into past human migrations or processes of population admixture, male-mediated expansions or male-specific bottlenecks, and many other demographic events that shaped the genetic diversity of extant human populations.

Figure 2- Distribution and diversity of the Y-Chromosome haplogroups around the world (adapted from http://www.transpacificproject.com/index.php/transpacific-migrations/)

FCUP 11 Characterization of Male Lineages in the Asháninka from Peru

Entrance in the Americas

The entrance of anatomically modern humans into the Americas is still a hotly debated topic. Researchers from various scientific areas are conjoining efforts to obtain answers integrating population genetics, linguistic and anthropological evidence. The idea that the first American inhabitants were migrants from Asian populations was put forth by the Spanish Jesuit José de Acosta as early as the sixteenth century (de Acosta et al, 2002). In the following centuries, the questions surrounding the peopling of the Americas were essentially fueled by archeological findings, which in the early 20th century led Hrdlicka (1928) to advance with a formal model of colonization according to which it was the result of a single, rapid migration throughout the continent, after the crossing of the Bering land bridge around 16.5 thousand years ago. The most important archeological evidence basing the model was the Clovis culture, which was found in the interior of North America, dating circa 13,000 years ago. In the meanwhile, older archeological findings were discovered, in the now called pre-Clovis sites, as is the case of Monte Verde in Chile, which dated 14,600 years (Dillehay et al, 2008). This caused confusion and casted doubt on the original supposition that the New World was colonized via a single migration, as it was expected that the oldest archeological finding would be located in North America. The entering in the age of molecular genetics, opened a new chapter in the debate on the peopling of the Americas. In 1986, a highly controversial paper was published by Greenberg et al, who combining archaeological, linguistic and genetic evidence proposed that the colonization of the Americas happened via three waves of migration, which were associated to the three linguistic groups spoken by native- Americans (Na-Dene, Eskimo-Aleut and Amerindian); these migrations, each with different Asian origins, reached separately the New World after the . In the following decades, a large amount of studies addressed the subject, some sustaining previous models, or refining them, and still other suggesting new interpretations. Once it was generally agreed that Beringia was indeed the entry door to the New World, as is reflected in the widely disseminated “Out of Beringia” expression (Goebel et al, 2008), geological and paleo-environmental considerations were consistently recruited FCUP 12 Characterization of Male Lineages in the Asháninka from Peru

to the debate. Conflict between the archeological evidence, indicating that the entire continent was already colonized by 13Kya, and evidence that Beringia was isolated from North America until 14 thousand years ago, thus post-dating the estimated 14,6 thousand years archaeological finds of the Monte Verde site, led, in 2008, Kitchen et al to present a three stage colonization model, positing that the proto-Amerind population was a sub-population that derived from Central Asia, travelled towards Northeastern Siberia, reaching it around 36 kya, after a journey that was accompanied by significant population growth. The divergence from the Central-Asian source population involved a severe bottleneck, which is argued to have deeply accounted for the reduced levels of genetic diversity found among Native Americans. After the entrance in Beringia, the founder population remained stable in size for long time, possibly due to the lack of nutritional resources. Then, Amerinds rapidly expanded into the Americas ≈15,000 years ago (Kitchen et al, 2007). Finally, the defrosting of the ice sheets on the interior of North America steered the proto-Amerinds into a rapid expansion throughout the entire continent. On the way south, native-Americans have experienced an overall 16 fold population growth (Kitchen et al, 2008), with all the journey having been deeply marked by a series of bottlenecks that followed the multiple population splits. Besides, crossing Central America towards South through the narrow Isthmus of Panama, was certainly prone to additional bottleneck events, being also probable that only a small number of individuals succeeded to access the high altitude Andean regions. This process of peopling explains the low levels of genetic diversity typically found among native-American populations. The detection of a subtle Australo-Melanesian genetic signal in Amerindian populations, prompted the recuperation of a two-wave migration model, also known as the Paleoamerican model, originally based on the cranial morphology, which hypothesized that two temporally and source distinct populations colonized the Americas. One earlier population originated in Asia in the Late Pleistocene gave rise to both the first Paleoamericans and present-day Australo-Melanesians. Later, the first Paleoamericans were largely replaced by ancestors of present-day Amerindians, who were descendants of later arriving Mongoloid populations (González-José et al, 2003). Such a view, however, was not supported when recently tested with genome-wide data (Raghavan et al, 2015).

FCUP 13 Characterization of Male Lineages in the Asháninka from Peru

Figure 3- Map depicting the early Homo sapiens migrations. (Adapted from http://news.bbc.co.uk/2/hi/science/nature/4435009.stm)

Still, the recent identification of some Y-chromosome lineages is South America, absent from the North and Central regions of the continent whereas occurring at high frequency in Asia, that showed signs of having been introduced into South America no more than 6,000 years ago, led to hypothesize that trans-Pacific routes of connection between East Asia and South America were established long after the settlement of South America by Native-Americans (Rower et al, 2013). Currently, many uncertainties still persist on the peopling process of Americas, and among the several models differing in the timing, routes and ancestry of the migrants, the most predominant perspective is of an early entry (~25,000 to 15,000 years ago) into the continent, with some defending it occurred via a Pacific coastal migration, others sustaining that migrants mainly used interior ice-free corridors, while others believing that migrants used both interior and coastal routes (reviewed In Potter et al, 2018). In historical times, the post-Colombian colonization of the New World by Europeans initiated in the late XV century, and in the following centuries the transatlantic slave-trade with destination to Americas, dramatically reshaped the population scenario of the entire continent.

FCUP 14 Characterization of Male Lineages in the Asháninka from Peru

Uniparentally transmited lineages in native-americans

Due to their peculiar characteristics, mtDNA and Y chromosome have been a primary source of genetic information on Amerindians (Fagundes et al, 2008; Perego et al, 2010). Several studies based on the maternally inherited mitochondrial DNA demonstrated low genetic diversity in the Amerindian maternal pool. They also revealed that a number of mtDNA lineages (haplogroups) were almost exclusively found in Native Americans, showing patterns of internal diversity fitting a scenario of recent population expansion. Only five haplogroups (A, B, C, D and X) have been described in the Native American mtDNA, with all of them, except X, presenting moderate frequencies in Asians (Fagundes et al, 2008). mtDNA-based evidence seems to sustain that prior to the colonization through the internal ice-free corridor, a coastal route along the Pacific was taken during the migration, which otherwise is consistent with the fact that South American archaeological sites have been dated as older than the ones found in North America (Llamas et al, 2016) .

On the other hand, analysis of the Y-chromosome also revealed reduced genetic diversity among native-American groups (Battaglia et al, 2013). In South America, strong genetic differentiation was found among the male pools of Native populations, and in addition the male genetic structure followed an East-to-West clinal pattern of diversity, meaning that a higher level of genetic diversity was found in the west southern coast when compared to the east southern one (Tarazona-Santos et al, 2001). Contrarily to the described for Europe (Roewer et al, 2005), neither geography nor language accounted significantly for the pattern of genetic structure among native-Americans (Roewer et al, 2013). In the patrilineal genome, only two Y-chromosome haplogroups were considered to be founder lineages in Native Americans, Q and C (Bortolini et al, 2003; Zegura et al, 2003). Out of the two, haplogroup Q is by far the most predominant and spread founder lineage, being defined by the Y-SNP M3. Several studies were performed to improve its resolution by examining derived sub-lineages defined by SNPs that phylogenetically are downstream to M3. However, diversity within Q-M3 was found to be very limited, with most of the derived sub-lineages discovered up to now possessing very restricted FCUP 15 Characterization of Male Lineages in the Asháninka from Peru

geographical distribution (Bisso-Machado et al, 2011). This panorama was further being confirmed by Y-chromosome studies based on Next Generation Sequencing (NGS) approaches, which despite providing an updated and detailed phylogeny of haplogroup Q, reveal that the new lineages detected had restricted spatial distribution (Jota et al, 2016). In South Amerindians, Q-M3 is particularly well represented, reaching the frequency of 91% in the populations studied by Peña et al (1995), excluding the Mapuches, a group of native-Americans that have experienced intense admixture with Europeans. The overwhelming preponderance of this haplogroup in South America reflects well the consequences of the serial founder events preceding and accompanying the settlement of native populations in the region (Underhill et al, 1996).

Haplogroup C, the other founder lineage in Native Americans, is comparatively much rarer- it accounts, on average, for about 6% of Native American Y chromosomes (Battaglia et al, 2013), and is geographically more restricted. The C-P39 sub-lineage is typically found in Natives from North America, region where reportedly this sub-lineage is confined (Karafet et al, 2008). It is absent from South America, where instead another C sub-lineage was detected, C-M217, although only in a few native groups either from Ecuador, namely the Kichwa (26%) and the Waorani (7.5%) (Geppert et al, 2011; Roewer et al, 2013), or Colombia, namely the Wayuu (8%) (Zegura et al, 2004). C lineages were yet not encountered in Central America (Zegura et al, 2004), but the coverage of native people from there is still very scarce. While the C-P39 lineages in North America are believed to have been introduced with the first wave(s) of Asian migrants who crossed Beringia, the presence in South America of C-M217, actually a branch more basal than C-P39 in the C haplogroup phylogeny, is supposed to have resulted from connections established at a later stage with East Asians via the already mentioned trans-Pacific routes (Rower et al, 2013). Besides the founder lineages, other maternal and paternal types have been found among native-Americans that were ascribed to be of recent European or African ancestry. Given that those lineages were assumedly introduced in the Americas with the post-Columbian colonization, they have been widely used not only to obtain estimates of the degree of admixture between Amerindians and non-Amerindians but also to illuminate how sex-biased were such processes (Zegura et al, 2004).

FCUP 16 Characterization of Male Lineages in the Asháninka from Peru

Figure 4- Phylogenetic tree of the Y chromosome according to Karafet et al, 2008. Adapted from Geppert et al, 2011.

Peru and the Asháninka Peru is a South American country situated in the North-western coast. Due to the localization of the region that corresponds to nowadays Peru, it must have been one of the first to be colonized by the Amerindians, particularly if assuming they indeed followed the Pacific coastal route. The country covers an immense territory that spans a vast latitude. This territorial vastness in combination with several other geographical factors, contributed to the properties of a varied ecosystem, compromising mountainous (Andes), coastal and forest (Amazonian) regions. The latter covers about 60% of the Peruvian territory. This diversity, offered the opportunity for native populations to settle in places with quite different natural environments, while at the same time being isolated from one another, which ultimately fomented that each population was locally autonomous. Henceforth, Peru is home to a vast number of Native American populations such as the Asháninka, Bora, Aymara, Quechua, Awajún, among others, which differ greatly in terms of language, traditions and many other characteristics.

FCUP 17 Characterization of Male Lineages in the Asháninka from Peru

Figure 5- Geographic location of Peru within South America (adapted from www.who.int)

The population that will be the main focus on this work are the Asháninka. The Asháninka spoke a language that belongs to the Arawak, also known as Maipurea, language family, one that spans various languages and dialects spoken across South America.

There are many communities of Asháninka, small and spread over many territories both from Perú and Brazil. In the past, they were known to live in the forests of Junin, Pasco, Huanuco and part of Ucayali. However, in time, they started to relocate to the valleys of the rivers Pichú, Perené and Apurimac-Tambo Ene, or even to the mountainous region of Gran Pejonal.

The discovery of bronze axes in part of the ancestral territory of the Asháninka indicates that they maintained cultural and commercial trading exchanges with Andean people from pre-Inca times, while in the Inca period such connections were maintained with groups sharing the same linguistic group (Granero, 1992). These relationships likely afforded conditions for gene flow between people from the Andean world. However, accounts also exists indicating some rivalry between the Asháninka and other native populations, like the Shipibo, even in pre-Hispanic times (Espinosa, 1993).

In 1635, Hispanic missionaries started to enter the Asháninka territories, opening a dire period of contacts and conflicts that would led to the severe reduction of the number of Asháninka. According to scarce estimates, during the XVIII century the population decreased by 3.5 to 1 (Rojas Zolezzi, 1994). In mid XVIII century, the Asháninka and other Native groups participated in a rebellion against the Hispanic missionaries and conquerors. During this rebellion period, the Asháninka made peace with Native FCUP 18 Characterization of Male Lineages in the Asháninka from Peru

populations then considered as rivals (Weiss 2005), bringing up a cultural flow that until then was residual.

In the XXth century, when Europeans set to fight for part of the lands of the Chanchamayo province that were inhabited by the Asháninka, several communities were displaced to the margins of the rivers Ene and Tambo (AIDESEP et al, 2000), with those resettled in Tambo being later relocated to the Satipo province. So, in the past century, many Asháninka continued to be displaced from their homes. In this forced displacement, many individuals would perish, reducing drastically the Asháninka population size; it is estimated that approximately 34 Asháninka communities disappeared during this period (CVR, 2003).

Nonetheless, the Asháninka are still nowadays one of the most numerous indigenous population in Peru, with a population estimated up to 70,000 people in general South America.

FCUP 19 Characterization of Male Lineages in the Asháninka from Peru

Aims

FCUP 20 Characterization of Male Lineages in the Asháninka from Peru

Given the state of the art concerning native-Americans from South America, the main objective of this work is to characterize the male lineages of the Asháninka, using a high resolution approach regarding Y chromosome haplogroup Q, since it was demonstrated to be a founder lineage in Native Americans.

To accomplish this goal, male lineages from 59 individuals from Asháninka were genotyped for 39 SNPs located in the non-recombinant portion of the Y chromosome. In addition, the STRs previously genotyped in Tineo et al (2015) were also recruited in order to compare the Asháninka with other South Native Americans, using data collected from an extensive bibliographic search. Combining these results, we aimed at obtaining a better understanding on the processes underlying the human colonization of the New World.

To summarize, the aims of this study were the following:

1. To characterize the male lineages of the Native American population Asháninka. 2. To achieve a higher resolution of the characteristic Amerindian haplogroup Q. 3. To contribute to a deeper understanding of the colonization of the New World.

FCUP 21 Characterization of Male Lineages in the Asháninka from Peru

Materials and Methods

FCUP 22 Characterization of Male Lineages in the Asháninka from Peru

Sampling A sample of 59 unrelated Asháninka living in 42 different communities, all located alongside the margins of the Amazonian rivers Pichis and Palcazú in the Peruvian Andes, were considered for this work. The same sample had been previously genotyped for the Y-STRs by Tineo et al (2015). The work herein presented is included in a project that further aims to characterize the sample for Autosomal Ancestry Informative Markers (AIMs) and mitochondrial DNA. All participants have given written informed consent to cooperate in this study, which was conducted under strict confidential conditions. DNA had been previously extracted from bloodstains using the standard phenol-chloroform protocol, as described in Tineo et al (2015).

Construction of a new Multiplex Q In order to develop a high resolution multiplex system targeting haplogroup Q, an extensive bibliographic search was performed, seeking to compile the more recently described SNPs downstream to M3 - the mutation whose presence defines haplogroup Q-M3. This lineage belongs to the clade that is typically found among Native Americans. A previous SNaPshot multiplex system (multiplex Q) had been already developed that allowed the genotyping of Y-SNPs M242, P36.2, M346, M3, M19, M194 and M199 (Rower et al, 2013). The selection criteria for integrating new SNPs in that multiplex, were based on two main points: on the one hand, the phylogenetic position, considering Q-M3 as the reference for the most basal clade, and picking then the SNPs that would permit the characterization of its sub-branches; on the other, and for each SNPs, the allele frequencies in populations where they were detected, having been excluded those with very low degree of polymorphism, i.e. with variations that had been only sporadically observed. Furthermore, regarding the SNPs already present in Multiplex Q, the bibliographic research was also updated in order to evaluate whether they should be maintained in the newly developed multiplex. At the end, the redeveloped Multiplex Q was constituted by a total of 10 SNPs.

To design the primers for multiplex PCR, Genbank was consulted to access the sequences encompassing 500 bp flanking the different SNPs. The retrieved sequences were then submitted to Primer 3 (Untergasser et al, 2012), in order to identify the most suited primer pair (forward and reverse) for each polymorphism. Given that various primer pairs would be used in a single PCR, the characteristics of the primers, with special attention to their melting temperature and the predicted size of amplicons, were FCUP 23 Characterization of Male Lineages in the Asháninka from Peru

taken into account with the purpose of achieving the successful amplification of all desired products, while being able to separate them in the polyacrylamide gel. Finally, being this a multiplex reaction, Auto-Dimer (Vallone & Butler, 2004) was used in order to check if there were any unwanted interactions between all of the primers. All the designed primers had a Tm of approximately 60ºC, were 20 nucleotides long, and the size of their amplification products ranged between 155-370bp.

Besides these PCR primers, primers for the method of single-base extension (SBE) were designed as well, since it was planned to adopt a SNaPshot strategy for typing the samples. The primers were designed as previously described for the PCR primers, under the condition that the 3´-end of SBE primers should be complementary to the base immediately before the interrogated SNPs. Furthermore, polynucleotide tails of different lengths, non-homologous with the human genome, were added to the 5’ extremity of the SBE primers to facilitate size-based distinction of SNPs in the electropherograms.

Lastly, to confirm the primer specificity, an in-silico PCR was performed with the various primer pairs (https://genome.ucsc.edu/cgi-bin/hgPcr), which allowed to evaluate whether the primers would anneal only with the targeted DNA sequences, avoiding thus the production of undesired amplification products. Additionally, a BLAT was also done, to test the specificity of the SBE primers (https://genome.ucsc.edu/cgi- bin/hgBlat?command=start).

DNA Amplification and Sequencing To test the efficiency of the designed primers pairs, singleplex PCR amplifications were initially done. In these amplifications, two samples were tested for each primer pair, utilizing 0,5µL of each primer, equal volume of DNA, and 2.5µl of QIAGEN® PCR kit and water for a total volume of 5µL. Samples were then submitted to an initial denaturation at 96ºC for 15 minutes, followed by 35 cycles at 95ºC for 30 seconds, 60ºC for 60 seconds and 72ºC for 90 seconds, ending with a final extension step at 72ºC for 10 minutes. The resulting PCR products, together with an allelic ladder and a negative control, were then submitted to an electrophoretic run in polyacrylamide gels (8.4%), colored with the standard Silver Staining method, in order to assure the correct amplification of all the fragments.

Next, the Multiplex PCR system was developed. A Multiplex PCR is basically a typical PCR amplification, but using simultaneously various primers in the mix reaction, leading to the co-amplification of various sequences from the same DNA template. In the FCUP 24 Characterization of Male Lineages in the Asháninka from Peru

system here developed, successful results were obtained using all primers at a final concentration of 0,2µM each, in the PCR mix. To correctly assign all the samples to their respective haplogroups, 4 multiplexes with a total of 39 Y-SNPs were used. The samples that did not belong to haplogroup Q were genotyped by utilizing Multiplex 1+M13 marker (Gomes et al, 2010a), Multiplex 2 and Multiplex E (Brion et al, 2005). The QIAGEN® Multiplex PCR Kit (QIAGEN©) was used as the master mix, which includes Taq DNAPolymerase, minerals and deoxyucleotides (dNTPs) at optimized concentrations. The PCR conditions are equal as the one in the singleplex PCR. The PCR products were then submitted to electrophoresis, as previously mentioned, to evaluate the successful amplification of the desired sequences.

After confirming the successful amplifications of all target sequences, the next steps performed were the needed for multiplex single-base sequencing. Firstly, the PCR products were purified adding 0.5µL ExoSAP-IT™ (Applied Biosystems™) to each 1µL of PCR products, although some adaptations in the volumes had to be done in cases of weak yield of PCR products, which was assessed by inspecting polyacrylamide gel results. ExoSAP-IT is a cleanup reagent that hydrolyzes excess primers and nucleotides, which if were not removed could compromise the sequencing reaction. The conditions used in this purification step were: incubation at 37ºC for 15 minutes in which the enzyme will remove the primers and dNTPs that weren’t used. A 15 minute incubation at 85ºC for the inactivation of the enzyme.

The actual sequencing reaction was executed with a total volume of 5µl, composed by 1µl of SNaPshot® Multiplex Kit (Applied Byosistems), 1µl of the SBE mix, the aforementioned 1.5µl of the purified PCR product and the remaining volume was composed of water. Due to the possible adjustments in the purification step, the water volume can be modified accordingly in order for the final volume to be 5µl. The mix is then submitted to 25 cycles of a denaturation cycle at 96ºC for 10 seconds, followed by an annealing at 50ºC for 5 seconds and ending with an extension at 60ºC for 30 seconds. Immediately after the end of the single-base extension reactions the final products were submitted to another purification, by adding 1µL of FastAP (Thermo Scientific) to the products, this time with an incubation time of 60 minutes at 37ºC, for the removal of the unwanted products, followed by a 15 minute step at 85ºC for the inactivation of the enzyme.

Following this, the purified products were joined with a mixture of Liz120 Size Standard (9.25µl of Formamide Hi-Di with 0.25µl liz120), proportion 1:20, and then the FCUP 25 Characterization of Male Lineages in the Asháninka from Peru

mix was submitted to capillary electrophoresis in an ABI3130 sequencer (Applied Biosystems). The results were analyzed with the GeneMapper® v.4.0 software.

SNPs Primers Forward- 5’ TTTGCTGAAGTTGCCTGTCA 3’ Z19319 Reverse- 3’ AGTTCCAGTCAGGGCAATCA 5’ Forward- 5’ AAGATCCCACCACTGCACTC 3’ SA01/M557 Reverse- 3’ CTCTGGCCCCTAACAAACCT 5’ Forward- 5’ CCATGTAGGAGGAGGCAAAA 3’ Z19483 Reverse- 3’ CATCACAAAAGCCAAAAGCA 5’

Forward- 5’ GAACCAAAGCACAGCACTCA 3’ SA05 Reverse- 3’ ATGCTCATGGCCTACACCTC 5’ Table 1- Primers that were added to Multiplex Q described in Roewer et al, (2013) in order to create the newly developed Multiplex Q.

Statistical Analysis In figure 6, is presented the phylogenic tree of Y-chromosome haplogroup Q constructed with the 10 SNPs scrutinized. Besides the 10 SNPs evaluated by Multiplex Q, 29 other SNPs were used in order to assign the samples that did not belong to the indigenous haplogroups, as previously stated.

Figure 6- Phylogenetic tree of the SNPs included in Multiplex Q and inferred haplogroups

FCUP 26 Characterization of Male Lineages in the Asháninka from Peru

Haplogroup frequencies were estimated by direct counting. With Arlequin v3.5.2.2 (Excoffier and Lischer, 2010) the Y-STR and Y-SNP genetic diversities were measured. Apart from that, RST and FST were calculated. As Tineo et al (2015) had already tested the genetic diversity with other Native American populations from Perú, in this study, the purpose was to compare our sample with Amerindian populations from other South American countries. In order to achieve that, an extensive bibliographic search was done, in which were assembled the Y-STR and Y-SNP data from several Ameridian populations. In genetic comparisons three different resolution levels were analyzed, consisting of 19, 15 and 7 Y-STRs. The number of populations used in each of the resolution levels were 5, 24 and 26, respectively. The samples were then tested for haplotype frequencies, the intra-populational genetic diversity, genetic distance (RST) and differentiation between the populations. Regarding the calculations for the genetic distance and differentiation, all null and duplicate alleles were removed and the repeat number of DYS389II was subtracted from the repeat number of DYS389I.

Regarding the genetic comparisons based on Y-SNPs, the resolution level, regarding haplogroup Q, had to stop at Q-M3, since the majority of studies did not genotype SNPs downstream of M3. Apart from haplogroup Q, other haplogroups were treated equally, with their resolution being as high as possible. For the SNP analysis, 18 populations were analyzed for the intra-population genetic diversity, genetic distance and differentiation between the populations. For the analysis of the genetic diversity, three resolution levels were used. Firstly, the Asháninka were tested for the resolution provided by the redeveloped Multiplex Q. Secondly the 18 populations were tested regarding all haplogroups found, i.e. Amerindian and non-Amerindian. Lastly, those same 18 populations were tested while only regarding the lineages inside haplogroup Q.

Also with Arlequin, we did an AMOVA in order to compare the different groups regarding their genetic relationships. These were quantified by means of Rst and Fst, taking into account the evolutionary distance between Y-STR and Y-SNP data. The groups were defined by various categories, these being, country, altitude (divided in High, medium and sea level), linguistic families (divided in Equatorial-Tuconoan, Andean, Chibcan-Paezan and Ge-Pano-Carib) and geographic origin (dividing South America into three sections with a Northwest to Southeast orientation).

The program IBM SPSS Statistics 25 was used to construct two-dimensional graphics through Multidimensional Scaling (MDS) method, utilizing both the pairwise RST and FST distances obtained with Arlequin software. FCUP 27 Characterization of Male Lineages in the Asháninka from Peru

Using the Network v5.0.0.0 program, median-joining networks were constructed for the Native American haplogroup Q. In one of the networks, all Y-STR loci were used, with the exception of DYS385 and DYS387S, being that these are typically double allelic. In order to present a less reticulated network, the reduced-median method (Bandelt et al, 1995) was applied before the median-joining method (Bandelt et a,. 1999).

Apart from this network, another three were constructed, which were performed using 15 Y-STRs common to all population samples. Apart from this, the information regarding the Y-SNPs was also used for these networks, taking into account 4 SNPs inside haplogroup Q (M242, P36.2, M346 and M3). To perform all the network analysis the weighing of each marker was inversely proportional to its variance, as according to Qamar et al (2002).

FCUP 28 Characterization of Male Lineages in the Asháninka from Peru

Results and Discussion

FCUP 29 Characterization of Male Lineages in the Asháninka from Peru

Asháninka Genetic Diversity Out of the 59 samples available from the Asháninka, 54 were here successfully genotyped. The DNA quality of the remaining 5 was not enough to allow characterization with the redeveloped multiplex Q. Since all samples had been previously typed for Y- STRs (Tineo et al, 2015), we were able to predict the haplogroups of each sample and all the latter 5 belonged to haplogroup Q.

In this work, the interrogation of 39 SNPs led to the identification of 10 distinct haplogroups, 6 of which defined by mutations downstream to the most basal SNP in haplogroup Q (M242).

As expected, the bulk of the Ashaninka Y chromosome lineages belonged to haplogroup Q (91%), by far the most prevalent haplogroup in Native American populations. Besides this haplogroup, other Y-chromosomes were here found that fell in haplogroups E (2%), G (2%), K (3%) and R (2%). Since none of them is assumed to be a native-American founder lineage, its presence in Americas has easily been explained as a consequence of the post-Colombian colonization of the continent, which afforded substantial European and African gene flow into the gene pool of Native Americans. In the Ashaninka those lineages summed up 9.26%, a value that represents an estimate of the male mediated admixture rate between the Ashaninka and non-Amerindian people.

Haplogroup Relative Frequency (%) E-M2 1,85% G-M201 1,85% K*-M9 3,71% Q-P36.2 (xM346) 1,85% Q-M346 (xM3) 1,85% Q-M3 59,26% (xZ19319,Z19483,SA05,M557) Q-Z19319 (xSA01) 3,71% Q-Z19483 1,85% Q-SA05 22,22% R-P25 1,85% Table 2- Relative frequency of the haplogroups detected in the Asháninka. The nomenclature of the haplogroups is according to van Oven et al, 2014.

FCUP 30 Characterization of Male Lineages in the Asháninka from Peru

Figure 7- Relative frequency of haplogroups from Native Americans of several South American countries. Amerindian lineages are assigned in tones of orange, European in tones of blue and African in tones of green

Concerning the set of lineages of non-Native Amerindian ancestry (E, G, K and R), the unique ascribed to be of African origin is haplogroup E, which is an haplogroup typically present at high frequencies across the African continent and only moderately frequent in other regions of the old world. The other 3 haplogroups are likely of European ancestry. Haplogroup G is most commonly present in the Middle East, though also found at lower frequencies across Europe, especially in Southern and Eastern regions. Two of our samples were classified as K-M9 (xN1c, P, T), which means that concerning the major subclades within K-M9, they were not N1c, P or T. However, taking into account the STR-haplotypes of these two samples, they were predicted to fall in the subclade L, defined by a mutation not included in any of the non-Q Multiplex systems used in this work. Since the associated prediction score was 100%, we assumed that the 2 K-M9 (xN1c, P, T) samples belonged indeed to the L haplogroup, which is known to be present in the Middle East and less frequently in Southern Europe, Central Asia and Northern Africa. Finally, one individual harbored a R-P25 lineage in a Y-STR haplotype indicating FCUP 31 Characterization of Male Lineages in the Asháninka from Peru

that it likely belonged to haplogroup, R1b1. This is the haplogroup most commonly associated to male European ancestry, once in the majority of European populations it is usually found at frequencies up to 50%.

Thus, the non-Amerindian component of the male gene pool of the Ashaninka, comprehends 20% of lineages of likely African origin and 80% of likely European origin, which were introduced in the Peruvian indigenous group as a consequence of the contacts with Africans and Europeans established since the beginning of the colonization period.

Previously, the assessment of Y-chromosome diversity in other Native-American groups allowed to estimate admixture ratios, which revealed to vary widely across different groups, as is illustrated in figure 7. For instance, in the native populations from Ecuador, no signs of admixture were detected, while contrarily in some groups from Argentina the level of admixture with non-Amerindians can reach up to 69%. Compared to other Amerindian groups, the Ashaninka are among the less admixed ones. Noteworthy, and similarly to the observed in the Ashaninka, the admixed groups reveal in the component of non-Amerindian ancestry, a proportion of African lineages in general much lower than the proportion of European ones, reflecting well the demographic pattern of admixture. Some admixed native groups from Argentina even possess a higher rate of European male lineages than Amerindian ones. Naturally, since haplogroup R dominates the pool of lineages in Europe, it is also the most common non- native clade found in Amerindians (Cárdenas et al, 2015; Sevini et al, 2013).

Q lineages Until a few years ago, the resolution of haplogroup Q, the major male founder lineage of Native American populations, was rather incipient. Downstream to the mutation that defines the haplogroup (the derived allele at M242), the well-known phylogeny ended at SNP M3. In the meanwhile, some sub-clades downstream of Q-M3 were identified, most of which occurring at a very low frequency among Native Americans. With the recent explosion of NGS (Next Generation Sequencing) approaches, the knowledge on the Y-chromosome diversity dramatically increased, and the novel SNPs discovered downstream of M3 allowed to refine the resolution of haplogroup Q, raising new prospects in achieving a better understanding of how the American continent was colonized (Jota et al, 2016; Geppert et al, 2015).

In this context, and since NGS is not yet an option accessible to all, we set out to redesign a Multiplex Q system previously developed (Roewer et al, 2013), discarding the FCUP 32 Characterization of Male Lineages in the Asháninka from Peru

less discriminating SNPs, namely M194 and M199 whose derived alleles had only been sporadically detected in Native Americans, and including the newly discovered SNPs that in accordance to the literature showed more reasonable levels of diversity among Native Americans, which were: Z19319, Z19483, SA05 (Jota et al, 2016) and M557 (Battaglia et al, 2013).

In table 2 were already presented the results on the five Q sub-lineages detected in this study. Of the 49 Asháninka Y-chromosomes belonging to haplogroup Q, the majority, 65,3%, fell in sub-haplogroup Q-M3, 30,6% in sub-lineages within Q-M3, namely Q-Z19319 (4.1%), Q-Z19483 (2.0%) and especially Q-SA05 (24.5%), while only 4.1% were ascribed to sub-lineages upstream to M3, in particular Q-P36.2 and Q-M346, each detected in one single chromosome.

Summing up all lineages harboring the derived allele at M3 (Q-M3+ Q-Z19483 + Q-Z19483 + Q-SA05), the total frequency in the Asháninka reached 95,92%. This value fits well the range previously reported for native groups from South America, revealing not only that haplogroup Q-M3 is widely disseminated throughout South America, but also that it typically represents the large majority of Y lineages, reaching values that can be up to 90% (Roewer et al, 2013, Di Corcia et al, 2017) or even 100%, as for instance was found among two Amerindian groups from French Guinea (Mazières et al, 2008).

Concerning the Q sub-lineages upstream to M3, Q-P36.2 and Q-M346, the latter had been often reported in groups from South America, mainly from regions northern to 30th parallel south, according with the current available data (see Roewer et al, 2013). This same study, reported for this haplogroup the global moderate frequency of 6%, in a study enrolling more than 1000 individuals from 50 tribal populations from 81 settlements, though in certain small settlements Q-M346 reached atypical high values, probably due to drift effects or bias in the sampling. In the , a tribe from Peruvian Amazon, Q-M346 is also present at 25% frequency (Di Corcia et al, 2017), and in the Lengua and Ayoreo from Paraguay Q-M346 accounted to >20% lineages (Bailliet et al, 2009). In general, however, Q-M346 is considered a less frequent lineage in South America (Rower et al, 2013; Bailliet et al 2009).

Concerning Q-P36.2, this lineage was not detected in any of the studies mentioned above.

Within the Q-M3 branch, our approach could potentially discriminate 7 different sub-clades, out of which 4 were detected in the Ashaninka: Q-M3*, Q-Z19483, Q-Z19483 and Q-SA05. Unfortunately, the lower or distinct resolution used in most of the previous FCUP 33 Characterization of Male Lineages in the Asháninka from Peru

studies does not permit yet a fine dissection of the pattern of distribution of those lineages. The unique work available with comparable data was that of Jota et al, 2016, who also found that the majority of Native-Amerindian lineages were Q-M3*, that is they did not have any of the screened mutations downstream to M3. The derived allele at Q- SA05 was present in approximately 9-15% of individuals also harboring the derived allele at M3, belonged to tribes mainly from Peru, but also from Bolivia. Q-Z19483 and Q- Z19483 were present in approximately 12-20% and 10-17%, respectively.

These findings indicate that at least the branch Q-SA05 within Q-M3 seems to be common enough in Native-Amerindian populations to provide information relevant for reconstructing the history of current day populations.

Heterogeneity within Q-SA05 lineages The analysis of the STR profiles within the set of Q-SA05 lineages here detected, led to discover that actually it seems to encompass two clusters. One of the STRs analyzed was DYF387S1, a locus reported to present sporadically tri-allelic variants, instead of its usual biallelic pattern (STRBase, https://strbase.nist.gov/tri). In the Ashaninka, tri-allelic patterns were observed in five individuals (two 34,35,39; two 34,36,39 and one 34,35,40), all of them belonging to Q-SA05.

A network was constructed with STR haplotypes of individuals ascribed to be Q, excluding from the analysis the two STRs with typical biallelic patterns, DYS385 DYF387S1. In the network (Figure 8), Q-SA05 chromosomes were clearly discriminated from the remaining lineages, and in addition showed between each other a degree of molecular relatedness considerably higher than registered between Q-M3* chromosomes. This finding seems consistent with the TMRCA estimates indicating the origin of Q-M3 at ~26,300 years ago, whereas Q-SA05 arose much more recently, ~13.400 years ago (Jota et al, 2016).

FCUP 34 Characterization of Male Lineages in the Asháninka from Peru

Figure 8- Network of all the Asháninka that were assigned to haplogroup Q. It is noticeable that the samples seem to form two clusters, one where all the samples that were genotyped as Q-SA05 are aggregated, and a second where all the samples from the remaining haplogroups are grouped, seemingly with not much differentiation between them.

Focusing on the cluster Q-SA05, two branches of lineages emerged, and interestingly one of the branches contained all the 5 Q-SA05 chromosomes with triallelic patterns at DYF387S whilst the other branch did not contained any sample carrying such pattern. In the first branch, besides the 5 triallelic samples, only a single biallelic haplotype grouped there; it was 34, 39 at DYF387S1, sharing thus a biallelic configuration with 4 of the sister triallelic haplotypes.

To better dissect the diversity within the Q-SA05, a network was then constructed with the 12 Q-SA05 lineages here detected plus the 44 reported in Jota et al (2016) that belonged to Amerindians from different populations from Peru and Bolivia. It was necessary to reduce the resolution to the Y-STR common to both studies, which totalized 15 loci, not including DYF387S1 since it was not examined in Jota et al (2016). To the best of our knowledge, no more data were available at the moment.

The network (figure 9) revealed again considerable heterogeneity within Q-SA05, and at least a cluster integrating the triallelic samples was rather well differentiated from the other lineages that included all the DYF387S1 biallelic samples from this study, excepting, once more, one sample. FCUP 35 Characterization of Male Lineages in the Asháninka from Peru

All this points to the presence of two sub-branches within Q-SA05, a likely hypothesis given that Q-SA05 is old enough to have sprout younger branches, but that still needs further studies be confirmed.

Although up to now Q-SA05 was only encountered in populations from Peru and Bolivia, suggesting a restricted geographical distribution (Jota et al, 2016), it remains to be studied in most of the Native-American groups from South America, turning difficult for now to predict how informative it can be to address the population history of the continent.

Another aspect that called attention in the network was the absence of shared haplotypes between Amerindian haplotypes from Peru and Bolivia, which might indicate scarce contacts between native-Amerindian groups from the two regions, notwithstanding the geographical proximity. One unique 15-STR haplotype was found to be shared between 3 Ashaninka here studied and 6 Peruvian Amerindian studied by Jota et al (2016), one of them also being member of an Ashaninka community.

Figure 9- Network with the Q-SA05 Asháninka samples and Q-SA05 individuals from Jota MS et al (2016)

Population Comparisons In order to compare the Y-SNPs results obtained in this work and the STR profiles previously provided by Tineo et al (2015) for the Asháninka, data for other Amerindian populations was collected from published works. However, few studies provided simultaneously raw data for Y-SNPs and Y-STRs, and in addition, the typing resolution FCUP 36 Characterization of Male Lineages in the Asháninka from Peru

varied highly across studies. For this reason, the comparisons hereafter presented were performed assuming distinct levels of resolution in terms of Native-Amerindian populations and number of markers considered. Naturally, to increase the number of populations the number of markers ought to be reduced, and vice versa.

Y-STRs In the comparative analysis through Y-STRs, three resolution levels were used. The higher, regarding the number of loci, relied in 19 Y-STR, for which data was only available for 4 additional Amerindian populations, all from Peru, which were Chachapoya, , Jivaro and Cajamarca (Guevara et al, 2016). The intermediate resolution level consisted of 15 Y-STRs, and 17 other Amerindian populations, namely the Cashibo and Shipibo (Di Corcia et al, 2017); the Terena, Awa-Guajá, Chimane, Trinitario, Pilaga, Wayuu and Bari Boxi (Roewer et al, 2013); the Aymara and Quechua (Gayà-Vidal et al, 2011), the Emberá-Chami, Guambiano and Coconuco (Xavier et al, 2015); the Waoroni and Kichwa (Geppert et al, 2011); and the Yanesha (Barbieri et al, 2014). The third resolution level was based in 7 Y-STRs and the following two populations were added: the Mapuche and Kolla (Blanco-Verea et al, 2010). All of the individuals used in these analysis belonged to haplogroup Q. Further information on these populations is presented in Supplemental Table 2.

The genetic diversity parameters in different Amerindian populations are presented in Table 3. They reveal that the global levels of diversity diminish as the number of STRs considered decreases, which is not unexpected since increased coverage of markers usually offers a higher discrimination power.

Regardless the level of resolution, diversity values were rather high in most Amerindian groups, including the Asháninka. High haplotypic diversity is usually observed when assessing population diversity with the STRs here considered, because all are highly polymorphic markers that were originally selected viewing forensic applications. However, the level of diversity in male lineages of most Amerindian groups is remarkable in light of their recent population history. With the post-Colombian colonization, Native Americans were severely decimated through war, diseases, and forced labor. The entire indigenous population declined fast in a very short time, with males being especially affected since they were regularly recruited for military campaigns and territorial conquests. Given the well documented drastic population decline (Cook, 2011) the retention of high levels of male genetic diversity is quite surprising. FCUP 37 Characterization of Male Lineages in the Asháninka from Peru

Departing from the general trend, 3 groups stood out due to the reduced diversity: the Yurucare, the Chimane and the Zoé. Concerning the first two, the few samples analyzed might have accounted to distort in a considerable extent the level of diversity in the groups. In the Zoé, the sample size was more reasonable, but the fact of being a small, very isolated tribe living deep in the Amazon rainforests of north Brazil, likely promoted genetic drift and consequent effects in lowering diversity.

19 STRs 15 STRs 7 STRs Population Genetic MNPD Genetic MNPD Genetic MNPD Diversity Diversity Diversity Asháninka 0,970±0,012 9,330±4,361 0,965±0,013 6.453±3.103 0,943±0,015 2,815±1,509 Chachapoya 0,993±0,024 9,786±4,525 0,991±0,005 7,661±3,617 0,979±0,007 3,774±1,927 Huancas 0,927± 0,067 9,309±4,636 0,891±0,074 6,745±3,445 0,823±0,071 3,164±1,772 Jivaro 0,884±0,067 6,468±3,195 0,847±0,079 5,211±2,631 0,813±0,080 2,591±1,451 Cajamarca 0,981±0,046 9,909±4,914 0,982±0,046 6,964±3,547 0,981±0,046 3,473±1,917 Shipibo - - 0,900±0,056 4,398±2,257 0,697±0,101 1,649±1,009 Cashibo - - 0,860±0,042 3,796±1,970 0,698±0,065 1,518±0,940 Awa-Guajá - - 0,829±0,041 2,112±1,199 0,599±0,061 1,008±0,690 Terena - - 0,915±0,033 3,821±1,973 0,752±0,068 1,645±0,995 Trinitario - - 0,975±0,011 8,073±3,837 0,966±0,013 3,309±1,741 Yurucare - - 0,524±0,209 2,762±1,658 0,285±0,196 1,428±0,984 Chimane - - 0,200±0,154 1,400±0,934 0,200±0,154 0,600±0,519 Embera- - - 0,905±0,041 7,850±3,791 0,884±0,037 3,844±2,001 Chami Coconuco - - 0,810±0,130 7,476±3,979 0,809±0,129 3,428±1,989 Wao - - 0,682±0,065 4,194±2,131 0,503±0,052 1,548±0,947 Kichwa - - 0,923±0,060 6,681±3,354 0,824±0,097 2,527±1,445 Pilaga - - 0,964±0,012 6,025±2,921 0,901±0,019 2,293±1,279 Wayuu - - 0,994±0,019 7,099±3,486 0,970±0,027 3,380±1,810 Bari Boxi - - 0,742±0,073 3,725±1,984 0,741±0,072 1,825±1,107 Yanesha - - 0,983±0,011 7,355±3,514 0,929±0,030 3,341±1,750 Aymara - - 0,979±0,009 6,276±3,023 0,903±0,029 3,322±1,731 Quechua - - 0,962±0,016 5,807±2,832 0,893±0,035 3,156±1,665 Guambiano - - 0,975±0,030 6,125±3,075 0,841±0,074 2,233±1,298 Zoe - - 0,297±0,115 0,313±0,332 0,080±0,072 0,080±0,156 Mapuche - - - - 0,961±0,023 2,952±1,609 Kolla - - - - 0,909±0,079 3,681±2,001 Table 3- Genetic diversity of the Ameridian populations according to the various resolutions levels of 19 Y- STRs.

Pairwise Genetic Distances To evaluate genetic distances between populations based on STR data, we used the statistics RST, an analogue of the FST that assumes the stepwise-mutation model, considered the most appropriate when dealing with STRs (Slatkin, 1995) than the FCUP 38 Characterization of Male Lineages in the Asháninka from Peru

conventional infinite-allele model. Whereas FST assumes the later model and only takes into account the differences in allele frequencies between populations, RST additionally valorizes how different the STR alleles are in respect to the number of repeats.

For the analysis with the larger set of STRs (19 markers), apart from the

Asháninka only 4 more populations, also from Peru, were available. The pairwise RSTs and corresponding P-values presented in Table 4, showed significant levels of genetic differentiation in circa half of the comparisons. The 5 groups under comparison, live in the Peruvian highland region between Andes and Amazon, but whereas the Chachapoya, Huancas, Jivaro and Cajamarca share a language from the Andean sub- group, the Asháninka’s language belongs to the Equatorial-Tuconoan sub-group. Yet, neither geography or language appear to be strong determinants of the pattern of differentiation between the Peruvian groups.

Asháninka Chachapoya Huancas Jivaro Cajamarca Asháninka - >0,00000 0,00594 0,00604 0,04673 ±0,0000 ±0,0007 ±0,0007 ±0,0022 Chachapoya 0,11333 - 0,02138 >0,00000 0,07187 ±0,00015 ±0,0000 ±0,0026 Huancas 0,11355 0,07407 - 0,00040 0,02772 ±0,0002 ±0,0018 Jivaro 0,07986 0,15510 0,22088 - 0,00614 ±0,0008 Cajamarca 0,05820 0,04161 0,11955 0,13351 -

Table 4- RST genetic distances and P-values between 5 Peruvian populations using 19 Y-STRs. Values not statistically significant are underlined in red, assuming the significance level of P<0.0125 according to the Bonferroni correction for multiple tests.

FCUP 39 Characterization of Male Lineages in the Asháninka from Peru

Awa- Embera- Asháninka Chachapoya Huancas Jivaro Cajamarca Shipibo Cashibo Yanesha Terena Zoe Trinitario Yurucare Chimane Aymara Quechua Coconuco Guambiano Wao Kichwa Pilaga Wayuu Bari Boxi Guajá Chamí 0,00000 0,00000 0,00089 0,07603 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,01812 0,00000 0,00000 0,00000 0,00000 0,00000 0,00525 0,09841 0,00000 0,00149 0,00000 0,00911 0,00000 Asháninka - ±0,0000 ±0,0000 ±0,0003 ±0,0029 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0013 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0007 ±0,0027 ±0,0000 ±0,0004 ±0,0000 ±0,0009 ±0,0000 0,09306 0,00000 0,01574 0,00000 0,00000 0,00000 0,00000 0,00000 0,00040 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00168 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 Chachapoya 0,13033 - ±0,0028 ±0,0000 ±0,0011 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0002 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0004 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 0,00248 0,09573 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00020 0,00000 0,00010 0,00050 0,00000 0,00099 0,00337 0,00000 0,00000 0,00000 0,01723 0,00000 Huancas 0,16025 0,03749 - ±0,0005 ±0,0026 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0000 ±0,0001 ±0,0002 ±0,0000 ±0,0003 ±0,0005 ±0,0000 ±0,0000 ±0,0000 ±0,0014 ±0,0000 0,03762 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00149 0,00030 0,00000 0,00000 0,00000 0,00000 0,01129 0,01089 0,00000 0,01901 0,00000 0,00188 0,00000 Jivaro 0,11081 0,15288 0,23148 - ±0,0019 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0004 ±0,0002 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0010 ±0,0010 ±0,0000 ±0,0014 ±0,0000 ±0,0004 ±0,0000 0,00030 0,00050 0,64053 0,00000 0,00000 0,00000 0,23592 0,00931 0,00010 0,04544 0,01515 0,02693 0,04415 0,24047 0,00000 0,22186 0,32858 0,11989 0,00366 Cajamarca 0,04243 0,07688 0,08045 0,10198 - ±0,0002 ±0,0003 ±0,0052 ±0,0000 ±0,0000 ±0,0000 ±0,0049 ±0,0010 ±0,0001 ±0,0021 ±0,0012 ±0,0016 ±0,0020 ±0,0043 ±0,0000 ±0,0045 ±0,0047 ±0,0035 ±0,0006 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,04158 0,01158 0,00000 0,00000 0,00000 0,00000 0,00000 Shipibo 0,17295 0,26400 0,39504 0,22700 0,22297 - ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0021 ±0,0012 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00010 0,00040 0,00000 0,00000 0,00000 0,00000 0,00000 Cashibo 0,18972 0,25323 0,38935 0,28519 0,19855 0,16442 - ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0002 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 0,00000 0,00000 0,00000 0,00287 0,00040 0,00000 0,00000 0,00000 0,00000 0,00010 0,00475 0,00010 0,02208 0,00040 0,00010 0,00010 Yanesha 0,11185 0,18558 0,24387 0,18293 -0,01443 0,30172 0,27964 - ±0,0000 ±0,0000 ±0,0000 ±0,0006 ±0,0002 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0007 ±0,0001 ±0,0013 ±0,0002 ±0,0001 ±0,0001 0,30819 0,00000 0,09504 0,00010 0,01851 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00010 0,00000 0,00000 0,00000 Awa-Guajá 0,44742 0,46515 0,71348 0,58872 0,45805 0,66754 0,69908 0,36284 - ±0,0047 ±0,0000 ±0,0030 ±0,0001 ±0,0013 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0000 ±0,0000 ±0,0000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 Terena 0,39962 0,42336 0,57576 0,37526 0,39797 0,22887 0,35510 0,40617 0,71035 - ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 Zoe 0,40405 0,40559 0,70631 0,47054 0,55030 0,43073 0,62554 0,47682 0,85839 0,59941 - ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 0,00139 0,00000 0,00000 0,00000 0,02435 0,01653 0,64508 0,00000 0,08940 0,00020 0,00010 0,00079 Trinitario 0,02957 0,16033 0,18669 0,09425 0,01710 0,16336 0,14451 0,05557 0,35908 0,29400 0,33713 - ±0,0003 ±0,0000 ±0,0000 ±0,0000 ±0,0017 ±0,0011 ±0,0047 ±0,0000 ±0,0026 ±0,0001 ±0,0001 ±0,0003 0,00089 0,00000 0,00010 0,00000 0,00238 0,00574 0,00000 0,00000 0,00000 0,06791 0,00010 Yurucare 0,31966 0,39436 0,52984 0,42985 0,23196 0,51884 0,52564 0,19597 0,76810 0,58015 0,91525 0,18997 - ±0,0003 ±0,0000 ±0,0001 ±0,0000 ±0,0005 ±0,0007 ±0,0000 ±0,0000 ±0,0000 ±0,0025 ±0,0001 0,00000 0,00000 0,00000 0,00000 0,00515 0,00000 0,00010 0,00000 0,28562 0,00010 Chimane 0,30152 0,42247 0,56945 0,50122 0,38467 0,51659 0,58352 0,39542 0,84690 0,56880 0,93912 0,24545 0,78971 - ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0007 ±0,0000 ±0,0001 ±0,0000 ±0,0048 ±0,0001 0,45025 0,00000 0,00000 0,00000 0,00000 0,00050 0,00000 0,00000 0,00000 Aymara 0,10621 0,20104 0,15702 0,20218 0,05619 0,28055 0,22686 0,12529 0,48094 0,39692 0,46436 0,10655 0,27358 0,42640 - ±0,0050 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0002 ±0,0000 ±0,0000 ±0,0000 0,00010 0,00010 0,00010 0,00000 0,00010 0,00000 0,00000 0,00000 Quechua 0,16404 0,24834 0,20976 0,25721 0,11256 0,33533 0,29558 0,17125 0,55250 0,44422 0,54850 0,15595 0,34234 0,51515 -0,00176 - ±0,0001 ±0,0001 ±0,0001 ±0,0000 ±0,0001 ±0,0000 ±0,0000 ±0,0000 Embera- 0,00337 0,11355 0,00000 0,00475 0,00000 0,00000 0,00000 0,10810 0,22483 0,23921 0,21356 0,09055 0,18318 0,21808 0,15265 0,44910 0,38017 0,42437 0,03957 0,36139 0,38576 0,13362 0,16021 - Chamí ±0,0006 ±0,0033 ±0,0000 ±0,0007 ±0,0000 ±0,0000 ±0,0000 0,12385 0,00139 0,00139 0,00000 0,18147 0,00000 Coconuco 0,14261 0,15994 0,25592 0,20303 0,12741 0,10882 0,15601 0,25926 0,66519 0,40923 0,59529 0,11876 0,40512 0,52676 0,27774 0,34500 0,18311 - ±0,0006 ±0,0003 ±0,0003 ±0,0000 ±0,0041 ±0,0000 0,00000 0,09692 0,00396 0,25235 0,00000 Guambiano 0,0372 0,14765 0,17621 0,11408 0,02532 0,12823 0,32700 0,09434 0,44972 0,28579 0,37517 -0,01214 0,27453 0,21334 0,17676 0,22809 0,03897 0,07443 - ±0,0000 ±0,0031 ±0,0005 ±0,0051 ±0,0000 0,00000 0,00000 0,00000 0,00000 Wao 0,35950 0,33512 0,47794 0,43702 0,36774 0,38478 0,30988 0,37544 0,69517 0,53202 0,58021 0,26818 0,47352 0,60895 0,37010 0,43778 0,32910 0,32087 0,29981 - ±0,0000 ±0,0000 ±0,0000 ±0,0000 0,02554 0,05326 0,00000 Kichwa 0,10766 0,21474 0,28493 0,13016 0,02811 0,26247 0,28121 0,05775 0,54094 0,40088 0,48980 0,03492 0,35356 0,43526 0,14300 0,21705 0,13256 0,25191 0,05679 0,42678 - ±0,0015 ±0,0022 ±0,0000 0,00010 0,00000 Pilaga 0,11904 0,18266 0,29992 0,18223 0,00735 0,29268 0,17759 0,05810 0,34402 0,41355 0,47396 0,06288 0,39035 0,45559 0,17879 0,25293 0,15872 0,28484 0,09586 0,43312 0,05400 - ±0,0001 ±0,0000 0,00218 Wayuu 0,06353 0,16360 0,07915 0,08076 0,04143 0,15359 0,62128 0,12854 0,34972 0,17812 0,21937 0,05659 0,13295 0,02178 0,33276 0,19238 0,10696 0,07151 0,01229 0,28696 0,04569 0,13153 - ±0,0005 Bari Boxi 0,28999 0,31958 0,41594 0,19018 0,19018 0,59956 0,28460 0,17879 0,49852 0,62982 0,84870 0,26336 0,57817 0,67419 0,17829 0,41079 0,44305 0,54161 0,30856 0,66963 0,31317 0,27561 0,09697 -

Table 5- RST genetic distances and P-values between the 24 populations using 15 Y-STRs. Values not statistically significant are underlined in red, assuming the significance level of P<0.0022 according to the Bonferroni correction for multiple tests.

FCUP 40 Characterization of Male Lineages in the Asháninka from Peru

The genetic distances yielded with 15 Y-STR between a total of 24 populations

are presented in Table 5. Again, most of the RST distances were statistically significant, evidencing great heterogeneity across the male gene pools of Amerindian populations. In order to allow a visual representation of their genetic relationships, two dimensional MDS plot shown in figure 10 was constructed. The majority of the populations were grouped in a large cluster in the center of the plot, which was surrounded by a number of groups very well differentiated from each other as well from the central cluster. In order to get insight on whether geography or language was related with the distribution of the populations in the plot, the groups were discriminated according to country (Figure 10A) and language (Figure 10B). However, neither the geographic proximity nor the sharing of a linguistic clade was clearly connect with inter-group genetic affinities.

A) B)

Figure 10- MDS plot based on the RST genetic distances, based on the information of 15 Y-STR, between the 24 populations analyzed. A) Populations in this plot are grouped according to country as follows: Peru (Red), Bolivia (Purple), Ecuador (Orange), Brazil (Green), Argentina (Blue), Venezuela (Pink), Colombia (Yellow). The Asháninka, from Peru, are marked as black. B) Populations in this plot are grouped according to linguistic clade as follows: Equatorian- Tuconoan (Red), Ge-Pano-Carib (Purple), Andean (Green), Chibcan-Paezan (Black). The Asháninka, from the Equatorian-Tuconoan language group, are marked as yellow.

Lastly, 26 populations were analyzed with the information based on 7 Y-STRs. Despite the diminished resolution, the global pattern of genetic differentiation between populations did not alter substantially (Figure 11).

FCUP 41 Characterization of Male Lineages in the Asháninka from Peru

Figure 11- MDS plot based on the RST genetic distances, based on the information of 7 Y-STR, between the 24 populations analyzed. Populations in this plot were grouped by country as follows: Peru (Red), Bolivia (Orange), Ecuador (Purple), Venezuela (Pink), Brazil (Green), Colombia (Yellow), Argentina (Blue). The Asháninka are marked in black To better assess the contribution of language and geography to the pattern of diversity across the Amerindian groups, hierarchical AMOVA (Analysis of Molecular Variance) was performed assuming different criteria for population grouping. In this analysis, only the intermediate and lowest STR resolution levels were considered, due to the small number of populations characterized for 19 Y-STRs, all from Peru. Runs of AMOVA were performed according to the linguistic clade (Equatorial-Tuconoan, Ge- Pano-Carib, Andean, Chibcan-Paezan and an isolate for Wao), the altitude of the regions where the populations are most frequently found (high, medium and sea level), the country the tribes inhabit, and a broader geographical classification of South America divided in Northwest, Central and Southeast.

Country Linguistic Clade Altitude N-S Cline 15 STR 7 STR 15 STR 7 STR 15 STR 7 STR 15 STR 7 STR FCT 0.02359 0.03870 -0.00525 0.04811 -0.02214 0.02872 -0,06305 -0.04020 0.22287 0.13644 0.23812 0.06337 0.60515 0.12871 0.94941 0.94535 P-value ±0.00407 ±0.00330 ±0.00401 ±0.00248 ±0.00466 ±0.00355 ±0.00220 ±0.00230 Table 6- FCT (variance between groups) and P-values yielded by AMOVA based on the information of 15 and 7 Y-STRs.

FCUP 42 Characterization of Male Lineages in the Asháninka from Peru

The FCT values obtained, presented in table 6, revealed that the variables tested did not accounted significantly to the fraction of total variation ascribed to differences between groups. All FCTs were very low and statistically non-significant, meaning that none of factors had clear impact in the genetic structure of native populations from South America. The highest FCT was observed when populations were grouped according to the country, yet that only explained 2.36% of the total variance, which was not statistically significant.

Y-SNPs To compare the Asháninka with other Amerindian populations, there was the need to reduce the resolution of the Q lineages affordable by the redeveloped Multiplex Q, as the majority of the available data did not include lineages downstream to Q-M3. Additionally, several studies on Amerindians only provided data on the indigenous haplogroup Q, omitting information on the population full gene pool, hampering the comparative analysis. In table 7, gene diversity values inferred from SNP-defined Y lineages are presented for the Asháninka and 17 other populations.

Populations Redeveloped Minimal Resolution Minimal Resolution Multiplex Q Entire set of lineages Q lineages only Asháninka 0,6059±0,0645 0,2439±0,0775 0,0808±0,0531 Embera-Chamí - 0,3969±0,1097 0,2899±0,1028 Nasa - 0,8182±0,0703 0,0000±0,0000 Guambiano - 0,6842±0,0636 0,5250±0,0546 Cashibo - 0,0000±0,0000 0,0000±0,0000 Shipibo - 0,0000±0,0000 0,0000±0,0000 Mapuche - 0,6854±0,0340 0,0000±0,0000 Diaguitas - 0,0000±0,0000 0,0000±0,0000 Coconuco - 0,2333±0,1256 0,0000±0,0000 Kichwa - 0,1333±0,1123 0,0000±0,0000 Wao - 0,1359±0,0683 0,0000±0,0000 Yanesha - 0,5266±0,0432 0,0357±0,0357 Palikur - 0,1647±0,0822 0,0606±0,0561 Emerillon - 0,0000±0,0000 0,0000±0,0000 Aymara - 0,2498±0,0709 0,1948±0,0651 Quechua - 0,3771±0,0782 0,0000±0,0000 Kolla - 0,8039±0,0428 0,0000±0,0000 Chachapoya - 0,6339±0,0413 0,0000±0,0000 Table 7 Y-SNP haplogroups genetic diversity for 18 Amerindian populations. The genetic diversity was measured using three distinct resolutions. One where only the Asháninka could be evaluated, as haplogroup Q was genotyped with the rMultiplex Q. A second where the highest resolution Q would be Q-M3, aside from the other non-indigenous haplogroups that were found. Finally, the third where only the diversity inside haplogroup Q was measured in all populations. FCUP 43 Characterization of Male Lineages in the Asháninka from Peru

Levels of diversity were remarkably disparate across Native Amerindians, ranging from 0,0000 (Emerillon, Shipibo, Cashibo, Diaguitas) to 0,8182 (Nasa). The groups where no diversity was found, were those containing only Q lineages. The very low level of resolution of the Q-lineages considered in this analysis led to classify all their members in a unique type of SNP-defined lineage, which can easily explain the gene diversity values of 0. Naturally that a finer resolution is able to discriminate more lineages, unveiling diversity otherwise not detected, as illustrates the Asháninka, among whom gene diversity was 0,2439 using the minimal resolution, while it augmented to 0,6059 using the finer resolution of Multiplex Q. For this reason, the differences across groups in gene diversity (assuming the entire set of lineages, column 3 in Table 7) is primarily due to the weight on non-Q haplogroups, which in fact reflect the varied degrees of miscegenation occurred in each group during the colonial period. Not surprisingly, the rate of admixture was found to be strongly correlated with levels of genetic diversity in each population.

Pairwise genetic distances between Amerindian groups based on Y-SNPs were calculated with the measure FST, which captured less significant differentiations than detected with Y-STRs: only approximately half of the FST were statistically significant. Again, the low resolution assumed for the Q lineages explains the absence of more number of significant differentiations between populations, especially between those where haplogroup Q was overwhelmingly prevalent. Consistently with this, most of significant FSTs involved populations that differed much in the degree of admixture.

This is well illustrated in the MDS plot obtained with the FST values depicted in Figure 12. It can be seen that the Guambiano, Mapuche, Nasa and Kolla are clearly separated from a cluster encompassing the remaining populations. In this cluster are positioned the groups where Q-M3 dominates, and those where Q-M3 was the unique lineage found (gene diversity=0, Table 7) are positioned most to the left in Dimension 1 of the MDS. Guambiano, Mapuche, Nasa and Kolla are all characterized by elevated diversity (Table 7), but whereas in Mapuche, Nasa and Kolla are well represented with non-Amerindian lineages, meaning that they suffered substantial admixture, in the Guambiano what contributes more to the high diversity and differentiation from other groups is the rather high frequency of lineage Q-M346, upstream to Q-M3, which in most Native Americans is only sporadically observed.

FCUP 44 Characterization of Male Lineages in the Asháninka from Peru

Figure 12- MDS plot based on the F genetic distances, based on the information the Y-SNP haplogroups, between ST the 18 populations analyzed. Populations in this plot were grouped by country as follows: Peru (Red), Bolivia (Orange), Ecuador (Purple), Venezuela (Pink), Brazil (Green), Colombia (Yellow), Argentina (Blue). The Asháninka are marked in black.

In Table 8 are presented the AMOVA results obtained with Y-SNPs based data applying the criteria for population grouping used with the STRs.

FCT P-value Countries 0.06848 0.03604±0.00189 Language Clusters 0.01463 0.28307±0.00417 Altitude -0.03646 0.79109±0.00412 N-S Cline 0.08162 0.01129±0.00125 Table 8- FCT (variance between groups) and P-values of non-differentiation in an AMOVA based on the information of Y-SNPs.

Again, language and altitude did not account for variability between groups. 6,8% of that variability was ascribed to the country where groups were located, and despite being a small proportion, it was already marginally significant. The country of origin was FCUP 45 Characterization of Male Lineages in the Asháninka from Peru

the variable that produced the highest FCT with Y-STR data, although not reaching statistical significance.

The cline north-south also significantly contributed to explain 8.2% of the total variability. However, this result was likely influenced by the fact that in the group “South” of populations were included the Mapuche and the Kolla, two admixed groups very well differentiated from other populations.

A network (figure 13) was constructed with the Y-lineages of the South Americans available. Both Y-STRs (15 in total) and Y-SNPs were taken into account, but using only Q chromosomes and considering Q-M3 as the most downstream level of resolution. Asides from the Asháninka, 15 other populations were used, Wao, Pilaga, Awa-Guajá, Embera-Chamí, Trinitario, Bari Boxi, Zoe, Huambisa, Yanesha, Guambiano, Chimane, Terena, Wayuu, Yurucare and Kichwa.

Noticeably, this network doesn’t present a star like pattern, suggesting that the differentiation of sub-lineages did not occurred in loco. Alternatively, we can hypothesize that the most ancestral lineages were lost or remain unsampled. The network further reveals that the Q-lineages are molecularly very heterogeneous. Besides, there are many lineages especially differentiated from others, occurring predominately as singletons that occupy the periphery of the network. Since haplogroup Q-M3 is very old, time was enough to accumulate high internal diversity, which permit to formulate two scenarios reconcilable with the pattern of Q lineages in Native-Amerindians from South America: 1) when settling South America, different groups might have carried, by chance, distinct subsets of lineages already strongly differentiated, 2) after entering South- America, the multiple splits that accompanied the dispersion throughout the continent, together with the recent decimation of the indigenous populations during the colonial period, conducted to the random loss of lineages in different groups, which means by other words that currently each group retains a specific sub-set of the diversity hold by their ancestors. These are not alternative scenarios and likely both accounted for the high heterogeneity of lineages across South-American Natives.

Another interesting result shown in the network is the scarcity of shared lineages between different groups, whereas, contrarily, shared lineages are common among individuals belonging to the same population. The extreme case of intra-group haplotype sharing are the Zoe, who only present three haplotypes not shared by any other population. The few shared haplotypes between populations, mostly involved populations from the same linguistic clade or from the same country. FCUP 46 Characterization of Male Lineages in the Asháninka from Peru

Figure 13- Phylogenetic network of South Amerindians constructed with the information from 15-Y-STR and 5 Y- SNPs within haplogroup Q.

Comparison between Peruvian populations In order to further dissect the genetic structure of populations from Peru, we performed AMOVA using the STR data provided in Tineo et al (2015). The groupings utilized in this AMOVA were the linguistic clade affiliation and the geographical location in Andes or Amazon.

Linguistic Clade Geographical Region FCT 0.16270 -0.03469 P-values 0.31426±0.00440 0.28406±0.00413

Table 9- FCT (variance between groups) and P-values of non-differentiation in an AMOVA based on the information of 15 Y-STRs of 11 Peruvian populations.

FCUP 47 Characterization of Male Lineages in the Asháninka from Peru

As in the previous AMOVAs, the P-values of the FCT were not statistically significant, meaning that neither language nor geography seem to contribute to the structure of Peruvian Amerindians. According to a recent genome-wide study in , the groups from the Andes, Amazon, and coastal Peru diverged rapidly, ∼12,000 ya following the initial peopling of each region, and afterwards the migration dynamic was mainly dominated by a flux descendent of the Andes toward the Amazon and the coast region (Harris et al 2018). In order to evaluate whether such trend was imprinted in the male lineages of Peruvian Ameridians, it would be necessary to compare populations from the 3 regions, which was impossible in this study due to the absence of data from coastal groups. However, the results here obtained with AMOVA indicate that at least the barrier Andes/Amazon was not a factor creating differentiation between Andean and Amazonian groups

A network was also constructed (figure 14), considering 8 populations, besides Asháninka: Cashibo (Di Corcia et al, 2017), Shipibo (Di Corcia et al, 2017), Shipibo- Conibo (Roewer et al, 2013), (Roewer et al, 2013), Chuquibamba (Roewer et al, 2013), Chumbivilca (Roewer et al 2013), Huambisa (Di Corcia et al, 2017) and a second sample of Asháninka (Di Corcia et al, 2017).

Figure 14- Phylogenetic network of Peruvian Native Americans constructed with the information from 15-Y-STR and 5 Y-SNPs within haplogroup Q. FCUP 48 Characterization of Male Lineages in the Asháninka from Peru

As in the network previously shown, this also did not have any central haplotype with radiating lineages. However, a small star-like structure was present in one of the sub-branches of the network (left part of Figure 14): it contained a central haplotype, shared by several populations, from which multiple single-step different haplotypes branched out. Interestingly, this group of haplotypes was only present among the Huanca, Chumbivilca, Yanesha and Chuquibamba, four populations living at high altitude in Andes. Therefore, the pattern observed might indicate that this branch was originated in the Andean region encompassing nowadays Peru, and that the expansion and diversification of the branch occurred in loco.

Additionally, the network shows again that there are relatively few haplotypes shared by more than one population, and that the vast majority of the haplotypes are unique. It also evidences that there are many haplotypes highly differentiated from each other, and taken as a whole, the results demonstrates that the heterogeneity of lineages in Peruvian Amerindians is not much lesser than in the entire pool of South Americans.

In the network of figure 14, the branches containing the haplotypes here found to fall in the two Q-SA05 sublineages are very well distinguished, and the haplotypes were not shared by other Peruvians. Having found out that the Q-SA05 haplotypes from the Asháninka do not reveal molecular affinities with any other haplotypes detected in Peru (figure 14) and broadly in South America (Figure 13), together with the results of Jota et al, who only encoutered such lineages in groups from Peru and Bolivia, suggest that Q- SA05 might a quite restricted geographical and population distribution. However, a better coverage of South America Ameridians is still needed to adress the issue. FCUP 49 Characterization of Male Lineages in the Asháninka from Peru

Conclusion

FCUP 50 Characterization of Male Lineages in the Asháninka from Peru

Until recently the biggest question on Native Americans was how they colonized the America millennia ago, however, nowadays the focus is beginning to shift to more detailed unknowns, including how genetic diversity of South Amerindian is structured.

The results here obtained were consistent with previous studies (Roewer et al, 2013; Rojas et al, 2010), leading to conclude that the complex demographic past on Native-Americans hinders to obtain clear insights into the process of dispersion throughout South America. This is assumed to have been a very rapid process, during which long distances were covered by the wave of first Native Americans that would settled the entire continent, maintaining subsequent geographic isolation from other groups. On the other hand, the Post-Columbian invasions strongly depleted the genetic diversity of Pre-Colombian Amerindians, erasing signatures on the ancestral patterns that potentially could give important insights on their past (Salzano & Callegari-Jacques, 1988; Crawford, 1998).

Regarding the Asháninka, although being well integrated in the context of male diversity of Amerindian populations, they did not present particular affinities with any of them. This agrees with the known panomara on South American populations, including from Peru, which are revealing to be extraordinarily heterogeneous, even if lacking a well defined pattern of structuring (Cabana et al, 2016). In Peru, the difficulty to capture the factores underlying the ancestral history of Native-Americans that currently live there, may be due to the strong impact in the Peruvian demography caused not only by the Spanish who conquered South America, but also by the Incas, a civilization that florished in ancient Peru, ultimatedly builting the largest empire in pre-Columbia America; the Inca ruled the Peruvian region untill middle 16th century, forcing intense migration of other people (Sandova et al, 2013).

A recent genome wide study enable to obtain new hints on the indegenous people and migration dynamics in Peru, providing evidence that the initial inhabitants of Peru diverged quickly into groups that settled the three regions that dominate this country- coast, the Amazon and the Andes, with the subsequent migrations being majoritively descent from the Andes to both the Amazon and the coast (Harris et al, 2018). Whlist up to now, the study of uniparental lineages did not captured such trend, something we were able to demonstrate was that an increased resolution of Y-chromosome markers was able to uncover layers of diversity otherwise undeciphered with low resolution panels. Exploring those layers, namely with the inclusion in Multiplex systems of new discovered SNPs downstream of Q-M3, together with the use a larger Y-STR set, brings the potential of obtaining a better understanding of the Amerindian genetic structure. FCUP 51 Characterization of Male Lineages in the Asháninka from Peru

Jota et al (2016) reported on various SNPs downstream of M3 with frequencies varying greatly among populations, as for instance SA05, which was found at moderate frequency in Peru and Bolivia, but being very rare in Brazil, or other haplogroups found at moderate frequency in Brazil bur rarely encountered in Peru. In this study, we found out that SA05 likely encompasses two branches present in Peruvian Amerindians, and additionally that within Q-M3 a STR-defined sub-branch exists also uniquely detected in Peruvian Amerindians. These findings led us to predict that a still finer analysis of SNPs downstream of M3 can open a new avenues to achieve a better understanding of the colonization of the New World.

FCUP 52 Characterization of Male Lineages in the Asháninka from Peru

Bibliography

FCUP 53 Characterization of Male Lineages in the Asháninka from Peru

AIDESEP, FORMABIAP, FUNDACIÓN TELEFÓNICA (2000). El ojo verde. Cosmovisiones amazónicas. Lima: AIDESEP, FORMABIAP, Fundación Telefónica.

Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Bentley, D. R., Chakravarti, A., Schloss, J. A. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. https://doi.org/10.1038/nature15393

Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Lacroute, P. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 56–65. https://doi.org/10.1038/nature11632

Amorim, A., & Pereira, L. (2005). Pros and cons in the use of SNPs in forensic kinship investigation: A comparative analysis with STRs. Forensic Science International, 150(1), 17–21. https://doi.org/10.1016/j.forsciint.2004.06.018

Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Bentley, D. R., Chakravarti, A., Schloss, J. A. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. https://doi.org/10.1038/nature15393

Ballantyne, K. N., Goedbloed, M., Fang, R., Schaap, O., Lao, O., Wollstein, A., … Kayser, M. (2010). Mutability of Y-chromosomal microsatellites: Rates, characteristics, molecular bases, and rorensic implications. American Journal of Human Genetics, 87(3), 341–353. https://doi.org/10.1016/j.ajhg.2010.08.006

Ballantyne, K. N., Ralf, A., Aboukhalid, R., Achakzai, N. M., Anjos, M. J., Ayub, Q., Kayser, M. (2014). Toward Male Individualization with Rapidly Mutating Y-Chromosomal Short Tandem Repeats. https://doi.org/10.1002/humu.22599

Bandelt, H. J., Forster, P., Sykes, B. C., & Richards, M. B. (1995). Mitochondrial portraits of human populations using median networks. Genetics, 141(2), 743–753.

Bandelt, H.-J., Peter Forster, and A. R., & ¨hl. (1999). Median-Joining Networks for Inferring Intraspeci c Phylogenies. Molecular Biology, 16(1), 37–48. https://doi.org/10.1093/oxfordjournals.molbev.a026036

Battaglia, V., Grugni, V., Perego, U. A., Angerhofer, N., Gomez-Palmieri, J. E., Woodward, S. R., … Semino, O. (2013). The First Peopling of South America: New FCUP 54 Characterization of Male Lineages in the Asháninka from Peru

Evidence from Y-Chromosome Haplogroup Q. PLoS ONE, 8(8). https://doi.org/10.1371/journal.pone.0071390

Bisso-Machado, R., Jota, M. S., Ramallo, V., Paixão-Côrtes, V. R., Lacerda, D. R., Salzano, F. M., … Bortolini, M. C. (2011). Distribution of Y-chromosome Q lineages in Native Americans. American Journal of Human Biology, 23(4), 563–566. https://doi.org/10.1002/ajhb.21173

Blanco-Verea, A., Jaime, J. C., Brión, M., & Carracedo, A. (2010). Y-chromosome lineages in native South American population. Forensic Science International: Genetics, 4(3), 187–193. https://doi.org/10.1016/j.fsigen.2009.08.008

Bortolini, M., Salzano, F. M., Thomas, M. G., Stuart, S., Nasanen, S. P. K., Bau, C. H. D., … Ruiz-linares, A. (2003). Y-Chromosome Evidence for Differing Ancient Demographic Histories in the Americas, 524–539.

Butler, J. M. (2003). Recent Developments in Y-Short Tandem Repeat and Y-Single Nucleotide Polymorphism Analysis. Forensic Science Review, 15(2), 91–111. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/26256727

Cabana, G. S., Lewis, C. M., Tito, R. Y., Covey, R. A., Cáceres, A. M., Cruz, A. F. D. La, … Stone, A. C. (n.d.). Population Genetic Structure of Traditional Populations in the Peruvian Central Andes and Implications for South American Population History.

Cárdenas, J. M., Heinz, T., Pardo-Seco, J., Álvarez-Iglesias, V., Taboada-Echalar, P., Sánchez-Diz, P., … Salas, A. (2015). The multiethnic ancestry of Bolivians as revealed by the analysis of Y-chromosome markers. Forensic Science International: Genetics, 14, 210–218. https://doi.org/10.1016/j.fsigen.2014.10.023

Cardoso, S., Alfonso-Sánchez, M. A., Valverde, L., Sánchez, D., Zarrabeitia, M. T., Odriozola, A., … De Pancorbo, M. M. (2012). Genetic uniqueness of the Waorani tribe from the Ecuadorian Amazon. Heredity, 108(6), 609–615. https://doi.org/10.1038/hdy.2011.131

CVR (2003) “Los pueblos indígenas y el caso de los asháninkas”. Informe Final de la Comisión de laVerdad y Reconciliación. Tomo V, capítulo 2. FCUP 55 Characterization of Male Lineages in the Asháninka from Peru

Consortium, T. Y. C. (2002). A Nomenclature System for the Tree of Human Y- Chromosomal Binary Haplogroups, 339–348. https://doi.org/10.1101/gr.217602.polymorphisms

Cortez, D., Marin, R., Toledo-Flores, D., Froidevaux, L., Liechti, A., Waters, P. D., Kaessmann, H. (2014). Origins and functional evolution of y chromosomes across mammals. Nature, 508(7497), 488–493. https://doi.org/10.1038/nature13151

de Acosta J, Mangan JE, Lopez-Morillas F, Mignolo WD (2002) Natural and moral history of the Indies. Durham, NC: Duke University Press.

Di Corcia, T., Sanchez Mellado, C., Davila Francia, T. J., Ferri, G., Sarno, S., Luiselli, D., & Rickards, O. (2017). East of the Andes: The genetic profile of the Peruvian Amazon populations. American Journal of Physical Anthropology, 163(2), 328–338. https://doi.org/10.1002/ajpa.23209

Dillehay, T. D., Ramírez, C., Pino, M., Collins, M. B., & Rossen, J. (2008). of South America, 320(May), 784–786.

Espinosa, O. (1993), “Las rondas asháninka y la violencia política en la selva central”. América Indígena, año 1993,número 4, pp. 79-101.

Excoffier, L. and H.E. L. Lischer (2010) Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 10: 564-567.

Fagundes, N. J. R., Kanitz, R., Eckert, R., Valls, A. C., Bogo, M. R., Salzano, F.M., Bonatto, S. L. (2008). Mitochondrial population genomics supports a single pre-Clovis origin with a coastal route for the peopling of the Americas. Am J Hum Genet, 82(3), 583–592. https://doi.org/10.1016/j.ajhg.2007.11.013.

Gayà-Vidal, M., Moral, P., Saenz-Ruales, N., Gerbault, P., Tonasso, L., Villena, M., … Dugoujon, J. M. (2011). MtDNA and Y-chromosome diversity in Aymaras and Quechuas from Bolivia: Different stories and special genetic traits of the Andean Altiplano populations. American Journal of Physical Anthropology, 145(2), 215–230. https://doi.org/10.1002/ajpa.21487 FCUP 56 Characterization of Male Lineages in the Asháninka from Peru

Geppert, M., Ayub, Q., Xue, Y., Santos, S., Ribeiro-Dos-Santos, Baeta, M., … Roewer, L. (2015). Identification of new SNPs in native South American populations by resequencing the y chromosome. Forensic Science International: Genetics, 15, 111– 114. https://doi.org/10.1016/j.fsigen.2014.09.014

Geppert, M., Baeta, M., Nu, C., Gonza, J., Wladimir, O., Cruz, V., … Roewer, L. (2011). Forensic Science International : Genetics Hierarchical Y-SNP assay to study the hidden diversity and phylogenetic relationship of native populations in South America ˜ a Martı, 5, 100–104. https://doi.org/10.1016/j.fsigen.2010.08.016

Goebel, T., Waters, M. R., & O’Rourke, D. H. (2008). The Late Pleistocene dispersal of modern humans in the Americas. Science, 319(5869), 1497–1502. https://doi.org/10.1126/science.1153569

González-José, R., González-Martín, A., Hernández, M., Pucciarelli, H. M., Sardi, M., Rosales, A., & van der Molen, S. (2003). Craniometric evidenc for palaeoamerican survival in Baja California. Nature, 425(SEPTEMBER), 62–65. https://doi.org/10.1038/nature01923.1.

Greenberg, J. H., Ii, C. G. T., & Zegura, S. L. (1986). The Settlement of the Americas: A Comparison of the Linguistic, Dental, and Genetic Evidence. Current Anthropology, 27(5), 477. https://doi.org/10.1086/203472

Griffiths AJF, Miller JH, Suzuki DT, et al. An Introduction to Genetic Analysis. 7th edition. New York: W. H. Freeman; 2000

Guevara, E. K., Palo, J. U., Guill�n, S., & Sajantila, A. (2016). MtDNA and Y- chromosomal diversity in the Chachapoya, a population from the northeast Peruvian Andes-Amazon divide. American Journal of Human Biology, 28(6), 857–867. https://doi.org/10.1002/ajhb.22878

Gusmão, L., Sánchez-Diz, P., Calafell, F., Martín, P., Alonso, C. A., Álvarez- Fernández, F., … Amorim, A. (2005). Mutation rates at Y chromosome specific microsatellites. Human Mutation, 26(6), 520–528. https://doi.org/10.1002/humu.20254

Handley, L. J. L., Manica, A., Goudet, J., & Balloux, F. (2007). Going the distance: human population genetics in a clinal world. Trends in Genetics, 23(9), 432–439. https://doi.org/10.1016/j.tig.2007.07.002 FCUP 57 Characterization of Male Lineages in the Asháninka from Peru

Harris, D. N., Song, W., Shetty, A. C., Levano, K. S., Cáceres, O., Padilla, C., … Guio, H. (2018). Evolutionary genomic dynamics of Peruvians before, during, and after the . Proceedings of the National Academy of Sciences, 115(28), E6526– E6535. https://doi.org/10.1073/pnas.1720798115

Harris H (1966) Enzyme polymorphisms in man. Proc R Soc Lond B Biol Sci 164:298-310. Hrdlicka A (1928) The origin and antiquity of the American Indian. Washington: U.S. Government Printing Office.

Hughes-Stamm, S. R., Ashton, K. J., & Van Daal, A. (2011). Assessment of DNA degradation and the genotyping success of highly degraded samples. International Journal of Legal Medicine, 125(3), 341–348. https://doi.org/10.1007/s00414-010-0455-3

Jobling, M. A., Pandya, A., & Tyler-Smith, C. (1997). The Y chromosome in forensic analysis and paternity testing. International Journal of Legal Medicine, 110(3), 118–124. https://doi.org/10.1007/s004140050050

Jobling, M. A. (2012). The impact of recent events on human genetic diversity. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1590), 793– 799. https://doi.org/10.1098/rstb.2011.0297

Jobling, M. A., & Tyler-Smith, C. (2003). The human Y chromosome: An evolutionary marker comes of age. Nature Reviews Genetics, 4(8), 598–612. https://doi.org/10.1038/nrg1124

Jobling, M. A., & Tyler-Smith, C. (2017). Human Y-chromosome variation in the genome-sequencing era. Nature Reviews Genetics, 18(8), 485–497. https://doi.org/10.1038/nrg.2017.36

Jota, M. S., Lacerda, D. R., Sandoval, J. R., Vieira, P. P. R., Ohasi, D., Santos- Jnior, J. E., … Santos, F. R. (2016). New native South American Y chromosome lineages. Journal of Human Genetics, 61(7), 1–11. https://doi.org/10.1038/jhg.2016.26

Karafet, T. M., Zegura, S. L., Posukh, O., Osipova, L., Bergen, a, Long, J., Hammer, M. F. (1999). Ancestral Asian source(s) of new world Y-chromosome founder haplotypes. American Journal of Human Genetics, 64(3), 817–831. https://doi.org/10.1086/302282 FCUP 58 Characterization of Male Lineages in the Asháninka from Peru

Karafet, T. M., Mendez, F. L., Meilerman, M. B., Wei, W., Ayub, Q., Chen, Y., Hammer, M. F. (2008). New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree, 830–838. https://doi.org/10.1101/gr.7172008

Kitchen, A., Miyamoto, M. M., & Mulligan, C. J. (2008). A three-stage colonization model for the peopling of the Americas. PLoS ONE, 3(2). https://doi.org/10.1371/journal.pone.0001596

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., … International Human Genome Sequencing, C. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921. https://doi.org/10.1038/35057062

Lewontin RC (1972) The apportionment of human diversity. Evol Biol 6: 381-398.

Li, W. H., & Sadler, L. A. (1991). Low nucleotide diversity in man. Genetics, 129(2), 513–523. https://doi.org/10.1093/molbev/msq110

Llamas, B., Fehren-Schmitz, L., Valverde, G., Soubrier, J., Mallick, S., Rohland, N., … Haak, W. (2016). Ancient mitochondrial DNA provides high-resolution timescale of the peopling of the Americas. Science Advances, (April), 1–10. https://doi.org/10.1126/sciadv.1501385

López, S., van Dorp, L., & Hellenthal, G. (2015). Human dispersal out of Africa: A lasting debate. Evolutionary Bioinformatics, 11, 57–68. https://doi.org/10.4137/EBo.s33489

Mazières, S., Sevin, A., Callegari-Jacques, S. M., Crubézy, E., Larrouy, G., Dugoujon, J. M., & Salzano, F. M. (2009). Population genetic dynamics in the French Guiana region. American Journal of Human Biology, 21(1), 113–117. https://doi.org/10.1002/ajhb.20835

Nachman, M. W., & Crowell, S. L. (2000). Estimate of the mutation rate per nucleotide in humans. Genetics, 156(1), 297–304. https://doi.org/papers2://publication/uuid/E46268CF-E7EF-4A7D-A85A-821D27B7F178 FCUP 59 Characterization of Male Lineages in the Asháninka from Peru

Pena SDJ, Santos FR, Bianchi N, Bravi CM, Carnese FR, Rothhammer F, Gerelsaikhan T, Munkhtuja B, Oyunsuren T (1995) Identification of a major founder Y- chromosome haplotype in Amerindians. Nat Genet 11:15–16

Perego, U. A., Angerhofer, N., Pala, M., Olivieri, A., Lancioni, H., Kashani, B. H., … De, U. (2010). The initial peopling of the Americas : A growing number of founding mitochondrial genomes from Beringia, 1174–1179. https://doi.org/10.1101/gr.109231.110.

Pinto, N., Gusmão, L., & Amorim, A. (2014). Mutation and mutation rates at y chromosome specific Short Tandem Repeat Polymorphisms (STRs): A reappraisal. Forensic Science International: Genetics, 9(1), 20–24. https://doi.org/10.1016/j.fsigen.2013.10.008

Potter, B. A., Baichtal, J. F., Beaudoin, A. B., Fehren-Schmitz, L., Haynes, C. V., Holliday, V. T., … Surovell, T. A. (2018). A N T H R O P O L O G Y Current evidence allows multiple models for the peopling of the Americas. Sci. Adv, 4(August), 1–9. https://doi.org/10.1126/sciadv.aat5473

Qamar, R., Ayub, Q., Mohyuddin, A., Helgason, A., Mazhar, K., Mansoor, A., Mehdi, S. Q. (2002). Y-Chromosomal DNA Variation in Pakistan. The American Journal of Human Genetics, 70(5), 1107–1124. https://doi.org/10.1086/339929

Raghavan, M., Steinrücken, M., Harris, K., Schiffels, S., Rasmussen, S., DeGiorgio, M., Willerslev, E. (2015). Genomic evidence for the Pleistocene and recent population history of Native Americans. Science (New York, N.Y.), 349(6250), aab3884. http://doi.org/10.1126/science.aab3884

Roewer, L., Amemann, J., Spurr, N. K., Grzeschik, K. H., & Epplen, J. T. (1992). Simple repeat sequences on the human Y chromosome are equally polymorphic as their autosomal counterparts. Human Genetics, 89(4), 389–394. https://doi.org/10.1007/BF00194309

Roewer, L., Croucher, Æ. P. J. P., Willuweit, Æ. S., Lu, T. T., Kayser, Æ. M., Knijff, P. De, & Jobling, Æ. M. A. (2005). Signature of recent historical events in the European Y-chromosomal STR haplotype distribution, 279–291. https://doi.org/10.1007/s00439- 004-1201-z FCUP 60 Characterization of Male Lineages in the Asháninka from Peru

Roewer, L., Nothnagel, M., Gusma, L., Corach, D., Sala, A., Alechine, E., … Ewart, E. (2013). Continent-Wide Decoupling of Y-Chromosomal Genetic Variation from Language and Geography in Native South Americans, 9(4). https://doi.org/10.1371/journal.pgen.1003460

Rojas Zolezzi, E. (1994). Los asháninka, un pueblo tras el bosque

Saiki, R., Scharf, S., Faloona, F., Mullis, K., Horn, G., Erlich, H., & Arnheim, N. (1985). Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science (New York, N.Y.), 230(4732), 1350– 1354. https://doi.org/10.1126/science.2999980

Saks, M. J., & Koehler, J. J. (2005). The Coming Paradigm Shift in Forensic Indentificaion Science. Science, 309(2005), 892–895. https://doi.org/10.1126/science.1111565

Sandoval, J. R., Lacerda, D. R., Jota, M. S. A., Salazar-Granara, A., Vieira, P. P. R., Acosta, O., … Genographic Project Consortium. (2013). The genetic history of indigenous populations of the Peruvian and Bolivian Altiplano: the legacy of the Uros. PloS One, 8(9). https://doi.org/10.1371/journal.pone.0073006

Santos Granero, F. (1992). Etnohistoria de la Alta Amazonía, Siglos XV al XVIII. Quito: Abya Yala

Sevini, F., Yao, D. Y., Lomartire, L., Barbieri, A., Vianello, D., Ferri, G., … Franceschi, Z. A. (2013). Analysis of Population Substructure in Two Sympatric Populations of Gran Chaco, Argentina. PLoS ONE, 8(5). https://doi.org/10.1371/journal.pone.0064054

Skaletsky, H., Kuroda-Kawaguchi, T., Minx, P. J., Cordum, H. S., Hillier, L., Brown, L. G., … Page, D. C. (2003). The male-specific region of the human Y chromosome is a mosic of discrete sequence classes. Nature, 423, 825–837.

Slooten, K., & Ricciardi, F. (2013). Estimation of mutation probabilities for autosomal STR markers. Forensic Science International: Genetics, 7(3), 337–344. https://doi.org/10.1016/j.fsigen.2013.01.006 FCUP 61 Characterization of Male Lineages in the Asháninka from Peru

Sobrino, B., Brión, M., & Carracedo, A. (2005). SNPs in forensic genetics: A review on SNP typing methodologies. Forensic Science International, 154(2–3), 181–194. https://doi.org/10.1016/j.forsciint.2004.10.020

Stanciu, F., Cuţâr, V., Pîrlea, S., Stoian, V., Stoian, I. M., Sevastre, O., & Popescu, O. R. (2010). Population data for Y-chromosome haplotypes defined by 17 STRs in South-East Romania. Legal Medicine, 12(5), 259–264. https://doi.org/10.1016/j.legalmed.2010.05.007

Tarazona-santos, E., Carvalho-silva, D. R., Pettener, D., Luiselli, D., Stefano, G. F. De, Labarga, C. M., … Tyler-smith, C. (2001). Genetic Differentiation in South Amerindians Is Related to Environmental and Cultural Diversity : Evidence from the Y Chromosome, 1485–1496.

Tineo, D. H., Loiola, S., Paredes, F. V, Noli, L. R., Amaya, Y. C., Simão, F., … Gusmão, L. (2015). Forensic Science International : Genetics Supplement Series Genetic characterization of 27 Y-STR loci in the native population of Ashaninka from Peru, 5, 220–222.

Tirado, M., López-Parra, A. M., Baeza, C., Bert, F., Corella, A., Pérez-Pérez, A., … Arroyo-Pardo, E. (2009). Y-chromosome haplotypes defined by 17 STRs included in AmpFlSTR Yfiler PCR Amplification Kit in a multi ethnical population from El Beni Department (North Bolivia). Legal Medicine, 11(2), 101–103. https://doi.org/10.1016/j.legalmed.2008.09.002

Tishkoff, S. A., & Verrelli, B. C. (2003). PATTERNS OF HUMAN GENETIC DIVERSITY : Implications for Human Evolutionary History and Disease. Annual Review of Genomics and Human Genetics, 4(1), 293–340. https://doi.org/10.1146/annurev.genom.4.070802.110226

Underhill, P. A., Jin, L., Zemans, R., Oefner, P. J., & Cavalli-Sforza, L. L. (1996). A pre-Columbian Y chromosome-specific transition and its implications for human evolutionary history. Proceedings of the National Academy of Sciences of the United States of America, 93(1), 196–200.

Underhill, P. A., Jin, L., Lin, A. A., Qasim Mehdi, S., Jenkins, T., Vollrath, D., … Oefner, P. J. (1997). Detection of numerous Y chromosome biallelic polymorphisms by FCUP 62 Characterization of Male Lineages in the Asháninka from Peru

denaturing high-performance liquid chromatography. Genome Research, 7(10), 996– 1005. https://doi.org/10.1101/gr.7.10.996

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., & Rozen, S. G. (2012). Primer3-new capabilities and interfaces. Nucleic Acids Research, 40(15), 1–12. https://doi.org/10.1093/nar/gks596

Vallone, P.M., and Butler, J.M. (2004) AutoDimer: a screening tool for primer-dimer and hairpin structures. Biotechniques 37(2): 226-231.

Vicard, P., Dawid, A. P., Mortera, J., & Lauritzen, S. L. (2008). Estimating mutation rates from paternity casework. Forensic Science International: Genetics, 2(1), 9–18. https://doi.org/10.1016/j.fsigen.2007.07.002

Wang, Y., Zhang, Y. J., Zhang, C. C., Li, R., Yang, Y., Ou, X. L., … Sun, H. Y. (2016). Genetic polymorphisms and mutation rates of 27 Y-chromosomal STRs in a Han population from Guangdong Province, Southern China. Forensic Science International: Genetics, 21, 5–9. https://doi.org/10.1016/j.fsigen.2015.09.013

Weiss, G. (2005) “Campas Ribareños”. En: SANTOS GRANERO, Fernando y Frederica BARCLAY (editores). Guía etnográfica de la Alta Amazonía. Volumen V. Lima: IFEA, Smithsonian Tropical Research Institute, pp.1-74.

Willuweit, S., & Roewer, L. (2007). Y chromosome haplotype reference database (YHRD): Update. Forensic Science International: Genetics, 1(2), 83–87. https://doi.org/10.1016/j.fsigen.2007.01.017

Xavier, C., Builes, J. J., Gomes, V., Ospino, J. M., Aquino, J., Parson, W., … Goios, A. (2015). Admixture and genetic diversity distribution patterns of non-recombining lineages of native american ancestry in colombian populations. PLoS ONE, 10(3), 1–13. https://doi.org/10.1371/journal.pone.0120155

Xue, Y., & Tyler-Smith, C. (2010). The hare and the tortoise: One small step for four SNPs, one giant leap for SNP-kind. Forensic Science International: Genetics, 4(2), 59–61. https://doi.org/10.1016/J.FSIGEN.2009.08.005

Zegura, S. L., Karafet, T. M., Zhivotovsky, L. A., & Hammer, M. F. (2004). High- Resolution SNPs and Microsatellite Haplotypes Point to a Single, Recent Entry of Native FCUP 63 Characterization of Male Lineages in the Asháninka from Peru

American Y Chromosomes into the Americas. Molecular Biology and Evolution, 21(1), 164–175. https://doi.org/10.1093/molbev/msh009

Web Resources

http://www.coha.org/the-ashaninka-illegal-logging-threatening-indigenous- rights-and-sustainable-development-in-the-peruvian-amazon/

https://genome.ucsc.edu/cgi-bin/hgBlat?command=start

http://www.transpacificproject.com/index.php/transpacific-migrations

www.who.int

FCUP 64 Characterization of Male Lineages in the Asháninka from Peru

Appendix

FCUP 65 Characterization of Male Lineages in the Asháninka from Peru

SupplementalTable 1

-

Haplogroupsand Y

- STRprofiles of theAsháninka samples

FCUP 66 Characterization of Male Lineages in the Asháninka from Peru

SupplementalTable 1

(cont.)

-

Haplogroupsand Y

- STRprofiles ofthe Asháninka samples

FCUP 67 Characterization of Male Lineages in the Asháninka from Peru

Country Linguistic Clade Altitude N-S Cline Asháninka Peru Equatorial- High Central Tucanoan Chachapoya Peru Andean High Central Jivaro Peru Andean High Central Huancas Peru Andean High Central Cajamarca Peru Andean High Central Yanesha Peru Equatorial- High Central Tucanoan Shipibo Peru Ge-Pano-Carib Sea Level Central Cashibo Peru Ge-Pano-Carib Sea Level Northwest Embera-Chamí Colombia Chibcan-Paezan Sea Level Northwest Guambiano Colombia Chibcan-Paezan Intermediate Northwest Coconuco Colombia Chibcan-Paezan Intermediate Northwest Aymara Bolivia Andean Sea Level Central Quechua Bolivia Andean Intermediate Central Yurucare Bolivia Equatorial- Sea Level Central Tuconoan Chimane Bolivia Ge-Pano-Carib Sea Level Central Trinitario Bolivia Equatorial- Sea Level Central Tuconoan Awa-Guajá Brazil Equatorial- Sea Level Central Tuconoan Terena Brazil Equatorial- Sea Level Central Tuconoan Zoé Brazil Equatorial- Sea Level Central Tuconoan Bari Boxi Venezuela Chibcan-Paezan Sea Level Northwest Wayuu Venezuela Equatorial- Sea Level Northwest Tuconoan Kichwa Ecuador Andean Intermediate Northwest Waoroni Ecuador Isolate Intermediate Northwest Pilaga Argentina Ge-Pano-Carib Sea Level Northwest Mapuche Argentina Andean Sea Level Southeast Diaguitas Argentina Isolate Sea Level Southeast Kolla Argentina Andean Sea Level Southeast Palikur French Guiana Equatorial- Sea Level Northwest Tuconoan Emerillon French Guiana Equatorial- Sea Level Northwest Tuconoan

Supplemental Table 2- Classification of all populations in each grouping considered for AMOVAs .

FCUP 68 Characterization of Male Lineages in the Asháninka from Peru

Asháninka Yanesha Chachapoya Huancas Jivaro Cajamarca Terena Trinitario Shipibo Cashibo Aymara Awa- Chimane Zoe Yurucare Quechua Wayuu Bari Embera- Coconuco Wao Guambiano Kichwa Pilaga Kolla Mapuche Guajá Boxi Chami Ashaninka - 0,00525 0,00000 0,00000 0,00584 0,94862 0,00000 0,00782 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,04059 0,00000 0,00000 0,01297 0,00000 0,00188 0,01366 0,00050 0,06326 0,00307 ±0,0008 ±0,0000 ±0,0000 ±0,0007 ±0,0025 ±0,0000 ±0,0009 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0018 ±0,0000 ±0,0000 ±0,0011 ±0,0000 ±0,0004 ±0,0011 ±0,0003 ±0,0026 ±0,0005 Yanesha 0,05472 - 0,00000 0,00000 0,00010 0,49906 0,00000 0,07267 0,00010 0,00000 0,00000 0,00000 0,00000 0,00000 0,00198 0,00000 0,02812 0,00000 0,00040 0,00881 0,00000 0,01485 0,53688 0,01238 0,41560 0,01099 ±0,0000 ±0,0000 ±0,0001 ±0,0048 ±0,0000 ±0,0025 ±0,0001 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0004 ±0,0000 ±0,0017 ±0,0000 ±0,0002 ±0,0011 ±0,0000 ±0,0012 ±0,0043 ±0,0010 ±0,0043 ±0,0011 Chachapoya 0,18143 0,28153 - 0,55519 0,01228 0,01891 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,04643 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 ±0,0053 ±0,0010 ±0,0014 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0021 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 Huancas 0,26253 0,33321 -0,01507 - 0,00208 0,09821 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00010 0,00000 0,00000 0,00020 0,08999 0,00000 0,00000 0,00000 0,00000 0,00277 0,00059 ±0,0005 ±0,0028 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0000 ±0,0000 ±0,0001 ±0,0027 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0005 ±0,0003 Jivaro 0,07421 0,18866 0,07314 0,19643 - 0,26492 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00010 0,00000 0,00000 0,00475 0,00000 0,00010 0,00000 0,00000 0,00059 0,00000 ±0,0044 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0000 ±0,0000 ±0,0007 ±0,0000 ±0,0001 ±0,0000 ±0,0000 ±0,0002 ±0,0000 Cajamarca -0,04328 -0,01196 0,10476 0,09974 0,01867 - 0,00040 0,62172 0,04406 0,00386 0,01634 0,00010 0,00901 0,00297 0,01445 0,00752 0,72914 0,00446 0,06673 0,42709 0,00099 0,41738 0,58143 0,75319 0,53510 0,50203 ±0,0002 ±0,0050 ±0,0021 ±0,0006 ±0,0013 ±0,0001 ±0,0009 ±0,0005 ±0,0012 ±0,0009 ±0,0040 ±0,0006 ±0,0023 ±0,0048 ±0,0003 ±0,0050 ±0,0048 ±0,0042 ±0,0047 ±0,0044 Terena 0,31885 0,22601 0,43556 0,61995 0,40025 0,24011 - 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00010 0,00000 0,00000 0,00000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0000 ±0,0000 ±0,0000 Trinitario 0,04724 0,02534 0,25282 0,31750 0,19167 -0,01798 0,25708 - 0,00040 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,01475 0,00000 0,00525 0,01426 0,00000 0,24087 0,55925 0,00703 0,10751 0,40729 ±0,0002 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0012 ±0,0000 ±0,0008 ±0,0012 ±0,0000 ±0,0037 ±0,0049 ±0,0009 ±0,0035 ±0,0050 Shipibo 0,19677 0,21568 0,25620 0,39436 0,25124 0,10159 0,37115 0,12397 - 0,00010 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00020 0,00257 0,00366 0,00178 0,00020 0,00000 0,00000 0,00050 ±0,0001 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 ±0,0005 ±0,0006 ±0,0005 ±0,0001 ±0,0000 ±0,0000 ±0,0002 Cashibo 0,30474 0,30837 0,31687 0,43456 0,32129 0,19760 0,29435 0,20633 0.14925 - 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00059 0,01148 0,00000 0,00000 0,00000 0,00000 0,00010 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0002 ±0,0011 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 Aymara 0,13150 0,13878 0,27379 0,28975 0,19720 0,10691 0,30885 0,12631 0,25864 0,29379 - 0,00000 0,00000 0,00000 0,00030 0,24334 0,00000 0,00000 0,00010 0,00089 0,00000 0,00000 0,00030 0,00000 0,18909 0,00010 ±0,0000 ±0,0000 ±0,0000 ±0,0002 ±0,0043 ±0,0000 ±0,0000 ±0,0001 ±0,0003 ±0,0000 ±0,0000 ±0,0002 ±0,0000 ±0,0038 ±0,0001 Awa-Guajá 0,45786 0,31307 0,49273 0,70140 0,70140 0,37302 0,76708 0,44055 0,75574 0,74794 0,52153 - 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 Chimane 0,46359 0,32255 0,48249 0,64252 0,67491 0,25449 0,65259 0,33628 0,58445 0,54205 0,52309 0,75566 - 0,00000 0,00030 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 ±0,0000 ±0,0002 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 Zoe 0,35750 0,36371 0,31705 0,54382 0,60681 0,26956 0,70296 0,24492 0,32893 0,35713 0,38040 0,86363 0,92958 - 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00059 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0002 Yurucare 0,47546 0,25012 0,51971 0,62502 0,64797 0,25711 0,54616 0,36166 0,64800 0,58701 0,34585 0,79817 0,79905 0,93042 - 0,00040 0,00010 0,00000 0,00000 0,00089 0,00000 0,00000 0,00040 0,00000 0,00109 0,00020 ±0,0002 ±0,0001 ±0,0000 ±0,0000 ±0,0003 ±0,0000 ±0,0000 ±0,0002 ±0,0000 ±0,0003 ±0,0001 Quechua 0,21232 0,20222 0,30655 0,28038 0,25078 0,16395 0.36263 0,19851 0,30602 0,33714 0,00586 0,54292 0,53362 0,40894 0,33685 - 0,00000 0,00000 0,00059 0,00307 0,00000 0,00000 0,00050 0,00000 0,08514 0,00010 ±0,0000 ±0,0000 ±0,0002 ±0,0006 ±0,0000 ±0,0000 ±0,0002 ±0,0000 ±0,0025 ±0,0001 Wayuu 0,04619 0,06379 0,20800 0,31110 0,17155 -0,03454 0,33070 0,06851 0,27921 0,30504 0,22040 0,49941 0,39287 0,49714 0,43773 0,28906 - 0,00000 0,00000 0,04841 0,00000 0,00525 0,05891 0,08762 0,01970 0,03940 ±0,0000 ±0,0000 ±0,0022 ±0,0000 ±0,0007 ±0,0023 ±0,0029 ±0,0013 ±0,0022 Bari Boxi 0,40648 0,29140 0,43528 0,57490 0,57555 0,24571 0,65896 0,43456 0,69368 0,67511 0,48517 0,47197 0,68226 0,85961 0,63893 0,49603 0,31144 - 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 0,00000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0000 Embera- 0,18105 0,14374 0,29164 0,28050 0,30294 0,08177 0,42630 0,42630 0,18696 0,29504 0,15568 0,50215 0,41684 0,21168 0,44072 0,16477 0,26036 0,53357 - 0,01139 0,00000 0,02861 0,01634 0,00000 0,03713 0,01564 Chami ±0,0009 ±0,0000 ±0,0017 ±0,0011 ±0,0000 ±0,0019 ±0,0013 Coconuco 0,14471 0,19300 0,10720 0,11909 0,23473 -0,01542 0,55209 0,55209 0,21788 0,30322 0,25537 0,69325 0,51921 0,40258 0,58230 0,28607 0,11657 0,55423 0,55423 - 0,00059 0,03465 0,00703 0,00327 0,01148 0,09623 ±0,0002 ±0,0017 ±0,0008 ±0,0005 ±0,0010 ±0,0030 Wao 0,39165 0,37994 0,36907 0,50388 0,44053 0,30634 0,45091 0,27766 0,14878 0,09572 0,32212 0,79576 0,64649 0,35939 0,65726 0,35087 0,45527 0,76506 0,29840 0,37746 - 0,00000 0,00000 0,00000 0,00000 0,00010 ±0,0000 ±0,0000 ±0,0000 ±0,0000 ±0,0001 Guambiano 0,09076 0,23748 0,35459 0,30296 -0,00811 0,44222 0,01535 0,14866 0,27837 0,25676 0,60416 0,45269 0,35045 0,62294 0,30709 0,13854 0,58311 0,09574 0,14074 0,36590 - 0,13127 0,01426 0,00594 0,33353 0,11273 ±0,0035 ±0,0013 ±0,0008 ±0,0046 Kichwa -0,01115 0,27901 0,38225 0,28997 -0,03043 0,31386 0,00975 0,26250 0,30989 0,19610 0,47096 0,41606 0,54134 0,45392 0,25174 0,07149 0,43358 0,11949 0,24466 0,43693 0,04404 - 0,38471 0,23077 0,26423 0,08138 ±0,0047 ±0,0046 ±0,0043 Pilaga 0,04628 0,24044 0,35871 0,21282 -0,02788 0,33879 0,04704 0,27494 0,32432 0,23883 0,37851 0,39904 0,40124 0,48300 0,31773 0,03410 0,37370 0,22113 0,21397 0,44217 0,07789 0,00123 - 0,01465 0,08969 0,06522 ±0,0012 ±0,0027 Kolla -0,00145 0,23709 0,24481 0,17088 -0,02256 0,26974 0,04341 0,27261 0,30984 0,02450 0,46934 0,46324 0,50207 0,27622 0,06217 0,11347 0,35569 0,09516 0,20676 0,41170 0,16997 0,03111 0,09686 - 0,04782 0,05747 ±0,0021 Mapuche 0,08235 0,20657 0,25986 0,21987 -0,01662 0,34741 0,00110 0,16764 0,18362 0,18314 0,48014 0,34334 0,23116 0,43842 0,24253 0,07108 0,45680 0,09900 0,08447 0,29377 0,00524 0,01639 0,03182 0,08897 - 0,09024

Supplemental table 3- Genetic diversity of the Amerindian populations according to the various resolutions levels of 7 Y-STRs.