<<

Supporting Information Schroeder et al. “Origins and genetic legacies of the Taino” SI Text

1. Terminology 1.1 The term “Taino”. The term “Taino” is often used collectively to refer to the indigenous ​ inhabitants of the Caribbean (1), but as Keegan and Hofman (2) have pointed out this assumes a ​ ​ ​ level of homogeneity in the indigenous Caribbean that is unwarranted. At the time of contact, the Caribbean was a complex mosaic of different ethnic groups, which had considerable contact with each other, and the mainland (3, 4). Columbus simply referred to the people he met as “indios” in ​ ​ the mistaken belief that he had arrived in Asia. Later reports distinguish between the “peaceful Arawaks” who lived the Greater Antilles and the Bahamas, where they were known as the Lucayans, and the more “warlike Caribs” who inhabited the Lesser Antilles. A third group known as the “Guanahatabey” (and often mistakenly referred to as “”) allegedly lived in Western where they had been pushed by the advancing Arawaks. The “name game” (2) has ​ continued in more recent times; for example, eminent Caribbean archaeologist Irving Rouse distinguished between the “Classic Taino” who lived in eastern Cuba, Hispaniola, and Puerto Rico and the “Eastern and Western Taino” who lived in the northernmost Lesser Antilles and Jamaica and central Cuba, respectively (1). Bearing this in mind, we use the term “Taino” as a ​ ​ shorthand to refer collectively to the group of Arawakan speakers who inhabited large parts of the Caribbean at the time of European contact and the term “Lucayan Taino” or simply “Lucayan” to refer to the indigenous inhabitants of the Bahamas. 1.2 Native American languages. Comparisons of linguistic and genetic variation can be a powerful tool for reconstructing human population history. In 1987, the American linguist Joseph Greenberg (5) proposed a classification scheme that split all Native American languages into ​ three major groups, which he referred to as Amerind, Na-Dene, and Eskimo-Aleut. Greenberg further suggested that this division represented three separate migrations into the Americas. Interestingly, genetic studies seem to support Greenberg’s three-way classification in that they suggest that modern Native Americans descend from at least three streams of Asian gene flow (6, ​ 7). However, this does not imply that Greenberg’s categories and subdivisions are correct. In fact, ​ it should be noted that while his classification scheme has been widely and uncritically used by human geneticists, it has been widely rejected by historical linguists who study Native American languages. As Bolnick et al. (8) pointed out, Greenberg’s method of “multilateral comparison” ​ was fundamentally flawed and its results are, therefore, utterly unreliable. The danger of using a classification system that has been almost universally rejected by specialists in the field is that it might obscure evolutionary relationships between groups or suggest links that do not exist. Therefore, we opted to break with tradition, and go for the more widely accepted language classifications proposed by Lyle Campbell and others (9–11). However, we note that many of the ​ ​ classifications and proposed language families remain tentative.

2. Archaeological context The archaeological site of Preacher’s Cave is located on the northern part of the island of Eleuthera in the Bahamas, next to Jean’s Bay and directly south of a reef system known as the Devil’s Backbone (Fig. S1). The cave received its moniker from a group of Puritans, known as 1

the Eleutherian Adventurers, who shipwrecked there in 1648 and sought refuge in the cave (12). ​ ​ Archaeological surveys and investigations at Preacher’s Cave revealed multiple phases of occupation dating both to the historic and the prehistoric period. The prehistoric inhabitants of the Bahamas were known as the Lucayans (13). The archipelago is thought to have been first settled ​ ​ around 800-900 AD (13, 14); although some culturally modified material and human remains ​ ​ from the site yielded earlier dates (see Table S1). By 1200 AD, the Lucayans and their customs reflect that of the Taino, a broader nexus of cultural traditions joined by powerful chiefdoms among the Greater Antilles, and from this time period forward, they are often referred to as Lucayan Taino (13, 14). ​ ​ A total of six Lucayan primary burials were discovered within the cave. Like many peoples native to Mesoamerica and the Caribbean, caves were sacred spaces for the veneration of ancestors as well as the origin of the cosmos and humankind (15). Of the six burials, three were ​ ​ well preserved and have been described elsewhere (15). The three burials belonged to two adult ​ ​ males and one female, aged 20-35 years at the time of death (15). Radiocarbon dating of the ​ ​ bones yielded dates between 320 and 1260 cal. AD (15). Two of the burials (burials 1 and 3) ​ ​ contained plaited matting; and the man in burial 3 was buried with a culturally modified triton shell and a cache containing 29 sunrise tellin shells, a nodule of red ochre, and a fish bone sacrifier (15). The cache may have been used for ritual body painting and the triton shell may ​ ​ have been a status symbol. The plaited matting and grave furniture may indicate high status individuals (15). To date, the Lucayan Taino graves from Preacher’s Cave represent the most ​ ​ complete archaeologically documented prehistoric burials in the Bahamas and provide the greatest contribution to the understanding of Lucayan Taino deathways and mortuary practices (15–18). ​ 3. Radiocarbon dating To determine the age of sample PC537, we directly dated the specimen using radiocarbon dating. The currently most reliable method is dating collagen from bone or dentine. However, in tropical environments, collagen is often poorly or not preserved at all (19) and as part of the dentine was ​ used for DNA analysis, the remaining sample was very small. Consequently, we turned to the enamel fraction. While collagen can be purified through various acid and base steps (20), ​ ​ separation of exogenous and endogenous carbon in enamel is more difficult. Dental enamel is carbonate-containing apatite (bioapatite) and the most likely source of contamination is expected to be groundwater carbonates, which are chemically indistinguishable from carbonates originally formed in the bioapatite. Chemical pre-treatment of bioapatite has therefore focused on removing carbonates in labile positions on the crystal surfaces and at grain boundaries (21–24). While the ​ ​ methodology is still far from being a standardised procedure, it is thought to provide a reliable terminus ante quem (21, 25, 26). ​ ​ ​ ​ ​ After DNA sampling, specimen PC537 was cleaned using an air-abrader with aluminium oxide powder at 40 psi and minimum powder flow. The remaining dentine fraction (55 mg) was sampled for radiocarbon dating of collagen with a diamond drill. Subsequently, the enamel fraction was removed from the cementum using a diamond drill (at less than 3000 rpm to avoid heating) in order to obtain a very fine powder. Research has shown that grinding or drilling to fine powder increases the reliability of the enamel dates, due to contaminants at the grain boundaries being made accessible to chemical pre-treatment (21). Dating of the dentine fraction ​ ​ followed standard ORAU procedures for collagen extraction and dating (20). However, to avoid ​ ​

2

excessive sample loss, we skipped the ultrafiltration step. Finally, the sample was freeze-dried before undergoing combustion and graphitisation. For enamel dating there is currently no standard protocol available, however, several experimental protocols have been applied in the past (21, 25). Due to the very small sample size ​ ​⁠ of 125.63 mg, experimental treatments with acetic acid as frequently used on enamel and bone apatite were deemed too risky: Although showing improvements in dating results, they cause large amount of sample loss requiring a starting sample weight of around 1-2 g (21). As research ​ ​ has shown that enamel dates are, if inaccurate, too young rather than too old – even in radiocarbon depleted environments – it was important to increase the likelihood of sufficient enamel powder surviving chemical pre-treatment for radiocarbon dating at ORAU (> 0.4 mg C) to establish a robust terminus ante quem. At the same time, contamination removal was important ​ ​ to obtain a date as close to the true age as possible. Treatments with HCl are easier to control and have been used previously with good results for the time period in question (26). ​ ​ Procedures as presented in (26) were used as a guideline. The enamel powder was treated ​ with 23 ml of 0.01 M HCl solution for 1h at 4°C, then rinsed three times with Milli-Q water. The timing was reduced from previously published 2h, as experiments have shown that 1h is sufficient to run the reaction to completion (as indicated by a neutral pH after treatment). The enamel sample size unfortunately excluded the use of stronger acid solutions. After chemical pre-treatment, the sample was frozen and subsequently freeze-dried for 48h. CO extraction ​2 followed standard ORAU protocol for shell carbonates (20): The enamel sample was digested in ​ ​ ​ vacuo in a Pyrex® reaction vessel alongside a IAEA-C1 marble standard with phosphoric acid (3 ​ ml, 85%). An in-house gas collection system, a Carlo Erba elemental analyzer and a Sercon stable isotope mass spectrometer were used to recycle the CO2. To optimize the conversion of ​ ​ CO of ‘very small’ samples to graphite, the desiccant magnesium perchlorate was added to the ​2 water trap of the reactor rigs as described by Motuzaite-Matuzeviciute et al. (27). While routine ​ ​ samples at ORAU of 0.8-1.8 mg C do not require this addition, it is critical for sample sizes of <0.8 mg C. The graphite targets are dated using the HVEE AMS system at ORAU (28). For ​ ​ subsequent radiocarbon calibration IntCal13 (29) and OxCal 4.2 (30) were used. ​ ​ ​ ​ Unfortunately, the collagen fraction dissolved completely during the ABA pre-treatment. However, the enamel fraction of 125.63 mg yielded 87.69 mg after 0.01M HCl treatment and produced a graphite target containing 0.46 mg C, which was sufficient for dating at ORAU. The minimum age is 1082 ± 29 BP (see Table S2 and Fig. S2) – clearly placing the specimen in the pre-Columbian era. Considering enamel radiocarbon ages provide minimum ages, even in radiocarbon depleted environments (21), we can confidently place the specimen at the beginning ​ ​ of the 11th century AD or earlier. Previous experiments on dental enamel from similar, but ​ slightly older time periods showed offsets of around 10 to 100 years for chemically pre-treated fractions (for acetic acid and weak vacuum (21), for HCl (26)) and an offset of less than 110 ​ ​ ​ ​ years for untreated enamel (26). This means that the true age of enamel formation could be as ​ ​ much as a century older than the radiocarbon date obtained. Considering the young age of the sample and the exponential decay of radiocarbon, a larger offset seems unlikely. To visualise what this means in terms of calendar years, tests with radiocarbon ages showing a 20, 40, 60, 80 and 100-year radiocarbon offset to the enamel date are calibrated in Fig. S3. Due to the of the calibration curve at this time period (plateau in the 9th century AD), all tests incorporate at ​ least one full century and variation between individual dates are small. The largest change occurs between 20 and 40 years offset, where the age range expands drastically as a consequence of the calibration curve plateau. As it is impossible to calculate which offset is correct, the sum of all 3

radiocarbon probability distributions between 1083 ±29 and 1183 ±29 was calculated to provide a more realistic, albeit conservative, calibrated age estimation for PC537 (see Fig. S4). It shows that the tooth was most likely formed in the 9th or 10th century AD, with a higher probably in the ​ ​ latter. This date clearly places the genome before the arrival of Columbus.

4. Isotope analysis

87 86 Strontium isotope ( Sr/​ Sr)​ analysis is a commonly employed method for archaeological ​ ​ 87 86 provenance studies based on two main principles: 1) that geological Sr/​ Sr​ is highly variable ​ ​ spatially owing to differences in age and lithology of underlying bedrock; and 2) that owing to the relatively minor mass differential between the isotopes of strontium (Sr), it does not undergo substantial mass-dependent fractionation in most biochemical processes. The Sr isotope approach has proven to be particularly effective for sourcing the childhood (natal) origins of humans based 87 86 on the analysis of Sr/​ Sr​ in dental enamel (31). As enamel, unlike bone, is essentially an inert ​ ​ ​ ​ material after mineralization is complete, it preserves the isotopic signal of the biochemical environment where it developed. Numerous studies (e.g., (32) have demonstrated that inner ​ (core) enamel is relatively resistant to diagenic contamination and that minor post-mortem alteration can be mediated by standard treatment protocols. Several important factors need to be considered to effectively apply the Sr isotope approach, including assessing the extent of local 87 86 (bioavailable) Sr/​ Sr​ variation in order to distinguish locals versus non-locals (33). ​ ​ ​ ​ Explorations of possible origins in the circum-Caribbean region can be conducted by comparing 87 86 the Sr/​ Sr​ signals of non-locals with regional isotope data sets (34) or predictive models of ​ ​ ​ spatial variation (35). However, precise determinations of geographic origins usually cannot be ​ ​ determined with a high degree of reliability owing to the issue of equifinality, whereby multiple 87 86 locales or regions share similar baseline (bioavailable) Sr/​ Sr​ ranges. 87 86 18 ​ ​ 13 In conjunction with Sr/​ Sr,​ oxygen (δ O)​ and carbon (δ C)​ isotope analyses of dental ​ ​ ​ ​ enamel are routinely conducted to provide more robust identifications of non-locals and to contribute to more nuanced explorations of origins (e.g., (36). Natural variations in oxygen ​ ​ ​ ​ isotope variation in precipitation and groundwater have been well established (37). Oxygen ​ ​ isotopes in fossil and skeletal remains have been commonly employed as a tool for paleoclimate 18 reconstruction and formed the basis for the first studies that clearly demonstrated that δ O​ of 18 18 18 ​ bioapatite (δ O​ ap) is highly correlated with δ O​ in precipitation (δ O​ prec) (38, 39) and as such ​ ​ ​ ​ ​ ​ ​ ​ 18 can be utilized for archaeological provenance studies. The basic premises are that 1) δ O​ is ​ ​prec spatially highly variable relative to variations in climatic and geographic conditions (e.g. latitude, altitude, temperature and amount of precipitation, distance from coast) and 2) that this variation is 18 also reflected in the patterning of δ O​ ap. Unlike strontium, oxygen does undergo substantial ​ ​ ​ fractionation between consumed water and skeletal bioapatite (38). Therefore, these values are ​ ​ not directly comparable, requiring conversion (see refs. (40–42). ​ ​ Carbon isotope analysis has a long history in archaeological research primarily as a tool for dietary reconstruction (e.g., (43, 44) The principle of the method is based on the fact that large ​ 13 differences in the carbon isotope values (δ C)​ of different food sources are passed on to the ​ 13 tissues of consumers. The primary sources of natural variation in the δ C​ of plants relates to ​ 13 differences in isotopic fractionation during carbon fixation, whereby the δ C​ of plants utilizing ​ C3 versus C4 (or CAM) photosynthetic pathways are highly distinct. In animal tissues, the primary sources of variation derive from isotopic enrichment related to trophic level effects and 13 to differences in the background environmental δ C​ of terrestrial versus marine ecosystems (43). ​ ​ ​

4

4.1 Analytical procedure. For the present study, we carried out Sr, C and O isotope analyses on ​ five dental enamel samples, following standard protocols (45, 46). The analyses were carried out ​ ​ at the Faculty of Earth and Life Sciences, VU University, Amsterdam. Crown fragments were first cleaned in ultra-pure H2O, MilliQ. To remove extant organic residues and secondary ​ ​ carbonates that may be attached to the tooth surface, a modified version of the Bocherens et al. (45) pre-treatment protocol was applied entailing briefly soaking (4 hours) enamel fragments in

2.5% NaOCl and then 1M CH3COOH (Ca acetate buffer, pH=4.5). The enamel fragments were ​ ​ then mechanically abraded with a diamond-tipped rotary burr mounted to a dental drill to remove remaining surface deposits and to expose the inner core enamel, and then ~5 mg of powdered enamel was extracted. Strontium was purified from the sample matrix via dissolution in 3M

HNO3 and passing the samples over Sr-specific resin loaded onto pre-cleaned quartz columns. 87 86​ Sr/​ Sr​ was measured on a Thermo Finnigan MAT262 RPQplus thermal ionization mass ​ spectrometer (TIMS). Long-term measurements of the standard reference material (NBS-987) produced a mean 87 86 Sr/​ Sr​ of 0.71025 ±0.00003 (1σ) and the standard error (2SE) for each sample is <0.00001. ​ Individual measurements were normalized and then corrected based on the generally accepted 87 86 Sr/​ Sr​ ratio of NBS-987 (0.71024). Sr concentrations of blanks are consistently negligible ​ relative to the size of the loaded sample. Enamel carbon and oxygen isotope compositions were measured on a Finnigan DeltaPlus Isotope Ratio Mass Spectrometer (IRMS), following reaction of the sample with 100% H3PO4 (24 hours) and isolation of the produced carbon dioxide (CO2) ​ ​ ​ ​ ​ with a Gasbench II universal automated interface. The long term reproducibility of the standard 18 13 18 13 reference material (NBS-19) for δ O​ is <0.2‰ and for δ C​ is <0.1‰. All δ O​ and δ C​ values ​ ​ ​ ​ referenced herein are reported in the delta (δ) notation, in parts per thousand (‰) relative to the international PDB (Pee Dee Belemnite) standard, unless noted otherwise.

87 86 4.2 Results. The Sr/​ Sr​ values of all samples are homogeneous (mean =0.70916±0.00001, n=5; ​ ​ ​ 87 86 see Table S3) and close to the reported average Sr/​ Sr​ ratio of modern seawater = 0.70918 (47). 87 86 ​ ​ ​ ​ No local bioavailable Sr/​ Sr​ data are yet available for the island of Eleuthera but Ostapkowicz ​ ​ 87 86 and colleagues (48) have recently reported wood Sr/​ Sr​ values (mean=0.70917, range=0.70914 ​ ​ ​ - 0.70920, n=91) of several species of hardwood trees from other islands in the Bahamian archipelago (Long Island, Cat Island, and the Turks and Caicos) that are very similar to the ratios 87 86 we obtained. Therefore, we conclude that the Sr/​ Sr​ values from Preacher's Cave are consistent 18 ​ ​ with a local (Bahamian) origin. The δ O​ values are also fairly homogenous, ranging from -3.6 to ​ 18 -2.0‰ (mean=-2.8±0.6, n=5; see Table S3). Although no human apatite δ O​ data from the 18 ​ Bahamas are available for comparison, the Preacher's Cave δ O​ results fall within the range of ​ values reported for other populations in the ancient Antilles (46). Furthermore, the values fall ​ ​ within the 2‰ range that has been proposed as the "natural" range for a single, local population (49). As such these values are also consistent with a local origin. ​ 13 In contrast to the other isotope data, the δ C​ values are less homogenous, ranging from ​ 13 -11.5 to -5.6‰ (mean=-9.1±2.0, n=5; Table S3). Overall, the enamel δ C​ results are consistently 13 ​ elevated compared to previously published Bahamian bone apatite δ C​ values (50). Some recent 13 ​ ​ ​ studies have questioned whether enamel and bone δ C​ ought to be considered isotopically ​ equivalent (51) and differences between the two datasets may result from age-related differences ​ in dietary practices or post-mortem alteration of bone apatite. However, this cannot account for 13 the wide range of δ C​ values within the dataset itself. A large database of prehistoric human 13 ​ 13 enamel δ C​ from throughout the Antilles (46) displays a broad range of δ C​ values with >90% ​ ​ ​ 5

of (Ceramic Age) individuals falling between ca. -13‰ and -8‰, indicating mixed diets. Most of the Preacher’s Cave individuals fall within this range of values, except for sample PC8 whose 13 13 δ C​ value of -5.6‰ is one of the highest recorded (enamel or bone) δ C​ values in the prehistoric ​ 13 ​ Caribbean. Such a highly elevated δ C​ value suggests a diet that differed from other native ​ 13 Antilleans, by the consumption of relatively greater proportions of C​ enriched foods (e.g. C4 ​ 13 plants and/or marine foods). The overall pattern of relatively heterogeneous δ C​ values is greater ​ than expected for a single population and suggests that some other temporal, spatial, ecological, or social factor is contributing to the dietary variability amongst the Preacher’s Cave sample set. In summary, the strontium and oxygen isotope results from the Preacher’s Cave site are fairly homogenous and, therefore, consistent with a local (Bahamian) origin. The carbon isotope values are more heterogeneous and we believe that this might reflect changing dietary patterns in the Bahamas over time. Although the Preacher’s Cave individuals are likely local (as suggested by the Sr and O isotope results), they are probably not all contemporaneous, as indicated by wide temporal spread of radiocarbon dates obtained for the burials at the site (15). ​ ​ 5. DNA extraction and library preparation 5.1 DNA extraction. DNA was extracted using silica pellets, targeting the cementum-rich layer ​ of the teeth and following a 10 min pre-digestion step, as described in (52). All pre-PCR steps ​ ​ were performed in the ancient DNA facilities at the Centre for Geogenetics in Copenhagen, Denmark. Initially, the samples were cleaned using a tissue dipped in 10% bleach solution and UV-irradiated for 2 min on each side to further reduce the amount of surface contamination. Subsequently, the tooth roots were cut off and crushed into small chunks using a pestle and mortar. Between 100-200 mg of each root was then weighed into 5 ml Eppendorf tubes and incubated at 45°C in 1 ml of an EDTA-based digestion buffer containing 0.25 mg/mL Proteinase K for 10 min. Subsequently, the buffer was replaced with 4 ml digestion buffer, and the samples digested overnight at 45°C. The DNA was then purified using 50 μl of silica pellets and 40 ml of binding buffer containing 5 M Gu-HCl, 100 mM NaOAc, 20 mM NaCl and 30% isopropanol, as described in (53). The silica-bound DNA was washed twice in 80% ethanol eluted in 60 μl EB. ​ ​ 5.2 Library preparation. Thirty μl of each DNA extract was built into NGS libraries using ​ NEBNext modules E6050 and E6056 (New England Biolabs) and Illumina specific adapters (54). ​ ​ Libraries were prepared according to manufacturer’s instructions, with the following modifications. The initial nebulization step was skipped because of the fragmented nature of ancient DNA. End-repair was performed in 50 μl reactions using 30 μl of extract. The reactions were incubated for 20 min at 12°C and 15 min at 37°C, purified using MinElute columns (Qiagen) and eluted in 30 μl EB. Adapter ligation was performed in 50 μl reactions using 30 μl of end-repaired DNA and Illumina-specific adapters (54). The reactions were incubated for 15 min ​ ​ at 20°C, purified using MinElute columns, and eluted in 30 μl EB. Purified adapter-ligated DNA fragments were then filled in using Bst DNA polymerase (New England Biolabs) at 37 °C for 15 minutes, followed by 20 min at 80°C to inactivate the enzyme. Ten μl of the DNA libraries were then amplified and indexed in 50 μl PCR reactions, using AmpliTaq Gold DNA polymerase (Applied Biosystems) and sample-specific barcodes, as described in (55). PCR conditions were ​ ​ as follows: 4 mins at 94°C, followed by 8-12 cycles of 30 sec at 94°C, 30 sec at 60°C, and 30 sec at 72°C, and a final extension step of 5 min at 72°C.

6

6. Molecular decay DNA preservation in the Caribbean is known to be poor (56–58). To investigate the molecular ​ ​ preservation at the site of Preacher’s Cave, we examined the declining part of the length distribution of the sample that generated the most reads (PC537), using a previously published framework (59). We find that the distribution (Fig. S5) displays periodicity with local maxima ​ ​ roughly every 10 bp. This pattern has been observed before and probably reflects the preferential cleavage of the DNA molecule where it faces away from the nucleosomes (60). Further, we find ​ ​ 2 that the declining part of the distribution follows an exponential decay curve (R =0.92),​ as ​ expected for random fragmentation of DNA. Deagle et al. (61) showed that the decay constant (λ) ​ in this exponential relationship expresses the damage fraction (the fraction of broken bonds in the DNA strand) and that 1/λ reflects the theoretical average fragment length. By solving the equation, we retrieved a DNA damage fraction (λ) of 1.6%, corresponding to a theoretical average fragment length (1/λ) of 63 bp. DNA fragmentation can be described as a rate and the damage fraction (λ, per site) can be converted to a decay rate (k, per site per year), when the age of the sample taken into account (59). By using an age estimate of 1,000 years for the Taino ​ ​ -5 sample, the corresponding decay rate (k) becomes 1.6 breaks​ per bond per year, corresponding ​ to a molecular half-life of 434 years for 100 bp fragments. This implies that after 434 years each 100 bp stretch of DNA will have had 1 break on average. This estimated is significantly higher than those reported for other ancient genomes from Europe and North America (see Table S6) and highlights the negative effect that the tropical climate has on DNA preservation. The fact DNA has survived at all can probably be ascribed to the fact that the sample was found in a cave, which has a more constant climate and lower average annual temperatures. Having said that, it should be noted that the other four samples from the cave were not as well preserved, suggesting that there might be other factors at play.

7. Sequencing error estimates We used ANGSD (62) to estimate overall and type-specific error rates. The method works on the ​ assumption that any given human genome should have on average the same number of derived alleles compared to an outgroup, e.g. the chimpanzee. Any excess of derived alleles (compared to a high quality genome) observed in the ancient sample can be assumed to be due to errors. We used sequence data for individual NA12778 from the 1000 Genomes Project as the high-quality genome and the chimpanzee genome (63) as outgroup to determine the ancestral allele. Both the ​ high quality and the ancient genomes were filtered with minimum base quality 20 and minimum mapping quality 30. The type specific error rates for the Taino genome are shown in Fig. S8. The overall error rate was estimated to be around 0.2%. This estimate is ca. 20 times higher than that obtained for a modern genome (64), but it is comparable to previously reported error rates for ​ ​ other ancient genomes, including the Siberian Mal’ta genome (65) and the Clovis genome (66). ​ ​ ​ The main reason for the increased error rates in ancient genomes is the expected increase C→T and G→A transitions caused by ancient DNA damage. Indeed, when excluding C→T and G→A errors, the average error rate of the ancient genomes is comparable to those of modern genomes.

8. MapDamage analysis Ancient DNA tends to be highly fragmented and chemically modified (67, 68). Both patterns are ​ ​ routinely being used to assess the authenticity of ancient DNA datasets (69, 70). We examined ​ ​ the shotgun reads from the screening run with respect to three molecular features: i) average read 7

length, ii) base composition before strand breaks, and iii) apparent C to T substitutions, specifically towards the end of reads. The analyses were carried out using mapDamage 2.0 (71). ​ ​ While the average read length does reflect the true average DNA fragment length in the extracts (see below), it does give some indication of the molecular preservation of the sample. We found that all 5 samples showed the features characteristic for ancient DNA, including i) short average read lengths (Fig. S6, Table S4), ii) an increased occurrence of purines before strand breaks (Fig. S5), which probably results from the depurination of the DNA followed by hydrolysis of the phosphate-sugar backbone (70), and iii) an increased frequency of apparent cytosine (C) to ​ ​ thymine (T) substitutions at 5′-ends (Fig. S7), which likely results from the deamination of cytosine residues that occur primarily in the single-stranded overhangs of DNA fragments (70). ​ ​ When simulating the posterior distribution of three damage parameters using the Bayesian framework implemented in mapDamage 2.0 (71) we found that all departed from zero (λ: mean, ​ 0.43; standard deviation, 0.007; ∂D: mean, 0.022; standard deviation, 0.0003; ∂S: mean, 0.40; standard deviation, 0.008), suggesting that the majority of the recovered DNA sequences are indeed ancient as opposed to modern contaminants.

9. Contamination estimates We used two different methods to estimate the amount of modern contamination in the Taino data. The first method (72) uses Markov chain Monte Carlo (MCMC) to estimate the contaminant ​ fraction of mtDNA reads. The ancient reads were combined to form a consensus sequence and merged with 311 modern mtDNA sequences using MAFFT-v7.182 (73). We ran three chains of ​ ​ ​ 50,000 iterations for the Monte Carlo Markov Chain, discarding the first 10,000, and assessed convergence by visualizing the potential scale reduction factor (PSRF) and verifying that the median of PSRF is below 1.01 for all cases (74). Using this method, we obtained a contamination ​ ​ estimate with a 95% credible interval of 0.3-0.8%. The second method provides a genome-wide estimate and relies on the MCMC algorithm implemented in DICE (75). Various human ​ ​ ​ populations from the 1000 Genomes project (76) were considered as potential contaminant ​ sources, and the Yoruba (YRI) were defined as the anchor population. Using this approach, we obtained a contamination estimate with a 95% credible interval of 0.1-1.2%. Both contamination estimates are low and at a level where they are unlikely to affect any of the downstream analyses (75). ​ 10. Molecular sexing We used the ratio of reads mapping to the Y chromosome (chrY) and X chromosome (chrX) to determine the sex of each sample as described in (77). We ran the script provided with the ​ ​ publication with default parameters to calculate R (defined as the fraction of reads that map to ​ ​y ​ chrY out of the total of reads mapping to both chrY and chrX), which in turn is used to assign the sample to either XX or XY. All specimens from Preacher's Cave could be assigned to females, except for specimen PC683 who turned out to be male. In case of specimen PC549, which was sampled from burial 1, the molecular sexing result matches the sexing based on morphological traits (15). In case of specimen PC115, however, which is thought to belong to burial 3, the ​ ​ molecular result did not match the morphological assessment (15). However, it should be noted ​ ​ that the latter was based exclusively on the visual inspection of the skull, which is considered to be a less reliable indicator than the hip bone (78). For the rest of the samples, no morphological ​ ​ sex assessment was available, as they were isolated finds.

8

11. Mitochondrial DNA analysis We used a two-step mapping iteration to determine the mtDNA consensus sequence for PC537. This was done in order to exclude nuclear copies of mitochondrial genes (NUMTs) that could affect calling the correct mtDNA consensus sequence (79). First, we retrieved all reads that ​ ​ mapped to the revised Cambridge Reference Sequence (80) from the filtered bam file using ​ samtools-1.2.1 (81) view. We then used samtools-1.2.1 (3) to call the consensus, only using reads ​ with a mapping quality and sites with a base quality above 30. We then re-mapped the reads to the nuclear genome build 37.1 and the consensus sequence, excluding reads with potential alternative mapping coordinates by controlling for XT, XA and X0 tags in order to reduce the number of NUMTs. We then called a new consensus sequence using samtools-1.2.1 (3) and used haplogrep 2 (82) to determine the haplogroup (B2). ​ ​ We then compiled a total of 422 mitogenomes belonging to haplogroup B2 from the literature and used PAML 4.4 (83) to construct a phylogeny (Fig. S9). Split times were estimated ​ using Soares et al.’s (84) revised mtDNA mutation rate of one mutation every 3,624 years. ​ Finally, we used SAGA 2.1.7 (85) to determine the geographic distribution of B2 frequencies ​ across the Americas (Fig. S10). The frequency distribution was rendered using an ordinary Kriging procedure for interpolating haplogroup frequencies. B2 relative frequencies were computed over the total Native American component of the samples (haplogroups: A2, B2, C1, D1, D4h3 and X2a), in order to account for the differential admixture existing in American populations. Moreover, only samples of size n>15 were used. To check if the ancient Taino lineage matches any of the extant haplotypes found in the Caribbean today, we compared the ancient sequence to 1,165 published modern mtDNA haplotypes from the region (ref. 86; Puerto Rico n = 27; Dominican Republic n = 27), (ref. 87; ​ ​ ​ ​ Haiti n = 291), (ref. 88; St Vincent n = 65; Trinidad n = 23), (ref. 89; Puerto Rico n = 288; ​ ​ ​ ​ Vieques n = 38), (ref. 90; Belize n = 28), (ref. 91; Cuba n = 245), (ref. 92; Jamaica n = 50), (ref. ​ ​ ​ ​ ​ ​ 93; Dominican Republic n = 83), but were unable to find any match or close phylogenetic ​ neighbors. Overall, the B2 lineage appears to be relatively rare in Caribbean populations today and it has not previously been found in ancient populations from the region (94-96). It is possible, ​ ​ ​ ​ therefore, that haplotype B2 was relatively rare in the Caribbean in the past. Alternatively, it may have been lost during the extreme bottleneck experienced by Caribbean populations after the arrival of Europeans.

12. Masking Masking was done according to previously described procedures (97). The dataset was combined ​ ​ with continental reference panels, representing European, Native American, and African ancestry. The European and African reference populations consisted of 75 CEU and 75 YRI individuals from 1000 Genomes (76), and the Native American panel consisted of 75 individuals ​ ​ with Maya and Tepehuano ancestry (98). To mask the genotypes, phasing was performed using ​ ​ SHAPEIT v2.r837 (99). Local ancestry estimation was performed using RFMix version 1.5.4 ​ ​ (100), with the phase correction option enabled and no EM rounds performed. All other settings ​ were the default setting. Local ancestry calls with a forward-backward probability of less than 0.95 were considered to be unknown. Genomic regions in each individual genome that did not contain homozygous high quality Native American ancestry calls were masked for subsequent analyses.

9

13. Principal components analysis To explore the relationship between the Taino and extant Native Americans we performed principal component analysis (PCA) on a subset of 17 Native American populations from (101) ​ which reflect some of the genetic variation present in Mesoamerica and South America. Prior to analysis we excluded individuals with too much missing data (mind 0.1) and PCA was performed using smartPCA (102) by projecting the ancient genome using lsq project (Fig. S11). There is ​ clear structure among Mesoamerican and South American populations and the Taino plots closest to Arawakan and Cariban speakers from northern South America, including Arara, Guarani, Parakana and Jamamadi.

14. Runs of homozygosity Genome-wide distributions of ROH are influenced by population history, both ancient and recent, as well as cultural practices such as endogamy (103, 104). This results in considerable variation ​ ​ of ROH patterns across individuals from different geographical regions and cultures. Previous analyses have identified three approximate classes of ROH - short, medium and long (104). The ​ ​ total length of short and medium genomic ROH (<1.6 Mb) tends to be strongly correlated with geography, increasing with distance from East Africa, indicative of deep demographic history. In contrast, longer ROH >1.6 Mb tend to be the result of more recent inbreeding and lack this geographic correlation. Native American groups have previously been observed to possess some of the highest levels of genomic ROH coverage, both for long and short ROH length categories (103, 104), ​ ​ indicating both ancient and recent restrictions in effective population size and/or inbreeding. However, the majority of ROH analyses in Native Americans have focused on isolated South and Central American populations (103). The current demographic model for the initial peopling of ​ ​ the Americas (97) suggests that, after initial entry via Beringia, people spread rapidly south ​ through landmass following a pattern of serial founder effects. In this case, these Southern and Central populations are unlikely to be an adequate representation of the full range of genomic ROH patterns present in Native American populations. Furthermore, recent admixture, such as that following European colonization of the Americas, can obscure past patterns of ROH which may only be observable through the analysis of ancient individuals. With this in mind, we explore the length distributions of genomic ROH in 37 modern (97, ​ 105, 106) and two ancient Native Americans, including a 12,600 year-old Clovis individual (66) ​ and the Taino individual from the current study, alongside 72 modern individuals from Northeast Siberia (97, 105, 106). The ancient genomes were merged with those of 109 modern Siberian and ​ ​ Native American individuals (97, 105, 106) retaining only transversions with no missing ​ genotypes. This eliminates the confounding effects of errors resulting from post-mortem damage. We also filtered the dataset for minor allele frequency, based on a larger dataset of 187 individuals (76 East Asians and Central Siberians included, but not used for ROH analysis). Sites with less than three minor alleles present in this dataset were removed, resulting in a final 583,623 SNPs for analysis. We then used PLINK 1.9 (107) to call ROH with the parameters ​ specified in (108): ​ ​ --homozyg, --homozyg-density 50, --homozyg-gap 100, --homozyg-kb 500, --homozyg-snp 50, --homozyg-window-het 1, --homozyg-window-snp 50 and --homozyg-window-threshold 0.05

10

Following the analysis described in (103) ROH were placed into six distinct size bins and the ​ total length of ROH in each bin was calculated for each individual. The silhouetted ROH distributions for select populations, grouped by geography, are displayed in Figure 2, alongside those of the two ancient individuals. These groupings consist of 17 Aymara, Chane, Quechua, Cachi, Colla and Wichi individuals (Andes/Gran Chaco), 3 Piapoco and Yukpa (Colombia/Venezuela), 5 Surui and Karitiana (Amazon), 10 Huichol, Mayan, Mixe, Pima and Zapotec (Mesoamerica/Arizona), 2 Athabaskans (Pacific Northwest), and 16 Koryak (Northeast Siberia). Furthermore, to explore the ROH classes identified in (104), ROH were split into two ​ ​ classes, short to medium ( < 1.6 Mb) and long (≥ 1.6 Mb), for both of which the summed total length for every individual was calculated. The resulting values are plotted against each other for all 111 individuals (additional northeast Siberian groups included) in Fig. S18.

15. Effective population size (Ne) estimate ​ ​ ​

We estimated the recent effective population size (Ne) of the Lucayans and other Native ​ ​ ​ American groups in our dataset using the estimator by Palamara et al. (109). The estimate is ​ ​ based the amount of IBD sharing observed in a specific length range. Using 1.6 Mb as a threshold we obtain an N estimate of around 1,600 individuals for the Lucayan Taino genome (Table S13). ​ ​e The effective size can be expected to be several times smaller than census population sizes because of the inclusion of children and elderly individuals in the census, variance in reproduction rates, and other biological and cultural factors. Demographic arguments suggest that in modern human populations, the effective size should be around one-third of the census size (110), which would put the size of the Lucayan population at around 5,000 individuals. However, ​ we caution that genetics-based estimates of recent effective population size are highly dependent on the sample size (111) and that the actual population size is likely to have been much higher. ​ ​ 16. Direct ancestry test To test if the Taino was directly ancestral to modern Puerto Ricans, we used a recently developed likelihood ratio test (112). The test measures the amount of genetic drift since the split of the ​ ​ ancestral population of the modern and ancient samples and estimates drift times t and t leading ​1 ​2 to the modern and ancient samples. Since we were specifically interested in the Native American component in modern Puerto Ricans, we used only sites that were identified as Native American in local ancestry analysis (see section 12). However, this resulted in a substantial drop in sample size, which posed a challenge due to the method’s reliance on accurate allele frequency estimates in the modern populations. We used a two-step procedure to help alleviate the problems caused by small sample sizes: 1. Estimate an underlying allele frequency distribution in the modern population by fitting a beta-binomial distribution to the allele counts 2. Use the fit beta distribution as a prior to average over uncertainty in the allele frequencies in the modern population. For the first task, we found maximum likelihood estimates of the α and β parameters of the beta distribution by assuming that alleles are binomial draws from an underlying beta distribution of allele frequencies. Thus, assuming that a segregating site i has n haplotypes, of which the derived ​i allele shows up in ki of them, we maximize the likelihood ​ ​

11

n B(k +α,n −k +β) L(α, β) = ( i ) i i i , where, B(α, β) is the usual beta function. ki B(α,β) With maximum likelihood estimates for α and β in hand, we then used these as a prior for the allele frequencies in the modern population. Assuming an ancient sample with m samples (i.e. m/2 diploid individuals), the method requires a vector of length m, hm, with j the entry of the j m-j ​ ​ vector hm,j=x (1-x)​ where x is the allele frequency in the modern population. However, we ​ ​ ​ ​ require large sample sizes to estimate x well. We replace this vector with its posterior expectation using the beta prior. Specifically, suppose that as before a site is observed k times out of n ​i ​i haplotypes and that we have maximum likelihood estimates αˆ and βˆ of the underlying beta distribution of allele frequencies. Then, the posterior distribution of the allele frequency at the ˆ ˆ site is also beta distributed with new parameters αi = α + ki and βi = β + ni-ki. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ We can then use this vector to compute the posterior expected value of each entry in h ​m Γ(α +j)Γ(β +m−j)Γ(α +β ) E(h |k , n ) = i i i i . m,j i i Γ(αi)Γ(βi)Γ(αi+βi+m) We fit a beta prior to all segregating sites with at least 15 haplotypes present; the maximum likelihood estimates of α and β are found in Table S14. In all cases, we see β > α, indicating a skew toward rare alleles, as expected. In addition, we applied a coverage filter to the ancient genome, excluding sites in the lower and upper 2.5% tails of the coverage distribution. We then used the likelihood ratio test described in (32) to compute maximum likelihood estimated drift times for the Taino and four modern admixed Latino populations from 1000 Genomes (76). The ​ ​ results are shown in Table S15, alongside 95% confidence intervals from 100 bootstrap replicates.

17. Admixture Graphs We used qpGraph (113) to explore different models of population history that might ​ ​ accommodate the patterns of allele frequencies we observe in our dataset. As starting point, we chose a simplified version of the model proposed by Reich et al. (6), which depicts the ​ ​ relationships between 16 selected Native American populations with entirely Native American ancestry along with two outgroups (Yoruba and Han). We then used the admixturegraph R ​ package (114) to identify all the possible branching points for both the Taino and modern Puerto ​ Ricans and fitted all extended graphs using qpGraph (113). Figure 3B shows a model that is a ​ ​ good fit to the data in the sense that none of the predicted f-statistics are more than three standard ​ ​ errors from what is observed (max|Z|=2.6). In this model, both the Taino and masked Puerto Ricans form a clade that branches off the main South American lineage. Two other models where the Taino and Puerto Ricans are fitted elsewhere on the South American lineage present an equally good fit (Fig. S19A and B). By contrast, a model where Puerto Ricans are added as direct descendants of the Tainos does not fit the data as we observe several f-statistics predicted by the ​ ​ model that are more than three standard errors from what is observed (max|Z|=13.9; Fig. S19D).

References

1. Rouse I (1993) The Tainos: Rise and decline of the people who greeted Columbus (Yale University ​ ​ Press). 2. Keegan WF, Hofman CL (2017) The Caribbean before Columbus (Oxford University Press). ​ ​ 3. Hulme P (1993) Making sense of the native Caribbean. New West Indian Guide 67(3/4):189–220. ​ ​ 12

4. Wilson SM (1993) The cultural mosaic of the indigenous Caribbean. Proc Br Acad 81:37–66. ​ ​ 5. Greenberg JH (1987) Language in the Americas (Stanford University Press). ​ ​ 6. Reich D, et al. (2012) Reconstructing Native American population history. Nature ​ 488(7411):370–374. 7. Achilli A, et al. (2013) Reconciling migration models to the Americas with the variation of North American native mitogenomes. Proc Natl Acad Sci USA 110(35):14308–14313. ​ ​ 8. Bolnick DAW, Shook BAS, Campbell L, Goddard I (2004) Problematic use of Greenberg’s linguistic classification of the Americas in studies of Native American genetic variation. Am J Hum Genet ​ 75(3):519–522. 9. Campbell L (2000) American Indian Languages: The Historical Linguistics of Native America ​ (Oxford University Press). 10. Campbell L, Grondona V (2012) The Indigenous Languages of South America: A Comprehensive ​ Guide (Walter de Gruyter). ​ 11. Aikhenvald AY (2012) Languages of the Amazon (OUP Oxford). ​ ​ 12. Craton M (1986) A History of the Bahamas (San Salvador Press, Waterloo). ​ ​ 13. Keegan WF (1992) The people who discovered Columbus: the prehistory of the Bahamas (University ​ ​ Press of , Gainesville, FL). 14. Berman MJ, Gnivecki PL (1995) The colonization of the Bahama archipelago: A reappraisal. World ​ Archaeol 26(3):421–441. ​ 15. Schaffer WC, Carr RS, Day JS, Pateman MP (2012) Lucayan-Taíno burials from Preacher’s cave, Eleuthera, Bahamas. Int J Osteoarchaeol 22(1):45–69. ​ ​ 16. Keegan WF (1982) Lucayan Cave Burials from the Bahamas. J New World Archaeol 5:57–65. ​ ​ 17. Pateman MP (2007) Reconstructing Lucayan mortuary practices through skeletal analysis. J Baham ​ Hist Soc 29:5–10. ​ 18. Schaffer WC (2015) A reappraisal of prehistoric human skeletal remains from the Bahamas housed at the Yale Peabody Museum of Natural History. J Carib Archaeol 15:134–156. ​ ​ 19. Pestle WJ, Colvard M (2012) Bone collagen preservation in the tropics: a case study from ancient Puerto Rico. J Archaeol Sci 39:2079–2090. ​ ​ 20. Brock F, et al. (2010) Current pretreatment methods for AMS radiocarbon dating at the Oxford Radiocarbon Accelerator Unit (ORAU). Radiocarbon 52(1):103–112. ​ ​ 21. Zazzo A (2014) Bone and enamel carbonate diagenesis: A radiocarbon prospective. Palaeogeogr ​ Palaeoclimatol Palaeoecol 416:168–178. ​ 22. Sereno PC, et al. (2008) Lakeside cemeteries in the Sahara: 5000 years of population and environmental change. PLoS One 3(8):e2995. ​ ​ 23. Surovell TA (2000) Radiocarbon dating of bone apatite by step heating. Geoarchaeol 15(6):591–608. ​ ​ 24. Cherkinsky A (2009) Can we get a good radiocarbon age from“ bad bone”? Determining the reliability of radiocarbon age from bioapatite. Radiocarbon 51(2):647–655. ​ ​ 25. Hedges RM, Lee-Thorp JA, Tuross NC (1995) Is tooth enamel carbonate a suitable material for radiocarbon dating? Radiocarbon 37:285–290. ​ ​ 26. Hopkins RJA, Snoeck C, Higham TFG (2016) When Dental Enamel is Put to the Acid Test: Pretreatment Effects and Radiocarbon Dating. Radiocarbon FirstView:1–12. ​ ​ 27. Motuzaite-Matuzeviciute G, Staff RA, Hunt HV, Liu X, Jones MK (2013) The early chronology of broomcorn millet (Panicum miliaceum) in Europe. Antiquity 87(338):1073–1085. ​ ​ 28. Bronk Ramsey C, Higham T, Bowles A, Hedges R (2004) Improvements to the Pretreatment of Bone at Oxford. Radiocarbon 46(01):155–163. ​ ​ 29. Reimer PJ, Bard E, Bayliss A, Beck JW (2013) IntCal13 and Marine13 radiocarbon age calibration curves 0-50,000 years cal BP. Radiocarbon 55(4):1869–1887. ​ ​ 30. Bronk Ramsey C (2009) Bayesian analysis of radiocarbon dates. Radiocarbon 51(1):337–360. ​ ​

13

31. Bentley RA (2006) Strontium isotopes from the earth to the archaeological skeleton: a review. J ​ Archaeol Meth Theor 13:135–187. ​ 32. Lee-Thorp J, Sponheimer M (2003) Three case studies used to reassess the reliability of fossil bone and enamel isotope signals for paleodietary studies. J Anthropol Archaeol 22:208–216. ​ ​ 33. Price TD, Burton JH, Bentley RA (2002) The Characterization of Biologically Available Strontium Isotope Ratios for the Study of Prehistoric Migration. Archaeometry 44(1):117–135. ​ ​ 34. Laffoon JE, Davies GR, Hoogland M (2012) Spatial variation of biologically available strontium 87 86 isotopes ( S​ r/ S​ r) in an archipelagic setting: a case study from the Caribbean. J Archaeol Sci ​ ​ ​ 39:2371–2384. 35. Bataille CP, Laffoon J, Bowen GJ (2012) Mapping multiple source effects on the strontium isotopic signatures of ecosystems from the circum Caribbean region. Ecosphere 3:art118. - ​ ​ 36. Schroeder H, O’Connell TC, Evans JA (2009) Trans Atlantic slavery: Isotopic evidence for forced - migration to Barbados. Am J Phys Anthropol 139:547–557. ​ ​ 37. Dansgaard W (1964) Stable isotopes in precipitation. Tell’Us 16(4):436–468. ​ ​ 38. Longinelli A (1984) Oxygen isotopes in mammal bone phosphate: a new tool for paleohydrological and paleoclimatological research? Geochim Cosmochim Acta 48:385–390. ​ ​ 39. Luz B, Kolodny Y, Horowitz M (1984) Fractionation of oxygen isotopes between mammalian bone-phosphate and environmental drinking water. Geochim Cosmochim Acta 48:1689–1693. ​ ​ 40. Millard AR, Schroeder H (2010) “True British sailors”: a comment on the origin of the men of the Mary Rose. J Archaeol Sci 37:680–682. ​ ​ 41. Pollard AM, Pellegrini M, Lee-Thorp JA (2011) Technical note: some observations on the conversion 18 18 of dental enamel δ O​ (p) values to δ O​ (w) to determine human mobility. Am J Phys Anthropol ​ ​ ​ 145(3):499–504. 42. Lightfoot E, O’Connell TC (2016) On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations. PLoS One 11(4):e0153850. ​ ​ 43. Tauber H (1981) 13C evidence for dietary habits of prehistoric man in Denmark. Nature ​ 292(5821):332–333. 13 44. van der Merwe NJ, Vogel JC (1978) C​ content of human collagen as a measure of prehistoric diet in ​ woodland North America. Nature 276(5690):815–816. ​ ​ 45. Bocherens H, Sandrock O, Kullmer O (2011) Hominin palaeoecology in Late Pliocene Malawi: First 13 18 insights from isotopes ( C​ , O​ ) in mammal teeth. South Afr J Sci 107:1–6. ​ ​ ​ ​ 46. Laffoon JE, Rojas RV, Hofman CL (2013) Oxygen and carbon isotope analysis of human dental enamel from the Caribbean: Implications for investigating individual origins. Archaeometry ​ 55(4):742–765. 87 86 47. Hess J, Bender ML, Schilling JG (1986) Seawater S​ r/ S​ r evolution from Cretaceous to ​ ​ present-Applications to paleoceanography. Science. ​ ​ 48. Ostapkowicz J, Ramsey CB, Brock F (2013) Birdmen, cemís and duhos: material studies and AMS 14 C​ dating of Pre-Hispanic Caribbean wood sculptures in the British Museum. J Archaeol Sci ​ 40:4675–4687. 49. White CD, Spence MW, Longstaffe FJ, Stuart-Williams H, Law KR (2002) Geographic Identities of the Sacrificial Victims from the Feathered Serpent Pyramid, Teotihuacan: Implications for the Nature of State Power. Lat Am Ant 13(2):217–236. ​ ​ 50. Stokes AV (1998) A biogeographic survey of prehistoric human diet in the West Indies using stable isotopes. Dissertation (University of Florida Gainesville, FL). 51. Loftus E, Sealy J (2012) Technical note: interpreting stable carbon isotopes in human tooth enamel: an examination of tissue spacings from South Africa. Am J Phys Anthropol 147(3):499–507. ​ ​ 52. Damgaard PB, et al. (2015) Improving access to endogenous DNA in ancient bones and teeth. Sci ​

14

Rep 5:11184. ​ 53. Allentoft ME, et al. (2015) Population genomics of Bronze Age Eurasia. Nature 522(7555):167–172. ​ ​ 54. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010(6):db.prot5448. ​ ​ 55. Kircher M, Sawyer S, Meyer M (2012) Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res 40(1):e3. ​ ​ 56. Brace S, Turvey ST, Weksler M, Hoogland MLP, Barnes I (2015) Unexpected evolutionary diversity in a recently extinct Caribbean mammal radiation. Proc Biol Sci 282(1807):20142371. ​ ​ 57. Schroeder H, et al. (2015) Genome-wide ancestry of 17th-century enslaved Africans from the Caribbean. Proc Natl Acad Sci USA 112(12):3669–3673. ​ ​ 58. Kehlmaier C, et al. (2017) Tropical ancient DNA reveals relationships of the extinct Bahamian giant tortoise Chelonoidis alburyorum. Proc Biol Sci 284(1846). ​ ​ 59. Allentoft ME, et al. (2012) The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc Biol Sci 279(1748):4724–4733. ​ ​ 60. Pedersen JS, et al. (2014) Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome. Genome Res 24(3):454–466. ​ ​ 61. Deagle BE, Eveson JP, Jarman SN (2006) Quantification of damage in DNA recovered from highly degraded samples--a case study on DNA in faeces. Front Zool 3:11. ​ ​ 62. Korneliussen TS, Albrechtsen A, Nielsen R (2014) ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15:356. ​ ​ 63. Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055):69–87. ​ ​ 64. Meyer M, et al. (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338(6104):222–226. ​ 65. Raghavan M, et al. (2014) Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505(7481):87–91. ​ ​ 66. Rasmussen M, et al. (2014) The genome of a human from a Clovis burial site in western Montana. Nature 506(7487):225–229. ​ ​ 67. Hofreiter M, Serre D, Poinar HN, Kuch M, Pääbo S (2001) Ancient DNA. Nat Rev Genet ​ 2(5):353–359. 68. Dabney J, Meyer M, Pääbo S (2013) Ancient DNA damage. Cold Spring Harb Perspect Biol 5(7), ​ ​ ​ a012567. ​ 69. Briggs AW, et al. (2009) Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325(5938):318–321. ​ 70. Sawyer S, Krause J, Guschanski K, Savolainen V, Pääbo S (2012) Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS One 7(3):e34131. ​ ​ 71. Jónsson H, et al. (2013) mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29(13):1682–1684. ​ ​ 72. Fu Q, et al. (2013) A revised timescale for human evolution based on ancient mitochondrial genomes. Curr Biol 23(7):553–559. ​ 73. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. ​ ​ 74. Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1):7–11. ​ ​ 75. Racimo F, Renaud G, Slatkin M (2016) Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans. PLoS Genet 12(4):e1005972. ​ ​ 76. 1000 Genomes Project Consortium, et al. (2015) A global reference for human genetic variation. Nature 526(7571):68–74. ​

15

77. Skoglund P, Storå J, Götherström A, Jakobsson M (2013) Accurate sex identification of ancient human remains using DNA shotgun sequencing. J Archaeol Sci 40(12):4477–4482. ​ ​ 78. Spradley MK, Jantz RL (2011) Sex estimation in forensic anthropology: skull versus postcranial elements. J Forensic Sci 56(2):289–296. ​ ​ 79. Li M, Schroeder R, Ko A, Stoneking M (2012) Fidelity of capture-enrichment for mtDNA genome sequencing: influence of NUMTs. Nucleic Acids Res 40(18):e137. ​ ​ 80. Andrews RM, et al. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23(2):147. ​ ​ 81. Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21):2987–2993. ​ ​ 82. Weissensteiner H, et al. (2016) HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res 44(W1):W58–63. ​ ​ 83. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol ​ 24(8):1586–1591. 84. Soares P, et al. (2009) Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet 84(6):740–759. ​ ​ 85. Conrad O, et al. (2015) System for automated geoscientific analyses (SAGA) v. 2.1. 4. Geoscient ​ Mod Develop 8(7):1991–2007. ​ 86. Bryc K, et al. (2010) Genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc Natl Acad Sci USA 107(Supplement 2):8954–8961. ​ ​ 87. Wilson JL, Saint-Louis V, Auguste JO, Jackson BA (2012) Forensic analysis of mtDNA haplotypes from two rural communities in Haiti reflects their population history. J Forensic Sci ​ 57(6):1457–1466. 88. Benn Torres J, et al. (2015) Genetic Diversity in the Lesser Antilles and Its Implications for the Settlement of the Caribbean Basin. PLoS One 10(10):e0139192. ​ ​ 89. Vilar MG, et al. (2014) Genetic diversity in Puerto Rico and its implications for the peopling of the Island and the West Indies. Am J Phys Anthropol 155(3):352–368. ​ ​ 90. Monsalve MV, Hagelberg E (1997) Mitochondrial DNA polymorphisms in Carib people of Belize. Proc Biol Sci 264(1385):1217–1224. ​ 91. Mendizabal I, et al. (2008) Genetic origin, admixture, and asymmetry in maternal and paternal human lineages in Cuba. BMC Evol Biol 8:213. ​ ​ 92. Madrilejo N, Lombard H, Torres JB (2015) Origins of marronage: Mitochondrial lineages of Jamaica’s Accompong Town Maroons. Am J Hum Biol 27(3):432–437. ​ ​ 93. Tajima A, et al. (2004) Genetic background of people in the Dominican Republic with or without obese type 2 diabetes revealed by mitochondrial DNA polymorphism. J Hum Genet 49(9):495–499. ​ ​ 94. Mendisco F, et al. (2015) Where are the Caribs? Ancient DNA from ceramic period human remains in the Lesser Antilles. Philos Trans R Soc Lond B Biol Sci 370(1660):20130388. ​ ​ 95. Lalueza-Fox C, Calderón F, Calafell F, Morera B, Bertranpetit J (2001) MtDNA from extinct Tainos and the peopling of the Caribbean. Ann Hum Genet 65(2):137–151. ​ ​ 96. Lalueza-Fox C, et al. (2003) Mitochondrial DNA from pre-Columbian Ciboneys from Cuba and the prehistoric colonization of the Caribbean. Am J Phys Anthropol 121(2):97–108. ​ ​ 97. Raghavan M, et al. (2015) Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349(6250):aab3884. ​ ​ 98. Moreno-Estrada A, et al. (2014) The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 344(6189):1280–1285. ​ ​ 99. Delaneau O, Marchini J, Zagury J-F (2012) A linear complexity phasing method for thousands of genomes. Nat Methods 9(2):179–181. ​ ​ 100. Maples BK, Gravel S, Kenny EE, Bustamante CD (2013) RFMix: a discriminative modeling ​

16

approach for rapid and robust local-ancestry inference. Am J Hum Genet 93(2):278–288. ​ ​ 101. Reich D, et al. (2012) Reconstructing Native American population history. Nature ​ ​ 488(7411):370–374. 102. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet ​ ​ 2(12):e190. 103. Kirin M, et al. (2010) Genomic runs of homozygosity record population history and consanguinity. ​ PLoS One 5(11):e13996. ​ 104. Pemberton TJ, et al. (2012) Genomic patterns of homozygosity in worldwide human populations. Am ​ ​ J Hum Genet 91(2):275–292. ​ 105. Mallick S, et al. (2016) The Simons Genome Diversity Project: 300 genomes from 142 diverse ​ populations. Nature 538(7624):201–206. ​ ​ 106. Pagani L, et al. (2016) Genomic analyses inform on migration events during the peopling of Eurasia. ​ Nature 538(7624):238–242. ​ 107. Chang CC, et al. (2015) Second-generation PLINK: rising to the challenge of larger and richer ​ datasets. Gigascience 4:7. ​ ​ 108. Gamba C, et al. (2014) Genome flux and stasis in a five millennium transect of European prehistory. ​ Nat Commun 5:5257. ​ 109. Palamara PF, Lencz T, Darvasi A, Pe’er I (2012) Length distributions of identity by descent reveal ​ fine-scale demographic history. Am J Hum Genet 91(5):809–822. ​ ​ 110. Felsenstein J (1971) Inbreeding and variance effective numbers in populations with overlapping generations. Genetics 68(4):581–597. ​ ​ 111. Browning SR, Browning BL (2015) Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. Am J Hum Genet 97(3):404–418. ​ ​ 112. Schraiber J (2017) Assessing the relationship of ancient and modern populations. bioRxiv. ​ ​ Available at: http://biorxiv.org/content/early/2017/03/04/113779.abstract. ​ ​ 113. Patterson N, et al. (2012) Ancient admixture in human history. Genetics 192(3):1065–1093. ​ ​ 114. Leppälä K, Nielsen SV, Mailund T (2017) admixturegraph: an R package for admixture graph manipulation and fitting. Bioinformatics 33(11):1738–1740. ​ ​ 115. Lawson DJ, Hellenthal G, Myers S, Falush D (2012) Inference of population structure using dense haplotype data. PLoS Genet 8(1):e1002453. ​ ​ 116. R. S. Carr, W. C. Schaffer, J. B. Ransom and M. P. Pateman (2012) Ritual Cave Use in the Bahamas. Sacred Darkness: A Global Perspective on the Ritual Use of Caves, ed Moyes H ​ ​ (University of Colorado Press, Boulder, CO). 117. Olalde I, et al. (2015) A Common Genetic Origin for Early Farmers from Mediterranean Cardial and Central European LBK Cultures. Mol Biol Evol 32(12):3132–3142. ​ ​ 118. Sánchez-Quinto F, et al. (2012) Genomic affinities of two 7,000-year-old Iberian hunter-gatherers. Curr Biol 22(16):1494–1499. ​ 119. Rasmussen M, et al. (2015) The ancestry and affiliations of Kennewick Man. Nature ​ 523(7561):455–458.

17

Figure S1. Map showing the location of Preacher’s Cave on the island of Eleuthera in the ​ Bahamas.

18

Figure S2. Radiocarbon date for specimen PC537. The probability distribution of the radiocarbon date of 1082 ± 29 BP at 2σ is shown in red. The calibration curve for the Northern hemisphere IntalCal13 (29) is shown in blue and the calibrated (posterior) probability distribution ​ giving an age of 894-930 calAD (28.3%) and 938-1017 calAD (67.1%) for the 95.4 % confidence interval is shown in black.

19

Figure S3. Comparison of enamel radiocarbon date (in green) to modelled ages (black) ​ assuming 20, 40, 60, 80 and 100 year offset between the enamel date and the true age of the sample.

20

Figure S4. New calculated age estimation for PC537 using a sum of all radiocarbon ages ​ between 1083 ± 29 and 1183 ± 29 to give a more reliable calibrated age approximation of PC537’s ‘true’ age at 95.4% confidence interval.

21

Figure S5. Declining part of the read length distribution for sample PC537 from Preacher’s ​ Cave, Eleuthera, fitted to an exponential curve.

22

Figure S6. DNA fragment length distributions for five samples from Preacher’s Cave. ​

23

Figure S7. DNA fragmentation and nucleotide misincorporation patterns for five samples from ​ Preacher’s Cave.

24

Figure S8. Phylogeny of Native American haplogroup B2, showing the geographical distribution ​ of subclades in the Americas and coalescent ages in ky calculated based on the calibrated mutation rate of Soares et al. (84). ​ ​

25

Figure S9. Interpolated map showing the frequency distribution of mtDNA haplogroup B2. ​

26

Figure S10. PCA plot showing the genetic affinities of the Lucayan-Taino individual with South American populations. PC1 and PC2 have been switched to mirror the geographic location of the sample. The populations have been colored by language group: Je-Tupi-Carib (yellow), Macro-Arawakan (blue), Mataco-Guaicuru (purple), Mayan (red), Oto-Manguean (green), Uto-Aztecan (teal), Totozoquean (dark brown), Quechumaran (light brown).

27

Figure S11. Admixture proportions for the ancient Taino and a set of Native American populations (6) for which segments of European and African ancestry had been masked. Each ​ panel shows the estimated ancestry proportions from K=2 to K=14. Each sample is represented by a vertical bar, and the colors indicate ancestry from different clusters.

28

Figure S12. D-statistics of the form D(Yoruba, X; Mixe, Surui/Taino/Clovis) showing that the ​ ​ ​ ​ ​ ​ ancient Taino and Clovis genomes do not share the same excess affinity with the Onge/Papuans as the Surui. Thick and thin whiskers represent 1 and 3 standard errors, respectively.

29

Figure S13. Admixture proportions for the ancient Taino, seven populations from the 1000 Genomes Project (including Puerto Ricans) and a set of Native American populations (6). Each panel shows the estimated ancestry proportions from K=2 to K=14. Each sample is represented by a vertical bar, and the colors indicate different ancestry components.

30

Figure S14. Admixture proportions for the ancient Taino, four populations from the 1000 Genomes Project (including Puerto Ricans) and a set of Native American populations (6) after the three first (continental) ancestry components have been removed and the rest normalized to 1. Each panel shows the estimated ancestry proportions from K*=2 to K*=14 where K* represents the number of clusters minus the three first clusters. Each sample is represented by a vertical bar, and the colors indicate different ancestry components.

31

Figure S15. Average admixture proportions for the ancient Taino, four populations from the 1000 Genomes Project (including Puerto Ricans) and a set of Native American populations (6) after the three first (continental) ancestry components have been removed and the rest normalized to sum to 1. Each panel shows the estimated ancestry proportions from K*=2 to K*=14 where K* represents the number of clusters minus the three first clusters. Each population/language group is represented by a vertical bar, and the colors indicate different ancestry components.

32

Figure S16. D-statistics testing specific hypotheses regarding the relationship of the Taino and ​ ​ modern Puerto Ricans. (A) D-statistics of the form D(YRI, Taino; PUR, X) testing whether a ​ ​ ​ ​ ​ ​ ​ ​ second population in our dataset is as closely related to the native American component in modern Puerto Ricans than the Taino (B) D-statistics of the form D(YRI, X; Taino, PUR) testing ​ ​ ​ ​ ​ ​ whether any gene-flow occurred after the divergence of the ancestors of the Taino and modern Puerto Ricans. Thick and thin whiskers represent 1 and 3 standard errors, respectively.

33

Figure S17. Estimated length distribution of ROH for 111 Native American and northeast ​ Siberian genomes, including two ancient individuals . Total length of short to medium (<1.6 Mb) ROH plotted against total length of long (≥1.6 Mb) ROH, as defined by (104). ​ ​

34

Figure S18. Two models that fit the data equally well (A and B) and two that do not fit the data in that the predicted f-statistics are more than three standard errors from what is observed (C and ​ ​ D). Drift values are shown in units proportional to FST × 1,000. ​ ​ ​

35

Figure S19. Expected total length of chunk donations in 111 present-day Puerto Ricans from the ​ 1000 Genomes Project (76) as revealed by ChromoPainter analysis (115). YRI=Yoruba, ​ ​ ​ ​ IBS=Iberians. Horizontal bars mark mean values plus or minus the standard deviation.

36

Table S1. List of calibrated radiocarbon dates human bone, charcoal and shell from Preacher’s ​ Cave, Eleuthera, Bahamas.

Beta # Material Location Median date cal. AD Source 218509 Burnt shell ST N110/E240 1560 (116) 218512 Triton shell ST 39 N207/E507 640 (116) 218517 Charred cob Unit 9 1481; 1595 (116) 518519 Charcoal Cremation deposit 1295; 1365 (116) 218520 Charcoal Feature 44, Unit 16 1540 (116) 220176 Charcoal Cremation deposit 1490; 1605 (116) 242393 Tellin shell Burial 3 1250 (15, 116) 242394 Triton shell Burial 3 1335 (15, 116) 260751 Human bone Burial 1 320 (15, 116) 260752 Human bone Burial 2 910 (15, 116) 260753 Human bone Burial 3 1070; 1190 (15, 116)

37

Table S2. Radiocarbon dating results for PC537. ​ 13 OxA No. Sample Date BP Age (cal BP) Age (cal AD) δ C​ (‰) ​ 1056-1021 (28.3%) 894-930 (28.3%) OxA-X-2623-21 PC537 1082 ± 29 -8.3 1012-933 (67.1%) 938-1017 (67.1%)

38

Table S3. Results of strontium, carbon, and oxygen isotope analyses of human dental enamel ​ from the Preacher’s Cave site, Eleuthera, The Bahamas.

87 86 13 18 Sample ID Element Sr/​ Sr​ δ C​ ‰ (VPDB) δ O​ ‰ (VPDB) ​ ​ ​ PC8 premolar 0.70917 -5.6 -2.7 PC11 premolar 0.70915 -10.5 -2.0 PC107 molar 0.70916 -11.2 -2.6 PC537 incisor 0.70915 -8.3 -3.3 PC549 molar 0.70914 -11.5 -2.2 PC562 molar 0.70915 -11 -4.4 PC683 premolar 0.70917 -9.6 -3.6

39

Table S4. Low-coverage screening results for the five tooth samples from Preacher’s Cave, ​ Eleuthera.

endognous sample # total reads unique reads read length content clonality 5pC>T PC8 2,999,961 3,782 48.0 bp 0.2 % 0.2 % 27.4% PC115 2,799,607 13,076 61.9 bp 0.6 % 0.9 % 31.5% PC537 3,726,183 477,966 67.5 bp 13.2 % 0.3 % 16.2% PC549 2,845,606 10,496 63.1 bp 0.5 % 0.1 % 28.6% PC683 4,435,818 16,745 54.9 bp 0.4 % 0.2 % 23.1%

40

Table S5. Deep-sequencing results for sample PC537. ​ nuclear DNA mtDNA fragment fragment 7 coverage1 depth2 length3 5′ C>T4 coverage5 depth6 length 5′ C>T8 ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 88.5% 12.4× 73.9 16% 98.5% 167× 99.7 10%

1 2 3 Nuclear​ genome coverage in percent; average​ depth of coverage of the nuclear genome; average​ length of nuclear DNA 4 ​ 5 ​ 6 fragments; fraction​ of nuclear reads showing 5p C>T damage; mtDNA​ genome coverage in percent; average​ depth of coverage ​ 7 ​8 ​ of the mtDNA genome; average​ length of mtDNA fragments; fraction​ of mtDNA reads showing 5p C>T damage. ​ ​

41

Table S6. Comparison of DNA decay in genomic data from samples from different ​ environments. Approximate ages and estimated burial temperatures are listed. Lambda (λ) is the DNA damage fraction (per site) estimated directly from the declining part of the observed read length distribution. The decay rate (k, per site per year) can be obtained by dividing λ by the age ​ ​ of the sample. The fragment length (1/λ) is the theoretical average DNA fragment length in the extract. Molecular half-lives for 100 bp fragments are calculated as in (59). ​ ​ Age Fragment Half-life, Genome (cal BP) Temp (C) λ k length (1/λ) 100 bp Reference Taino 1,000 20 0.016 1.60-05 63 434 This study ​ STM2 340 25 0.014 4.12-05 71 640 (57) ​ CB13 7,400 15 0.027 3.65-06 37 1,900 (117) ​ La Brana 7,500 10 0.033 4.40-06 30 1,576 (118) ​ Kennewick 9,000 12 0.017 1.89-06 59 3,670 (119) ​ Anzick 12,800 5 0.018 1.41-06 56 4,916 (66) ​

42

Table S7. Native American reference populations used in this study. ​

Population N Language group SNP array Reference Aleutian 8 Eskimo–Aleut Illumina 650Y (11) Algonquin 5 Algic Illumina 610-Quad (1) Arara 1 Je–Tupi–Carib Illumina 610-Quad (1) Arhuaco 5 Macro-Chibchan Illumina 610-Quad (1) Aymara 23 Quechumaran Illumina 610-Quad (1) Bribri 4 Macro-Chibchan Illumina 610-Quad (1) Cabecar 31 Macro-Chibchan Illumina 610-Quad (1) Chane 2 Macro-Arawakan Illumina 610-Quad (1) Chilote 8 Indo-European Illumina 610-Quad (1) Chipewyan 15 Na-Dene Illumina 610-Quad (1) Chono 4 Unclassified Illumina 610-Quad (1) Chorotega 1 Oto-Manguean Illumina 610-Quad (1) Cree 4 Algic Illumina 610-Quad (1) Diaguita 5 Unclassified Illumina 610-Quad (1) Embera 5 Chocoan Illumina 610-Quad (1) Guahibo 6 Macro-Arawakan Illumina 610-Quad (1) Guarani 6 Je–Tupi–Carib Illumina 610-Quad (1) Guaymi 5 Macro-Chibchan Illumina 610-Quad (1) Huetar 1 Macro-Chibchan Illumina 610-Quad (1) Hulliche 4 Araucanian (isolate) Illumina 610-Quad (1) Inga 9 Quechumaran Illumina 610-Quad (1) Inuit (EG) 7 Eskimo–Aleut Illumina 650Y (11) Inuit (WG) 8 Eskimo–Aleut Illumina 650Y (11) Jamamadi 1 Macro-Arawakan Illumina 610-Quad (1) Kaingang 2 Je–Tupi–Carib Illumina 610-Quad (1) Kaqchikel 13 Mayan Illumina 610-Quad (1) Karitiana 13 Je–Tupi–Carib Illumina 650Y (2) Kogi 4 Macro-Chibchan Illumina 610-Quad (1) Maleku 3 Macro-Chibchan Illumina 610-Quad (1) Maya 49 Mayan Illumina 650Y/HumanHap550 (2, 12) Mixe 17 Totozoquean Illumina 610-Quad (1) Mixtec 5 Oto-Manguean Illumina 610-Quad (1) Ojibwa 5 Algic Illumina 610-Quad (1) Palikur 3 Macro-Arawakan Illumina 610-Quad (1) Parakana 1 Je–Tupi–Carib Illumina 610-Quad (1) Piapoco 7 Macro-Arawakan Illumina 650Y (2) Pima 33 Uto-Aztecan Illumina 650Y (2) Purepecha 1 Purepecha (isolate) Illumina 610-Quad (1) Quechua 40 Quechumaran Illumina 610-Quad (1) Surui 24 Je–Tupi–Carib Illumina 650Y (2) Tepehuano 25 Uto-Aztecan Illumina HumanHap550 (12) Teribe 3 Macro-Chibchan Illumina 610-Quad (1) Ticuna 6 Ticuna–Yuri (isolate) Illumina 610-Quad (1) Toba 4 Mataco–Guaicuru Illumina 610-Quad (1) Waunana 3 Chocoan Illumina 610-Quad (1) Wayuu 11 Macro-Arawakan Illumina 610-Quad (1) Wichi 5 Mataco–Guaicuru Illumina 610-Quad (1) Yaghan 4 Yaghan (isolate) Illumina 610-Quad (1) Yaqui 1 Uto-Aztecan Illumina 610-Quad (1) Zapotec 43 Oto-Manguean Illumina HumanHap550 (12)

43

Table S8. Outgroup f3-statistics testing f3(Yoruba; Taino, X). ​ ​ ​ ​ ​ ​ ​ ​ ​

Population A Population B Outgroup f3-statistic std error Z-score SNPs ​ ​ ​ Aleutian Taino Yoruba 0.2460 0.0034 71.4 7904 Algonquin Taino Yoruba 0.2612 0.0020 129.4 20116 Arara Taino Yoruba 0.2931 0.0024 121.1 23781 Arhuaco Taino Yoruba 0.2855 0.0022 132.1 20945 Aymara Taino Yoruba 0.2874 0.0018 160.9 15680 Bribri Taino Yoruba 0.2866 0.0021 137.5 20384 Cabecar Taino Yoruba 0.2883 0.0020 146.5 17209 Chane Taino Yoruba 0.2902 0.0019 151.0 21747 Chilote Taino Yoruba 0.2831 0.0020 141.8 20249 Chipewyan Taino Yoruba 0.2590 0.0019 138.8 17059 Chono Taino Yoruba 0.2854 0.0023 125.9 20709 Chorotega Taino Yoruba 0.2689 0.0042 64.6 15053 Cree Taino Yoruba 0.2635 0.0022 122.2 18654 Diaguita Taino Yoruba 0.2834 0.0019 147.7 20185 EastGreenland Taino Yoruba 0.2490 0.0020 125.4 18538 Embera Taino Yoruba 0.2890 0.0020 145.6 19841 Guahibo Taino Yoruba 0.2921 0.0017 170.1 18846 Guarani Taino Yoruba 0.2904 0.0018 157.1 18635 Guaymi Taino Yoruba 0.2897 0.0021 135.1 20318 Huetar Taino Yoruba 0.2837 0.0037 77.6 13698 Hulliche Taino Yoruba 0.2874 0.0021 136.5 20568 Inga Taino Yoruba 0.2891 0.0018 158.6 18438 Jamamadi Taino Yoruba 0.2939 0.0024 121.4 23837 Kaingang Taino Yoruba 0.2898 0.0026 110.8 19964 Kaqchikel Taino Yoruba 0.2842 0.0018 162.3 16223 Karitiana Taino Yoruba 0.2910 0.0018 157.8 18997 Kogi Taino Yoruba 0.2870 0.0022 133.4 20993 Maleku Taino Yoruba 0.2873 0.0023 127.4 21885 Maya Taino Yoruba 0.2835 0.0019 151.5 16741 Mixe Taino Yoruba 0.2836 0.0018 155.2 16470 Mixtec Taino Yoruba 0.2821 0.0019 149.9 18775 Ojibwa Taino Yoruba 0.2660 0.0020 129.8 19595 Palikur Taino Yoruba 0.2939 0.0021 143.0 21301 Parakana Taino Yoruba 0.2918 0.0023 128.7 23453 Piapoco Taino Yoruba 0.2932 0.0018 158.9 18524 Pima Taino Yoruba 0.2795 0.0018 152.8 16226 Purepecha Taino Yoruba 0.2776 0.0028 97.7 16972 Quechua Taino Yoruba 0.2872 0.0018 162.9 14633 Surui Taino Yoruba 0.2941 0.0020 150.8 19199 Tepehuano Taino Yoruba 0.2804 0.0018 153.5 15404 Teribe Taino Yoruba 0.2893 0.0023 125.3 21125 Ticuna Taino Yoruba 0.2921 0.0018 163.9 19574 Toba Taino Yoruba 0.2902 0.0019 150.6 19774 Waunana Taino Yoruba 0.2875 0.0021 138.3 20804 Wayuu Taino Yoruba 0.2898 0.0019 150.8 17604 WestGreenland Taino Yoruba 0.2489 0.0021 117.5 18676 Wichi Taino Yoruba 0.2897 0.0020 142.8 19992 Yaghan Taino Yoruba 0.2866 0.0019 147.1 20994 Yaqui Taino Yoruba 0.2740 0.0030 91.3 15055 Zapotec Taino Yoruba 0.2823 0.0018 160.7 15185

44

Table S9. Outgroup f3-statistics testing f3(Yoruba; PUR, X). ​ ​ ​ ​ ​ ​ ​ ​ ​

Population A Population B Outgroup f3-statistic std error Z-score SNPs ​ ​ ​ Aleutian PUR Yoruba 0.2389 0.0040 59.7 1582 Algonquin PUR Yoruba 0.2568 0.0027 94.5 4158 Arara PUR Yoruba 0.2924 0.0031 95.4 4895 Arhuaco PUR Yoruba 0.2788 0.0028 101.2 4407 Aymara PUR Yoruba 0.2819 0.0025 114.9 3417 Bribri PUR Yoruba 0.2815 0.0025 111.0 4327 Cabecar PUR Yoruba 0.2829 0.0025 113.6 3747 Chane PUR Yoruba 0.2851 0.0027 104.6 4545 Chilote PUR Yoruba 0.2778 0.0026 107.4 4245 Chipewyan PUR Yoruba 0.2535 0.0023 108.5 3624 Chono PUR Yoruba 0.2806 0.0027 104.9 4404 Chorotega PUR Yoruba 0.2611 0.0041 64.1 3079 Cree PUR Yoruba 0.2571 0.0029 87.8 3796 Diaguita PUR Yoruba 0.2782 0.0027 103.4 4288 Inuit (EG) PUR Yoruba 0.2423 0.0025 97.0 3886 Embera PUR Yoruba 0.2832 0.0026 109.4 4206 Guahibo PUR Yoruba 0.2851 0.0026 108.0 4031 Guarani PUR Yoruba 0.2848 0.0025 114.1 4020 Guaymi PUR Yoruba 0.2843 0.0026 109.2 4329 Huetar PUR Yoruba 0.2759 0.0038 73.5 2883 Hulliche PUR Yoruba 0.2798 0.0026 107.4 4324 Inga PUR Yoruba 0.2831 0.0025 112.8 3980 Jamamadi PUR Yoruba 0.2838 0.0030 95.7 4862 Kaingang PUR Yoruba 0.2833 0.0031 92.3 4086 Kaqchikel PUR Yoruba 0.2794 0.0024 115.3 3525 Karitiana PUR Yoruba 0.2864 0.0026 108.5 4116 Kogi PUR Yoruba 0.2816 0.0026 109.3 4388 Maleku PUR Yoruba 0.2818 0.0027 106.2 4546 Maya PUR Yoruba 0.2789 0.0024 116.0 3649 Mixe PUR Yoruba 0.2786 0.0024 115.9 3565 Mixtec PUR Yoruba 0.2774 0.0024 114.0 4047 Ojibwa PUR Yoruba 0.2605 0.0026 100.1 4142 Palikur PUR Yoruba 0.2889 0.0027 108.3 4484 Parakana PUR Yoruba 0.2893 0.0030 96.5 4830 Piapoco PUR Yoruba 0.2842 0.0025 112.9 3994 Pima PUR Yoruba 0.2730 0.0024 112.9 3511 Purepecha PUR Yoruba 0.2737 0.0034 81.3 3465 Quechua PUR Yoruba 0.2809 0.0024 115.8 3212 Surui PUR Yoruba 0.2889 0.0026 109.5 4127 Taino PUR Yoruba 0.2930 0.0030 97.4 4188 Tepehuano PUR Yoruba 0.2755 0.0024 116.1 3365 Teribe PUR Yoruba 0.2833 0.0026 108.8 4429 Ticuna PUR Yoruba 0.2844 0.0025 112.6 4189 Toba PUR Yoruba 0.2826 0.0026 110.0 4230 Waunana PUR Yoruba 0.2852 0.0026 110.7 4399 Wayuu PUR Yoruba 0.2813 0.0025 114.2 3813 Inuit (WG) PUR Yoruba 0.2432 0.0027 88.8 3835 Wichi PUR Yoruba 0.2829 0.0025 111.6 4282 Yaghan PUR Yoruba 0.2806 0.0027 105.7 4398 Yaqui PUR Yoruba 0.2702 0.0037 72.9 3158 Zapotec PUR Yoruba 0.2774 0.0024 115.6 3437

45

Table S10. D-statistics testing D(Yoruba, Taino; Palikur, X). ​ ​ ​ ​ ​ ​ ​

H1 H2 H3 H4 D-statistic std error Z-score ​ ​ Yoruba Taino Palikur Aleutian -0.1160 0.007 -16.2 Yoruba Taino Palikur Algonquin -0.0835 0.004 -20.2 Yoruba Taino Palikur Arara -0.0053 0.005 -1.0 Yoruba Taino Palikur Arhuaco -0.0219 0.004 -5.2 Yoruba Taino Palikur Aymara -0.0176 0.003 -5.2 Yoruba Taino Palikur Bribri -0.0196 0.004 -4.7 Yoruba Taino Palikur Cabecar -0.0151 0.004 -4.3 Yoruba Taino Palikur Chane -0.0103 0.004 -2.5 Yoruba Taino Palikur Chilote -0.0279 0.004 -6.9 Yoruba Taino Palikur Chipewyan -0.0897 0.004 -25.5 Yoruba Taino Palikur Chono -0.0214 0.004 -5.4 Yoruba Taino Palikur Chorotega -0.0601 0.010 -6.3 Yoruba Taino Palikur Cree -0.0760 0.005 -16.6 Yoruba Taino Palikur Diaguita -0.0278 0.004 -7.1 Yoruba Taino Palikur Embera -0.0131 0.004 -3.5 Yoruba Taino Palikur Guahibo -0.0050 0.004 -1.3 Yoruba Taino Palikur Guarani -0.0095 0.003 -3.0 Yoruba Taino Palikur Guaymi -0.0115 0.004 -3.1 Yoruba Taino Palikur Huetar -0.0258 0.006 -4.1 Yoruba Taino Palikur Hulliche -0.0177 0.004 -4.8 Yoruba Taino Palikur Inga -0.0129 0.003 -3.8 Yoruba Taino Palikur Inuit (EG) -0.1116 0.004 -25.7 Yoruba Taino Palikur Inuit (WG) -0.1123 0.004 -28.0 Yoruba Taino Palikur Jamamadi 0.0002 0.005 0.04 Yoruba Taino Palikur Kaingang -0.0109 0.005 -2.2 Yoruba Taino Palikur Kaqchikel -0.0261 0.003 -7.5 Yoruba Taino Palikur Karitiana -0.0079 0.004 -2.3 Yoruba Taino Palikur Kogi -0.0186 0.004 -4.7 Yoruba Taino Palikur Maleku -0.0180 0.005 -3.8 Yoruba Taino Palikur Maya -0.0280 0.003 -8.3 Yoruba Taino Palikur Mixe -0.0277 0.003 -8.4 Yoruba Taino Palikur Mixtec -0.0317 0.004 -9.0 Yoruba Taino Palikur Ojibwa -0.0703 0.004 -17.3 Yoruba Taino Palikur Parakana -0.0057 0.005 -1.1 Yoruba Taino Palikur Piapoco -0.0020 0.004 -0.5 Yoruba Taino Palikur Pima -0.0384 0.004 -10.7 Yoruba Taino Palikur Purepecha -0.0398 0.006 -6.6 Yoruba Taino Palikur Quechua -0.0181 0.003 -5.7 Yoruba Taino Palikur Surui 0.0007 0.004 0.2 Yoruba Taino Palikur Tepehuano -0.0361 0.003 -11.4 Yoruba Taino Palikur Teribe -0.0125 0.004 -2.9 Yoruba Taino Palikur Ticuna -0.0049 0.004 -1.3 Yoruba Taino Palikur Toba -0.0100 0.004 -2.6 Yoruba Taino Palikur Waunana -0.0173 0.004 -4.4 Yoruba Taino Palikur Wayuu -0.0112 0.003 -3.4 Yoruba Taino Palikur Wichi -0.0114 0.004 -2.8 Yoruba Taino Palikur Yaghan -0.0198 0.004 -5.4 Yoruba Taino Palikur Yaqui -0.0537 0.007 -8.2 Yoruba Taino Palikur Zapotec -0.0311 0.003 -9.3

46

Table S11. D-statistics testing D(YRI, Taino; PUR, X). ​ ​ ​ ​ ​ ​ ​

H1 H2 H3 H4 D-statistic std error Z-score ​ ​ Yoruba Taino PUR Aleutian -0.1273 0.0108 -11.8 Yoruba Taino PUR Algonquin -0.0951 0.0065 -14.5 Yoruba Taino PUR Arara -0.0128 0.0077 -1.7 Yoruba Taino PUR Arhuaco -0.0326 0.0067 -4.9 Yoruba Taino PUR Aymara -0.0296 0.0050 -5.9 Yoruba Taino PUR Bribri -0.0336 0.0062 -5.5 Yoruba Taino PUR Cabecar -0.0281 0.0056 -5.0 Yoruba Taino PUR Chane -0.0190 0.0063 -3.0 Yoruba Taino PUR Chilote -0.0420 0.0061 -6.9 Yoruba Taino PUR Chipewyan -0.1012 0.0057 -17.7 Yoruba Taino PUR Chono -0.0347 0.0067 -5.1 Yoruba Taino PUR Chorotega -0.0766 0.0106 -7.2 Yoruba Taino PUR Cree -0.0871 0.0069 -12.6 Yoruba Taino PUR Diaguita -0.0377 0.0056 -6.7 Yoruba Taino PUR Inuit (EG) -0.1267 0.0061 -20.7 Yoruba Taino PUR Embera -0.0249 0.0058 -4.3 Yoruba Taino PUR Guahibo -0.0134 0.0053 -2.5 Yoruba Taino PUR Guarani -0.0209 0.0052 -4.0 Yoruba Taino PUR Guaymi -0.0241 0.0063 -3.8 Yoruba Taino PUR Huetar -0.0297 0.0099 -3.0 Yoruba Taino PUR Hulliche -0.0314 0.0061 -5.2 Yoruba Taino PUR Inga -0.0238 0.0054 -4.4 Yoruba Taino PUR Jamamadi -0.0143 0.0073 -2.0 Yoruba Taino PUR Kaingang -0.0255 0.0074 -3.4 Yoruba Taino PUR Kaqchikel -0.0382 0.0049 -7.8 Yoruba Taino PUR Karitiana -0.0155 0.0056 -2.8 Yoruba Taino PUR Kogi -0.0322 0.0063 -5.1 Yoruba Taino PUR Maleku -0.0300 0.0068 -4.4 Yoruba Taino PUR Maya -0.0394 0.0051 -7.7 Yoruba Taino PUR Mixe -0.0374 0.0051 -7.4 Yoruba Taino PUR Mixtec -0.0410 0.0055 -7.5 Yoruba Taino PUR Ojibwa -0.0760 0.0063 -12.1 Yoruba Taino PUR Palikur -0.0107 0.0059 -1.8 Yoruba Taino PUR Parakana -0.0185 0.0068 -2.7 Yoruba Taino PUR Piapoco -0.0116 0.0054 -2.2 Yoruba Taino PUR Pima -0.0495 0.0056 -8.8 Yoruba Taino PUR Purepecha -0.0572 0.0082 -7.0 Yoruba Taino PUR Quechua -0.0296 0.0050 -6.0 Yoruba Taino PUR Surui -0.0103 0.0057 -1.8 Yoruba Taino PUR Tepehuano -0.0490 0.0051 -9.6 Yoruba Taino PUR Teribe -0.0254 0.0061 -4.1 Yoruba Taino PUR Ticuna -0.0167 0.0055 -3.0 Yoruba Taino PUR Toba -0.0227 0.0058 -3.9 Yoruba Taino PUR Waunana -0.0300 0.0060 -5.0 Yoruba Taino PUR Wayuu -0.0229 0.0054 -4.3 Yoruba Taino PUR Inuit (WG) -0.1239 0.0064 -19.4 Yoruba Taino PUR Wichi -0.0238 0.0059 -4.1 Yoruba Taino PUR Yaghan -0.0314 0.0063 -5.0 Yoruba Taino PUR Yaqui -0.0640 0.0093 -6.9 Yoruba Taino PUR Zapotec -0.0437 0.0050 -8.8

47

Table S12. D-statistics testing D(YRI, X; Taino, PUR). ​ ​ ​ ​ ​ ​ ​

H1 H2 H3 H4 D-statistic std error Z-score ​ ​ Yoruba Aleutian Taino PUR 0.0017 0.0107 0.2 Yoruba Algonquin Taino PUR 0.0063 0.0059 1.1 Yoruba Arara Taino PUR 0.0113 0.0072 1.6 Yoruba Arhuaco Taino PUR -0.0021 0.0060 -0.3 Yoruba Aymara Taino PUR 0.0026 0.0046 0.6 Yoruba Bribri Taino PUR 0.0045 0.0053 0.8 Yoruba Cabecar Taino PUR 0.0028 0.0050 0.6 Yoruba Chane Taino PUR -0.0008 0.0056 -0.1 Yoruba Chilote Taino PUR 0.0052 0.0055 0.9 Yoruba Chipewyan Taino PUR 0.0010 0.0049 0.2 Yoruba Chono Taino PUR 0.0031 0.0066 0.5 Yoruba Chorotega Taino PUR 0.0047 0.0088 0.5 Yoruba Cree Taino PUR -0.0027 0.0063 -0.4 Yoruba Diaguita Taino PUR 0.0017 0.0056 0.3 Yoruba Inuit (EG) Taino PUR 0.0037 0.0057 0.7 Yoruba Embera Taino PUR 0.0007 0.0053 0.1 Yoruba Guahibo Taino PUR -0.0041 0.0049 -0.8 Yoruba Guarani Taino PUR 0.0009 0.0049 0.2 Yoruba Guaymi Taino PUR 0.0019 0.0057 0.3 Yoruba Huetar Taino PUR -0.0027 0.0096 -0.3 Yoruba Hulliche Taino PUR -0.0009 0.0056 -0.2 Yoruba Inga Taino PUR -0.0008 0.0049 -0.2 Yoruba Jamamadi Taino PUR -0.0103 0.0069 -1.5 Yoruba Kaingang Taino PUR -0.0024 0.0068 -0.3 Yoruba Kaqchikel Taino PUR 0.0042 0.0043 1.0 Yoruba Karitiana Taino PUR -0.0005 0.0053 -0.1 Yoruba Kogi Taino PUR 0.0042 0.0055 0.8 Yoruba Maleku Taino PUR 0.0020 0.0065 0.3 Yoruba Maya Taino PUR 0.0029 0.0046 0.6 Yoruba Mixe Taino PUR 0.0008 0.0047 0.2 Yoruba Mixtec Taino PUR 0.0023 0.0049 0.5 Yoruba Ojibwa Taino PUR -0.0046 0.0056 -0.8 Yoruba Palikur Taino PUR 0.0020 0.0057 0.3 Yoruba Parakana Taino PUR 0.0107 0.0068 1.6 Yoruba Piapoco Taino PUR -0.0098 0.0047 -2.1 Yoruba Pima Taino PUR -0.0017 0.0048 -0.3 Yoruba Purepecha Taino PUR 0.0100 0.0079 1.3 Yoruba Quechua Taino PUR -0.0004 0.0044 -0.1 Yoruba Surui Taino PUR 0.0002 0.0054 0.0 Yoruba Tepehuano Taino PUR 0.0044 0.0046 1.0 Yoruba Teribe Taino PUR 0.0016 0.0058 0.3 Yoruba Ticuna Taino PUR -0.0024 0.0050 -0.5 Yoruba Toba Taino PUR -0.0028 0.0053 -0.5 Yoruba Waunana Taino PUR 0.0107 0.0052 2.1 Yoruba Wayuu Taino PUR -0.0052 0.0047 -1.1 Yoruba Inuit (WG) Taino PUR 0.0012 0.0059 0.2 Yoruba Wichi Taino PUR -0.0002 0.0053 0.0 Yoruba Yaghan Taino PUR 0.0012 0.0058 0.2 Yoruba Yaqui Taino PUR -0.0002 0.0084 0.0 Yoruba Zapotec Taino PUR 0.0036 0.0046 0.8

48

Table S13. Effective population size estimates (Ne) for selected Native American populations. ​ ​ ​ ​

Population n N ​e Aleut 2 2,663 Athabascan 2 1,095 Aymara 1 1,409 Chane 1 3,462 Clovis 1 539 Eskimo 9 1,330 Han 3 19,189 Huichol 1 1,176 Karitiana 3 472 Mayan 2 1,967 Mixe 3 1,383 Piapoco 2 1,023 Pima 2 1,019 Quechua 3 3,191 Surui 2 340 Taino 1 1,634 Wichi 4 698 Yukpa 1 727 Zapotec 2 3,823

49

Table S14. Maximum likelihood estimates of the parameters of the beta distribution of allele frequencies for each modern Native American population.

Test population Alpha MLE Beta MLE CLM 1.36 1.75 MXL 1.06 1.42 PEL 0.91 1.23 PUR 1.41 1.81

50

Table S15. Maximum likelihood estimated drift times for Taino and modern admixed Latino populations for which segments of African and European ancestry had been masked. t1 indicates the drift time in the modern population and the t2 indicates the drift time in the ancient population. Values are given with 95% confidence intervals from 100 bootstrap replicates.

Test population t t ​1 ​2 CLM 0.008±0.003 0.060±0.005 MXL 0.007±0.002 0.074±0.005 PEL 0.015±0.002 0.061±0.004 PUR 0.005±0.005 0.056±0.008

51