Current Biology, Volume 27

Supplemental Information

Parallel Trajectories of Genetic and Linguistic Admixture in a Genetically Admixed Creole Population

Paul Verdu, Ethan M. Jewett, Trevor J. Pemberton, Noah A. Rosenberg, and Marlyse Baptista 160°W 140°W 120°W 100°W 80°W 60°W 40°W 20°W 0° 20°E 40°E 60°E 80°E 100°E 120°E 140°E 160°E

0 3000 km 80°N 80°N

60°N 60°N

40°N 40°N

20°N 20°N

Cape Verde

0° Santo 0° Antão São Sal Nicolau São 20°S Vicente 20°S Boa Vista

40°S Santiago 40°S Maio Brava Fogo 60°S 50 km 60°S

160°W 140°W 120°W 100°W 80°W 60°W 40°W 20°W 0° 20°E 40°E 60°E 80°E 100°E 120°E 140°E 160°E

AFRICA MIDDLE EAST EUROPE CENTRAL- EAST ASIA Han OCEANIA CENTRAL AND Bedouin Adygei Xibo Papuan Maasai (MKK) SOUTH ASIA Han (N. China) SOUTH AMERICA Druze Toscani (TSI) Makrani Tu Melanesian Luhya (LWK) Han (CHB) Pima Palestinian Tuscan Balochi Naxi Maya Mbuti Han (CHD) Bantu (Kenya) Mozabite Italian Brahui Lahu Han (CHS) CARIBBEAN Colombian (CLM) Biaka Sardinian Hazara Yi Puerto Rican (PUR) Daur Piapoco Yoruba French Pathan Dai Barbados (ACB) Cambodian Peruvian (PEL) Yoruba (YRI) Basque Sindhi Dai (CDX) Oroqen Karitiana San Russian Kalash Mongolia She Surui Bantu (S. ) Orcadian Burusho Tujia Hezhen Mandenka Iberian (IBS) Gujarati (GIH) Miao USA Japanese Caucasian (CEU) Gambian (GWD) Finnish (FIN) Uygur Yakut Japanese (JPT) African American (ASW) Mende (MSL) British (GBR) Telugu (ITU) Kinh (KHV) Mexican American (MXL) Esan (ESN) Penjabi (PJL) Tamil (SLU) Bengali (BEB)

Figure S1. Loca�ons of 79 popula�ons that were combined with the Cape Verde data. Gene�c data from these popula�ons were used to produce the dataset used in Figure 1. Portuguese se lement of Cape Verde began in the 1400s, beginning on San�ago around 1460. The remaining islands were se led over the following 400 years.

S���� 1 48 samples genotyped on S���� 2 S���� 3 the Illumina HumanOmni2.5 containing 2,379,855 markers Remove one sample from each iden�fied rela�ve pair 4 samples

Recluster markers Remove markers that have 4 markers 3 markers ≥10% missing data in the data set Remove markers with 7,241 markers Remove markers that are ambiguous posi�on 385,397 markers monomorphic in the data set

Remove markers that are in Curate markers on the 115 markers 122 markers X, Y, XY, MT 58,443 markers Hardy-Weinberg disequilibrium

14 Indels Remove non-SNP markers 34 Indels Remove markers with 6,728 markers call rate < 90%

1,918,698 SNPs (80.6%) Iden�fy pairs of samples 2,304,069 SNPs (96.8%) related closer than first cousins 0 samples Calculate sample call rates 48 samples (100%) 44 samples (91.7%) and remove samples < 95%

Recluster autosomal markers

Remove markers with 7 markers S���� 4 cluster separa�on <0.2 1000 Genomes HGDP-CEPH 2,436 individuals 938 individuals Remove markers with: 2,166,414 SNPs 642,861 SNPs call rate <95% 2 markers Remove SNPs not in 339,358 SNPs 477,013 SNPs 1,406,304 SNPs 222,109 SNPs the other data set 0 samples have 2,307,434 markers (97.0%) gender discrepancies 48 samples (100%)

Remove duplicate markers 3,206 markers Align strandness of alleles at 1,715,548 SNPs 111,508 SNPs 408,409 SNPs 12,343 SNPs SNPs share by both data sets same strand opposite strand same strand opposite strand

Remove markers with >2 alleles 0 markers

1,827,056 SNPs WORLDunrelat dataset 2,304,228 markers (96.8%) 2,480 samples 420,752 SNPs 48 samples (100%) 3,418 samples

Figure S2. Pipeline for prepara�on of the genotype dataset used in the study. The data resul�ng from this pipeline were used in the gene�c data analysis in Figure 1.

N ta lenbra sin. Na kel filmi, kel rapas staba xintadu si.. kel rapas panha kel masan...kel masan e ruma si... e ruma-l asi... e dura ku el ta ruma asi... dipos di kel, e staba xintadu pa si... e tene si kel otu kolega ki staba di la... aian, N odja

ranscript si... T n ta lenbra sin na kel filmi rapas staba xintadu si panha masan e ruma ruma-l asi dura ku el dipos pa tene otu kolega ki di . . . 2 2 1 1 1 7 1 2 3 2 5 1 2 5 2 1 2 1 1 1 1 1 1 1 1 1 2 ... Counts abo bu ago agora agu agua ami antigu arvi arvori arvuri dexa dipos es faze fika furta gente gosta . . . Subject A 2 0 0 5 1 0 1 0 0 0 3 0 0 0 2 0 0 5 0 ... Subject B 0 3 1 0 0 3 2 1 1 0 0 8 0 0 2 4 0 0 2 ... Subject C 0 2 2 0 0 1 0 0 2 0 5 1 2 0 0 1 2 0 6 ... Count Matrix . .

Figure S3. Word frequency profiles. The number of �mes each unique word appeared in a given transcript was recorded. These counts were then compiled into a matrix with rows corresponding to individual subjects and columns to words. This matrix was used in the linguis�c data analyses in Figure 3.

erdean

Mandenka Gambian (GWD) Cape V Iberian (IBS) French British (GBR)

K = 2 50 runs

K = 3 41 runs

K = 4 27 runs

Figure S4. Downsampled ADMIXTURE analysis. A sample of 226 individuals is considered. The analysis and figure design follow Figure 1C.

Table S1. Identification of words of African origin. The table shows the list of words considered in the linguistic analysis in Figure 3.

Word Category Description al ** Marker signaling desire and probability. From Portuguese há de ‘must do something’ and Wolof yálla, expression invoking God ([S1]: 98; [S6]: 134). banbudu * Adj. “carried on one’s back”. From banbu ‘to carry a child on one’s nback’ on Manding, Manjaku, and Mankañ ([S9]: 78). djobe/i * ‘To look’. From Portuguese olhar and Mandinga juubee ([S4]: 150; [S6]: 218; [S9]). Es ** Third-person plural pronoun. From Wolof old form lees < li + ees ([S6]: 140). Wolof mëneesu ko ‘one can’t do it.’ Brito in 1888 writes the pronoun in ēss with a long vowel is reminiscent of the Wolof form ([S3]: 343, 359). Lang proposes a double etymology from Portuguese eles and Wolof ees ([S6]: 141). fepu * Quantifier ‘all/entirely’. From Wolof –épp ‘all/entirely’, Mandinka few ‘completely’ ([S2]: 122). Ka ** Negative marker. From Portuguese nunca and different negative markers from Manjaku, in addition to other African languages in the region ([S1]: 95; [S6]: 129). kabesa *** Portuguese noun ‘head’. Used as a reflexive in the same way that Wolof head bopp is used as a marker of reflexivization ([S6]: 184). [In our corpus, roba-l riba kabesa]. korpu *** Portuguese noun used as a reflexive. From Mankañ’s expression with same meaning u-leef ‘the body’ ([S11]: 80). kotxi * Verb ‘to crush (corn or rice)’. Comes from kócci in Mandinka ([S2]: 125; [S8]: 76). ku *** Derived from Portuguese preposition com ‘with’ but used as a conjunction of coordination meaning ‘and/with’ and coordinating two nouns much in the same way that Wolof ak does ([S6]: 198). kunpanheru *** Portuguese noun meaning ‘companion’ but marking reciprocity in the same way that Bambara does ([S6]: 184). la * Deictic marker signaling distance. From Wolof classifier –l- combined with deictic morpheme –a ([S6]: 147). li * Deictic marker signaling proximity. From Wolof classifier –l- combined with deictic morpheme –i ([S6]: 147). ma ** Complementizer. From the older form kuma < Old Portuguese coma and Mandinka verb kuma ‘to talk’ and Mandinka morpheme kó combining the function of a verb ‘to say’ and a subordinate conjunction ([S6]: 129, 159). moku ** From Portuguese mouco ‘drunk’ and Wolof mokk ‘to be pounded’ ([S4]: 453) and meaning ‘drunk’ in Cape Verdean, Guinea Bissau, and Casamance Creoles, where the word has the same meaning ([S6]: 218; [S9]). n * First person singular pronoun. From Balanta and Mandinka ([S6]: 129). nbonji ***† Type of beans. nkontra ***† To meet. nsoda ***† The meaning is unclear. Nsoda is not listed in the dictionaries that were consulted. ntende ***† To understand. ntende ***† To delegate. ntrega ***† To fill up. nu * First-person plural pronoun. From identical Wolof pronoun nu with the same person and number ([S6]: 139). Also influenced by Portuguese nos, pronounced nus at the time of creolization ([S6]: 140). ta *** Aspectual marker. Functionally (semantically) equivalent to Wolof imperfective -y. Ta is viewed as an imperfective marker ([S6]: 161) that can be morphologically linked to Portuguese estar ([S6]: 162) but functionally behaves like the Wolof imperfective marker.

ten *** Verb ‘to have’. Semantically and morphologically associated with Wolof am/ame to express permanent versus temporary possession ([S6]: 132; [S7]: 55). tene *** Verb ‘to have.’ See ‘ten.’ -ba ** Anteriority marker. From Portuguese –va and Manjaku verb ba ‘to finish, to complete’ ([S1]: 94; [S6]: 129). -du ** Adjectival or past participle morpheme. Derived from Portuguese suffix –do and from the Wolof derivative suffix –u ([S6]: 183). -m * First-person object pronoun. From Wolof or Bambara ([S6]: 140). *Words and morphemes that trace directly back to an African language. The 9 items marked with a single asterisk are the items included in an African-origin set of particularly high confidence. **Words that have a dual etymology, corresponding to cases of conflation between the European and the African languages and attesting to the intense contact between the two sets of languages. ***Words or expressions that are clearly Portuguese, but whose semantics or phonology reveal substratal influences. †Words characterized by prenasal consonants that have been argued to originate from African substratal phonology ([S8]: 81). Though the words themselves are Portuguese, their phonological realization reveals traces of African substrates.

SUPPLEMENTAL REFERENCES

[S1] Baptista M (2006). When substrates meet superstrate: the case of Cape Verdean Creole. In Cabo Verde: Origens da sua Sociedade e do seu Crioulo, eds Lang J, Holm J, Rougé J-L, Soares MJ (Gunter Narr, Tübingen), pp. 91-116. [S2] Bartens A (2006) A contribuição do substrato africano para a génese dos crioulos caboverdianos: o caso dos ideofones. In Cabo Verde: Origens da sua Sociedade e do seu Crioulo, eds Lang J, Holm J, Rougé J-L, Soares MJ (Gunter Narr, Tübingen), pp. 117-131. [S3] Brito A (1967) Apontamentos para a gramática do criolo que se fala na ilha de Santiago de Cabo Verde. Estudos linguísticos crioulos: reedição de artigos publicados no Boletim da Sociedade de Geografia de Lisboa, ed Morais-Barbosa, J (Academia Internacional da Cultura Portuguesa, Lisbon), pp. 329-404. [S4] Brüser M, dos Reis Santos A, Dengler E, Blum A, Lang J (2002). Dicionário do Crioulo da Ilha de Santiago (Cabo Verde). Tübingen: Gunter Narr. [S5] Lang J (2006). L’influence des Wolof et du wolof sur la formation du créole santiagais. In Cabo Verde: Origens da sua Sociedade e do seu Crioulo, eds J Lang, J Holm, J-L Rougé, MJ Soares (Gunter Narr, Tübingen), pp. 53-62. [S6] Lang J (2009). Les Langues des Autres dans la Créolisation. Tübingen: Gunter Narr. [S7] Quint N (2000). Le Cap Verdien: Origines et Devenir d’une Langue Métisse. Paris: L’Harmattan. [S8] Quint N (2006) Un bref aperçu des racines Africaines de la langue cap-verdienne. In Cabo Verde: Origens da sua Sociedade e do seu Crioulo, eds Lang J, Holm J, Rougé J-L, Soares MJ (Gunter Narr, Tübingen), pp. 75-90. [S9] Rougé J-L (1988) Petit Dictionnaire Étymologique de Kriol de Guiné Bissau et de Casamance. Bissau: INEP. [S10] Rougé J-L (1994) A propos de la formation des créoles du Cap Vert et de Guinée. Papia 3:137-149. [S11] Santos R (1979). Le Créole des Iles du Cap-Vert: Comparaison avec les Langues Africaines. Thesis, Université de Dakar, Sénégal.