Quick viewing(Text Mode)

Berbers and Arabs: Tracing the Genetic Diversity and History of Southern Tunisia Through Genome Wide Analysis

Berbers and Arabs: Tracing the Genetic Diversity and History of Southern Tunisia Through Genome Wide Analysis

Berbers and : tracing the genetic diversity and history of Southern through genome wide analysis

Paolo Anagnostou1,2*, Valentina Dominici1, Cinzia Battaggia1, Nouri Boukhchim3,4, Jaâfar Ben Nasr5, Ridha Boussoffara6, Emanuele Cancellieri7, Marwa Marnaoui7, Meriem Marzouki8, Hedi Bel Haj Brahim9, Mongi Bou Rass10, Savino Di Lernia7,11†and Giovanni Destro Bisol1,2 †

1 Dipartimento di Biologia Ambientale, Università di Roma “La Sapienza”, Roma, Italia 2 Istituto Italiano di Antropologia, Roma, Italia 3 Faculté des Lettres et Sciences Humaines, Université de , Kairouan, Tunisia 4 Laboratoire LMAIM, LR99ES01, Université de 5 Département d’Archéologie, UR 16 ES 01, Université de Kairouan, Tunisia 6 Institut National du Patrimoine, Tunis, Tunisia 7 Dipartimento di Scienze dell'Antichità, Università di Roma “La Sapienza”, Roma, Italia 8 Université de Tunis, Tunis, Tunisia 9 Artisanat du , , Tunisia 10 Musée Berbère de Tamezret, Tamezret, Tunisia 11 GAES, University of Witwatersrand, Johannesburg, South

* Corresponding author email: [email protected] †These authors contributed equally

Running title: Genome-wide diversity in Tunisia populations

ABSTRACT

Objectives Tunisia has been a crossroads for people from Africa, and the since prehistoric times. At present, it is inhabited by two main ethnic groups, Arabs and , and several minorities. This study aims to advance knowledge regarding their genetic structure using new population samplings and a genome-wide approach.

Materials and Methods We investigated genomic variation, estimated ancestry components and dated admixture events in three Berber and two Arab populations from Southern Tunisia, mining a dataset including Middle Eastern, sub-Saharan and European populations.

Results Differences in the proportion of North African, Arabian and European ancestries and the varying impact of admixture and isolation determined significant heterogeneity in the genetic structure of Southern Tunisia populations. Admixture time estimates show a multilayer pattern of admixture events, involving all three ethno-linguistic groups, which started around the mid XI century and lasted for nearly five centuries.

Discussion Our study provides evidence that the relationships between genetic and cultural diversity of old and new inhabitants of follow different patterns. Unlike what has been hypothesized for and , the Berbers of southern Tunisia seem to have preserved a significant part of their original genomic heritage despite and Arab cultural influence. Whereas Berber groups still share a common genetic background regardless of linguistic diversity and differential admixture, Arabs show clearly different genetic structures: the population from Douz bears multiple signatures of stratification and admixture, while the semi- R’Baya, who have maintained their original lifestyle, retain most of the genomic background of Arabian populations.

Keywords: North Africa, Berbers, population genomics, haplotype, admixture INTRODUCTION

Being placed at the centre of North Africa between two permeable barriers, the Sahara to the south and the Mediterranean Sea to the north, the area corresponding to present-day Tunisia is a natural crossroads between Africa, Europe and the Middle East. Due to this strategic location, Tunisia has been witness to a long parade of invaders, settlers and colonizers who have generated a complex and multilayer population history. The most ancient evidence of presence in this area dates back at least to the late Middle Pleistocene. Over the centuries, Tunisia has been peopled by human groups bearing different cultures, from the Acheulean (Belhouchet 2003; Sahnouni et al., 2013), the Middle Stone Age, dating back to at least 90 Kya (Aouadi-Abdeljaouad et Belhouchet, 2012) up to the Capsian culture (10-4.7 Kya)(Rahmani, 2004; Mulazzani, 2013; Di Lernia et al., 2017). The latter persisted even after the introduction of agricultural practices, which took place in North Africa as early as around 7400 kya, likely through contacts and cultural exchanges with different regions and people (Linstädter et al., 2012; Mulazzani et al., 2016). Starting from the second millennium BCE, Tunisia was occupied by the Phoenicians who were succeeded by Romans and, from the 5th century CE by other domains including , Byzantines, Arabs, Ottomans and finally French (Modéran, 2003; Lassere, 2015). All of these dominations left recognizable traits on the cultural heritage and, possibly, on the genetic makeup of Berbers (Brett et Fentress, 1996), the “autochthonous” inhabitants of North Africa (Collignon, 1886). The most significant and impacting population movement was that involved the peoples, who converted local populations to their Islamic religion and forced most Berber communities to adopt their Arabic culture and language. Starting in the mid- CE, the Arab invasions initially involved only a few thousand migrants (McEvedy and Jones, 1980). In the , a considerably larger Arab wave led by Bedouin tribes ( and Banu Sulaym) reached the area of the . This event is considered to be the most important of the entire medieval period in the Maghreb (Hess et Julien, 1971), counting from 150,000 to 300,000 (El Hareir et Mbaye, 2011). The Arab and Berber dynasties persisted until the 16th century, when the Ottoman took control of the country (Naylor, 2010; Guellouz et al 2010). This complex migratory and cultural history shaped the current ethno-linguistic diversity of today’s Tunisian populations (Leveau, 2018). The two major extant linguistic groups, Arabic and Berbers, are both branches of the Afro-asiatic family (Ruhlen, 1991). A variety of the former, known as Tunsi, is spoken by nearly 99% of the population (Sayahi, 2014; Battenburg, 1999). The latter, or better a series of dialects comprised into the Zenati language branch (Kossmann, 2013), are still spoken in small enclaves concentrated in the southern and southeastern areas of the country which count from 1% to 0.5% of the overall Tunisian population depending on different estimates (Garmadi, 1968, Penchoen 1968, Battenburg, 1999). The cultural diversity is greatest in the northern areas, where the Turco-Tunisians are settled in the coastal cities and on the island of Jerba, which is also home, along with the capital Tunis, of the Jewish community. The southern desertic areas of Tunisia are inhabited by nomadic and semi-nomadic populations of Berber and Bedouin origin (Dupire, 1963). To complete the picture, descendants of the , who were expelled from after the “” in 1609, occupy and quarters in the capital (Saadaoui, 1996, 2001). While Tunisian populations are known to be characterized by the remarkable cultural diversity outlined above, the structure of their genetic variation studies has yet to be exhaustively investigated. Most of the studies conducted so far have used unilinear polymorphisms, which may help trace specific demographic events but cannot investigate the genetic structure in all its aspects (Fadhlaoui-Zid et al., 2004; Loueslati et al., 2006; Cherni et al., 2009; Fadhlaoui-Zid et Benammar- Elgaaied, 2010; Ennafaa et al., 2011; Kefi et al., 2015; Frigi et al., 2017; Elkamel et al. 2017, Cherni, 2018). Furthermore, genomic approaches, which can define the relationships among individuals and groups more comprehensively, have been limited to Berbers (Henn et al., 2012; Arauna et al., 2017). Finally, some areas, more in particular the southern part of the country, have yet to be adequately investigated. Here we present the results of a new study of genome-wide variation carried out among Arabs, Berbers and R’Baya of Southern Tunisia. Applying an allele and haplotype-based approach, we assess their genomic relationships, reconstruct their admixture history and make inferences on the relationships between genetic and cultural diversity.

MATERIALS AND METHODS

Populations under study Our dataset includes new data relative to three ethno-linguistic groups of southern Tunisia and literature results pertaining to Berbers and populations from North and sub-Saharan Africa, the Arabian Peninsula, Middle East and Europe (see Supplementary Table S1 and Figure 1). We obtained saliva samples from a total of 80 individuals enrolled following the “grandparents” rule after obtaining informed consent and ethnological, linguistic and familiar information. Thirty- two samples of self-reported Tunisian Arabs were collected in the wider region of the Nefzaoua, mostly in the city of Douz (33° 27′ 0″ N, 9° 1′ 0″ E). Historical sources suggest that this area represents a possible settlement point for the Arab newcomers in the middle ages and also home for Arabized Berber communities (Ellefi, 2016). Fifteen samples of Arab-speaking self-reported Berbers in Matmata (33° 32′ 33.5″ N, 9° 58′ 0.5″ E) and 26 samples of Berber-speaking self-reported Berbers in Nouvelle (N=16; 33° 39' 40.1'' N, 9° 47' 34.7'' E ) and Tamezret (N=10; 33° 32' 00" N, 9° 52' 00" E) were sampled. We also gathered saliva samples from seven R’Baya individuals, a small group (nearly 14,000 members) of semi-nomadic herders whose oral chronicles set their origin in the Arabian Peninsula, more precisely in Yemen and their arrival in North Africa between the 11th and the 18th century AC (Dupire, 1963). All experimental protocols were approved by the Bioethic Committee of the University “La Sapienza” of Rome (Rome, Italy. Prot. N.259/19)

Genomic analyses Genomic DNA was extracted using the prepIT L2P kit (DNA Genotek) following the manufacturer's protocol, all samples were genotyped with the Illumina Human Omniexpress Exome v 8.1 array. Markers and individuals with less than a 98% call rate were filtered leaving a total of 937,563 SNPs and 79 individuals for intersection with additional data. These novel data were merged with literature results from 33 populations of six predefined geographical regions and then filtered using PLINK (Purcell et al 2007). To balance for sample sizes, we subsampled the Tuscan (ITA_C) and Palestinian (PAL) populations randomly selecting 30 individuals. Only the SNPs and individuals with a genotyping success rate > 90% were included. Relatedness to the 3rd generation (Identity by Descent, IBD > 0.125) was tested with the PLINK v1.9 software and only one sample from any detected relative pair was randomly chosen for the subsequent analysis. After the quality control procedure, our final dataset includes a total of 258,609 autosomal SNPs and 721 individuals. The data that support the findings of this study are openly available in Zenodo at http://doi.org/10.5281/zenodo.3546051.

Runs of Homozygosity, population structure and local ancestry assignment Runs of homozygosity (RoH) analysis was performed in order to investigate intra-population diversity. RoH were estimated using PLINK v1.9 (Purcell et al., 2007) under default settings (sliding window of 5 Mb, minimum of 50 SNPs, one heterozygous genotype and five missing calls allowed). We defined RoH as stretches of at least 1MB including a minimum number of SNPs set to 80. To explore patterns of population structure, we first performed a Principal Components analysis (PCA) using PLINK. Our five Tunisian populations were projected onto the PCs space obtained using the reference panel. Local ancestry was estimated using RFMix (Maples, Gravel, Kenny et Bustamante, 2013). This algorithm assigns probabilities that a certain genomic chunk derives from an ancestral population based on a random forest classifier for each chunk followed by smoothing by a conditional random field. We first jointly phased the 258,609 by means of the ShapeIt software (Delaneau, Howie, Cox, Zagury et Marchini, 2013) using 1000 Genomes data as a reference panel and then ran RFMix with the PopPhased option and default parameters. We did not make use of the expectation maximization (EM) option since it is highly time-consuming and does not increase inference accuracy (Xue, Lencz, Darvasi, Pe’er et Carmi, 2017). We tested the five Tunisian populations plus the Sened Berbers for ancestries from North (western Maghreb) and Sub-Saharan Africa (both eastern and western), the Middle East, Arabian Peninsula and Southern Europe using two populations from each geographic region. We also downsampled the above-mentioned reference populations to balance the size of Kenya (N=9). The SNPs with a lower assignment quality than 0.9 were filtered out. We also used ADMIXTURE v1.23 (Alexander, Novembre et Lange, 2009) to infer maximum likelihood estimation of individual ancestries using default values and testing from k=2 to 10 ancestral clusters using 5 different random seeds. The different outputs were than combined with CLUMPAK (Kopelman, Mayzel, Jakobsson, Rosenberg et Mayrose, 2015) using the LargeKGreedy algorithm. We assessed cross-validation errors for each value of K using the ADMIXTURE’s Cross Validation procedure.

Chromopainter, FineStructure and Globetrotter analysis To explore population structure at a fine-grain level and date admixture events, we made use of a set of LD-based methods, namely ChromoPainter (Lawson et al., 2012), FineStructure (Lawson et al., 2012) and Globetrotter (Hellenthal et al., 2014). Briefly, Chromopainter “paints” the chromosomes of recipient individuals as a mosaic of segments deriving from donor individuals, using the information on the allelic state of recipient and donors at each available position along the chromosome. We painted each individual using all of the others in the dataset as a donor. Following Busby et al (2015), we first inferred the global mutation probability and the switch rate for chromosomes 1, 5, 8, 12, 17 and 22 in 10 iterations of the EM (expectation maximization) algorithm. Due to the exhaustive nature of this estimation, we averaged these values across a subset of populations (Bedouins, Douz, Germans, , Palestinians, Saudi). We then used these parameters in the final Chromopainter run to infer the ChromoPainter coancestry matrix for each chromosome. Using ChromoCombine, we then combined the data output from ChromoPainter into a single final coancestry matrix. The coancestry matrix was used to hierarchically cluster individuals into homogeneous groups by using an MCMC procedure implemented in FineStructure. We ran FineStructure for 2 million MCMC iterations, discarding the first 1 million as “burn-in” and sampling from the posterior distribution every 10,000 iterations (Hellenthal et al., 2014; Leslie et al., 2015). We then built a tree relating these final clusters by pairwise merging similar clusters in a greedy fashion. Finally, we used Globetrotter to infer and date admixture events. This method uses the ChromoPainter output to infer admixture proportions through a multiple linear regression on the average proportion of DNA that each recipient copies from each donor group. Then, it estimates time of admixture by measuring the decay of association vs genetic distance between the chromosome segments copied from a given pair of donor groups (Busby et al., 2015; Hellenthal et al., 2014). We ran Globetrotter separately on eachFineStructure cluster composed only of individuals belonging to our three Tunisia ethnic groups plus the Sened Berbers while all the other clusters with at least two individuals were used as surrogate groups. For this analysis, we ran again ChromoPainter v2 in a separate analysis using the same switch rate and global mutation parameters cited above, but individuals were a priori classified into groups based on FineStructure clustering results. We ran 100 bootstrap iterations in order to estimate admixture dates and set the generation time to 28 years (Hellenthal et al., 2014).

RESULTS

As a first assessment of genetic diversity, we carried out a PC analysis using individual data both from the populations under study and literature results (Figure 2). The first component (3.6% of the total variance) clearly separates sub-Saharans, placed at the negative edge, from the Europeans and Middle Easterners at the other extreme, with the North African groups falling in between but closer to the latter. A subset of individuals fall in an intermediate position between the North and sub- Saharan African samples, a likely signature of a relatively recent admixture. The second component (0.6% of the total variance) distinguishes the Europeans from the Middle Eastern and North African populations, with individuals from the Arabian Peninsula found in the upper area of the plot. North Africans fall close to each other and near the Middle east/Arabian Peninsula groups. When the sub- Saharan African samples are removed from the plot, a substructure within North Africa becomes evident (see supplementary Figure S1); the second PC slightly differentiates Berbers (with the Algeria Mozabites more scattered) from the Arabs (Douz, and R’Baya) who are closer to Middle eastern/Arabian populations. In order to obtain information more directly concerned with the population genetic history, we estimated local ancestry. We found three components that were able to discriminate Berber from Arab populations (Figure 3). Due to their small sample size, the data obtained for R’Baya should be treated with caution. Firstly, the highest proportions of North African ancestry (in green) were observed among Berbers, especially the three southernmost populations (median values around 40% vs 31% for Sened), whereas the two Arab groups have considerably lower values (22.1% and 19.5% for Douz and R’Baya, respectively). Secondly, the Arabian ancestry (in pink) of Douz and R’Baya individuals (31.3% and 41.6%, respectively) was found to be higher than Berbers (ranging from 10.6% to 13.3% for Tamezret and Zraoua, respectively). Thirdly, Berbers also show a higher proportion of European ancestry (average 24.1%, in blue), almost doubling the value observed in both Arab populations. The other two ancestries tested did not evidence a clearcut diversity between the two groups. The highest proportion of Sub-Saharan African ancestry was observed in Douz (13.5%, in orange), with the western ancestry prevailing over the eastern one (10.9% vs 7.2%, see supplementary Figure S2). Interestingly, the Middle Eastern ancestry (in red) was relatively low and similar across all populations (around 15%). The general picture of local ancestry estimates produced by RFmix was similar to that obtained using the unsupervised ADMIXTURE algorithm (K=4; see Supplementary Figure S3), which reassured us regarding the robustness of the pattern observed. The differentiation between Tunisian Berbers and Arabs was found to be remarkable also at individual level. Removing outliers (1 Matmata and 2 Douz individuals with an outlying proportion of sub-Saharan African ancestry), the Arabian component was found to be higher than Berbers (47 individuals) in 30 out of 31 Arabs. For the North African component, the overlap involves just four Berbers with a lower proportion of North African ancestry than Arabs, while no overlap was observed for the European one. It should be noted that the overlap found between Berbers and Arabs regarding the North African and Arab components is limited to the northernmost Sened. In order to gain further insights into the populations’ genetic structure, we analyzed haplotype sharing and defined the hierarchical relationships between the clusters of genetically similar individuals. The extent of within-population haplotype sharing was found to be highest for the R’Baya followed by Zraoua and Tamezret (supplementary Figure S4) suggesting an increased inbreeding rate compared to the other populations of the same area, a possible signature of genetic isolation. Further support for this result comes from the analysis of the number and length of Runs of Homozygosity (supplementary Figure S5) which were found to be higher in Zraoua and R’Baya. A total of 83 clusters were found, which were assigned to three monophyletic and geographically well-defined clades (Figure 4A). The coancestry matrix and the detailed composition of each cluster are reported in Supplementary Figure S6 and Supplementary Table S2. Of the three main clades, the uppermost mostly comprises clusters of European individuals, whereas the lowermost includes all Sub-Saharan Africans and several small-sized clusters (1 or 2 individuals) of North African, both Berber and Arab, or Arabian subjects. The intermediate clade can be further divided into two sub- clades, one exclusively composed of western Maghreb Berber individuals and the other with a prevalence of Middle Eastern and Arabian clusters. Within the former sub-clade, individuals belonging to each population are mostly dispersed across multiple clusters, once more pointing to a high genetic diversity both between and within groups. Focusing on the three Tunisian ethnolinguistic groups, we found a noticeable homogeneity among Tunisian Berbers, with most individuals from Tamezret and Matmata, and all those from Zraoua falling inside -specific and closely related clusters. Differently, 31 out of 33 Arabs from Douz were distributed across 7 (closely related) clusters, which form a paraphyletic branch of the Berber one. The remaining two fell within the sub-Saharan African branch. Interestingly, all members of the DOUZ5 cluster (3 individuals) belong to the same family, while six out of eight (75%) of the DOUZ1 cluster come from a single clan, which suggests a certain degree of endogamy (see Supplementary Table S3). Finally, the R’Baya individuals constitute a single cluster that fall well within the middle eastern/Arabian sub-clade and which is distant from the North African one. Finally, we identified and dated signatures of admixture using pairwise coancestry curves. We were able to detect six one-date and five one-date-multiway admixture events. Unfortunately, no reliable results (p>0.05) were obtained for the Sened Berbers (cluster SEN). Overall, the admixture patterns were found to be highly complex, with multiple sources admixing at the same time (one- date multiway events; see the first event in Figure 4C) and/or source populations being already admixed, especially considering the more recent events (e.g. see the three last events in Figure 4C). The dates of these admixture events cover a time span of approximately five centuries (Figure 4B) and suggest three different migratory waves. The first wave may be related to two apparently simultaneous (multi-way) admixture events involving the Matmata cluster, dating at 1011 CE (978- 1044 CE). As shown in previous studies (Busby et al., 2015), if in a multi-way admixture, the two events share the same or a closely related source population, the pattern can be interpreted as a three-way admixture. Following this, present-day people from Matmata could be in continuity with Berber groups similar to contemporary populations from Morocco and Northern Tunisia, with some degree of introgression of genomic tracts resembling that of current Saudi and western sub-Saharan Africans (Figure 4C). The second wave might have occurred around the middle thirteenth century and include two events. The first, found for the DOUZ3 cluster (1247 CE [1240-1308 CE]), involves again Saudi and Yoruba as major and minor sources, respectively, whereas the second, dating at 1267 CE (1235-1301 CE), was found for one of the two Zraoua clusters, which suggests a three-way admixture with sources consistent with those inferred for Matmata but showing a lower extent of Arab introgression. The third wave begins around the mid-Fourteenth century. The most ancient event could represent the lower boundary of the arrival of the R’Baya (1354 CE [1340-1394 CE]), a group with an ancestry resembling current Saudi and some genetic contribution from eastern sub- Saharan Africans, followed by Tamezret (1387 CE [1360-14 43 CE]) and DOUZ 2 (1391 CE [1368-1410 CE]) with Berber/western sub-Saharan Africa and Saudi/eastern sub-Saharan Africa as major and minor parental sources. The remaining four events were found to be temporarily very close to each other and more complex than the previous ones, with source populations resulting highly admixed. They highlight the introgression of Berber genomes into the Douz gene pool, probably due to the assimilation of Arabized Berbers (clusters 1,4,5,6; see supplementary Table S4 for details on the admixture dates and source contributions).

DISCUSSION

Genetic data obtained so far have failed to show any clear distinction between Berber and Arab ethno-linguistic communities (K. Fadhlaoui-Zid et al., 2004; Karima Fadhlaoui-Zid et al., 2011; S. Frigi et al., 2017; Kefi et al., 2015). This finding would suggest a homogeneous genetic background and/or an extensive admixture between the two groups. However, the geographical distance and long separation between the ancestors of the two groups does not support the former hypothesis, while the persistence of linguistic and/or cultural differences between present-day Arabs and Berbers seems not to be in favor of the latter. Furthermore, a scenario of religious and cultural assimilation conducted by a small elite is at odds with the settlement in Tunisia of large groups belonging to the Bedouine Banu Hilal and Banu Sulaym tribes, as reported by the Arab historiographer and historian . Given these premises, we thought it would be useful to re-examine the issue of homogeneity/diversity between the two ethno-linguistic groups in relation to their demographic and admixture histories, studying a new sampling of ethnically-defined populations from southern Tunisia through the use of more powerful genomic tools than previously adopted. Our study shows that the Tunisian Berbers keep robust signatures of their original genetic legacy despite the religious assimilation imposed by Arabs and their cultural influences. Using data from 258,609 SNPs, we detected noticeable differences in the frequency of two ancestry components: one which traces back to the “autochthonous” north African genetic background, and the other close to that of present-day populations of the Arabian Peninsula. The former was found to be prevalent among Berber individuals, while the latter characterizes Arabs from the Nefzaoua region and semi-nomadic R’Baya. Although a clear-cut differentiation can also be appreciated for the European component, more frequent among Berbers than Arabs, this was found at a lower frequency than the other two mentioned previously. Sub-Saharan and Middle Eastern ancestries, which are known to have been brought to North Africa through a variety of ancient (pre- and neolithic) and recent (Arab conquest) migratory waves (Newman, 1995; Appiah et Gates, 2010; Henn et al., 2012; van de Loosdrecht et al., 2018; Fregel et al., 2018;Elkamel et al., 2018; D’Atanasio et al., 2018), were found not to vary substantially among the ethno-linguistic groups. We also provide evidence that Berber groups maintained a substantial genetic homogeneity, regardless of linguistic and historical differences. Tamezret and Matmata were found to be genomically the closest, which suggests that their linguistic diversity has not been an effective and/or prolonged enough barrier to obscure their common ancestry. Interestingly, studies carried out in other geographical contexts have shown that often linguistic diversity alone is not an effective factor of population differentiation at regional level (e.g. Anagnostou et al., 2017; Capocasa et al., 2013, 2014). Zraoua showed substantial genetic affinities with Matmata and Tamazret, despite its moderate signals of genetic isolation (for RoHs and intra-population haplotype sharing, see supplementary figures S4 and S7), a condition already observed for the Berbers (Arauna et al., 2017; Henn et al., 2012). Sened was found to be the most diverse, which may be due to its geographical distance from the other three Berber villages (average linear distance 122 Km) which is greater than that between the latter (≤ 21 Km) and the presence of a large seasonal salt lake (Chott el Jerid) along the path among villages that might have limited the contacts between the southern and Northern Berber communities. Furthermore, local ancestry estimates suggest that they admixed with Arabs to a larger extent than other Berbers. Admixture estimates suggest that the Arab and sub-Saharan Africa contribution to the genetic structure of Matmata occurred around the mid-eleventh century, which is close to the timing of the Hilalian invasion (1052 CE) (Laroui 1977, Abdel Waheb 2004); the Bedouin tribes were particularly aggressive over the conquered local Berber groups in order to significantly accelerate the process and thus linguistic assimilation (Abun-Nasr 1971). A quite different scenario may be put forward for Tamezret and Zraoua. Their admixture with foreign groups (sub-Saharans and Arabs, respectively) was less intense and took place from the mid-twelfth to the mid-fourteenth centuries. This period overlaps with the domination of the Berber Hafsids dynasty, who ruled a large area of North Africa from 1230 to 1574 CE, granting stability and prosperity thanks to the establishment and consolidation of routes with Europe, the Middle East and sub-Saharan Africa (Daghfous and Mahfoudh, 2013). Hafsids were also able to relax the relations between Arabs and Berbers, helping the latter to preserve their traditional language (Daghfous and Mahfoudh, 2013). Widening comparisons to Berbers from North Africa, our patterns are different from those unraveled in previous studies. A recent genomic investigation carried out among Arabs and Berbers from Algeria and Morocco detected no significant distinction between the two groups and a substantial differentiation between Berbers (Arauna et al 2017). However, the linguistic assimilation imposed on Berbers by the newcomers in this area makes it difficult to distinguish one ethno- linguistic group from another using spoken language as a proxy. Therefore, as the authors of the study admit, it cannot be ruled out that their results might have been confounded by difficulties in discriminating Arabized Berbers from Arabs (pp 325). Moving the focus on to the Arab populations, two clearly different patterns were observed. The Arabs from the region of Nefzaoua exhibit a rather complex genetic structure. Genomes were assigned to seven main clusters, whereas no more than two were found in other groups. The stratification of multiple migratory waves of Arab newcomers, bringing along their eastern sub- Saharan slaves or even enslaving western sub-Saharan Africans locally, might explain their heterogeneity. According to genomic data, this process began approximately in the mid-twelfth century and possibly led to the first settlement in the Douz oasis. It ended around the beginning of the sixteenth century when admixture with Berbers occurred and arabized Berbers were assimilated within the Douz population (ELLEFI M., 2016, Phd thesis). The R’Baya show the genomic features of a group who moved from the Arabian Peninsula and admixed to a limited extent with neighbouring groups after their settling in North Africa, which our data place after the mid-fourteenth century. Genomic evidence seems to match their cultural and social features, which indicate the maintenance of the traditional Bedouin habits, with the ensuing limits likely imposed by cultural isolation and the semi-nomadic lifestyle to contacts and admixture with new neighbors.

Concluding remarks We show that the interplay of migration, admixture and isolation produced a heterogeneous structure of genomic diversity among southern Tunisia ethno-linguistic groups. Similarly, the relations between cultural and genetic diversity do not seem to comply with a single model. Berbers still retain strong signatures of their original ancestry despite the Islamization and Arabization processes. In the population from Douz, the Arab culture coexists with considerable genetic diversity and evident signatures of admixture with Berbers or assimilation of Berbers. Differently, the other Arab population studied here, the R’Baya, retained most of their genomic background of Arabian groups keeping, at the same time, their original Bedouin lifestyle. Although other Tunisian populations still remain to be investigated, we believe that our study can advance our knowledge of the peopling of this area and hope it could provide a basis for further explorations of genomic diversity of present and past populations of North Africa.

ACKNOWLEDGEMENTS

This research is part of the activities of “The Archaeological Mission in the Sahara”, a joint project between INP-Institut National du Patrimoine (2014-present), Kairouan University (2014-2018) of Sapienza University of Rome (2014-present). We warmly thank the General Directors of INP and all the people that collaborated in the present study. We also would like to thank Ioannis Patramanis and Marco Morittu for helping with data analyses. The research is financially supported by Sapienza University of Rome (Grandi Scavi di Ateneo) and the Italian Ministry of Foreign Affairs (DGPS, Office VI), granted to SdL, Sapienza University of Rome (Progetti Ateneo 2017; prot. N° RP11715C7E0E853D) granted to PA. Partial support was granted by the Istituto Italiano di Antropologia.

AUTHOR’S CONTRIBUTIONS Study design: PA, GDB, SdL. Sample preparation and genetic analyses: CB. Data analyses: PA, VD. Sampling: PA, SdL, NB, JBN, MeM, MaM, RB, EC, HBHB, MBR. Wrote the manuscript: PA, GDB in collaboration with SdL. Critical revision of the article/discussion: PA, GDB, SdL, NB.

REFERENCES

Alexander, D. H., Novembre, J. et Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655‑1664. doi:10.1101/gr.094052.109 Anagnostou, P., Dominici, V., Battaggia, C., Pagani, L., Vilar, M., Wells, R. S., … Tofanelli, S. (2017). Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset. Scientific Reports, 7, 41614. doi:10.1038/srep41614

Aouadi-Abdeljaouad, N. et Belhouchet, L. (2012). Middle Stone Age in Tunisia: Present Status of Knowledge and Recent Advances: A North African Perspective. Dans J.-J. Hublin et S. P. McPherron (dir.), Modern Origins (vol. 80, p. 143‑155). Dordrecht : Springer . doi:10.1007/978-94-007-2929-2_10

Appiah, A. et Gates, H. L. (2010). Encyclopedia of Africa. Oxford University Press.

Arauna, L. R., Mendoza-Revilla, J., Mas-Sandoval, A., Izaabel, H., Bekada, A., Benhamamouch, S., … Comas, D. (2017). Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa. Molecular Biology and Evolution, 34(2), 318‑329. doi:10.1093/molbev/msw218

Battenburg, J. (1999). The Gradual Death of the Berber Language in Tunisia. International journal of the sociology of language, 137(1). doi:10.1515/ijsl.1999.137.147

Battenburg, J. (1999). The gradual death of Berber language communities in Tunisia. International Journal of the Sociology of Language 137. 147-161.

Belhouchet, L. (2003). Technologie lithique au paléolithique inférieur en Tunisie : approche de l'organisation de la production lithique à partir des industries du site de Sidi Zin (le Kef) (Unpublished doctoral dissertation or master’s thesis). Université d'Aix-, .

Brett, M. et Fentress, E. W. B. (1996). The Berbers. Blackwell.

Busby, G. B. J., Hellenthal, G., Montinaro, F., Tofanelli, S., Bulayeva, K., Rudan, I., … Capelli, C. (2015). The Role of Recent Admixture in Forming the Contemporary West Eurasian Genomic Landscape. Current Biology: CB, 25(19), 2518‑2526. doi:10.1016/j.cub.2015.08.007

Capocasa, M., Anagnostou, P., Bachis, V., Battaggia, C., Bertoncini, S., Biondi, G., … Destro Bisol, G. (2014). Linguistic, geographic and genetic isolation: a collaborative study of Italian populations. Journal of Anthropological Sciences = Rivista Di Antropologia: JASS / Istituto Italiano Di Antropologia, 92, 201‑231. doi:10.4436/JASS.92001

Capocasa, M., Battaggia, C., Anagnostou, P., Montinaro, F., Boschi, I., Ferri, G., … Destro Bisol, G. (2013). Detecting genetic isolation in human populations: a study of European language minorities. PloS One, 8(2), e56371. doi:10.1371/journal.pone.0056371

Cherni, L., Fernandes, V., Pereira, J. B., Costa, M. D., Goios, A., Frigi, S., … Pereira, L. (2009). Post-last glacial maximum expansion from Iberia to North Africa revealed by fine characterization of mtDNA H haplogroup in Tunisia. American Journal of Physical Anthropology, 139(2), 253‑260. doi:10.1002/ajpa.20979 Cherni, L., Loueslati, B. Y., Pereira, L., Ennafaâ, H., Amorim, A. et El Gaaied, A. B. A. (2005). Female gene pools of Berber and Arab neighboring communities in central Tunisia: microstructure of mtDNA variation in North Africa. Human Biology, 77(1), 61‑70.

Collignon, R. (1886). Ethnologie de la Tunisie. Bulletins de la Société d'anthropologie de Paris 9: 620– 622.

D’Atanasio, E., Trombetta, B., Bonito, M., Finocchio, A., Di Vito, G., Seghizzi, M., … Cruciani, F. (2018). The peopling of the last Green Sahara revealed by high-coverage resequencing of trans-Saharan patrilineages. Genome Biology, 19(1), 20. doi:10.1186/s13059-018-1393-5

Daghfous,R et Mahfoudh, M (2013). Histoire de la Tunisie Médiévale. CPU.

Delaneau, O., Howie, B., Cox, A. J., Zagury, J.-F. et Marchini, J. (2013). Haplotype estimation using sequencing reads. American Journal of Human Genetics, 93(4), 687‑696. doi:10.1016/j.ajhg.2013.09.002

Di Lernia, S., Anagnostou, P., Ben Fraj, T., Ben Nasr, J., Boukhchim, N., Boussoffara, R., Bel Haj Brahim, H., Cancellieri, E., Carpentieri, M., Castorina, F., Destro Bisol, G., Lucci, E., Manzi, G., Marnaoui, M., Monaco, A., Ouaja, M., Jaouadi, S., Tafuri, M. A. (2017). First archaeological investigations in the Chott el Jerid area, Southern Tunisia. Scienze dell'Antichità 23 (1): 3-19.

Dupire, M. (1963). Nomades et nomadisme au Sahara. Unesco.

El Hareir, I. et Mbaye, R. (2011). The Spread of Throughout the World. UNESCO.

Ellefi, M. (2016). L’Arzugitana: Recherches d’Histoire et d’Archéologie sur une entité africaine de l’antiquité tardive (Unpublished doctoral dissertation or master’s thesis). Faculté des lettres et des sciences humaine de , Tunisia.

Elkamel, S., Boussetta, S., Khodjet-El-Khil, H., Benammar Elgaaied, A. et Cherni, L. (2018). Ancient and recent Middle Eastern maternal genetic contribution to North Africa as viewed by mtDNA diversity in Tunisian Arab populations. American Journal of Human Biology: The Official Journal of the Human Biology Council, 30(3), e23100. doi:10.1002/ajhb.23100

Ennafaa, H., Fregel, R., Khodjet-El-Khil, H., González, A. M., Mahmoudi, H. A. E., Cabrera, V. M., … Benammar-Elgaaïed, A. (2011). Mitochondrial DNA and Y-chromosome microstructure in Tunisia. Journal of Human Genetics, 56(10), 734‑741. doi:10.1038/jhg.2011.92

Fadhlaoui-Zid, K., Martinez-Cruz, B., Khodjet-el-khil, H., Mendizabal, I., Benammar-Elgaaied, A. et Comas, D. (2011). Genetic structure of Tunisian ethnic groups revealed by paternal lineages. American Journal of Physical Anthropology, 146(2), 271‑280. doi:10.1002/ajpa.21581

Fadhlaoui-Zid, K., Plaza, S., Calafell, F., Ben Amor, M., Comas, D. et Bennamar El gaaied, A. (2004). Mitochondrial DNA heterogeneity in Tunisian Berbers. Annals of Human Genetics, 68(Pt 3), 222‑233. doi:10.1046/j.1529-8817.2004.00096.x Fregel, R., Méndez, F. L., Bokbot, Y., Martín-Socas, D., Camalich-Massieu, M. D., Santana, J., … Bustamante, C. D. (2018). Ancient genomes from North Africa evidence prehistoric migrations to the Maghreb from both the Levant and Europe. Proceedings of the National Academy of Sciences of the of America, 115(26), 6774‑6779. doi:10.1073/pnas.1800851115

Frigi, S., Cherni, L., Fadhlaoui-Zid, K. et Benammar-Elgaaied, A. (2010). Ancient local evolution of African mtDNA haplogroups in Tunisian Berber populations. Human Biology, 82(4), 367‑384. doi:10.3378/027.082.0402

Frigi, S., Mota-Vieira, L., Cherni, L., van Oven, M., Pires, R., Boussetta, S. et El-Gaaied, A. B. A. (2017). Mitochondrial DNA analysis of Tunisians reveals a mosaic genetic structure with recent population expansion. Homo: Internationale Zeitschrift Fur Die Vergleichende Forschung Am Menschen, 68(4), 298‑315. doi:10.1016/j.jchb.2017.05.003

Garmadi, S. (1968). La situation linguistique actuelle en Tunisie: problemes et perspectives. Revue Tunisienne des Sciences Sociales, 13, 13-24.

Guellouz A., Masmoudi A., Smida M., Saadaoui A. (2010). Histoire Générale de la Tunisie - TOME III: Les Temps modernes (941-1247H. /1534-1881). Sud Editions.

Hellenthal, G., Busby, G. B. J., Band, G., Wilson, J. F., Capelli, C., Falush, D. et Myers, S. (2014). A genetic of human admixture history. Science, 343(6172), 747‑751. doi:10.1126/science.1243518

Henn, B. M., Botigué, L. R., Gravel, S., Wang, W., Brisbin, A., Byrnes, J. K., … Comas, D. (2012). Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genetics, 8(1), e1002397. doi:10.1371/journal.pgen.1002397

Hess, A. C. et Julien, C.-A. (1971). --Tunisia Algeria, Morocco: From the Arab Conquest to 1830. The American Historical Review. doi:10.2307/1849334

Kefi, R., Hsouna, S., Ben Halim, N., Lasram, K., Romdhane, L., Messai, H. et Abdelhak, S. (2015). Phylogeny and genetic structure of Tunisians and their position within Mediterranean populations. Mitochondrial DNA, 26(4), 593‑604. doi:10.3109/19401736.2013.879649

Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A. et Mayrose, I. (2015). Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Molecular Ecology Resources, 15(5), 1179‑1191. doi:10.1111/1755-0998.12387

Kossmann, M. (2013). The Arabic Influence on Northern Berber. BRILL.

Laroui, A. (2015). The History of the Maghrib: An Interpretive Essay. Princeton University Press.

Lasserè, J.M. (2015). Africa, quasi Roma. CNRS Editions.

Lawson, D. J., Hellenthal, G., Myers, S. et Falush, D. (2012). Inference of population structure using dense haplotype data. PLoS Genetics, 8(1), e1002453. doi:10.1371/journal.pgen.1002453 Leslie, S., Winney, B., Hellenthal, G., Davison, D., Boumertit, A., Day, T., … Bodmer, W. (2015). The fine-scale genetic structure of the British population. Nature, 519(7543), 309‑314. doi:10.1038/nature14230

Leveau P., (2018). Climat, Société et environnement aux marges Sahariennes du Maghreb : Une approche historiographique . In S. Guedon (Ed.) Les Frontières Méridionales du Maghreb : Approches Croisées (Antiquité-Moyen Age), I. (pp. 19-106). Ausonius éditions.

Linstädter, J., Eiwanger, J., Mikdad, A. et Weniger, G.-C. (2012). Human occupation of Northwest Africa: A review of Middle Palaeolithic to sites in Morocco. Quaternary international: the journal of the International Union for Quaternary Research, 274, 158‑174. doi:10.1016/j.quaint.2012.02.017

Loueslati, B. Y., Cherni, L., Khodjet-Elkhil, H., Ennafaa, H., Pereira, L., Amorim, A., … Ben Ammar Elgaaied, A. (2006). Islands inside an island: reproductive isolates on Jerba island. American Journal of Human Biology: The Official Journal of the Human Biology Council, 18(1), 149‑153. doi:10.1002/ajhb.20473

Maples, B. K., Gravel, S., Kenny, E. E. et Bustamante, C. D. (2013). RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. American Journal of Human Genetics, 93(2), 278‑288. doi:10.1016/j.ajhg.2013.06.020

Modéran, Y. (2003). Les Maures et l’Afrique romaine (IVe-VIIe siècle). Publications de l’École française de Rome. doi:10.4000/books.efr.1395

Mulazzani, S. (2013). Le Capsien De (Tunisie): Culture, Environnement Et Economie. Africa Magna Verlag.

Mulazzani, S., Belhouchet, L., Salanova, L., Aouadi, N., Dridi, Y., Eddargach, W., … Zoughlami, J. (2016). The emergence of the Neolithic in North Africa: A new model for the Eastern Maghreb. Quaternary international: the journal of the International Union for Quaternary Research, 410, 123‑143. doi:10.1016/j.quaint.2015.11.089

Naylor, P. C. (2010). North Africa: A History from Antiquity to the Present. University of Texas Press.

Newman, J. L. (1995). The Peopling of Africa: A Geographic Interpretation. Yale University Press.

Penchoen, T. (1968). La langue berbère en Tunisie et la scolarisation des enfants berbérophones. Revue Tunisienne des Sciences Sociales, pp. 173-186.

Pickles, M. J. (2006). Tooth wear. In R. M. Duckworth (Ed.), The teeth and their environment (pp. 86– 104). Basel, Switzerland: Karger.

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., … Sham, P. C. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81(3), 559‑575. doi:10.1086/519795 Rahmani, N. (2004). Technological and Cultural Change Among the Last Hunter-Gatherers of the Maghreb: The Capsian (10,000–6000 B.P.). Journal of World Prehistory, 18(1), 57‑105. doi:10.1023/B:JOWO.0000038658.50738.eb

Ruhlen, M. (1991). A Guide to the World’s Languages: Classification. Stanford University Press.

Saadaoui, A. (1996). du XVIIe au XIXe siècle: histoire architecturale d’une ville morisque de Tunisie. Faculté des lettres de la .

Saadaoui, A. (2001). Tunis, ville ottomane: trois siècles d’urbanisme et d'architecture. Centre de publication universitaire. Repéré à https://play.google.com/store/books/details?id=6fvVAAAAMAAJ

Sahnouni, M., Rosell, J., van der Made, J., Vergès, J. M., Ollé, A., Kandi, N., … Medig, M. (2013). The first evidence of cut marks and usewear traces from the Plio-Pleistocene locality of El-Kherba (Ain Hanech), Algeria: implications for early hominin subsistence activities circa 1.8 Ma. Journal of Human Evolution, 64(2), 137‑150. doi:10.1016/j.jhevol.2012.10.007

Sayahi, L. (2014). Diglossia and Language Contact: Language Variation and Change in North Africa. Cambridge University Press. van de Loosdrecht, M., Bouzouggar, A., Humphrey, L., Posth, C., Barton, N., Aximu-Petri, A., … Krause, J. (2018). Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations. Science, 360(6388), 548‑552. doi:10.1126/science.aar8380

Xue, J., Lencz, T., Darvasi, A., Pe’er, I. et Carmi, S. (2017). The time and place of European admixture in Ashkenazi Jewish history. PLoS Genetics, 13(4), e1006644. doi:10.1371/journal.pgen.1006644

Figures

Figure 1. Map showing the geographic location of the Tunisian populations under study and the reference populations panel used for comparisons.

Figure 2. Principal components analysis of wider Mediterranean and sub-Saharan African populations with individuals of the three Tunisian ethno-linguistic groups (highlighted in dark olive green) projected onto the first two PCs. Population labels as in Supplementary Table S1.

Figure 3. Local ancestry proportions inferred for the six Tunisian populations. The ancestry proportion deriving from the two populations from each geographic region has been stacked with the order reported in the plot legend. Population labels as in Supplementary Table S1.

Figure 4. Fine-grain structure and admixture of Tunisian populations (A) FineStructure cluster distribution. Clusters are colored and named after the most represented geographic area and population, respectively. (B) Globetrotter admixture dates estimations for each Tunisian cluster. The vertical line represents the point estimate while the width of the box represents the 95% confidence interval. (C) Composition and proportion of the components inferred in each cluster that contribute as a major and minor source of admixture. One and two bars indicate one date two populations and a multiway admixture event, respectively. The colored boxes on the left and right of the bars report the two surrogate clusters. Historical dates, reported in the left blank boxes, were retrieved by subtracting the year estimates from the average date of birth of our Tunisia samples (1971).