bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Patterns of African and admixture in the Afrikaner population of South 2 3 4 N Hollfelder1*, JC Erasmus2*, R Hammaren1, M Vicente1, M Jakobsson1,3,4+, JM Greeff2#+, CM 5 Schlebusch1,3,4#+ 6 7 8 1 , Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 9 SE-752 36 Uppsala, Sweden 10 11 2 Department of , University of , Pretoria, 0002, 12 13 3 Science for Life Laboratory, Uppsala University, Norbyvägen 18C, SE-752 36 Uppsala, 14 Sweden 15 16 4 Centre for Anthropological Research & Department of and Development Studies, 17 University of , Johannesburg, South Africa 18 19 20 Running Title: Admixture into the Afrikaner population of South Africa 21 * Equal contribution 22 + Equal contribution 23 24 # Correspondence: Carina Schlebusch 25 Email: [email protected] 26 27 # Correspondence: Jaco Greeff 28 Email: [email protected] 29 30 31 32 Keywords: Afrikaner, South Africa, admixture, slave trade, colonial times 33 34 35

1

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

36 ABSTRACT 37 The Afrikaner population of South Africa are the descendants of European colonists who started 38 to colonize the in the 1600’s. In the early days of the colony, mixed unions 39 between European males and non-European females gave rise to admixed children who later 40 became incorporated into either the Afrikaner or the “Coloured" populations of South Africa. 41 Ancestry, social class, culture, sex ratio and geographic structure affected admixture patterns and 42 caused different ancestry and admixture patterns in Afrikaner and Coloured populations. The 43 Afrikaner population has a predominant European composition, whereas the Coloured population 44 has more diverse ancestries. Genealogical records estimated the non-European contributions into 45 the to 5.5%-7.2%. To investigate the genetic ancestry of the Afrikaner population today 46 (11-13 generations after initial colonization) we genotyped ~5 million genome-wide markers in 77 47 Afrikaner individuals and compared their genotypes to populations across the world to determine 48 parental source populations and admixture proportions. We found that the majority of Afrikaner 49 ancestry (average 95.3%) came from European populations (specifically northwestern European 50 populations), but that almost all Afrikaners had admixture from non-Europeans. The non-European 51 admixture originated mostly from people who were brought to South Africa as slaves and, to a 52 lesser extent, from local Khoe-San groups. Furthermore, despite a potentially small founding 53 population, there is no sign of a recent bottleneck in the Afrikaner compared to other European 54 populations. Admixture among diverse groups during early colonial times might have 55 counterbalanced the effects of a founding population with a small census size. 56 57 58 59 60 61 SIGNIFICANCE STATEMENT 62 Afrikaners are a southern African primarily descended from colonial 63 (population ~2.8–3.5 million). Genome-wide studies might offer interesting insights into their 64 ancestry, not the least due to South Africa's history of segregationist laws known as “”, 65 resulting in an expectation of low levels of admixture with other groups. Originating from a small 66 founder population, their genetic diversity is also interesting. In our genome-wide study of 77 67 Afrikaners we found their majority ancestry (average 95.3%) came from Europeans, but almost all 68 Afrikaners had admixture from non-Europeans (Africans and Asians). Despite their small 69 founding population, we found no signs of decreased genetic diversity. Admixture among diverse 70 groups during colonial times might have counterbalanced effects of a small founding population. 71 72

2

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

73 INTRODUCTION 74 The seventeenth century European colonization of the southern tip of Africa resulted in the influx 75 of two groups of people, European colonists and slaves. The subsequent admixture between these 76 external groups and the local southern African Khoe-San populations resulted in two admixed 77 populations – the Afrikaner population and the “Coloured” population of South Africa (1). (In this 78 article we use the term “Coloured” following the current-day continued use of the term as self- 79 identification (2)). 80 81 While both the Afrikaner and Coloured populations have ancestry from many populations from 82 different continents, the ancestry proportions differ substantially between the groups. The 83 admixture proportions of these populations do not reflect the historical local census sizes of the 84 parental populations (supplementary text), rather, ancestry, social class, culture, sex ratio and 85 geographic structure affected admixture patterns (3-7). 86 87 The most dominant contribution to the Afrikaner population came from European immigrants ((8- 88 10) and supplementary text), whereas the Coloured population has more diverse ancestries (11- 89 20). The colonization of started in 1652 when established a 90 refreshment station for the Dutch (DEIC) at the Cape of Good Hope (Cape 91 Town today). In 1657 a few employees of the DEIC were released from their services to start 92 farming (4) (numbering 142 adults and children in 1658). They were initially called free citizens 93 and later Afrikaners (supplementary text). The DEIC continued to support the settlement of Dutch 94 nationals and other Europeans by offering free passage to the Cape and granting land for farming 95 (4). The other major sources of immigrants were 156 French that arrived in 1688 and 96 an unknown number of German laborers and soldiers that were financially marginalized (4). 97 Estimates vary but Dutch, German and French respectively contributed 34%-37%, 27%-34% and 98 13%-26% ((8-10) and supplementary text). Immigrants from Europe continued to arrive at the 99 Cape from 1658 onwards, but the very rapid population growth of the local established population 100 (almost 3% per annum; (5, 21) and supplementary text) and lower fecundity in the new immigrants 101 (5) meant that the genetic contribution of later arrivals was less ((8) and supplementary text). 102 103 While the DEIC did not encourage admixture with local populations and slaves, the strongly male- 104 biased ratio of immigrants lead to mixed-ancestry unions (3), especially between European males 105 and non-European females (however there are recorded cases of European women marrying non- 106 European men) (7). The offspring from these unions were frequently absorbed into the Afrikaner 107 population (8). As time progressed relationships between Europeans and non-Europeans became 108 more infrequent (8) and as early as 1685, commissioner van Rheede outlawed marriages between 109 Europeans and non-Europeans (marriages to admixed individuals, with some European ancestry, 110 were still allowed though) (7). In early colonial times, mixed marriages were more acceptable than 111 later on, and due to the population’s fast growth rate, early unions likely contributed exponentially 112 more to the Afrikaner population. Elphick and (3) distinguish two admixture patterns in 113 Afrikaners based on historical records – in and the surrounding area admixture was 114 predominantly between European men and female slaves or former slaves, and in the outlying 115 areas between European pastoralist frontier farmers (“trekboere”) and Khoe-San women.

3

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

116 117 Admixture with slaves (and former slaves) resulted from informal as well as formal associations 118 (3). The church recorded many marriages between Europeans and manumitted slaves (7, 8). 119 Husbands-to-be frequently had to buy their future slave wife’s freedom (7). It is unclear what the 120 input of informal relationships into the Afrikaner gene pool was, as the outcome of these 121 relationships and the population affiliation of the resulting offspring is not clear. Informal liaisons 122 resulted, for example, from the slave lodge that served as a brothel for one hour a day for passing 123 sailors and other European men (3, 22). This practice was so extensive that many children in the 124 slave lodge clearly had European fathers (3/4 in 1671; 44/92 in 1685), prompting commissioner 125 van Rheede to decree that these children should be manumitted and baptized (3). A census of the 126 slave lodge in 1693 revealed 29/61 admixed school children and 23/38 admixed children under the 127 age of three (3). Many women that married at the Cape during the early years used the toponym 128 “van de Kaap” (meaning from the Cape) which may indicate a locally born slave. European men 129 from slave-owning families also sometimes had a “voorkind” (meaning “before child”) with a 130 slave in the household before they got married to a European woman (3). These children could 131 also have been absorbed into the Afrikaner population (as opposed to becoming part of the 132 Coloured population). 133 134 To understand the characteristics of the genetic contributions that slaves made it is necessary to 135 know from where and when they came to Cape Town and see that in the light of European male 136 partner choices. Shell (1994) (23) claimed that from 1658 to 1807, roughly a quarter of the slaves 137 in the came from Africa, , South Asia and Southeast Asia each. Slave 138 trade in the Cape was stopped in 1807 and slavery as such was stopped in 1834. Worden (24, 25) 139 estimated that more slaves came from Asia, specifically South Asia, and fewer from Madagascar 140 and Africa (supplementary text). Nevertheless, we do not expect an exact reflection of these ratios 141 in Afrikaners. European men had a clear preference for Asian and locally born slaves over African 142 and Madagascan women (3). Despite only two ships, containing west African slaves, that moored 143 at the Cape in 1685 (26), we can expect the west African per capita contribution to exceed later 144 arrivals because the fast population growth rate meant earlier contributions benefitted more from 145 the exponential growth. 146 147 The “trekboere” were European farmers and their descendants followed a nomadic lifestyle in 148 harsh conditions along the frontier. Informal unions with Khoe-San women were more frequent 149 among the trekboere, but it is unclear if children from these relationships were absorbed into the 150 Coloured and/or Afrikaner community (3, 4). Poor record keeping and a reduced presence of the 151 church on the frontier meant that recorded information is incomplete for this section of the 152 population. In the Cape, formal unions between European men and Khoe-San women were very 153 unusual with only one known example (27). 154 155 By using church records, genealogists calculated the contribution of non-Europeans to be between 156 5.5% and 7.2% ((8-10) and supplementary text). These estimates may be biased because the 157 registers; a) only reflect the Christian fraction of the population, b) were less complete at the 158 frontier where admixture may have been more frequent, c) could be incorrectly pieced together

4

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

159 from church records, and d) list people of unknown heritage, such as “van de Kaap”. In addition, 160 records may be incorrect or unrecorded for children born out of wedlock. Populations that would 161 have been excluded were a substantial Muslim community amongst manumitted slaves (3), a small 162 Chinese population resulting from exiles and banned political prisoners (28, 29) and the indigenous 163 Khoe-San who were not partial to the Christian religion (3). The presence of the Coloured 164 population compounded these difficulties as genes may have exchanged between the Coloured and 165 Afrikaner populations. 166 167 In order to clarify the patterns of ancestry and admixture fractions in current-day Afrikaners we 168 compared genome-wide genotype data from 77 Afrikaners to comparative data of potential donor 169 populations and tried to pinpoint the best possible sources of the admixture and the fractions of 170 admixture from these groups. 171

172 RESULTS 173 Population structure and admixture 174 We generated filtered genotype data for 77 Afrikaner individuals (Method section) and merged the 175 Afrikaner data with comparative data to create a dataset containing 1,747 individuals from 33 176 populations and 2,182,606 SNPs. We used this merged dataset to conduct population structure 177 analysis and to infer population summary statistics. 178 179 In the population structure analysis (Figure 1) (30), Afrikaners cluster with non-Africans (K=2 to 180 K=9) and specifically Europeans (K=3 to K=9) before receiving their own cluster at K=10. From 181 K=7 onwards, northern and southern Europeans cluster separately, with Finnish forming one 182 cluster (light blue) and southern Europeans (Tuscans and Iberians) the other cluster (light yellow). 183 British (GBR) and Utah residents of northwestern European descent (CEU) appear midway 184 between the Finnish and Southern European clusters. Afrikaners seem to contain slightly more 185 northern European (blue) ancestry-component compared to CEU and GBR. The specific 186 percentage of cluster assignments of Afrikaner individuals at K=6 and K=9, and the population 187 averages assigned to each cluster, are given in Table S1 and S2. 188 189 Assuming six clusters (K=6), where the major geographical ancestries are discernible; i.e. 190 aboriginal southern African (Khoe-San), West+East African, European, East Asian, South Asian, 191 and Native American, the level of admixture from these ancestries can be distinguished in the 192 Afrikaners (Table S1 and Figure 2). In addition to European ancestry (mean of 95.3% SD 3.8% - 193 blue cluster) Afrikaners have noticeable levels of ancestry from South Asians (1.7% - orange 194 cluster), Khoe-San (1.3% - red cluster), East Asians (0.9% - purple cluster), West/East Africans 195 (0.8% - green cluster), and very low levels from Native (0.1%). The small fraction from 196 Native Americans likely stems from common ancestry between Native Americans and Europeans 197 and from European admixture into Native Americans. The total amount of non-European ancestry, 198 at the K=6 level, is 4.8% (SD 3.8%) of which 2.1% are African ancestry and 2.7% Asian/Native 199 American ancestry. The individual with most non-European admixture had 24.9% non-European 200 admixture and only a single Afrikaner individual (out of 77) had no evidence of non-European

5

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

201 admixture (Table S1). Among the 77 Afrikaners investigated, 6.5% had above 10% non-European 202 admixture, 27.3% between 5 and 10%, 59.7% between 1 and 5% and 6.5% below 1%. 203 204 The fractions of admixture from the different non-European groups in Afrikaners (at K=6) are 205 generally correlated to each other (Figure S1), except for the West/East African admixture 206 fractions. 207 208 At K=9 (before Afrikaners form their own cluster at K=10), additional inferences can be made 209 regarding, Japanese vs. Chinese ancestry, East vs. West African Ancestry and northern vs. 210 southern European ancestry (Table S2). Accordingly, it seems Afrikaners received East Asian 211 ancestry from Chinese rather than Japanese individuals and slightly more West African ancestry 212 than East African ancestry. Southern and northern European ancestry is almost equal in the 213 Afrikaners but northern European ancestry is elevated compared to CEU and GBR. 214 215 Compared to the Afrikaners, the Coloured populations have more diverse origins. At K=6 the Cape 216 Coloured population from Wellington (within the region of the original Cape colony) had the 217 following ancestry fractions 30.1% Khoe-San, 24% European, 10.5% East Asian, 19.7% South 218 Asian, 15.6% West/East African, and 0.2% Native American (Figure 1). The Coloured populations 219 whom today are living further from the original Cape colony had different admixture patterns with 220 less Asian and more Khoe-San contribution than the Cape Coloured: Colesberg Coloured (48.6% 221 Khoe-San, 20% European, 2.9% East Asian, 6.7% South Asian, 21.6% West/East African, 0.2% 222 Native American), Askham Coloured (76.9% Khoe-San, 11.1% European, 0.9% East Asian, 3.9% 223 South Asian, 7.2% West/East African, 0% Native American). 224 225 In principle component analysis (PCA) (Figure 3 and S2) the first principal component (PC1) 226 explains 3.6% of the variation in the dataset and distinguish Africans from non-Africans (right to 227 left). PC2 explains 1.9% of the variation in the dataset and distinguishes Europeans from East 228 Asians (top to bottom). The distribution of Afrikaners along PC1 and PC2 indicates both African 229 and Asian admixture. Compared to northern Europeans (CEU and GBR) Afrikaners have more 230 African and East Asian admixture. From the PCA it appears that most of the Afrikaner group have 231 non-European ancestry at comparable levels to Iberians and Tuscans (IBS and TSI), however, 232 certain Afrikaner individuals show greater levels of both African and Asian ancestry (Figure 3). 233 Finnish populations show more East Asian but lower levels of African admixture than Afrikaners. 234 235 PC3 further defines variation within Africans and distinguishes between Khoe-San groups on the 236 one extreme and groups with West African ancestry on the other. On this PC it appears that the 237 African ancestry in Afrikaners has a greater affinity to Khoe-San variation (Figure S2). PC4 238 defines Native American ancestry and shows that the Afrikaner population is probably as similar 239 to Native American populations as West European populations. PC5 delineates Asian ancestry and 240 shows that the Afrikaner population may have genetic input from East Asian populations. PC6 241 distinguishes between West and East African ancestry and the Afrikaner population seems to be 242 closer related to West African populations. PC7 differentiates between Japanese and Chinese 243 ancestry, where we can confirm the affinity of the Afrikaner to the Chinese, as has been observed

6

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

244 in the ADMIXTURE analysis. PC8 is associated with variation between different Khoe-San 245 groups, where the Afrikaner cluster closer to the central and southern San than to the northern San. 246 Further PC’s do not seem to capture group-wise information but continue to separate out 247 individuals. 248 249 To look for the most likely sources of admixture in the Afrikaners within the comparative dataset 250 we did formal tests of admixture (f3, (31)) between all pairs of comparative populations each time 251 specifying (two) potential parental sources for the Afrikaner population. Results were sorted 252 ascendingly according to Z scores (low Z scores indicate significant admixture) and Z-scores are 253 visualized in a figure to aid interpretation (Figure S3), parental populations are colored according 254 to regional population. It is clear that when only two populations are considered to be parental 255 sources the most likely sources are always a European and either a Khoe-San or a West African 256 population (combination of blue and red or blue and green labels in Figure S3). Subsequently the 257 Coloured population can also be used to model a parental population in combination with 258 Europeans (blue and grey labels). This method is however limited by the fact that only two parental 259 sources can be tested at one time and might not be the best tests when multiple parental groups 260 admixed into a population, as is the case for the Afrikaner population. 261 262 We also used projected PCA analysis (32) by constructing principal components (PCs) based on 263 certain specified populations and then projecting Afrikaners on existing PC’s to further refine 264 specific ancestry components within bigger ancestral groups (Figures S4 and S5). To distinguish 265 the specific Khoe-San group who contributed to Afrikaner variation, we constructed a PCA based 266 on the variation of Karretjie People (Southern San), Ju|’hoansi (Northern San) and CEU 267 (northwestern Europeans) and projected Afrikaners on the PCA. The projection (Figure S4A-B), 268 clearly shows that Afrikaners have more Khoe-San admixture than CEU (see shift in the first PC) 269 and furthermore, that the admixture was from a Khoe-San population is more similar to Karretjie 270 People (southern San) than Ju|’hoansi (northern San) as observed in the skew towards Karretjie. 271 This trend towards southern San rather than northern San is even made clearer when YRI is used 272 in place of CEU in the projection (Figure S4C-D). This shift of AFR in PC1 is not only due to drift 273 and inherent population differences between populations of European origin, because when GBR 274 is added to the projection, GBR and CEU overlap with each other, with AFR still clearly shifted 275 towards Khoe-San (Figure S4E-F). Other southern Khoe-San groups (≠Khomani and Nama) show 276 similar behaviors in terms of being favored compared to northern San (Figure S5). 277 278 Alleles that are shared privately between combinations of the Afrikaner population with one 279 comparative population (Figure S6) show that the Afrikaner share the most private alleles with the 280 CEU population, which makes them a better parental source than the other European populations. 281 The ≠Khomani share the most private alleles with the Afrikaner out of all Khoe-San populations, 282 which is consistent with the previous observations. The most shared private alleles of the Afrikaner 283 with Asian populations can be found in the GIH (Gujarati Indian) followed by the CHB (Chinese) 284 and JPT (Japanese). The Afrikaner share more private alleles with the Niger-Congo speaking 285 Yoruba (YRI) from Nigeria than with the two eastern Bantu-speaking groups, Luhya (LWK) from 286 Kenya and southeastern Bantu-speakers from South Africa (Figure S7). This supports an

7

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

287 admixture from West-African slaves rather than from southern African Bantu-speakers into the 288 Afrikaner population. 289 290 For finer scale resolution of the European and the Asian components in the Afrikaners, the dataset 291 analyzed above was furthermore merged, respectively, with the POPRES dataset (33, 34) and 292 various datasets containing populations from East, South and Southeast Asia. These additional 293 European and Asian comparative datasets had much lower SNP densities but they contained many 294 more comparative populations and were used for fine scale resolution of European and Asian 295 components in Afrikaners. 296 297 The Afrikaner individuals were projected on a PCA, constructed based on European variation 298 present in the POPRES dataset (Figure S8). Afrikaner individuals seem to group within western 299 European variation. They are grouped in-between the French and German clusters on the PC plot. 300 When principal components were summarized by population averages and standard deviations – 301 they seem to be grouping closest to Swiss German, Swiss French and Belgian populations. This 302 positioning could also be explained as an intermediate position between German, Dutch and 303 French variation that link to each other in a clinal pattern (33, 34). 304 305 When analyzing Afrikaners with a dataset enriched for Asian populations it appears that the largest 306 contributing Asian component is from India (Figure S9). The orange component in Figure S9 is 307 the most prominent admixed component from Asian groups and this component is specifically 308 associated with Indo-European speaking Indian groups, i.e. Khatri, Gujarati Brahmin, West Bengal 309 Brahmin, Maratha; and Dravidian speaking Iyer (35). 310 311 Genome local ancestry and selection 312 To see if non-European ancestries were enriched in any specific part of the genome, we also 313 inferred local genomic ancestries in Afrikaner individuals (Figure S10B). Specific enrichment of 314 admixed ancestries in a specific region of the genome could be indicative of a possible adaptive 315 introgression event. There are two regions of interest, one on chromosome 2 and another on 316 chromosome 21, both enriched for Khoe-San ancestry. For the region on chromosome 21 between 317 positions 10,699,687-14,537,654 Afrikaners are 100% assigned to Khoe-San ancestry (Fig S10B). 318 The three genes associated with the region, TPTE, BAGE2 and ANKRD30BP2, all have exclusive 319 expression in the testis. The TPTE gene encodes a tyrosine phosphatase which may play a role in 320 the signal transduction pathways of the endocrine or spermatogenic function of the testis (UCSC 321 genome browser). The BAGE2 gene is from the B melanoma antigen family and is not expressed 322 in normal tissues except the testis. It is however also expressed in 22% of melanomas and have 323 been associated with Melanoma skin cancer (UCSC genome browser). ANKRD30BP2 is a non- 324 coding RNA pseudogene with exclusive expression in the testes (UCSC genome browser). This 325 region on chromosome 21 span the centromere of the chromosome, which could influence SNP 326 assignment, and results should be interpreted with caution (FigS10B). The region on chromosome 327 2 shows around 10% Khoe-San ancestry and is located between positions 2,974,989 – 3,058,838 328 in the genome. This region is located close to the telomere of chromosome 2, which might 329 influence results. There is one annotated human cDNA that spans this region, namely AK095310. 8

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

330 The gene function is not known but the highest expression is in the brain, with some expression in 331 the pituitary glands (UCSC genome browser). 332 333 We also analyzed the Afrikaner data for genome-wide signals of selection by scanning for regions 334 with extended haplotype homozygosity compared to other haplotypes within the same population 335 (iHS scans) and compared to haplotypes in a comparative population (XP-EHH scans). For XP- 336 EHH scans, Afrikaners were compared to the CEU population. Figure S11 shows the genome- 337 wide Manhattan plot of selection scan results for iHS and XP-EHH. Several peaks that might 338 indicate signals of selection in the Afrikaner group were observed. The top 5 peaks in each scan 339 are listed in Table S3 together with; position information, underlying/nearby genes and potential 340 function of underlying/nearby genes. From the top 5 iHS peaks only one had a gene directly 341 associated with the peak, the gene FGF2 is a fibroblast growth factor with a variety of functions 342 and was previously associated with cholesterol levels. The XP-EHH results were clearer and three 343 out of the five top peaks were directly associated with genes; CCBE1 – a gene previously 344 associated with lymphatic disease, ACTG2 – an enteric smooth muscle actin gene previously 345 associated with intestinal diseases and SUCLG2 – encoding a succinate-CoA ligase, previously 346 associated with glucose and fat metabolism. Interestingly the Afrikaner group does not show the 347 strong adaptation signals at the lactase persistence region on chromosome 2 and MHC region on 348 chromosome 6, which is strong and well-known for the CEU group. Although the CEU group have 349 significantly more of the lactase persistence associated allele (rs4988235-T) (71% in CEU vs. 52% 350 in the AFR, p= 0.000434), their predicted lactase persistence phenotypic status, based on 351 homozygote and heterozygote counts combined, is not significantly different (91% in CEU vs. 352 83% in AFR, p= 0.126514). 353 354 Dating of admixture 355 We constructed linkage disequilibrium (LD) decay curves, to date the various admixture events in 356 the Afrikaners. Clear LD decay curves are visible, indicating admixture, when CEU and an 357 admixing group are used as parental populations and Afrikaners as admixed population (Table S4, 358 Figure S12). A date of 9-10 generations ago was obtained with African populations as the second 359 parental population and around 15-16 generations ago with Asian populations as second parental 360 population. 361 362 Estimation of bottleneck effects 363 Runs of homozygosity were calculated for each individual of the dataset. Depending on their 364 length, runs of homozygosity are informative of historic population size or recent inbreeding in 365 populations (36). While we see striking differences between continental groups (Figure 4), there 366 is no strong difference between the Afrikaner and other European populations, except for the 367 Finish population that appears to have had a smaller historic effective population size (Figure 4). 368 The average inbreeding coefficients were also calculated for all populations (Table S5) and 369 Afrikaners had a lower average inbreeding coefficient than other European populations. 370

9

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

371 DISCUSSION 372 Genealogical records suggest that Afrikaners have their main ancestry components from 373 Europeans (Dutch, German and French) and estimate the non-European contributions to the 374 Afrikaner to be between 5.5% and 7.2% ((8, 9) and supplementary text). Our genetic study that 375 included 77 Afrikaners, inferred a slightly lower non-European contribution than predicted by 376 genealogical studies. From population structure analyzes we saw that Afrikaners have their main 377 ancestry component (95.3%) from European populations. The European component is a more 378 northwestern (than southern or eastern) European component (Figure 1 and S8), which is in 379 agreement with genealogical records of most ancestry coming from Dutch and German (61-71%), 380 intermediate from French (13-26%), with much smaller fractions from other European groups ((8) 381 and supplementary text). Of note, Afrikaners group separately from populations from the UK 382 (Figure S8) despite the fact that the Cape was a British colony from 1806 onwards. This confirms 383 the relatively small contributions from to the Afrikaner population as predicted by 384 genealogical records (8). 385 386 The non-European fraction in Afrikaners was estimated to 4.7% on average (Table S1). More of 387 the non-European admixture fraction appeared to have come from people who were brought to the 388 Cape as slaves (3.4%) during colonial times than from local Khoe- (1.3%). Indeed 389 historical records of the early Cape Colony record more instances of unions between European 390 men and slaves or former slaves than to local Khoe-San women (3). Only one example of a Khoe- 391 San - European union in the Cape colony is known. A local woman from the 392 Goringhaicona group, Eva (or ) van de Kaap was an interpreter and ambassador between 393 the colonists and Khoekhoe people and married Pieter Van Meerhof in 1664 (27, 37). Since unions 394 between Khoe-San women and the frontier farmers were thought to be more frequent, it may 395 account for the 1.3% observed admixture in the Afrikaner population. The 1.3% observed Khoe- 396 San ancestry calculates to 26.6 Khoe-San women out of 2048 ancestors 11 generations ago. 397 However, we know that one Afrikaner had for example only 299 ancestors in colonial times (10) 398 because many Afrikaner ancestors enter pedigrees multiple times (8, 10, 38). These 26.6 Khoe- 399 San women that contributed to the average Afrikaner should thus not be seen as 26 separate women 400 (i.e. the same woman could have contributed many times). The Khoe-San admixture component 401 is the most ubiquitous non-European admixture component and only 6 out of the 77 Afrikaners 402 had no Khoe-San ancestry (Table S1). The Khoe-San ancestry in Afrikaners appears to have come 403 from Southern Khoe-San groups and not Northern Khoe-San (Figure S5 and S6). This is not 404 surprising since in pre-historic and historic times, the Southern Khoe-San inhabited current-day 405 South Africa while the Northern San lived in northern and southern (12, 16, 39- 406 41). 407 408 South and East Asians contribute cumulatively to an average of 2.6% of the Afrikaner ancestry 409 (53.2 out of 2048 ancestors 11 generations ago). Elpick and Shell (1989) (3) noted that European 410 men more often mixed with Asian and locally born slaves than African and Madagascan women. 411 Although many other additional factors might have played a role in the resultant current-day 412 Afrikaner admixture fractions, the genetic admixture fractions of South and East Asians was higher 413 in current-day Afrikaners than Khoe-San fractions (1.3%) and West/East African fractions (0.8%),

10

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

414 and slightly higher than the combined African fraction (2.1%). South Asian contributions outweigh 415 East Asian contributions (p-value of <0.00001, paired Wilcox test) (Table S2). The South Asian 416 contribution seems to have come predominantly from Indian populations (Figure S9). 417 418 West/East Africans contributed an average of 0.8% of the Afrikaner ancestry (Table S1) (16.3 out 419 of 2048 ancestors 11 generations ago). Shell (23) estimated that about 63,000 slaves arrived in the 420 Cape colony between 1658-1807 and a quarter came from West/East coastal Africa (26.4%, east 421 coast and only 2.5% from ). Only two ships brought West African slaves to the Cape 422 in 1685 (26). When one takes into account that only 2.5% of African slaves came from West 423 Africa, it is fascinating that just over half of this signal is from West Africans rather than East 424 Africans (Table S2). This discrepancy could possibly be explained by West Africans arriving four 425 generations earlier than East Africans (see Supplementary text). More frequent admixture during 426 early years, and fast population growth could have caused the genetic footprint of West Africans 427 to exceed that of East Africans. While admixture fractions between East Asians, South Asians and 428 Khoe-San correlate well with each other in Afrikaner individuals (Figure S1), West/East African 429 fractions do not correlate significantly with South Asian and East Asian fractions and a high 430 number of Afrikaners had no West/East African admixture (26/77). These patterns could possibly 431 be explained by the fact that, there were relatively few West African slaves at the Cape, that the 432 arrival of West African slaves was contained in a very limited time period, and that East African 433 slaves arrived later in time. The West African fraction in the Afrikaners is presumed to come 434 mostly from these West African slaves and not from southern African Bantu-speakers. This is 435 confirmed by shared allele analysis (Figure S7) where Afrikaners share more alleles with the West 436 African Yoruba from Nigeria than with local South African Bantu-speakers (southeast Bantu- 437 speakers). Although current-day South African Bantu-speakers trace the majority of their ancestry 438 (80%) to West Africa (12, 16, 40, 42) (Figure 1), there were no Bantu-speakers present in this part 439 of southern Africa during colonial times (22). Bantu-speakers originated from West Africa and 440 started to expand to the rest of sub-Saharan Africa around 4,000 years ago, reaching the northern 441 parts of current-day South Africa around 1,800 years ago. During the 1600s the edge of the Bantu- 442 expansion (specifically Xhosa speakers) reached as far south as the Fish River in the 443 Province of current day South Africa (approximately 1000 km from Cape Town). Therefore, 444 Bantu-speakers were only encountered at a much later stage during the frontier wars in 1779-1851 445 by expanding Afrikaner frontier farmers (22). Although initial interactions between the frontier 446 farmers and Bantu-speaking Xhosa populations were peaceful with trade between groups, relations 447 quickly turned aggressive and conflicts erupted. It is unlikely that much gene flow occurred 448 between South African Bantu-speakers and Afrikaners, since during this relatively late time 449 period, the Afrikaner population identity had already culminated, which resulted in social hinders 450 to relationships between groups. 451 452 Genome local ancestry analysis revealed two possible adaptive introgression signals in the 453 Afrikaners, on chromosome 2 and 21. In both cases Khoe-San ancestry was enriched for these 454 genomic regions. The chromosome 21 region was 100% assigned to Khoe-San ancestry and the 455 three genes in this region are exclusively expressed in the testes and might indicate an adaptation 456 linked to reproduction or male fertility in the Afrikaners. Since the implicated regions are close to

11

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

457 centromeres or telomeres results should be interpreted with caution. Scans for selection in the 458 Afrikaners implicated several genes associated with diet, i.e. intestinal function, lipid and glucose 459 metabolism, possibly indicating adaptation to modified or novel food sources. 460 461 The estimated times of admixture differs between African and Asian populations (Table S4) and 462 was estimated for around 9-10 generations for West Africans and Khoe-San and around 15-16 463 generations for Asians. In the Afrikaner population the average generation time for men was 32.92 464 years whereas for women it was 27.36 (43), suggesting that the Afrikaner population is on average 465 between 11 and 13 generations old. Since many admixing events occurred during early colonial 466 times, genetic dating of admixture times is expected to fall in this range as well. The admixture 467 times for Asian populations appeared older though. Some possible explanations for this could be 468 that some of the slaves may already have been admixed when they arrived so that a generation or 469 two could be added; previous admixture into Europeans might play a role; there could have been 470 more recent admixture events with Africans in Afrikaners that could contribute to more recent 471 dates; the differences in effective population sizes between Africans and non-Africans might 472 influence estimates. 473 474 It is interesting to observe that Afrikaners do not present a signal of a population bottleneck 475 compared to European groups even though they had a very small founding population (Figure 4 476 and Table S5). This could be explained by the fact that even though the initial founding population 477 of the Afrikaners was small, they were from diverse origins and many of the initial unions resulted 478 in admixed children who were incorporated in the resultant population. For example, one Afrikaner 479 individual (JMG - (10)) had 125 common ancestors, but these were so distant from each other that 480 his inbreeding coefficient is only 0.0019. Until recently, most humans were sedentary and 481 populations were small so that inbreeding due to distant relations was not unusual. However, a 482 number of founder effects for specific diseases have been identified in Afrikaners ((44, 45) and 483 supplementary text). These founder effects however need not imply inbreeding but rather suggest 484 a sampling effect, i.e. some disease alleles were present in original founders and were amplified 485 through exponential population growth. 486 487

488 CONCLUSION 489 Although Afrikaners have the majority of their ancestry from northwestern Europe, non-European 490 admixture signals are ubiquitous in the Afrikaner population. Interesting patterns and similarities 491 could be seen between genealogical predictions and genetic inferences. The non-European 492 admixture fractions resulting from people that were brought to southern Africa as slaves and local 493 Khoe-San populations might have been one of the factors that helped to counteract the adverse 494 effect of a small founding population size and inbreeding, since Afrikaners today have comparable 495 inbreeding levels to current-day European populations. 496 497

12

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

498 MATERIALS AND METHODS 499 500 Sample collection and genotyping 501 502 Ethical clearance for this study was obtained from the ethics committee from the Natural and 503 Agricultural Science of the EC11912-065. All subjects were given a study 504 information sheet and gave their written informed consent for the collection of samples and 505 subsequent analysis. 506 507 The 77 individuals included in this study form part of parallel studies on non-paternity (43, 46) 508 and on the mitochondrial DNA heritage of self-identified Afrikaners. Fifty three samples came 509 from 17 groups of men bearing the same surnames (an average of 3.12 individuals per family with 510 the same surname). Males with the same surname are separated by an average of 15.8 generations 511 along their patrilines. Twenty four samples are from unrelated patrilines, either having unique 512 surnames or stemming from different founders. 513 514 Samples were collected with Oragene® DNA Saliva collection kits (DNA Genotek, Kanata, 515 ) and whole genome DNA was extracted according to the manufacturer’s instructions. Final 516 concentrations were adjusted to 50 ng/µl. Genotyping was performed by the SNP&SEQ 517 Technology Platform in Uppsala, Sweden (www.genotyping.se) using the Human Omni 5M SNP- 518 array. Results were analyzed using the software GenomeStudio 2011.1, and the data were exported 519 to Plink format and aligned to hg19. 520 521 Genotype filtering and merging with Comparative data 522 SNP data quality filtering and merging to comparative data was done with PLINK v1.90b3 (47). 523 A 10% genotype missingness threshold was applied and the HWE rejection confidence level was 524 set to 0.001. SNPs with a chromosome position of 0, indels, duplicate-, mitochondrial- and sex 525 chromosome SNPs were removed. All individuals passed a missingness threshold of 15% and a 526 pairwise IBS threshold of 0.25 (for identification of potential relatives). 527 528 The resultant dataset of 4,154,029 SNPs and 77 individuals were phased using fastPHASE (48), 529 with 25 haplotype clusters, 25 runs of the EM-algorithm, and 10% assumed missingness. 530 Subsequently the data was merged with data from (16), containing 2,286,795 quality filtered 531 autosomal SNPs typed in 117 southern African Khoe-San and Bantu-speakers. Before merging the 532 datasets, AT and CG SNPs were removed from the datasets. During the merge the strands of 533 mismatching SNPs were flipped once and remaining mismatches were removed, only the 534 overlapping positions between the datasets were kept. 535 536 To get a more extensive set of African and non-African comparative data, we furthermore 537 downloaded SNP data from the 1000 Genomes Project website, at 538 ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20120131_omni_genotypes_and_intensiti 539 es (49). The 1000 genomes genotype data were quality filtered using the same thresholds as used 540 in our datasets (described above). The following populations were included from the 1000 541 genomes dataset: YRI and LWK (Yoruba and Luhya - West African ancestry), MKK (Maasai -

13

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

542 East African), ACB and ASW (African-Americans in the Caribbeans and southwest USA), TSI, 543 IBS, CEU, GBR, and FIN (Tuscans, Iberians, northwest European ancestry, British, Finnish – 544 European), JPT, GIH, CHB, CHS, CDX, KHV (Japanese, Indian ancestry, Han Chinese Bejing, 545 Han Chinese South, Dai Chinese, Vietnamese – Asian), PEL, PUR, MXL, CLM (Peruvians, Puerto 546 Ricans, Mexican ancestry, Colombians – Native American (admixed)). All populations were 547 randomly down-sampled to 80 individuals. This merged dataset included a total of 2,182,606 high 548 quality SNPs in 1,747 individuals from 33 populations. 549 550 For finer scale resolution of the European component and the Asian component in Afrikaners, this 551 dataset was furthermore merged with 1) the POPRES dataset (33, 34) and 2) various datasets 552 containing populations from east, south and southeast Asia (35, 50-56). These European and Asian 553 comparative datasets were quality filtered and phased with the same thresholds and parameters as 554 used in the previous datasets. Although these datasets had much lower SNP densities (149,365 555 SNPs for the European and 313,790 SNPs for the Asian dataset) they contained many more 556 comparative populations (37 European comparative populations for the European dataset and 63 557 Asian comparative populations for the Asian dataset). 558 559 Population structure analysis 560 Population genetic analysis was conducted for the main merged dataset, containing 1747 561 individuals from 33 populations and 2,182,606 SNPs. We inferred admixture fractions (30) in 562 order to investigate genomic relationships among individuals based on the SNP genotypes. Default 563 settings and a random seed were used. Between 2 and 10 clusters (K) were tested. A total of 100 564 iterations of ADMIXTURE were run for each value of K and the iterations were analyzed using 565 CLUMPP (57) for each K to identify common modes among replicates. Pairs of replicates yielding 566 a symmetric coefficient G’>=0.9 were considered to belong to common modes. The most frequent 567 common modes were selected and visualized with DISTRUCT (58). For the Asian extended 568 dataset similar settings were used as described above, however clustering was done for K=2 to 569 K=15 due to the higher number of populations in the dataset. 570 571 PCA was performed with EIGENSOFT (32, 59) with the following parameters: r2 threshold of 572 0.1, population size limit of 80, and 10 iterations of outlier removal. Projected PCA analysis was 573 done using EIGENSOFT by constructing Principal Components (PCs) based on certain specified 574 populations and then projecting Afrikaners on existing PC’s. 575 576 Formal f3 tests of admixture (31) were done between all pairs of comparative populations 577 specifying (two) potential parental sources of the Afrikaner population. We estimated the 578 admixture time based on linkage disequilibrium (LD) decay due to admixture (ROLLOFF - (31)), 579 default parameters were used. The standard error was estimated with a jackknife procedure. 580 Generations were converted to years using 29 years per generation. Shared private alleles were 581 inferred using ADZE (60) for all pairwise population combinations of populations with at least 15 582 individuals. Runs of homozygosity were calculated using PLINK (47) using the following 583 parameters (--homozyg --homozyg-window-kb 5000 --homozyg-window-het 1 --homozyg-

14

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

584 window-threshold 0.05 --homozyg-kb 500 --homozyg-snp 25 --homozyg-density 50 --homozyg- 585 gap 100). Inbreeding coefficients were calculated using PLINK (--het). 586 587 Local ancestry analysis 588 We inferred genome local ancestry for the Afrikaner individuals using RFMix version 1.5.4 (61). 589 The following populations were used as putative sources: CEU, CDX, YRI and Khoe-San groups 590 (combined !Xun, ≠Khomani, Karretjie and Ju|huansi). RFMix was run with the following settings: 591 RFMix_v1.5.4/RunRFMix.py --forward-backward -e 2 infilename. Other settings were left as 592 default. The results were visualized using the ancestry pipeline by Alicia Martin available at 593 https://github.com/armartin/ancestry_pipeline/blob/master/plot_karyogram.py and a Python 594 plotting script making use of the plotting library Bokeh. The first 2 Mbp from each end of the 595 chromosomes were filtered to avoid telomere regions. 596 597 To scan for signals of genome-wide selection in the Afrikaner group, integrated haplotype scores 598 (iHS) and the cross population extended haplotype homozygosity (XP-EHH) were analyzed using 599 R package REHH (62). The ancestral state was identified by its presence in the chimpanzee, 600 gorilla, orangutan and human genomes (downloaded from UCSC). Based on this requirement, we 601 performed selection analyzes on 1,759,008 SNPs. iHS and XP-EHH were calculated with 602 maximum distance between two SNPs of 200,000 bp. For the XP-EHH we compared the 603 Afrikaners (AFR) haplotype homozygosity with Northwest European ancestry individuals (CEU). 604 605 Data availability 606 The anonymized genome-wide SNP data of 75 of the 77 Afrikaner individuals who consented to 607 have their data shared electronically, will be made available for research use through the 608 ArrayExpress database (https://www.ebi.ac.uk/arrayexpress), access number XX. 609 610 611 ACKNOWLEDGEMENTS 612 We thank Karen Harris, Michele Ramsay, Cesar Fortes-Lima and Maximillian Larena for their 613 useful comments and Nigel Worden for invaluable information. We thank all the participants in 614 the study. The POPRES data were obtained from dbGaP (accession no. phs000145.v1.p1). This 615 work is based upon research supported by the National Research Foundation of South Africa (grant 616 77256 to JMG) the Genomics Research Institute of the University of Pretoria (to JMG) and by the 617 Swedish Research Council (no. 621-2014-5211 to CS and 642-2013-8019 to MJ) and Knut and 618 Alice Wallenberg foundation (to MJ). JCE was supported by an NRF scarce-skills PhD 619 scholarship, a University of Pretoria study abroad bursary and bursary allocations from JMG.

620 621 REFERENCES 622 1. Elphick R & Giliomee H (1989) The Shaping of South African , 1652-1840 623 (Maskew Miller Longman (Pty) Ltd, Cape Town) 2nd Ed. 624 2. Wikipedia (2017) “”. Retrieved, from https://en.wikipedia.org/wiki/Coloureds, 625 December 12, 2017.

15

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

626 3. Elphick R & Shell R (1989) Intergroup relations: Khoikhoi,settlers, slaves ans free 627 blacks, 1652-1795. The Shaping of South African Society, 1652-1840, eds Elphick R & 628 Giliomee H (Maskew Miller Longman (Pty) Ltd, Cape Town), 2nd Ed, pp 184-239. 629 4. Guelke L (1989) Freehold farmers and frontier settlers, 1657-1780. The Shaping of South 630 African Society, 1652-1840, eds Elphick R & Giliomee H (Maskew Miller Longman 631 (Pty) Ltd, Cape Town), 2nd Ed, pp 66-108. 632 5. Ross R (1993) The “white” population of the Cape colony in the eighteenth century. 633 Beyond the Pale, ed Ross R (Wesleyan University Press), pp 125-137. 634 6. Terreblanche S (2002) A history of inequality in South Africa, 1652-2002 (University of 635 Natal Press, ). 636 7. Heese HF (2005) Groep Sonder Grense (Protea Boekhuis, Pretoria ). 637 8. Heese JA (1971) Die Herkoms van die Afrikaner, 1657–1867 (A.A. Balkema, Kaapstad). 638 9. de Bruyn GFC (1976) Die samestelling van die Afrikanervolk. [The composition of the 639 Afrikaner population]. Tydskrif vir Geesteswetenskap 15:39–42. 640 10. Greeff JM (2007) Deconstructing Jaco: genetic heritage of an Afrikaner. Ann Hum Genet 641 71(Pt 5):674-688. 642 11. de Wit E, et al. (2010) Genome-wide analysis of the structure of the South African 643 Coloured Population in the . Human Genetics 128(2):145-153. 644 12. Schlebusch CM (2010) PhD Thesis: Genetic variation in -speaking populations 645 from southern Africa. PhD (University of the Witwatersrand, Johannesburg). 646 13. Naidoo T, et al. (2010) Development of a single base extension method to resolve Y 647 chromosome haplogroups in sub-Saharan African populations. Investig Genet 1(1):6. 648 14. Schlebusch CM, de Jongh M, & Soodyall H (2011) Different contributions of ancient 649 mitochondrial and Y-chromosomal lineages in 'Karretjie people' of the Great in 650 South Africa. J Hum Genet 56(9):623-630. 651 15. Schlebusch CM & Soodyall H (2012) Extensive population structure in San, Khoe and 652 Mixed Ancestry populations from southern Africa revealed by 44 short 5-SNP 653 haplotypes. Human Biology Dec(Dec). 654 16. Schlebusch CM, et al. (2012) Genomic Variation in Seven Khoe-San Groups Reveals 655 Adaptation and Complex African History. Science 338(6105):374-379. 656 17. Schlebusch CM, Lombard M, & Soodyall H (2013) MtDNA control region variation 657 affirms diversity and deep sub-structure in populations from Southern Africa. BMC Evol 658 Biol 13(1):56. 659 18. Quintana-Murci L, et al. (2010) Strong maternal Khoisan contribution to the South 660 African coloured population: a case of gender-biased admixture. Am J Hum Genet 661 86(4):611-620. 662 19. Patterson N, et al. (2010) Genetic structure of a unique admixed population: implications 663 for medical research. Hum Mol Genet 19(3):411-419. 664 20. Petersen DC, et al. (2013) Complex patterns of genomic admixture within southern 665 Africa. PLoS Genet 9(3):e1003309. 666 21. Gouws NB (1981) Bevolkingstendense van Blankes in Suid-Afrika voor 1820. (Randse 667 Afrikaanse Universiteit, Johannesburg). 668 22. Giliomee H & Mbenga B (2007) New (Tafelberg, Cape Town). 669 23. Shell RCH (1994) Children of bondage: A social history of the slave society at the Cape 670 of Good Hope, 1652-1838 (Wesleyan University Press published by University Press of 671 New England).

16

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

672 24. Worden N (2016) slaves in Cape Town, 1695 – 1807. Journal of South 673 African Studies 42(3):389-408. 674 25. Worden N (2016) The changing nature of Indian Ocean slave trading to the Cape colony, 675 1652 – 1807. Slave Trade in the Indian Ocean and Indonesian Archipelago Worlds (16th 676 to 19th Century). New Research, Results and Comparisons International Workshop (10- 677 11 Nov 2016). 678 26. Armstrong JC & Worden NA (1989) The slaves, 1657-1834. The Shaping of South 679 African Society, 1652-1840, eds Elphick R & Giliomee H (Maskew Miller Longman 680 (Pty) Ltd, Cape Town), 2nd Ed, pp 109-183. 681 27. Elphick R & Malherbe VC (1989) The Khoisan to 1828. The Shaping of South African 682 Society, 1652-1840, eds Elphick R & Giliomee H (Maskew Miller Longman (Pty) Ltd, 683 Cape Town), 2nd Ed, pp 3-65. 684 28. Harris KL (2009) The Chinese in the early Cape colony: a significant cultural minority. 685 Suid Afrikaanse Tydskrif vir Kultuurgeskiedenis 23(2):1-18. 686 29. Armstrong JC (2012) The Chinese exiles. Cape Town between East and West: Social 687 Identities in a Dutch Colonial Town, ed Worden NA (Jacana Media (Pty) Ltd, 688 Johannesburg), pp 101-127. 689 30. Alexander DH, Novembre J, & Lange K (2009) Fast model-based estimation of ancestry 690 in unrelated individuals. Genome Res 19(9):1655-1664. 691 31. Patterson N, et al. (2012) Ancient admixture in human history. Genetics 192(3):1065- 692 1093. 693 32. Patterson N, Price AL, & Reich D (2006) Population structure and eigenanalysis. PLoS 694 Genet 2(12):e190. 695 33. Nelson MR, et al. (2008) The Population Reference Sample, POPRES: a resource for 696 population, disease, and pharmacological genetics research. Am J Hum Genet 83(3):347- 697 358. 698 34. Novembre J, et al. (2008) Genes mirror geography within Europe. Nature 456(7218):98- 699 101. 700 35. Basu A, Sarkar-Roy N, & Majumder PP (2016) Genomic reconstruction of the history of 701 extant populations of India reveals five distinct ancestral components and a complex 702 structure. Proc Natl Acad Sci U S A 113(6):1594-1599. 703 36. Kirin M, et al. (2010) Genomic runs of homozygosity record population history and 704 consanguinity. PLoS One 5(11):e13996. 705 37. Wikipedia (2017) “Krotoa”. Retrieved from https://en.wikipedia.org/wiki/Krotoa, 706 December 04, 2017. 707 38. Philpott DM (2012) Analysis of the Heritage and Genetic Diversity of Influential 708 Afrikaners. (University of Pretoria, Pretoria). 709 39. Schlebusch CM, Prins F, Lombard M, Jakobsson M, & Soodyall H (2016) The 710 disappearing San of southeastern Africa and their genetic affinities. Hum Genet 711 135(12):1365-1373. 712 40. Schlebusch CM & Jakobsson M (2018) Tales of Human Migration, Admixture, and 713 Selection in Africa. Annu Rev Genomics Hum Genet. 714 41. Schlebusch CM, et al. (2017) Southern African ancient genomes estimate modern human 715 divergence to 350,000 to 260,000 years ago. Science 358(6363):652-655. 716 42. Li S, Schlebusch CM, & Jakobsson M (2012) On the migration process during the Bantu- 717 expansion of peoples from West Africa. PhD (Uppsala University, Uppsala).

17

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

718 43. Greeff JM & Erasmus JC (2015) Three hundred years of low non-paternity in a human 719 population. Heredity (Edinb) 115(5):396-404. 720 44. Botha MC & Beighton P (1983) Inherited disorders in the Afrikaner population of 721 southern Africa. Part II. Skeletal, dermal and haematological conditions; the Afrikaners 722 of Gamkaskloof; demographic considerations. S Afr Med J 64(17):664-667. 723 45. Botha MC & Beighton P (1983) Inherited disorders in the Afrikaner population of 724 southern Africa. Part I. Historical and demographic background, cardiovascular, 725 neurological, metabolic and intestinal conditions. S Afr Med J 64(16):609-612. 726 46. Greeff JM & Erasmus JC (2013) Appel Botha Cornelitz: The abc of a three hundred year 727 old divorce case. Forensic Sci Int Genet 7(5):550-554. 728 47. Chang CC, et al. (2015) Second-generation PLINK: rising to the challenge of larger and 729 richer datasets. Gigascience 4:7. 730 48. Scheet P & Stephens M (2006) A fast and flexible statistical model for large-scale 731 population genotype data: applications to inferring missing genotypes and haplotypic 732 phase. Am J Hum Genet 78(4):629-644. 733 49. Auton A, et al. (2015) A global reference for human genetic variation. Nature 734 526(7571):68-74. 735 50. Brucato N, et al. (2016) Malagasy Genetic Ancestry Comes from an Historical Malay 736 Trading Post in Southeast Borneo. Mol Biol Evol 33(9):2396-2400. 737 51. Cox MP, et al. (2016) Small Traditional Human Communities Sustain Genomic Diversity 738 over Microgeographic Scales despite Linguistic Isolation. Mol Biol Evol 33(9):2273- 739 2284. 740 52. Kusuma P, et al. (2017) The last sea of the Indonesian archipelago: genomic 741 origins and dispersal. Eur J Hum Genet 25(8):1004-1010. 742 53. Kusuma P, et al. (2016) Contrasting Linguistic and Genetic Origins of the Asian Source 743 Populations of Malagasy. Sci Rep 6:26066. 744 54. Morseburg A, et al. (2016) Multi-layered population structure in Island Southeast Asians. 745 Eur J Hum Genet 24(11):1605-1611. 746 55. Mallick S, et al. (2016) The Simons Genome Diversity Project: 300 genomes from 142 747 diverse populations. Nature 538(7624):201-206. 748 56. Pierron D, et al. (2014) Genome-wide evidence of Austronesian-Bantu admixture and 749 cultural reversion in a hunter-gatherer group of Madagascar. Proc Natl Acad Sci U S A 750 111(3):936-941. 751 57. Jakobsson M & Rosenberg NA (2007) CLUMPP: a cluster matching and permutation 752 program for dealing with label switching and multimodality in analysis of population 753 structure. Bioinformatics 23(14):1801-1806. 754 58. Rosenberg NA (2002) Distruct: a program for the graphical display of structure results. 755 59. Price AL, et al. (2006) Principal components analysis corrects for stratification in 756 genome-wide association studies. Nat Genet 38(8):904-909. 757 60. Szpiech ZA, Jakobsson M, & Rosenberg NA (2008) ADZE: a rarefaction approach for 758 counting alleles private to combinations of populations. Bioinformatics 24(21):2498– 759 2504. 760 61. Maples BK, Gravel S, Kenny EE, & Bustamante CD (2013) RFMix: a discriminative 761 modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet 762 93(2):278-288.

18

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

763 62. Gautier M, Klassmann A, & Vitalis R (2016) rehh 2.0: a reimplementation of the R 764 package rehh to detect positive selection from haplotype structure. Mol Ecol Resour 765 17(1):78-90. 766 767

768 Figure legends: 769 Figure 1: Admixture analysis of Afrikaners and comparative data. The number of allowed 770 clusters are shown on the left, the statistical support on the right. Abbreviations: YRI-Yoruba 771 from Nigeria, ACB and ASW-African American, LWK-Luhya from Kenya, MKK-Maasai from 772 Kenya, AFR-Afrikaner from South Africa, TSI-Tuscan from Italy, IBS-Iberian from Spain, 773 GBR-British from , CEU-Northwest European ancestry from Utah, FIN-Finnish, 774 KHV-Vietnamese, CDX-Dai from China, CHS-Han from southern China, CHB-Han from 775 Beijing, JPT-Japanese, MXL-Mexican, PUR-Puerto Rican, CLM-Colombian, PEL-Peruvian. 776 777 Figure 2: Admixture proportions of the Afrikaner at K=6. (A) Magnification of the Afrikaner 778 population in the ADMIXTURE analyzes. (B) Non-European admixture fraction in the 779 Afrikaner, sorted by total non-European admixture fraction. Dotted lines indicate the mean (top 780 line) and median (bottom line) (C) Individual non-European admixture fractions sorted by total 781 non-European admixture fraction (gray line). 782 783 Figure 3: Principal component analyzes showing PC1 and PC2. (A) Full Figure, (B) shows a 784 zoom-in on the Afrikaner (dark blue) population. 785 786 Figure 4: (A) Runs of Homozygosity. Populations are colored according to regional group. (B) 787 Magnification of runs of homozygosity for European populations and the Afrikaner.

788 789

19

bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Khoe-San West/East African European South Asian East Asian Native American ancestry ancestry ancestry ancestry ancestry ancestry

K=2 100%

K=3 96%

K=4 54%

K=5 86%

K=6 98%

K=7 36%

K=8 40%

K=9 66%

K=10 36%

YRI TSI IBS FIN Xun ACB GBR CEU GIH KHV CDX CHS CHB JPT MXL PUR CLM PEL KhweNama ASW LWK MKK Karretjie Juhoansi Khomani SEBantuSWBantu AFR GuiGhanaKgal ColouredAskham ColouredColesbergColouredWellington bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B Native American South− Asian East Asian Khoe−San West/East African European Admixture fraction Admixture Non − European 0.00 0.05 0.10 0.15 0.20

Afrikaner Individuals (sorted by total non−European admixture fraction) WK TSI IBS MKK AFR GBR CEU C South-Asian Khoe−San East−Asian West/East African

● 0.10 0.15 0.10 0.15 0.10 0.15 0.10 0.15 ●

● ● Admixture Fraction Admixture Admixture Fraction Admixture Admixture Fraction Admixture Admixture Fraction Admixture ● 0.05 0.05 0.05 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●●● ● ● ● ● ●●● ●● ●● ●●● ● ●● ●● ●●●● ●● ●●●●●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●●●● ●●● ●●●● 0.00 0.00 0.00 0.00

0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

Individual Individual Individual Individual bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B Ju/hoansi !Xun Khwe /Gui and //Gana Nama − 0.010 Khomani Karretjie People Coloured (Askham) Coloured (Colesberg) Coloured (Wellington)

− 0.015 SE Bantu−speakers SW Bantu−speakers Yoruba (YRI) African Caribbean (ACB) African American (ASW)

− 0.020 Luhya (LWK) Maasai (MKK) PC2: 1.918% PC2: 1.918% Tuscan (TSI) Iberian (IBS) Great Britain (GBR) Northwestern European (CEU) − 0.025 Finnish (FIN)

− 0.02 0.00 0.02 0.04 Gujarati Indian (GIH) Vietnamese (KHV) Dai Chinese (CDX) Southern Han Chinese (CHS)

− 0.030 Beijing Han Chinese (CHB) −0.02 −0.01 0.00 0.01 0.02 0.03 0.04 0.05 −0.016 −0.014 −0.012 −0.010 −0.008 −0.006 Japanese (JPT) Mexicans (MXL) PC1: 3.586% PC1: 3.586% Puerto Rican (PUR) Colombians (CLM) Peruvians (PEL) AfikanerAfrikaners (AFR) (AFR) bioRxiv preprint doi: https://doi.org/10.1101/542761; this version posted February 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B

Juhoansi ColouredWellington TSI CHB AFR IBS CEU Xun SEBantu IBS JPT TSI GBR FIN Khwe SWBantu GBR MXL GuiGhanaKgal YRI CEU PUR 150 Nama ACB FIN CLM Khomani ASW GIH PEL Karretjie LWK KHV ColouredAskham MKK CDX 80 100 ColouredColesberg AFR CHS 60 40 50 100 mean total RoH length (Mb) mean total RoH length (Mb) 20 0 0

0.5−1 1−2 2−4 4−8 8−16 >16 0.5−1 1−2 2−4 4−8 8−16 >16 RoH length category (Mb) RoH length category (Mb)