Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking Populations of Africa (OMLL – 01-JA27: 01-B07/01-S08/01-V01)

The Case of

Lolke J. Van der Veen (Project Leader), Jaume Bertranpetit (Principal Investigator), David Comas, Lluis Quintana-Murci, Mark Stoneking (Principal Investigator)

Introduction This interdisciplinary EUROCORES OMLL project started at the beginning of 2002 as an extended version of a similar project on and genes. This latter officially commenced, under Lolke Van der Veen’s supervision, in July 2000 as part of the “Origine de l’Homme, du Langage et des Langues” programme launched and funded by the French CNRS1. The present Collaborative Research Project (CRP) consists of one main project submitted by the research laboratory “Dynamique du Langage” (UMR 5596, Lyon, ) and two joint projects submitted respectively by research teams from Germany and Spain (see below). This detailed outline of the project wants to highlight the project’s goals and specific features, and summarises the state of the art.

The LCGB project: objectives and approach The ultimate goal of the “LCGB” project is to elaborate a solidly based multidisciplinary theory of the origin and expansion of Bantu and the Bantu-speaking populations, on the basis of correlations between linguistic, biological and anthropological markers. Do historical linguistics, population genetics, cultural anthropology and archaeology all tell the same story? Combining linguistics and population genetics still is a rather controversial issue, but applying this enlarged multidisciplinary approach to the study of Bantu expansion may positively contribute to the debate and offers promising perspectives for a new synthesis in this field of investigation. In the first time, existing solid linguistically based phylogenetic classifications are to be compared with biologically based classifications, which are in the process of being constructed. Therefore, at the start of the project, very well-defined linguistically based diachronic inferences were submitted to geneticists having accepted to collaborate with our team of linguists, for close examination. The OMLL programme has allowed the development of collaborations with geneticists from France (i.e. a team directed by Dr. Lluis Quintana- Murci, Institut Pasteur, , CNRS URA 1961), from Spain (a team directed by Prof. Dr. Jaume Bertranpetit and Dr. David Comas, Unitat de Biologia Evoltiva, Facultat de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, Barcelona) and from Germany (a team directed by Prof. Dr. Mark Stoneking, Max Planck Institute, Leipzig), but also from Gabon (Dr. Lucas Sica, Centre International de Recherches Médicales Franceville) and the USA (Dr. Sarah Tishkoff, University of Maryland). The French team —directed by Prof. Dr. Lolke Van der Veen—, as well as the Spanish and German teams have been granted funding by their respective national funding agencies. Without this innovative interdisciplinary research

1 The main lines of this OHLL project were presented at the 32nd Annual Conference on African Languages held in Berkeley, March 2001. Van der Veen & Hombert (forthcoming).

1 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa programme, it would have been impossible to develop such a variety of extensive collaborations and to create networking activities between thematically related projects. The combination of these different sources of information we are working with, will most certainly reveal both similarities and discrepancies between the disciplinary approaches. Both cases (presence or absence of correlations) will of course be equally interesting. The results of this study will subsequently be compared with the results obtained so far (or still to be obtained) in other fields of research such as Cultural Anthropology, Archaeology and History. In a wider perspective, the outcome of the “LCGB” project will allow a comparison with the results of similar OMLL projects on Berber (project leader: Prof. J.-M. Dugoujon), Eurasian (project leader: Dr. A. Sajantila), Himalayan (project leader: Dr. P. de Knijff) and Island Melanesia (project leader: Prof. S. C. Levinson), also presented in the present volume.

Main specific features of LCGB Over the last few decades, population genetic studies of the African continent have especially tried to establish parallels between genetic and linguistic classifications at a fairly general and thus (historically) deep level. These studies have mainly focussed on the four major linguistic phyla of this continent (i.e. NIGER-CONGO, NILO-SAHARAN, AFRO-ASIATIC and KHOI-SAN2). Several of these studies have made strong claims about the African prehistory on the basis of specific genetic markers. Both traditional and molecular markers reveal correlations between these major linguistic phyla and genetic data (cf. Cavalli-Sforza & al. 1994; Excoffier & al. 1987 and 1991; Hammer 1994; Soodyall & al. 1993 and 1996; Poloni & al. 1997; Watson & al. 1996 and 1997; Melton & al. 1997; Stoneking & al. 1997, Scozzari & al. 1994 and 1999; Spedini 1999; Salas & al. 2002, etc.). At the start of the “LCGB” project, Bantu had mainly been studied as a whole. NIGER- CONGO populations in general and Bantu populations in particular showed to have the lowest level of internal genetic diversity. This is of course compatible with the hypothesis of a “recent” (i.e. 5 000 BP) and rapid (although gradual) expansion of these populations (Excoffier & al. 1987, and 1991 (traditional markers); Poloni & al. 1997 (Y-chromosome haplotypes), Scozzari & al. 1999 (biallelic and microsatellite Y-chromosome poly- morphisms), Spedini & al. 1999 (suggesting an expansion through minor migration movements). However, virtually nothing was known about the internal relationships between Bantu-speaking populations. Criteria used for sampling individuals were most of the time imprecise. All available studies suffered from a lack of accuracy in linguistic labelling (undifferentiated lumping, errors related to narrow linguistic classification, etc.) as well as from a lack of representativity (e.g. central African Bantu is still very badly underrepresented nowadays, cf. Salas & al. 20023). Carefully checked ethnolinguistic data were rarely taken into account and geneticists usually clung on to linguistic classifications that were more or less outdated (e.g. Greenberg’s 1963 classification). There clearly was a desperate need for a more accurate and more rigorous approach, and for a close collaboration between geneticists and linguists. The “LCGB” project aims at developing such an approach in particular for the study of phenomena with a lesser time depth, by working out rigorous criteria for sampling and analysis, by submitting up-to-date linguistic classifications and hypotheses4, and by

2 For a detailed up-to-date presentation of these phyla and the languages they comprise, see Williamson & Blench (2000). 3 For this study, data from only three central African Bantu languages were available, and only two Pygmy languages. 4 Considerable progress has been made in the field of African linguistics over the last three decades. These advances cannot be ignored.

2 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa drawing benefit from a large-scale multidisciplinary approach including also archaeologists, cultural anthropologists and historians (see also below). It should be underlined here that the Bantu languages (a group of about 600 languages and language varieties, being part of the much larger NIGER-CONGO phylum, and covering about one third of the African continent), along with the SINO-TIBETAN languages, are among the best-studied languages of the world, just after the INDO-EUROPEAN languages. Important aspects of the proto-language (i.e. “Proto-Bantu”, a language variety or spoken most probably about 5 000 BP in the borderland between present-day Nigeria and ) have been reconstructed (Meeussen 1967 and 1969; Guthrie 1967-71) and solid historical inferences have been made concerning the origin and expansion of Bantu. Although some more or less competing theories exist as far as the role of the forest area concerns in the expansion process (gradual penetration into the forest —following waterways and/or mountain ridges— or northern “migration” bypassing the forest5 or both6), and also concerning some secondary convergence and/or expansion zones, a general consensus exists among scholars about the main lines of the expansion as well as several much more local and detailed phenomena. (See map 1 below, for an up-to-date overview.) The Bantu expansion probably coincided with the end of the Neolithic Age and was at least at some stage related to the diffusion of agriculture and iron metallurgy (Phillipson 1985, De Maret 1982, Nsuka- Nkutsi 1989, Clist 1995, Oslisly 1996, Holden 2002, Diamond & Bellwood 2003). Environmental, demographic, social and economic factors must have played an important role in this gradual and wave-like expansion. For the present project, three geographically strategic zones were chosen in order to test the linguistically based hypotheses: the Gabon area (i.e. the enlarged “Ogooué-Ivindo- Ngounié”7 area), the Kenya-Tanzania area and the Angola-Namibia area. These three areas are known to have played an important role in the Bantu expansion and have also undergone, to a certain extent, convergence phenomena (linguistic homoplasy). The present chapter mainly concerns the first area for reasons that will become explicit hereafter.

Fieldwork —a crucial aspect of this kind of research— is presently ongoing. Sample collection (7 ml of blood per individual) started in 2002 in the Gabon area, for two reasons: firstly, extensive linguistic fieldwork has been conducted over the last 20 years by our (DDL) research team and consequently our knowledge of the languages of this area as well as of the area itself and of its inhabitants is solidly founded at present; secondly, on the basis of detailed linguistic studies, we now know that the central part of Gabon is a place where different “migration” paths have met and led to convergence phenomena (linguistic homoplasy). Several languages show clear signs of admixture. Some of the ethnic groups speaking these “mixed” languages have been retained for sampling and testing, e.g. the Viya and Makina (Shiwa) groups (see below). Studies in comparative linguistics based on phonological, morphological and lexical markers, have allowed us to elaborate a fairly accurate picture of groupings as well as the main expansion patterns. (See maps 2 & 3 below.) The project also benefits from already existing tight scientific collaborations with several Gabonese institutions such as the “Laboratoire Universitaire de la Tradition Orale” (LUTO) and the “Laboratoire d’Anthropologie” (LABAN) of the Omar Bongo University of .

5 Cf. Coupez & al. (1975) and also Phillipson (1985). 6 Scholars nowadays more and more agree about the role played by the rain forest in the expansion process(es). Cf. Nurse & Philippson (2003:164-167); also Vansina (1990 and 1995). 7 The three major waterways of the Gabon area.

3 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Although French is the and exerts increasing pressure on the local languages, some 50 Bantu language varieties are still spoken in Gabon nowadays. These dialects can be grouped into a dozen clusters (A30 (BUBE-BENGA), A70 (YAUNDE-FANG), A80 (MAKAA-NTEM), B10 (MYENE), B20 (KELE), B30 (TSOGO), B40 (SHIRA), B50 (NJABI), B60 (MBAAMA), B70 (TEKE) and H10 (KONGO)8) and belong to three different, though adjacent, geolinguistic zones (A, B and H). The Bantu all belong to what has been recently called “Forest Bantu”, which is a part of “Western Bantu” opposed to “Westcentral Bantu” and “Eastern Bantu” (cf. Nurse & Philippson 2003, Bastin & al. 1999). For Bastin & al. (1999) zone A languages together with B10-B20-B30 constitute what they call “Northwestern Bantu” opposed to the rest of the western Bantu languages. This implies the presence of a relatively important linguistic “barrier” within Gabon (see map 2). Apart from the Bakao Pygmies who live in the surroundings of Minvoul and speak an Ubangian (i.e. non-Bantu) language, all other Pygmy groups have adopted one or several of the languages spoken by the Bantu populations in whose vicinity they live. Their lifestyle has become semi- nomadic in recent days. On the basis of shared phonological, morphological, morphosyntactical and lexical innovations, one can establish at least 4 higher-level Bantu clusters: B10+B30 (MYENE- TSOGO), B40+H10 (SHIRA-Vili (the latter being part of KONGO)), B50+B60+B70 (NJABI- MBAAMA-TEKE), and the dialect cluster A75 (Fang). For a detailed discussion of the status and complex history of the MYENE-TSOGO group, see Mouguiama-Daouda & Van der Veen (forthcoming). This cluster is probably an ancient branch of Western Bantu, but it also presents signs of homoplasy. This homoplasy is probably due to a prolonged period of contact that occurred some time after the initial separation between the two groups. The cluster (and B30 in particular) furthermore shows some (ancient) affinities with languages belonging to zone A. The B20 group clearly appears as a floating group in most recent linguistic classifications (cf. Bastin & al. 1999). It may cluster with the MYENE-TSOGO group, but also with other groups.

One can presently assume with a fair degree of certainty that several of the linguistic groups of Gabon must have arrived from more northern regions (the MYENE-TSOGO group and much later in history, the Fang A75), others from regions located to the east (the NJABI- MBAAMA-TEKE group) and still others from regions located to the south (the SHIRA-Vili group) (cf. Map 3 below, for major expansion patterns and relative chronology). The well- defined hypotheses submitted to the geneticists are founded on these data and have determined the choice of the populations to be retained for sampling.

In addition to the multidisciplinary character of our project and the well-established working hypotheses it is based on, one other important, and rather unique feature of our project to be stressed here, is the rigorous criteria used for sampling. The choice of the populations to be sampled is based on very precise linguistic criteria and is not made randomly. Only male individuals are sampled (in order to obtain information from both the mitochondrial DNA and the Y-chromosome variation!), 35 years of age or older in order to reduce the influence of (recent) admixture, and at least 50 individuals per population; and this for at least 2 populations of each ethnolinguistic cluster, if possible. Both parents of the sampled individuals should belong to the same ethnic group. Furthermore, the declared (ethnic) origin of the individuals is systematically verified by means of an ethnolinguistic and

8 Malcolm Guthrie’s reference system (Guthrie 1948). Guthrie divided the Bantu territory up into 15 geolinguistic zones (A, B, C, etc.). Tens refer groups and units to specific languages.

4 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa anthropological questionnaire conceived to collect details concerning place of birth, lineage(s), ethnic origin of both parents, grandparents, etc., and language(s) spoken over at least four generations (see photography, last page of this paper). A team of Gabonese anthropologists (LUTO / LABAN) will enlarge our research by an extensive study of the possible impact of kinship systems, mating patterns, matrilineal or patrilineal descent, patrilocality, polygamy, the role of local slavery in history, as well as the role of social stratification, cultural and political organisation, economy, trade and technology within the different ethnic groups. Blood sampling is performed only with the individual’s consent, exclusively in collaboration with the local Health Services, and of course in agreement with the research permits granted by the Gabonese government. Blood samples are never taken from blood banks. This would of course be much simpler from a practical point of view, but would make the verification of linguistic and ethnic origin extremely difficult, if not impossible.

Procedures for data collection Sample and ethnolinguistic data collection is conducted in the field, not in Europe, in close collaboration with Gabonese researchers working in the fields of linguistics, population genetics and cultural anthropology. This work is performed by several teams, simultaneously in different places of the country, all using the same procedure, during several periods of the year. This requires necessarily a very well planned coordination and the use of efficient communication systems. Each team is composed of linguists and cultural anthropologists. On arrival in the villages, local authorities (both civil and moral) are informed. With their consent an information campaign is organised in the presence of the elders of the village. Individuals are sampled in accordance with the rigorous criteria described in the previous section. During blood sampling sessions — usually organised in health centres—, consenting individuals fill out a form and several questionnaires, a small quantity of blood is taken (7 ml). Each individual receives in return a set of essential pharmaceutical products as well as some refreshment. The blood samples are maintained at a temperature of +4° C and sent by car then by plane within a delay of 48 hours, to the “Centre International de Recherches Médicales Franceville” (CIRMF) in Franceville, in the southeast of Gabon, where DNA extraction is performed. Blood samples can absolutely not be exported from Gabon as such. A small quantity of blood is being used by the CIRMF (Dr. Lucas Sica) for the study of drepanocytosis, a blood disease affecting a considerable number of Gabonese individuals. Purified DNA samples (of excellent quality) are sent on a regular basis by the CIRMF to the Institut Pasteur in Paris for preservation, sequencing, typing and analysis. The analysis of the mitochondrial DNA (mtDNA) and the Y-chromosome variation is being performed in European laboratories (see hereafter). Ethnolinguistic anthropological data are processed in Lyon and recorded in a large database for further analysis.

5 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

State of the art The Gabon area

Extensive fieldwork in the Gabon area has allowed to collect over 950 blood samples (representing almost exclusively male individuals, most of them at least 35 years of age) in 14 different places spread all over the country9. The last of a total of four missions10 in the field was organised in February 2004. This implies that we now have access to genetic data of excellent quality from 21 out of the 50 Gabonese populations11, whereas the initial objective was to sample only 6 populations12!

Molecular genetics has shown over the last decades that our genome reflects, not only the processes intrinsic to its molecular nature, such as recombination and mutation, but also demographic processes that have modelled its composition, such as migrations, admixture, population expansions and founder effects. This is of course crucial for the kind of research we intend to perform. The molecular part of the present project will focus on the analysis of two genomic regions, the mitochondrial DNA (mtDNA) and the Y-chromosome, that present particular properties suitable to capture the footprints of demographic effects in our genome. MtDNA analysis is in process at the “Institut Pasteur” in Paris under the supervision of Dr. Lluis Quintana-Murci, and the Y-chromosome analysis (joint project), which started in Barcelona in November 2003, is directed by Prof. Jaume Bertranpetit (Principal Investigator) and conducted by Dr. David Comas (Executive Researcher). These uniparental-inherited genetic markers will be used to reconstruct the population history of several populations in Central Africa and, in a broader context, of the African continent. Given that the mtDNA and the Y- chromosome do not experience recombination, a clear phylogeny of both molecules has been established. Moreover, the lineages within these phylogenies are not distributed randomly across human populations: the clear geographic structure of the uniparentally-inherited lineages allows us to reconstruct the phylogeography of the individuals analysed.

Y-chromosome analysis (team directed by Jaume Bertranpetit) The Y-chromosome determines the masculinity in our species due to the presence of a single gene, SRY (sex-determining gene). The zygotes with an X-chromosome and a Y-chromosome yield a male embryo, whereas the zygotes with two X-chromosomes form a female one. Therefore, males inherit their Y-chromosome from their fathers who received them from their paternal grandfathers, and so on. The lack of recombination along the major part of the Y- chromosome, allows us to trace back the history of the chromosome.

9 These places are Booué, Cap Estérias, Fougamou, Franceville, Lambaréné, Libreville, La Lopé, Lastoursville, Malibé, Minvoul, Mouila, Ntoum, Port-Gentil and Sindara. 10 Field missions were conducted in 2002 (one mission), 2003 (two missions) and 2004 (one mission). 11 As indicates above, the target was a minimum of 50 male individuals per population. For most of the 21 ethnic groups this target has been achieved. 12 List of populations for which both linguistic and genetic data is available (with the number of individuals sampled): Benga (53), Fang (70), Makina-Shiwa (50), Bekwil (5), Galwa (51), Rungu (42), Kele (50), Kota (59), Mbangwe (6), Shake (52), Tsogo (66), Viya (38), Kande (8), Shira (53), Punu (52), Nzebi (Njabi) (63), Duma (49), Obamba (Mbaama) (54), Ndumu (44), Teke (56), Bakao Pygmies (39). The latter speak an Ubangian language (i.e. non-Bantu). (Populations for which the target has been achieved completely, have been underlined.)

6 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

To-date, a large number of genetic markers have been analysed to define the Y- chromosome diversity within humans. There are some polymorphic insertions in the chromosome, such as some Alu elements; and an important number of repetitions in tandem, such as the STRs (short tandem repeats). Nonetheless, the most promising markers in the last years are SNPs (single nucleotide polymorphisms) that consist of a base substitution, in which one base is exchanged for another. The present Y-chromosome study of the expansion of Bantu languages and populations will be focused on a total 18 STRs and around 50 SNPs that will be typed in the same sample set used for the mtDNA. The information yielded by these markers will allow us to classify the samples into haplogroups within the Y-chromosome phylogeny. The mutation rates of both classes of markers are substantially different: STRs have a mutation rate per locus and generation around 10-3-10-4, whereas SNPs have a slower mutation rate of around 10-8. This difference has profound implications for the Y-chromosome analysis. Due to their slower mutation rate, SNPs are unique events that define the major lineages of the Y-chromosome. On the other hand, the STRs are highly variable and allow us to define the diversity found within the major lineages defined by the SNPs. Thus, the combination of both kinds of markers will provide a fine resolution of the Y-chromosome diversity. The analysis of the STRs is performed by the amplification of the regions where the markers lie and a subsequent determination of the number of repeats that each STR is composed of by a capilar electrophoresis. The determination of the 18 STRs is performed in three multiplex reactions, therefore, each multiplex resolves the genotypes of six STRs. The genotyping of the SNPs is performed by a single reaction, a SNPlex reaction, in which the allelic state of around 50 SNPs is determined. After the genotyping of both kinds of markers, the Y-chromosomes for each individual will be classified into haplogroups, groups of lineages that have a common origin and are defined by their SNP composition. After haplogroup classification, the diversity of each haplogroup will be assessed by the results given by the STRs. Since the phylogeography of the Y-chromosome is available, we will be able to trace back the origin of the Y-lineages found and compare the results to those found for the mtDNA. This will allow us to reconstruct demographic events in our population set.

MtDNA analysis (team directed by Lluis Quintana-Murci) Towards the end of the 1980s, the publication of the first paper on the implications of the study of mtDNA variation for understanding the origin of modern humans (Cann & al. 1987) opened a new era for the study of human genetic variation and evolution. MtDNA polymorphisms being the first DNA markers utilised for evolutionary purposes, the last decade has been characterized by a large number of studies on mtDNA variation, which have offered new perspectives on the understanding of the origin of modern humans. Due to the strict maternal inheritance of this molecule, the geographic distribution of neutral mtDNA mutations reflects the prehistory of women, in an analogous manner to the Y-chromosome. Since mtDNA mutations accumulate sequentially along radiating female lineages, individual mtDNA lineages diverged as women migrated into the different parts of the world. To define mtDNA variation in our population set, we perform two parallel approaches: sequence of the HVS-I region and genotyping of a number of coding-region SNPs in order to properly identify haplogroups. The two sets of data present differences in mutations rates; SNPs in the coding-region are more stable and rarely present recurrent mutations. Conversely, HVS-I sequence variation is often affected by recurrent mutation (homoplasy), since this region presents a higher mutation. Thus, coding-region SNPs define the main lineages of the phylogenetic tree and HVS-I sequence variation is very useful to assess internal variation of

7 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa the haplogroups (defined by the coding-region mutations). The internal haplogroup variation can be translated into expansion times of the main lineages. All samples are sequenced for HVS-I D-loop from the nucleotide position (np) 16023 to np 16368. As to the coding-region, all the samples are typed using Victor technology, based on fluorescence polarisation, for the SNP T3594C, which separates haplogroups L1-L2 from L3, for T10810C, defining haplogroup L1, and for the RFLP +16390HaeIII, defining haplogroup L2. In addition, we have used a hierarchical approach to subdivide L2 and L3 samples into their internal derivative lineages. All L2 samples have been typed for +13803/+13958 HaeII (defining L2a), +4157AluI (L2b), -13803/-13958HaeII (L2c) and –3693MboI (L2d). L3 samples have been typed for +10086TaqI (L3b), -8618MboI (L3d) and +2352 (L3e). The samples not belonging to these three sub-haplogroups internal to L3, have been assigned to L3f, L3g or L3* according to the HVS-I motif as in Salas & al. (2002). Samples assigned to L3 that were not belonging to any of the African specific branches of L3, were also tested for +10871MnlI (N) and +10397AluI (M).

Statistical analyses Methods of phylogeny reconstruction, such as median joining networks, will be employed to establish and refine the topology of the phylogeny of both molecules, and time-depths of the main characterising lineages will be estimated. The apportionment of the genetic variation between and within populations will be estimated by means of the Analysis of the Molecular Variance (AMOVA). Moreover, a spatial analysis of the molecular variance (SAMOVA) will also be performed by presetting different numbers of population groups. This approach defines groups of populations that are geographically homogeneous and maximizes the proportion of total genetic variance due to differences between groups. In order to establish the genetic relationships between the populations analysed, an analysis of principal components based on haplogroup frequencies will be performed. Some other descriptive statistical indexes, such as the Tajima’s D and Fu’s Fs neutrality tests, will be performed, when relevant, to establish the demographic past (constant versus stationary size) of the different populations.

Sex-biased population history? As stated above, mtDNA and Y-chromosome are unique for their lack of recombination at meiosis and are inherited through only one sex. Thus, the study of both these two molecules in a given population will give a complete panorama of their maternal and paternal genetic history. However, there is a number of examples where both sets of data give distinct information and this has been interpreted as being due to the different female/male social habits along history, such as polygamy, patrilocal exogamy, etc. The historical fact that women, at marriage, relocate(d) in most cases to join the male partner makes them consequently more genetically mobile than men. Women in their movements have contributed to masking the original differences between populations and, consequently, they have “homogenized” their gene pool. On the other hand, Y-chromosome variants turn out to be more geographically clustered than mtDNA as a consequence of less male historical mobility. Since men turn out to be less mobile, their gene pool has contributed to keeping these differences that are still recognizable through Y-chromosome studies. This suggested that women’s movements have been very important during the past leading to the conclusion that females have had a much higher migration rate than males (Seielstad & al. 1998). Nevertheless, although patrilocality must explain the differences in the patterns between the Y and mtDNA, it operates on a local scale and is not thought to be responsible for the continental and world pattern of diversity. Thus, some details are still not clear and will certainly be object of further studies. In this context, a careful comparison in our population

8 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa dataset of mtDNA and Y-chromosome variation will help to examine the relative contribution of males and females in shaping the patterns of diversity of central Africa and, more generally, it will shed some light onto the proportion and extent to which males and females were involved into the Bantu expansions from (western) central Africa southwards and eastwards.

Preliminary results The results of the study of mtDNA variation obtained thus far allow to start dealing seriously with issues such as the origin of the prevailing haplotypes and the role of these genetic lineages in the Bantu expansion, the confirmation or invalidation of the linguistically based groupings13 —i.e. a rather solid “northern” cluster (B10+B30 —MYENE+TSOGO— with probably at least a part of B20 —KELE) and two “southern” clusters14 (B40+part of H10 and B50-B60-B70) which certainly are related to each other from a comparative point of view, but more loosely than B10 and B30—, the impact of genetic admixture (convergence), the status of several specific endangered minority groups (e.g. the Viya and the Makina/Shiwa populations) in the process of being assimilated by surrounding populations, and also the origin of the Fang population. The Viya for instance, speak a language that basically belongs to the TSOGO (B30) group but its lexicon and phonology have undergone a fairly strong influence from a language (B41) spoken by the neighbouring Shira population. As mentioned before, the TSOGO group in its turn clusters with the MYENE group (B10), on the basis of a variety of structural traits. Whether this clustering (also claimed by local oral tradition) is due to shared heritage or to convergence by prolonged contact is still a matter of debate, but see Mouguiama-Daouda & Van der Veen (forthcoming) for new perspectives. The preliminary results of the analysis of mtDNA and Y-chromosome variation seem to confirm Viya’s affiliation with TSOGO, as well as the MYENE+TSOGO cluster. The Makina people speak Shiwa (A83), but their language and cultural tradition is nowadays yielding to the fast growing influence of the Fang. The latter, whose language (A75) belongs to the A70 (i.e. YAUNDE-FANG) cluster, claim an Egyptian (or Sudanic) origin. This rather surprising origin is claimed by a considerable number of members of this community since the publication of Rev. Trilles’ theory on this issue at the beginning of the 20th century). The first results of the mtDNA analysis support the position of those who, on the basis of linguistic and more general methodological considerations, reject this theory. However, Y-chromosome data as well as a more detailed study based on cautious interpretation is needed to clarify in a more substantial way the migration/expansion history of this population15. Whatever the final outcome of this issue may be, from a linguistic point of view, there is absolutely no doubt that Fang is a Bantu language. This dialect cluster clearly attests the defining characteristics of the YAUNDE-FANG group. Last but certainly not least, this ongoing study also sheds important new light on the history of slave trade and the origin of African-Americans. An accurate account of the results of the mtDNA analysis of these individuals and populations performed by Dr. Lluis Quintana-Murci and his collaborators is forthcoming. The preliminary results of the Y-chromosome analysis based on the same samples were presented

13 Taking into account the most up-to-date linguistic studies (published or unpublished). 14 The “northern” vs. “southern” distinction is more based on the presumed historical origin of the language groups than on their present geographic distribution. 15 It is a historical fact that groups of Fang reached the Estuary of Gabon during the second half of the 19th century AD (1874).

9 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa and discussed by Dr. David Comas during an interdisciplinary workshop held in November 2004 at the Institut Pasteur in Paris. Both types of analyses are ongoing.

The other regions

So far, no fieldwork has been undertaken in Tanzania for administrative reasons (difficulty of obtaining research permits), funding problems (see below) and because of important overlapping between the objectives of the present project and fieldwork already carried out by other researchers from the US not directly engaged in the project. The original objective was to obtain samples for genetic analysis from the Mbugu and neighbouring groups in Tanzania, in order to test hypotheses about the origins of the Mbugu (see hereafter, “Some problematic issues”. It would of course not be scientifically nor ethically sensible to resample the groups we are interested in. In order to compensate this absence of onsite research in this other important region, a special two-day « East African Workshop » was organised in May 2003, in which participated several geneticists (i.a. Dr. Sarah Tishkoff, University of Maryland; Dr. Lluis Quintana- Murci, Institut Pasteur; Dr. Richard Cordaux, Max Planck Institute, Leipzig) and nine linguists (most of them specialists of the languages spoken in this area, i.a. Gérard Philippson (DDL, Lyon), Derek Nurse (Memorial University of Newfoundland), Maarten Mous (University of Leiden) and Gerrit Dimmendaal (University of Cologne)). The aim of this workshop was to define new research areas for fieldwork on the basis of specific linguistic criteria and, especially, to develop close scientific collaborations between linguists and geneticists having already been working in this area. Extensive blood sampling has already been carried out by Dr. Sarah Tishkoff and her team, but much remains to be done, especially the interpretation of the results. Linguistic hypotheses will prove essential. Some of these concern mixed languages (cf. joint project submitted by Maarten Mous and Mark Stoneking, Max Planck Institute, Leipzig. See also hereafter, « Some problematic issues ».) But given the present situation, a new strategy will have to be chosen. Instead of investing energy in fieldwork (and previously in obtaining research permits), new workshops will be organised as soon as Dr. Sarah Tishkoff’s database will become available on the Web (i.e. towards the end of 2004). Linguists, geneticists, archaeologists, historians and cultural anthropologists will be able to discuss and compare the results from their respective fields, and commence preparing a multidisciplinary publication.

Extending our project to the third area (Namibia-Angola) has not been possible either, for political reasons and lack of local contacts, although specialists of the languages spoken there are ready to collaborate. Recently, however, genetic data from Angola has become available (Plaza & al. forthcoming), which can be integrated into the present study. This extension will be extremely important for a fine-grained examination of the western Bantu expansion. Plaza’s study —which is a contribution of the Spanish team together with other geneticists of renown— clearly demonstrates that genetic flow took place between southeastern (data mainly from Mozambique) and southwestern Bantu populations, and thus corroborates the linguistic classifications (eastern linguistic traits in southwestern languages). (Cf. Soodyall & al. 1996.) The eastern and western Bantu expansion routes were clearly not independent. This study, which is based on the analysis of the mtDNA variation of 44 Bantu-speaking individuals from Angola (analysis of the two hypervariable segments —HVSI and HVSII— of the mtDNA control region of the 9-bp deletion in the COII/tRNALys intergenic region), also shows that there are no traces of Khoisan lineages in the extant Bantu-speaking population

10 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa from Angola, which suggest that the Bantu expansion provoke the migration southwards of the Khoisan-speaking people without substantial admixture. Since the phylogeny of the Y-chromosome lineages has been well established and there is some data from other African samples, it will be possible to determine migration processes, admixture with neighbouring populations and differential modes of dispersal between men and women by comparison to the results yielded by the mtDNA analysis performed in the same individuals. All these analyses will contribute to the reconstruction of the population history of Bantu-speaking populations in its Western expansion route.

Some problematic issues Unfortunately, the joint project concerning the Ma’a/Mbugu language community of Tanzania (a mixed language presumably due to Bantu-speaking women mixing with Cushitic-speaking men) has not been retained for funding by the Dutch funding agency, in spite of its quality. However, this highly interesting hypothesis has since been maintained within the main project for testing, and will consequently be financed as much as possible from other available funds. Maarten Mous —a specialist of Nilotic as well as of Eastern Bantu— will closely collaborate with Dr. Sarah Tishkoff for the study of this specific issue. For some time there was uncertainty about funding for the joint projects submitted by Spain and Germany. It now has been clearly established that as far as both participating countries are concerned funding has been granted. The Spanish project (i.e. the Y- chromosome analysis of the Gabonese data) started in November 2003. The German project, which initially concerned the Tanzania area, will perhaps be focussing on the Bantu populations of Zambia. This possible extension is presently being examined.

Short overview of presentations, publications & other realisations This section provides a concise overview of the main presentations and publications produced thus far, as well as of the other activities developed since the beginning of the OHLL and OMLL programmes.

Reports, presentations and publications As part of the usual evaluation procedure, written scientific reports have been submitted on an annual basis to the French CNRS, with regard to both the French OHLL programme and the European OMLL programme (2001, 2002, 2003, 2004). Oral presentations with slide supports were solicited in order to summarise the results obtained and to enable exchanges between related projects. All the reports produced to-date may be consulted freely. Ensuing the first field mission, the anthropologists of the LUTO/LABAN (Omar Bongo University, Libreville) submitted a preliminary written report to the linguists and geneticists, based on the genealogical data obtained during this mission, with details about lineages covering a time span of four generations, as a preparation of an extensive study of local mating patterns. Constructive comments were made in order to render further fieldwork still more efficient. Two important (oral) presentations should be mentioned here. During the first “East African Workshop” (May, 2003), Sarah Tishkoff (University of Maryland, USA) made a detailed presentation of the state of the art in the population genetics of Tanzania. This main presentation was followed by a series of presentations on several related linguistic topics by Derek Nurse (Memorial University of Newfoundland, Canada), Maarten Mous (University of Leiden, the Netherlands), Gerrit Dimmendaal (Cologne University, Germany) and Gérard Philippson (DDL, Lyon, France). These presentations aimed at defining new domains of investigation for East Africa. The other important presentation was part of a popularisation

11 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa effort: Christophe Coupé and Lolke Van der Veen presented a paper entitled “Quand les langues rencontrent les gènes : une histoire des populations du Gabon” (June 25th, 2004) at the monthly seminar of the “Institut des Sciences de l’Homme” (ISH, Lyon, France). Francesca Luca of the Institut Pasteur (Paris) presented a detailed poster at the 2nd DNA Polymorphisms in Human Population International Symposium held in Paris in December 2003. Her poster, prepared in close collaboration with Lluis Quintana-Murci, was entitled “MtDNA variation in Central Africa: a microevolutionary study in Bantu-speaking populations from Gabon”. In collaboration with other geneticists, the Barcelona team prepared a highly interesting publication on mtDNA variation in Bantu-speaking populations of Angola offering new insights into the western Bantu expansion (Plaza & al. forthcoming). Their contribution has been submitted to Human Genetics. Two major contributions were made by Lolke Van der Veen to the description and classification of the Bantu languages of Gabon, i.e. a bilingual dictionary of the endangered Viya language (Van der Veen 2002) and a chapter on the B30 language group published in The Bantu Languages. This important reference book was published by Gérard Philippson and Derek Nurse in 2003 (London, Routledge). Both contributions were initiated before the start of the OMLL programme, but significantly improve the understanding of the language groups of Gabon. Finally Patrick Mouguiama-Daouda (DDL, CNRS) and Lolke Van der Veen wrote a chapter entitled “B10-B30: conglomérat phylogénétique ou produit d’une hybridation ?” for a Festschrift dedicated to two outstanding specialists of Bantu, viz. Claire Grégoire and Yvonne Bastin (MRAC, Tervuren, Belgium). This chapter is an important contribution to the understanding of the rather complex relationships between these two linguistic groups and their history.

Interdisciplinary workshops In addition to the above mentioned “First East African Workshop” held in May 2003, several other interdisciplinary workshops have been organised. During a workshop held October 31st 2003, Lluis Quintana-Murci gave a detailed presentation of the first official results concerning the analysis of the mtDNA, based on 308 DNA samples, collected from seven different ethnic groups of Gabon. Another workshop, on the state of the art of anthropological research in Gabon, was organised March 26th 2004. Specialists as Louis Perrois, Philippe Laburthe-Tolra and Raymond Mayer participated in this important meeting. Its objective was to initiate the identification of cultural markers that may be significant for the classification of the Bantu (and non Bantu) populations of the area. The next workshop, in which participated Richard Oslisly (WCS, Libreville, Gabon), Bernard Clist, Raymond Lanfranchi and Bernard Peyrot, was organised April 23rd 2004, on the state of the art of archaeological research in Gabon. The objective of this meeting was to start examining possible parallels between the linguistic inferences about Bantu history and the findings of archaeology in the Gabon area. Finally, linguists and geneticists met in November 2004, in Paris, in order to present the ongoing genetic analyses and discuss the new results. The first results regarding the Y- chromosome variation were presented by David Comas. Some interesting parallels have been detected with the results of the mtDNA analysis, but also a certain number of striking differences. However, these observations are too preliminary to be discussed here.

12 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Databases An ethnolinguistic database integrating all data collected during the four field missions is nearing completion. It comprises information about the sampled individuals: the individual’s lineage, the lineages of his parents, the individual’s date and place of birth, language(s) spoken by the individual and by his parents, etc. Meanwhile, the anthropologists have started to produce an up-to-date account of the available multifactorial ethnological data for the sampled Gabonese populations. A first draft of this paper, providing specific information for a dozen of ethnic groups, has been submitted to the French geneticists in March 2004, and a much more detailed version is being prepared.

Additional linguistic fieldwork More linguistic fieldwork was carried out during the summer of 2004 by Dr.Patrick Mouguiama-Daouda (DDL, associated researcher) and Pascale Paulin (DDL, graduate student) in order to complete our linguistic database. Patrick Mouguiama-Daouda checked and collected lexical data from B20, B30 and B60 languages, and Pascale Paulin studied the Baka (pygmy) language and culture in the extreme north of Gabon (viz. the Minvoul area).

Ongoing research and perspectives We shall conclude this detailed outline with a short overview of the activities in progress or to be developed in the near future.

Further research on languages and ongoing analyses in genetics The DNA samples as well as the ethnolinguistic data collected during the last research mission in the Gabon area (February 2004, see above) are currently being processed. The aim of this (temporary) final mission was to obtain additional blood samples from 400 (male) individuals in order to enlarge sample sizes, and also to allow for finer anthropological and genetic analyses taking as significant level the lineages rather than the ethnic groups (the concept of ethnic group being rather vague and therefore much more difficult to define). This objective has been achieved but due to haemolysis some samples have been lost. In spite of increasing difficulties in exporting DNA samples from Gabon, we very recently (i.e. September and November 2004) managed to transfer the remaining samples of purified DNA to the Institut Pasteur in France. Since all samples from Gabon are presently in Paris (a total of some 900 samples), deep diversity and phylogeographic analyses of central African mtDNA can be undertaken and the results of these studies will be integrated in the global African context in order to determine genetic barriers. The relative dating of the branches will also have to be examined. These analyses will continue during the second half of 2004 in order to refine the results previously obtained. The most interesting preliminary results of this ongoing analysis were presented at the OMLL Conference in Leipzig April 4-6th 2004. Y-chromosome analysis is also ongoing. The remaining samples (see above) will be analysed during the second half of 2004 and the beginning of 2005. A first account of the progress made in this analysis was presented by the Spanish team at the OMLL Conference in Leipzig, April 4-6th 2004. Some results have been presented during a workshop organised in November 2004 (see above). Other genetic markers may be explored subsequently. The study of autosomal markers is one of the options that are currently being examined.

13 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Interpretation of the results of the genetic analyses Only a very careful, critical and fine-grained comparison of the results of the genetic analyses with linguistic, anthropological, archaeological, and available historical data can serve as a basis for the interpretation of the recently acquired data. Most obviously, this stage of the project, initiated in March 2004, is the most crucial (and delicate!) of all. What kind of diachronic information (i.e. historical scenarios) can legitimately be inferred from the synchronic genetic distances? What has been the impact of mating patterns (endogamous or exogamous strategies), birth rate, group internal social, cultural and political stratification, habitat, residence strategies, geographical proximity, and prolonged contact (trade, exchange of technologies, etc.) between ethnic groups? What is the historical time depth of the probably highly complex observed genetic phenomena? Close collaboration with cultural anthropologists and specialists of other fields will be essential here. For instance, a detailed atlas of the lineages attested in Gabon (being prepared by Prof. Raymond Mayer (LABAN, Libreville) as well as a solidly based inventory of cultural markers will be indispensable tools for the task of interpreting the data.

Application of phylogenetically based methods to linguistic data The limits of lexicostatistics as a possible tool of historical linguistics have been recognised by the vast majority of linguists. Therefore, phylogenetically based methods, and in particular the cladistic method (maximum parsimony, maximum likelihood) will experimentally be applied to the classification of the languages of the Gabon area, taking into account the available phonological, morphological and lexical material, in close collaboration with several specialists of this field, with Mahé Ben Hamed (DDL, Lyon, and INSERM, Paris) as main consultant. These much more sophisticated methods will possibly also be useful for the study of the anthropological traits. Collaborations are being developed with John Nerbonne of the University of Groningen (RUG, the Netherlands), and also with Russell Gray and Claire Holden (University of Auckland, New Zealand) in order to apply with their help different newly developed methods to the available lexical data, e.g. the Neighbor-Net method (Bryant & Moulton 2004), and elaborate more realistic classifications of the languages that are being studied. The Neighbor-Net method for instance, allows to reveal conflicting signals in the comparative data and thus render more adequately their complexity. The conflicting signals in the data are often due to convergence phenomena. This method may also provide useful information about possible dialect chains.

Publications in preparation Several publications are currently being prepared for journals of population genetics, human biology and/or anthropology of international renown, as well as new publications on the description and the internal classification of the Gabonese languages aiming to provide a still more accurate picture of the linguistic situation of this area (cf. Mouguiama-Daouda & Van der Veen forthcoming; also Mouguiama-Daouda in press, Mouguiama-Daouda & Hombert forthcoming). The editing of a multidisciplinary book presenting the main results of our investigations is one of the major objectives of the LCGB project. This publication will include chapters on linguistics, genetics, anthropology, archaeology and history, as well as a critical comparison of the results from these different approaches. It will also summarise the procedures we have followed during fieldwork (useful for further research of this kind). Furthermore, a thorough reflection on ways of communicating the obtained results to the local populations, often eager to learn more about their group’s history, is imperative. This ethical aspect of the project should not be taken underestimated. The results of the project

14 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa may easily be misinterpreted or misappropriated for ideological reasons, and can in such cases have dramatic consequences!

Extension of data collection/research to other regions Once the results for the Gabon area will be available, research will have to be extended to other regions, geographically adjacent or not. This can be done in two ways, the first of these options being the most plausible one: (1) the comparison of our results with those obtained by teams presently working in neighbouring countries; (2) new fieldwork. Two crucial studies offer highly interesting perspectives for comparison. One concerns the genetic data of Bantu-speakers of Angola that has recently become available in a publication prepared by the Barcelona team in collaboration with Antonio Salas (Plaza & al. forthcoming). This issue is crucial for the study of the western Bantu expansion (see also below). The available data mainly concern the male lineages (Y-chromosome). The other study focuses on data being collected and analysed by a multidisciplinary research team directed by Alain Froment (IRD (ORSTOM), ERMES, Orleans), Evelyne Heyer (CNRS, Musée de l’Homme, Paris) and Serge Bahuchet (CNRS, Musée national d’Histoire naturelle, France) as for the Cameroon area. Comparing their results with the ones our project will yield will allow us, inter alia, to improve the study of the genetic affiliation of the Fang population (related to the Ewondo and other population groups of Cameroon) as also of the Baka (and other) Pygmies of Gabon (the Baka are probably related to the Aka population of the and most certainly to the Baka Pygmies living in Cameroon). A doctoral thesis was recently started by Pascale Paulin (DDL, Lyon) on this Pygmy population and its language in Gabon, under the supervision of Lolke Van der Veen. The constantly increasing genetic database concerning central Africa will also make it possible to envisage in the near future the study of health related phenomena, e.g. the analysis of genes involved in the immunity system in Bantu-speaking farmers vs. in Pygmies (nomadic or semi-nomadic lifestyle), in order to examine possible cultural bases of health risk.

The option of new fieldwork is a priori less plausible. Fieldwork implies time-consuming preparation, needs contact persons in the field, demands ideally some existing agreement and collaboration between scientific organisms facilitating the administrative formalities, and if possible some political stability. The extension of sample collecting activities to the Congo area seems compromised for the time being, partly for safety reasons, partly because of a lack of representatives (i.e. potential contact persons) in this highly interesting region. New possibilities may however exist for Tanzania. Excellent linguistic data has recently become available for the Makonde, a rather isolated population, formerly practising matrilocality, living in the south-eastern part of Tanzania. This ethnic group moved into the area coming from more southern regions, whereas the other Bantu-speaking populations arrived from the west. A clear linguistic barrier exists between Makonde and the other languages. This linguistically based hypothesis is worth being tested, but obtaining research permits may be difficult. Official authorities are becoming increasingly reticent in regard to blood sampling activities by foreigners. (linguist), and a number of medical researchers of the Anatomy Department of the Muhimbili Hospital University have shown real interest. If research permits will be attributed, fieldwork will probably be carried out by Mark Stoneking and his team, in close collaboration with Sophie Manus who is a specialists of this language community. Whatever the outcome of this new opportunity may be, linguists, geneticists, archaeologists, historians and cultural anthropologists will have to get together in the near future in order to compare the results obtained in their respective fields for the Kenia-

15 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Tanzania area. A series of workshops will be organised on the theme of the “East African languages, cultures and genes” when Dr. Sarah Tishkoff’s data will have become available on the Internet for discussion. A “Second (Extended) East African Workshop” will probably be organised during the first half of 2005.

Conclusion For the time being, the main focus of the “LCGB” project will be on the western Bantu expansion. Three important reasons justify this choice. Firstly, this region is the least studied of sub-Saharan Africa from the genetic point of view. Sample collection and analysis fill in the gaps. Secondly, this region presents a particular historical interest. The western Bantu expansion is probably one of the oldest movements as part of the spread of Bantu and has in the southwest (Angola-Namibia) undergone admixture as a result of exchanges with populations having arrived from the east (?southeast, ?northeast). In many cases, good linguistic documentation is available. And finally, our investigations have started in this region for the reasons indicated above and have given so far the best results. It is important to build on this newly acquired basis, and we trust that this study will improve our understanding of the nature of this expansion as well as of its underlying mechanisms. More generally, the array of different though related scientific activities assembled here for one specific purpose will allow us to considerably increase our knowledge and improve our understanding of the population genetics of Central and East Africa and also to verify to what extent a multidisciplinary project as « Language, Culture and Genes in Bantu » may contribute to the ongoing study of African history and prehistory.

References

Bastin Y., Coupez, A. & Mann, M. (1999). Continuity and divergence in the Bantu languages: perspectives from a lexicostatistic study. Annales Sciences Humaines du Musée royal de l’Afrique Centrale de Tervuren vol. 162. Tervuren. Bryant, D. & Moulton, V. (2004). Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Molecular Biology and Evolution 21(2):255-265. Cann, R., Stoneking, M., Wilson , A. C. (1987). Mitochondrial DNA and Human Evolution. Nature 325. 31-6. Cavalli-Sforza, L. L. & al. (1994). The History and Geography of Human Genes. Princeton, NJ: Princeton University Press. Clist, B. (1995). Gabon : 100 000 ans d’Histoire. Libreville, CCF-Sépia. Coupez A., Evrard, E. & Vansina, J. (1975). Classification d’un échantillon de langues bantu d’après la lexicostatistique. In Africana Linguistica 6. Tervuren. 133-158. De Maret, P. (1982). The Iron Age in the West and the South. In von Noten, F. (ed.) The Archaeology of Central Africa. Graz, Akademische Druk-u. Verlagsanstalt. 77-96. Diamond, J. & Bellwood, P. (2003). Farmers and their Languages: the First Expansions. Science April 25 2003. 300:597-603. Excoffier, L. & al. (1987). Genetic and history of Sub-Saharan Africa. Yearbook of Physical Anthropology 30:151-194. Excoffier, L. & al. (1991). Spatial differentiation of RH and GM haplotype frequencies in sub-Saharan Africa and its relation to linguistic affinities. Human Biology 63(3):273-307. Greenberg, J. (1963). The . The Hague, Mouton. Grégoire, C. (2003). The Bantu languages of the forest. In Nurse, D. & Philippson, G. (eds.). The Bantu Languages. London, Routledge. 349-370.

16 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Guthrie, M. (1948). The Classification of the Bantu Languages. London, International African Institute (IAI). Guthrie, M. (1967-1971). Comparative Bantu. Four volumes. Farnborough, Gregg International Publishers. Hammer, M. (1994). A recent insertion of an Alu element on the Y chromosome is a useful marker for Human Population studies. Mol. Biol. Evol. 11(5):749-761. Holden, C. J. (2002). Bantu language trees reflect the spread of farming across sub-Saharan Africa: a maximum-parsimony analysis. Broc. R. Soc. Lond. Bio. Sci. April 22 2002. 269(1493):793-9. Lowe, J. B. & Schadeberg, T. C. (1997). Bantu MapMaker. v3.1 (computer program). Berkeley and Leiden. [CBOLD] Melton, T. & al. (1997). Extent of heterogeneity in mitochondrial DNA of sub-Saharan African populations. Journal of Forensic Sciences 42(4):582-592. Meeussen, A.E. (1967). Bantu Grammatical Reconstructions. Africana Linguistica III. Tervuren. Meeussen, A.E. (1969). Bantu Lexical Reconstructions. Pro manuscripto. Tervuren. Mouguiama-Daouda, P. (in press), Reconstruction du vocabulaire culturel et irrégularités phonologiques, Diachronica. Mouguiama-Daouda P. & Hombert, J.-M. (forthcoming). Les noms des mammifères dans les langues du Gabon : reconstruction et hypothèses historiques. Nsuka-Nkutsi F. (1989), Apport des structures morpho-syntaxiques aux problèmes ayant trait à l’expansion des peuples bantu, in T. Obenga (ed.) Les peuples bantu : Migrations, expansion et identité culturelle, tome 1. 60-62. Nurse, D. & Philippson, G. (eds.) (2003). The Bantu Languages. London, Routledge. Nurse, D. & Philippson, G. (2003). Towards a historical classification of the Bantu languages. In Nurse, D. & Philippson, G. (eds.) The Bantu Languages. London, Routledge. 164-181. Oslisly, R. (1996). The middle Ogooué valley [Gabon]: cultural changes and palaeoclimatic implications of the last four millennia. In Sutton, J. (ed.) ‘The growth of farming communities in Africa from the equator southwards’. Azania (Nairobi). vols. 29-30:324- 331. Phillipson, D. W. (1985). African Archaeology. Cambridge, Cambridge University Press. Poloni, E. S. & al. (1997). Human genetic affinities for Y-chromosome p49a,f/TaqI haplotypes show strong correspondence with linguistics. American Journal of Human Genetics 61:1015-1035. Salas, A. & al. (2002). The Making of the African mtDNA Landscape. American Journal of Human Genetics. November 2002. 71(5):1082-1111. Scozzari, R. & al. (1994), Genetic studies in Cameroon: Mitochondrial DNA Polymorphisms in Bamileke, Human Biology, vol. 66(1):1-12. Scozzari, R. & al. (1999). Combined use of biallelic and microsatellite Y-chromosome polymorphisms to infer affinities among African Populations. American Society of Human Genetics 65:829-846. Seielstad, M. T., Minch, E., Cavalli-Sforza, L. L. (1998). Genetic evidence for a higher female migration rate in humans. Nat Genet 20:278-80. Soodyall, H. (1993). Mitochondrial DNA polymorphisms in southern African populations. PhD thesis. University of Witwatersrand, Johannesburg. Soodyall, H. & Jenkins, T. (1993). Mitochondrial DNA polymorphisms in Negroid populations from Namibia : new light on the origins of Dama, Herero and Ambo. Ann. Hum. Biol. 20:477-485.

17 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Soodyall, H. & al. (1996). MtDNA control-region sequence variation suggests multiple independent origins of an « Asian-specific » 9-bp deletion in sub-Saharan Africans. American Society of Human Genetics 58:595-608. Spedini, G. & al. (1999). The peopling of sub-Saharan Africa: the case study of Cameroon. American Journal of Physical Anthropology 110(2):143-162. Spurdle, A. B. & al. (1992). Y chromosome probe p49a detects complex PvuII haplotypes and many new TaqI haplotypes in southern African populations. Am. J. Hum. Genet., 50:107- 125. Spurdle, A. B. & al. (1996). The origins of the Lemba « Black Jews » of southern Africa: evidence from p12F2 and other Y-chromosome markers. Am. J. Hum. Genet., 59:1126- 1133. Tishkoff, A. S. & Williams, S. M. (2002). Genetic Analysis of African Populations: Human Evolution and Complex Disease. Nature Reviews, Genetics, 3:611-621. Trilles, H. (1912). Quinze ans au pays des Fang. Paris, Desclée. Van der Veen, L. J. & Hombert, J.-M. (forthcoming). On the origin and diffusion of Bantu: a multidisciplinary approach. Proceedings of the 32nd Annual Conference on African Languages (ACAL 32). [March 2001 Berkeley.] Vansina, J. (1990). Paths in the Rainforests: Toward a History of Political Tradition in Equatorial Africa. Madison, University of Wisconsin Press. Vansina, J. (1995). New linguistic evidence and ‘The Bantu Expansion’. Journal of African History 36:173-195. Watson, E. & al. (1996). MtDNA sequence diversity in Africa. American Journal of Human Genetics 59:437-444. Watson, E. & al. (1997). Mitochondrial footprints of human expansion in Africa. American Journal of Human Genetics 61:691-704. Williamson, K. & Blench, R. (2000). Niger-Congo. In Heine, B. & Nurse, D. (eds.) African Languages, An Introduction. Cambridge, Cambridge University Press.

Major (recent) publications of the Principal Investigators

Jaume Bertranpetit & David Comas (and associated researchers) Plaza S., Salas, A., Calafell, F., Corte-Real, F., Bertranpetit, J., Carracedo, A., Comas, D. (forthcoming). Insights into the western Bantu dispersal: mtDNA lineage analysis in Angola. [Publication submitted to Human Genetics.] Fadhlaoui-Zid, K., Plaza, S., Calafell, F., Ben Amor, M., Comas, D., Bennamar Elgaaied, A. Mitochondrial DNA heterogeneity in Tunisian Berbers. Ann Hum Genet (2004). Comas, D., Plaza, S., Wells, R. S., Yuldaseva, N., Lao, O., Calafell, F., Bertranpetit, J. Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur J Hum Genet (2004). Plaza, S., Calafell, F., Helal, A., Bouzerna, N., Lefranc, G., Bertranpetit, J., Comas, D. Joining the Pillars of Hercules: mtDNA sequences show multidirectional gene flow in the Western Mediterranean. Ann Hum Genet (2003). 67(4): 312-328. Osier, M. V., Pakstis, A. J., Soodyall, H., Comas, D., Goldman, D., Odunsi, A., Okonofua, F., Parnas, J., Schulz, L. O., Bertranpetit, J., Bonne-Tamir, B., Lu, R. B., Kidd, J. R., Kidd, K. K. A global perspective on genetic variation at the ADH genes reveals unusual patterns of linkage disequilibrium and diversity. Am J Hum Genet. (2002). 71(1):84-99.

18 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Mark Stoneking (and associated researchers) Kayser, M., Kittler, R., Erler, A., Hedman, M., Lee, A.C., Mohyuddin, A., Mehdi, S. Q., Rosser, Z., Stoneking, M., Jobling, M. A., Sajantila, A., Tyler-Smith, C. (forthcoming). A comprehensive survey of human Y-chromosomal microsatellites. American Journal of Human Genetics. [accepted] Cordaux, R. & Stoneking, M. (2003). South Asia, the Andamanese and the genetic evidence for an “early” human dispersal out of Africa. American Journal of Human Genetics 72:1586-1590. Kayser, M., Brauer, S., Schädlich, H., Prinz, M., Batzer, M. A., Zimmerman, P. A., Boatin, B. A., Stoneking, M. (2003). Y chromosome STR haplotypes and the genetic structure of U.S. populations of African, European, and Hispanic ancestry. Genome Research 13:624-634. Pakendorf, B., Morar, B., Tarskaia, L. A., Kayser, M., Soodyall, H., Rodewald, A. and Stoneking, M. (2002). Y-chromosomal evidence for a severe reduction in male population size of Yakuts. Human Genetics 110:198-200. Stoneking, M. & al. (1997), Alu insertion polymorphisms and human evolution: evidence for a larger population size in Africa. Genome Research, 7:1061-1071. Cold Spring Harbor Laboratory Press.

Lluis Quintana-Murci (and associated researchers) Quintana-Murci, L., Chaix, C., Wells, R. S., Behar, D. M., Sayar, H., Scozzari, R., Rengo, C., Al-Zahery, N., Semino, O., Santachiara-Benerecetti, A. S., Alfredo, Coppa, Qasim Ayub, Aisha Mohyuddin, Tyler-Smith, C., Mehdi, S. Q., Torroni, A., McElreavey, K. (2004). Where West meets East: The complex mtDNA landscape of the Southwest and Central Asian corridor. American Journal of Human Genetics 74 : 827-845. Quintana-Murci, L., Bigham, A., Rouba, H., Barakat, A., McElreavey, K., Hammer, M. (2004). Y-chromosomal STR haplotypes in Berber and -speaking populations from Morocco. Forensic Sci Int 140(1):113-5. Zei, G., Lisa, A., Fiorani, O., Magri, C., Quintana-Murci, L., Semino, O., Santachiara- Benerecetti, A. S. (2003). From surnames to the history of Y-chromosomes: the Sardinian population as a paradigm. Eur J Hum Genet 11: 802-7. Quintana-Murci, L., Veitia, R., Fellous, M., Semino, O., Poloni, E. S. Genetic structure of Mediterranean populations revealed by Y-chromosome haplotype analysis. (2003). Am J Phys Anthropol 121(2):157-71. Manni, F., Leonardi, P., Barakat, A., Rouba, H., Heyer, E., Klintschar, M., McElreavey, K., Quintana-Murci, L. (2002). Y-chromosome analysis reveals a genetic regional continuity in north-eastern Africa. Human Biology 74: 645-658.

Lolke Van der Veen (and associated researchers) Mouguiama-Daouda, P. & Van der Veen, L. J. (forthcoming). B10-B30 : conglomérat phylogénétique ou produit d’une hybridation. [Accepted. To be published in 2005 in a Festschrift for Claire Grégoire and Yvonne Bastin.] Van der Veen, L. J. (2003). The B30 languages. In Nurse, D. & Philippson, G. (eds.). The Bantu Languages. London, Routledge. 371-391. Van der Veen, L. J. & Bodinga-bwa-Bodinga, S. (2002). Gedandedi sa geviya, dictionnaire geviya-français. Collection Langues et littératures de l’Afrique Noire (Philippson, G. ed.). XII. Louvain/Paris, Peeters Publishers. 569 pp. Luca, F., Van der Veen, L., André, E., Mouguiama-Daouda, P., Sica, L., Hombert, J.-M., Quintana-Murci, L. (2003). MtDNA variation in Central Africa: a microevolutionary study in Bantu-speaking populations from Gabon. Poster presented by Francesca Luca of the

19 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Institut Pasteur (Paris) at the 2nd DNA Polymorphisms in Human Population International Symposium held in Paris, December 2003. Mouguiama-Daouda, P. (in press). Contribution de la linguistique à l’histoire du Gabon : la méthode comparative et son application au domaine bantou. Paris, Editions CNRS. [Accepted.]

20 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

±5000 BP ? ±2500 BP + ++

±3500 BP?

Ambo R22 •

±1500 BP

Map 1. Bantu Expansion: Presumed expansion routes. Zones retained for hypotheses testing.

Presumed major expansion routes, convergence zones, and secondary expansion zones. The Bantu homeland is situated in the Northwest (borderland Nigeria-Cameroon). From this place, the Bantu populations gradually moved across the rainforest to the East (one major movement) and to the South (other major movement). Some of eastern populations continued their southward spread. Convergence between eastern and western populations occurred at different times and places (cf. the Ambo population, R22 (Guthrie’s classification). The different shades (colours) in the arrows have been used to differentiate major regional “migration” movements.

Map created with Bantu Map Maker (T. Schadeberg, Leiden University, The Netherlands.)

21 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Bakao Pygmies A75 5

4 B20 A75 A75 A75 B20 B20 B10 A75 Fang A75 Kota B20 A75 1 B30 B10 Galwa B20 Rungu B20 B10 Kele B60 B20 Viya Mbaama B30 B30 B60 Shira Tsogo B50 B40 B40 B50 B70 Punu 3 2 Njabi Teke B40 B40 B20 B40 Vili H12b GABON

References acc. to Guthrie (1948).

Map 2. The Gabon area (enlarged “Ogooué-Ivindo-Ngounié” region): Major linguistic groups

1. = the B10+B30+?B20 (MYENE-TSOGO-?KELE) cluster (in green) 2. = the B40-H12b (SHIRA-Vili) cluster (in red) 3. = the B50-B60-B70 (NJABI-MBAAMA-TEKE) cluster (in brown) 4. = A75 (Fang) (in violet) 5. = (in black)

(E.g. Kele = language name, B10 = language group.)

22 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

5 A75 1 2 Fang

B10-B30 (B20?) Myene-Tsogo-(Kele?)

Main Convergence Zone

B50?-B60-B70 Njabi?-Mbete-Teke

B40-H12a 4 Shira-Vili GABON 3

Reference numbers according to Maho (2003).

Map 3. The Gabon area (enlarged “Ogooué-Ivindo-Ngounié” region). Its presumed major (gradual) expansion routes, the central convergence zone. [Linguistically based inferences.]

1. = Main north-south Western expansion route 2. = B10+B30 (MYENE-TSOGO) branch splitting off 3. = B40+part of H10 (SHIRA-Vili) branch 4. = B50+B60+B70 (NJABI-MBAMBA (OBAMBA)-TEKE) branch 5. = A75 (Fang) [Most probable chronological order.]

(Language group references according to Guthrie 1948.)

23 OMLL (Eurocores, ESF) — Language, Culture and Genes in Bantu: a Multidisciplinary Approach of the Bantu-speaking populations of Africa

Field mission July-August 2003. Thierry Nzamba-Nzamba, cultural anthropology student, filling out the ethnolinguistic questionnaire in the presence of the main representative of the Viya community. (Photograph: Gisèle Teil-Dautrey (DDL).)

24