LINGUISTIC AND GENETIC RELATIONSHIPS IN NORTHERN CAMEROON
Brett C. Haberstick1
Erin Shay2
Eric Johnston2
Gary L. Stetler1
John K. Hewitt1
Andrew Smolen1
Zygmunt Frajzyngier2
1 Institute for Behavioral Genetics, University of Colorado, Boulder, CO 80309-0447
2 Department of Linguistics, University of Colorado, Boulder, CO 80309-0295
The authors wish to thank the Butcher Foundation for their support of the Northern
Cameroon Language and Genes Project [NCLGP] as well as Brad Pemberton and Taylor
Roy for their technical assistance related to this project. Introduction
The goal of this study is to explore the correlation between related languages and the genetic relationships of the populations that speak them. The narrower goal is to examine whether individuals of languages from the same language family or the same branch of a given family exhibit a closer genetic relationship than individuals belonging to different language groups or subgroups. For this purpose, genotypes for 28 autosomal genetic markers were determined from samples obtained from about 30 speakers of each of six different languages belonging to two different language families spoken in Northern
Cameroon. Genetic relationships established using the genetic data were correlated with
established relationships among languages spoken by those who provided samples.
Background
Several prior studies have sought correlations between linguistic classification and
genetic distance among the language populations of Cameroon. Like the present study,
these studies take for granted the established linguistic classifications. Because the
methods of genetic sampling and analysis differ, the results of these studies are not always
comparable. There have been very few studies dedicated to the Chadic-language
populations of Cameroon.
In two studies, Spedini and colleagues [1999, 2001] analyzed the distribution of ten
protein genetic polymorphisms in eighteen populations belonging to three linguistic families
represented in Cameroon, namely Afroasiatic; Nilo-Saharan; and the West-Atlantic,
Adamawa eastern, and Benué-Congo branches of Niger-Kordofanian. The Afroasiatic family
is represented by the Chadic languages and by Shua Arabic. Among the Chadic languages,
Spedini et al. have examined Daba, Giziga (Guiziga, in their spelling), Mafa, Mada, Uldeme,
and Podoko (Podowko, in their spelling), belonging to the Central Chadic branch, and Masa
(Massa in their spelling), all of which belong to the Masa branch as per the Newman 1977 classification. Spedini et al. postulate a partial correlation between linguistic distance and genetic distance, concluding that ‘the language-family relationship between populations contributes more than their geographic location to the genetic differentiation among Chadic
speakers (but not among Niger-Kordofanian)’ (Spedini et al. 1999: 156).
erný et al. [2004] examined mitochondrial DNA sequences for Hdi (which they call
Hide), Kotoko, Mafa (all Chadic languages of the Central branch), and Masa (Masa branch).
The data are compared with published findings for other populations in Africa. The authors conclude that speakers of the four Chadic languages in their study are more closely related to populations in East Africa than to populations in West Africa, pointing out that such similarities may be due to prehistoric migrations or to more recent interactions between the populations.
Linguistic Relationships
A language family is a group of languages thought to be descended from a common
ancestor. The members of a language family may be grouped into branches and sub-
branches whose members are thought to be more closely related to one other than to
members of other branches or sub-branches.
The current study is based on genetic data gathered from speakers of six different
language groups [five from the Chadic family, one from the Niger-Congo family] in a
relatively small area of Northern Cameroon. Within the Chadic family, four languages [Gidar,
Mina, Hdi, and Mafa] belong to the Central (also called the Biu Mandara) branch while one
language [Peve] belongs to the Masa branch. From the Niger-Congo language group, we
sampled speakers of Mambay. Previous research has established the linguistic relationships
among these languages using the standard comparative method [Newman (1977) for the
Chadic grouping, Boyd (1989) for the Niger-Congo grouping]. For the purposes of the
current study, we take these relationships as given.
Basis for language group selection
In order to maximize the detection of common population genetic structure between groups, language groups were selected first by language family and then by geography. For example, within the Central branch of the Chadic language family, Hdi and Mafa were sampled because they are spoken within the immediate vicinity of each other; thus these languages are close both linguistically and geographically. Gidar and Mina language groups, also of the Central branch, are also spoken in the immediate vicinity of one another, though each belongs to a different sub-branch within the Central Chadic language family grouping.
A comparison of the genetic relationships between Hdi and Mafa on the one hand and Gidar
and Mina on the other hand may reveal whether linguistic distance, geographical proximity
or both are reflected in the genetic structuring of these respective populations.
Peve, which belongs to the East branch of the Chadic family, was chosen because it
is only distantly related to the other four Chadic languages selected for participation.
Mambay was chosen because it belongs to the Niger-Congo family and thus is not
linguistically related to any other languages in the study [Boyd, 1989]. Because Mambay is
spoken in the geographical area close to where Peve is spoken, including these two
language groups allows an examination of the genetic relatedness when two groups are
geographically proximate and linguistically unrelated. Comparisons between Peve and the
other Chadic languages examined here allow an examination of the genetic relatedness
when two groups are linguistically related, geographically remote. As a whole, each of the
six distinct language groups identified for participation facilitated asking whether the
complete absence of linguistic relatedness also suggested population genetic divergence.
One further consideration in selecting the language groups was that several of the
investigators have previously published work on the Hdi, Gidar, and Mina languages
[Frajzyngier, in press; Frajzyngier et al, 2005; Frajzyngier and Shay, 2002] and have developed working familiarity with the chosen groups. Furthermore, the investigators were able to employ a speaker of Peve who is interested in studying his own language and who assisted in the collection of the genetic samples used in the current study.
Social relationships and historical interactions
The language populations from which genetic samples were obtained have had varying degrees of interaction over time. Hdi is a relatively small language, with estimates varying between 15,000 to 30,000 speakers. Mafa, the largest language group selected for study here, number more than 100,000 speakers and occupies a large area of the extreme
Northern Province of Cameroon, and surrounds the Hdi population on three sides. There
has been much commercial trading and intermarriage between the Hdi and Mafa
populations, with Mafa women sometimes marrying into the Hdi community and Hdi women
marrying into the Mafa community. The primary factors determining intermarriage between
these two populations are economic rather than cultural [e.g. there is no linguistic exogamy
requirement].
Some Mina and Gidar settlements are separated by as little as 15 kilometers, but the
historical and cultural centers of the Gidar and Mina populations are a considerable distance
apart [see #39 and #61 on Figure 1]. The Mina population, whose speakers number roughly
11,000, were a dominant military force in the area during the 1800’s. There are more Gidar
speakers than Mina speakers, with estimates ranging from 40,000 to 70,000 [Ethnologue].
According to Podlewski [1965], there has been a considerable degree of intermarriage or
admixture between Gidar speakers and other populations [not specified by Podlewski].
Peve, the smallest language groups selected for our study, is spoken by
approximately 5,720 speakers and is a dialect of Zime, a language spoken by over 100,000
speakers in Cameroon and nearby Chad. The Peve settlement borders the area where
Mundang [#56 in Figure 1] is spoken. Although Peve and Mambay [spoken by roughly 8,000 speakers] belong to different language families, both languages border on the area where
Mundang, a member of the Niger-Congo family, is spoken. Peve speakers have had considerable linguistic and social contact with Mundang, and many Peve speakers speak
Mundang as a second language. The degree of mutual understanding between Mambay and Mundang has been estimated at about 47% [Hamm, 2002]. There are occasional intermarriages among Mambay and Peve speakers. In those situations, Peve men have settled among Mambay, with fewer cases of Mambay men settling among Peve.
Goals of the study
Our objective was to determine whether the degrees of relatedness determined by
our genetic data would reflect the degrees of relatedness as determined by the linguistic
relationships described above. Thus we hypothesized speakers of languages belonging to
the Chadic language family would be more genetically similar to each other than to speakers
of the Niger-Congo language, Mambay. In particular, we hypothesized that Hdi, Mafa, and
Mina speakers would be more closely related to one another than any one of them is to
Gidar. We also hypothesized that the Hdi, Mafa, Mina, and Gidar groups would be more
closely related to each other than any of them to Peve.
Methods
Sample
Thirty native speakers from one of six villages or settlements representing six distinct
language groups [Gidar, Mina, Peve, Mambay, Hdi, Mafa] in Northern Cameroon were
asked to participate in the investigation. In each population, the leader of the village was
informed about the study and the need to obtain buccal samples from speakers who were
not biologically related. Although considerable effort was made to collect DNA samples only
from speakers not biologically related to one another, some samples may have inadvertently been collected from relatives. Participants in the study were paid approximately $6 (USD) each for their time.
For the Gidar, Mina, Peve, and Mafa language groups, the villages where samples were obtained were the main or central settlements. This was not true for the Mambay and
Mafa language groups, where samples were collected from villages that are peripheral to the main settlement but are geographical neighbors to other languages included in the current study. For example, the Hdi and Peve language groups closely neighbored Mafa and Mambay, respectively.
The genetic samples for Hdi speakers came from Tourou, the cultural and historical
center of the Hdi population. For the Mafa language group, DNA samples were collected
from residents living approximately 15 kilometers (km) away from Tourou. The samples for
Gidar came from the viliage of Lam, considered to be the religious, cultural, and political
center of the Gidar population. The samples collected from Mina speakers came from a
settlement about 150 km from the Gidar village of Lam and about 200 km from the Hdi and
Mafa sample populations. The Peve samples came from the village of Mayo-Lopé, roughly
200 km from the other Chadic populations [Gidar, Mina, Peve, Hdi, Mafa] surveyed and
about 45 to 50 km from the village of Bikallé, where our DNA samples were collected from
Mambay speakers.
DNA collection and genotyping
Buccal cell DNA was collected following signed informed consent. Buccal cells were
collected using two cotton-tipped swabs that were placed in 0.5 ml of lysis buffer (0.5%
SDS, 10 mM Tris-EDTA, pH 8.0), and stored at ambient conditions until shipped to the
Institute for Behavioral Genetics [IBG; University of Colorado, Boulder, Colorado, USA].
Upon arrival in the laboratory, two ml of lysis buffer were added to the samples. Genomic
DNA was extracted using proteinase K treatment followed by isopropyl alcohol and ethanol precipitations. DNA pellets were resuspended in Tris-EDTA buffer at a concentration of 10 ng /µL.
PCR
Three multiplex PCR reactions were used to amplify the 28 Short Tandem Repeat
(STR) loci analyzed. IBG-Hvar1 [Table1a] is a 12-plex PCR that we have used extensively for zygosity determinations. IBG-Hvar2 [Table 1b] is a 14-plex PCR based on the CODIS
[Combined DNA Index System; Budowle et al, 1999] panel that has been modified by replacing four of the loci: D21S11, THO1, D18S51 and FGA with D4S2639, D9S934,
D20S470 and D15S657, respectively for routine use. The replaced CODIS loci were analyzed in a separate five-plex PCR reaction (Table 1c). In practice the four replacement pairs (e.g., D21S11 and D4S2639) can be substituted for one another in any combination if desired, since the four replacement loci were chosen for the same sized amplicon, similarity of primer melting temperature and lack of interactions among primers. The sources of primer sequences are given in the footnotes to the tables. The heterozygosity values for each locus
were obtained from the Invitrogen web site (http://mp.invitrogen.com/
resources/apps/mappairs/). They are given for illustrative purposes only, and do not
represent the observed heterozygosities for these Cameroon populations, which may be
found accompanying the appropriate tables at the website for the Northern Cameroon
Language and Genetics Project [NCLGP; http://ibgwww.colorado.edu
/genotyping_lab/NCLGP] .
Each 20 µl PCR reaction contained 1 µl of DNA (1-10 ng), 4.4 µl of primer mix
composed as described in the tables, 2.0 µl of of GoldSTAR buffer [Promega, Madison, WI]
and two units of AmpliTaq® Gold DNA polymerase [Applied Biosystems, Foster City, CA].
Cycling conditions were as given in Krenke et al [2002]. The alleles were separated and
detected using an ABI PRISM® 3100 Genetic Analyzer. All plates contained at least one control [CEPH 1347-02] and allelic ladders for the CODIS loci. Electropherograms were reviewed by two investigators independently, and discrepancies resolved by reanalysis of the loci using single plex reactions.
[Insert Tables 1a, 1b, 1c about here]
Statistical Analyses
Allele frequencies for the 28 STR markers were determined by direct counting as implemented in CONVERT [Glaubitz, 2004]. Tests for deviations from Hardy-Weinberg equilibrium (HWE) were conducted using Fisher’s Exact probability test [Guo and
Thompson, 1992] using the statistical package Arlequin (Version 3.01; Excoffier et al, 2005].
Unbiased genetic distances were calculated between all pairs of populations using
published methods [Reynolds, Weir, and Cockerham, 1983]. Analysis of molecular variance
(AMOVA) was conducted using data from all 28 STR markers as implemented in Arlequin.
AMOVA enables the partition of genetic variation at a locus or several loci into variation
between and within populations [Excoffier, 2001]. In addition, AMOVA can be used for
hierarchical analyses of genetic differences due to: (1) variation between individuals within a
population, (2) between populations within groups, (3) between groups. Significance of
AMOVA values was estimated using 10,100 permutations. The extent of population division
was measured using the fixation index or coancestry coefficient, Fst [Weir and Cockerham,
1984; Excoffier, Smouse, and Quattro, 1992; Weir, 1996). Fst values range between 0
indicating no population subdivision, random mating, and no genetic divergence within a
population and 1 (population isolation), with values below 0.05 suggesting little to no genetic
differentiation [Adeyemo, Chen, Chen, and Rotimi, 2005; Tishkoff and Williams, 2004]. The
significance of the genetic contribution to variation among populations within groups, Fsc,
and among groups, Fct, was examined using permutation procedures. Genetic structuring of the six Northern Cameroon populations was also assessed using Structure [Version 2.1; Pritchard, Stephens, and Donnelly, 2000], which is a model based clustering method for inferring populations using unlinked markers. The model assumes there are K populations each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are probabilistically assigned to populations, or jointly to two or more populations if their genotypes indicate they are admixed. An advantage of Structure is that it can be applied to most types of genetic markers including single nucleotide polymorphisms, microsatellites, and Restriction
Fragment-Length Polymorphisms (RFLPs). Genetic data from the 28 STR markers were
examined by specifying an admixture model that assumed correlated allele frequencies
between populations [Falush, Stephens, and Pritchard, 2003; Pritchard et al, 2000]. This
approach has been shown to improve clustering in populations that are weakly differentiated
(e.g. recently admixed) and also allows inference of the pattern of genetic drift or population
divergence [Falush et al, 2003; Rosenberg et al, 2005]. For each run the number of
populations, K, varied between 1 and 7, with a burn in period of 100,000 and a run length of
500,000 iterations. At each value of K, runs were conducted multiple times to ensure the
consistency of the results.
Results and Discussion
Tests of HWE for each of the 28 STR markers, polymorphic information content, and
power of discrimination statistics for each loci examined across all populations have been
previously reported [Haberstick et al, submitted]. Tests for deviations from HWE for each of
the 28 markers within the six Northern Cameroon language groups are detailed elsewhere
(http://ibgwww.colorado.edu/genotyping_lab/NCLGP). Number of observed alleles and most
common alleles at each of the 28 loci along with the average gene diversity over the loci
within the six language groups are provided in Tables 2a, 2b, 2c, and 2d. For most loci, the most common allele (MCA) was shared across 2 – 6 language groups. Variation in the number of observed alleles together with their heterozygosity values (0.4286-0.9643) suggest the 28 STR loci characterized in these populations are highly polymorphic and the within population variability is sufficiently high with a mean genetic diversity of 0.7387 [Table
3].
[Insert Tables 2a, 2b, 2c, 2d about here]
[Insert Table 3 about here]
In order to characterize the extent of the genetic structure among the six language groups studied, Fst values were calculated for each language group separately. Population
specific Fst, which measure the degree of population differentiation, ranged between 0.0201
and 0.0216 with an overall Fst value of 0.0208 and suggested little population differentiation.
Analysis of molecular variance results indicated that when haplogroup frequencies were
analyzed without grouping the six populations, the highest fraction of variability was due to
within population differences (97.9%). Pairwise coancestry coefficients based on this model
were low and significant for four of the six language groups. As shown in Table 4, this
suggested that there is a slight but statistically significant differentiation between speakers of
the North-Central language groups (Gidar, Mina) and South-Central language groups
(Mambay, Peve). Though non-significant, the near zero coancestry coefficients for the two
North-Western language groups (Hdi, Mafa) suggests that there are few restrictions upon
mating between these two populations.
[Insert Table 4 about here]
To understand better how the between population variation was distributed, we conducted three hierarchical analysis of molecular variance tests. For these tests, language groups were arranged into (1) geography; North-Eastern (Mambay, Peve), North-Central
(Mina, Gidar), and North-Western (Mafa, Hdi), (2) linguistic family; Chadic (Gidar, Mafa, Hdi,
Mina, Peve) and Niger-Congo (Mambay); and (3) linguistic sub-family; Central/Biu Mandara
1 (Hdi, Mafa, Mina) and 2 (Gidar), Masa (Peve), Niger-Congo (Mambay). For each of the three hierarchical models, the genetic variance between groups ranged from -0.95 to 0.95
and was similar when arranged along geographic and linguistic sub-family membership. For
none of the three tests was the percentage of variation accounted for by the between groups
parameter statistically significant.
[Insert Table 5 about here]
Because the degree of genetic differentiation between these six groups was low, we
further investigated the genetic population structure using the Baysian clustering algorithm
implemented in Structure. Results from a model specifying admixture with correlated allele
frequencies were indeterminate as the likelihood that individuals descended from two (K =2;
log likelihood = -17,020.6), three (K = 3; -17,044.6), four (K = 4; -17,053.8), and five (K = 5; -
17,071.2) were similar. There was no support for each of the groups belonging to the same
genetic population (K =1; log likelihood = -17,378.2) or belonging to six different populations
(K = 6; -17,238.6). This suggested some genetic structuring within these six language
groups. Table 6 describes the proportion of individuals assigned to each of the K clusters.
As shown, across all levels of K no language group could be completely assigned to a single
group, and suggested a shared ancestry among these populations. Speakers from the
Gidar, Mina, and Peve groups evidenced higher rates of membership with each other than with Mambay and Hdi. The exception was Mafa, with whom individuals from each of the five other groups shared membership.
[Insert Table 6 about here]
Lastly, to put these results in context, we compared the observed allele frequencies with those previously reported from Cameroon samples. Five of the STR loci characterized here were also genotyped in the Bamileke of the western plateau and Ewonodo in the central-southern areas of Cameroon [Destro-Bisol et al, 2000]. Table 7 provides the published allele frequencies and those of our sample. As shown, the number of observed
alleles for three loci (TH01, TPOX, vWA) was identical across all groups. For two loci
(D18S51 and D21S11), however, greater genetic diversity was observed in the Northern
Cameroon populations than in the Bamileke and Ewonodo. In comparison with the Northern
Cameroon populations, all three groups shared the most common allele for two markers
(TH01 and D21S11) and completely differed for one (vWA). While interpretation is limited
without formal inclusion of the Destro-Bisol et al [2002] data, these results lend evidence to
the notion that there is a limited degree of genetic diversity among these 8 groups.
[Insert Table 7 about here]
Conclusions
In this report, the allelic diversity for 28 STR loci in six discrete language populations
of Northern Cameroon are reported; many for the first time. A number of characteristics of
STR loci make them useful for the study of migration history, population substructure, and
controlling for the confounding effects of admixture in association-based studies of complex
traits [Barholtz-Sloan et al, 2005; Jorde et al, 1997; Reddy et al, 2001; Reed and Tishkoff, 2006]. These features include a typically large range in allele sizes, high heterozygosity (Ho) values, relative abundance within the genome, and the relative ease at which STR loci can be characterized [Perez-Miranda et al, 2005]. While the use of autosomal STR loci is common among studies of world-wide migration patterns [Rosenberg et al, 2002; Bastos-
Rodrigues et al, 2006] and as large numbers of markers have been characterized in limited numbers of African populations, there are few studies that have focused on Cameroon in general and Northern Cameroon specifically.
Based on previous linguistic study [Frajzyngier, in press; Frajzyngier et al, 2005;
Frajzyngier and Shay, 2002], we sought to elucidate the extent of genetic differentiation
among six language groups and determine the correlation between language group
membership and genetic substructure. The most significant inference that could be drawn
from these results was the lack of genetic diversity among these populations, despite being
highly differentiated linguistically. While small differences were detected between
populations, they were not along hypothesized lines. The fact that these populations have a
shared genetic substructure [alleles in one population were found in each of the remaining
five populations] suggests that these six groups descended from a common ancestor and
their divergence is somewhat recent.
In these samples, the within-population variation accounted for more than 97.0% of
the genetic diversity. This observation was consistent across hierarchical analyses based on
geography, language family, and linguistic sub-family. While not examined along linguistic
lines, similar estimates have been observed in previous studies based on microsatellite
markers [Adeyemo et al, 2005; Rosenberg et al, 2002] and insertion/deletion polymorphisms
[Bastos-Rodrigues et al, 2006]. One potential reason for this result could be that these six
groups live within a 300 kilometer radius of one another. Furthermore, there are no major
mountain ranges, bodies of water, or other landscape features that would prevent admixture. As described earlier, despite the linguistic differences, these groups have limited contact with one another in the form of trade and marriage partners.
Although the different methodological approaches employed here converge on the notion of limited population substructure among these groups, it is important to keep in mind a number of limitations. First, many of the STR loci characterized within language groups were out of HWE. While this suggests the effects non-random mating, it violates an assumption of the clustering algorithm implemented in Structure [Pritchard et al, 2000]. As such, the proportion of membership reported here may be biased. Second, while our choice
of markers overlapped with one previous study [Destro-Bisol et al, 2000] and extended the
available genotypes for larger population based studies, their number may not have been
sufficiently informative. However, the number of loci characterized here exceeds many
studies of other human populations around the world. Fourth, we sampled a total of 180
individuals; 30 speakers from each of six groups which may have limited our ability to detect
the extent of population genetic substructure. Samples of 200 – 500 individuals per group or
more have been shown to be helpful in determining “clusteredness” [Bamshad et al, 2003;
Rosenberg et al, 2005]. We hope that our future efforts to understand the degree of
relationship between the six language groups examined here and population genetic
structure will address many of these limitations and expand to include the use of
mitochrondrial DNA and Y-Chromosome haplotype information. References
A.A. Adeyemo, G. Chen, Y. Chen, C. Rotimi, ‘Genetic structure in four West African population groups’. BMC Genetics 6 (2005), 1-9.
M.J. Bamshed, S. Wooding, W.S. Watkins, C.T. Ostler, M.A. Batzer, L.B. Jorde, ‘Human population genetic structure and inference of group membership’. American Journal of
Human Genetics 72 (2003), 578-589.
J.S. Barnholtz-Sloan, R. Chakarborty, T.A. Sellers, A.G. Schwartz, ‘Examining population stratification via individual ancestry estimates versus self-reported race.’ Cancer
Epidemiology, Biomarkers Prevention 14 (2005) 1545-1551.
R. Boyd, In J. Bendor-Samuel and R.L. Hartell (eds.). The Niger-Congo languages: A classification and description of Africa’s largest language family. (Lanham, MD: University
Press of America, 1989), 178-215.
B. Budowle, T.R. Moretti, A.L. Baumstartk, D.A. Defenbaugh, K.M. Keys, ‘Population data on the thirteen CODIS core short tandem repeat loci in African Americans, U.S. Caucasians,
Hispanics, Bahamians, Jamaicans, and Trinidadians.’ Journal of Forensic Sciences, 44
(1999), 1277-1286.
¢ V.M. erný, R. Hájek, J. Cémejla, J. Bru ¡ ek, R. Brdi ka. ‘mtDNA sequences of Chadic- speaking populations from northern Cameroon suggest their affinities with eastern Africa.
Annals of Human Biology, 5 (2004) 554-569. G. Destro-Bisol, I. Boschi, A. Cagila, S. Tofanelli, V. Pascali, G. Paoli, G. Spedini,
‘Microsatellite variation in Central Africa: an analysis of intrapopulation and interpopulation genetic diversity’. American Journal of Physical Anthropology 112 (2000) 319-337.
L. Excoffier, Analysis of population subdivision. Handbook of Statistical Genetics. Eds: D.J.
Balding, M. Bishop, C. Cannings (Chichester: John Wiley & Sons, 2001).
L. Excoffier, G. Laval, S. Schneider, ‘Arlequin (version 3.0): An integrated software package for population genetics data analysis,’ Evolutionary Bioinformatics, 1 (2005) 47-50.
D. Falush, M. Stephens, J.K. Pritchard, ‘Inference of population structure using multilocus
genotype data: Linked loci and correlated allele frequencies,’ Genetics, 164 (2003) 1567-
1587.
Z. Frajzynger, A Grammar of Gidar. (Frankfurt: Peter Lang, in press)
Z. Frajzynger, E. Johnston, A. Edwards, A Grammar of Mina. (Berlin/New York: Mouton de
Gruyter, 2005).
Z. Frajzynger, E. Shay, A Grammar of Hdi. (Berlin/New York: Mouton de Gruyter, 2002).
S.W. Guo, E.A. Thompson, ‘Performing the exact test of Hardy-Weinberg proportion for
multiple alleles,’ Biometrics, 48 (1992) 361-372.
J.C. Glaubitz, ‘CONVERT: A user-friendly program to reformat diploid genotypic data for commonly used population genetic software packages,’ Molecular Ecology Notes, 4 (2004)
309-310.
B.C. Haberstick, G.L. Stetler, B. Pemberton, E. Johnston, J.K. Hewitt, Z. Frajzyngier, E.
Shay, A. Smolen, ‘Northern Cameroon population data on 28 STR loci,’ Journal of Forensic
Sciences, submitted.
C. Hamm, ‘A sociolinguistic survey of the Mambay language of Chad and Cameroon. SIL
Electronic Survey Reports SILESR, (2002) 39.
B.E. Krenke, A. Tereba, S.J. Anderson, E. Buel, S. Culhane, C.J. Finis, C.S. Tomsey, J.M.
Zachetti, A. Masibay, D.R. Rabbach, E.A. Amiott, C.J. Sprecher, ‘Validation of a 16-Locus
Fluorescent Multiplex System,’ Journal of Forensic Science, 47 (2002) 773-85.
M. Mizutani, T. Yamamoto, K. Torii, H. Kawase, T. Yoshimoto, R. Uchihi, M. Tanaka, K.
Tamaki, Y. Katsumata, ‘Analysis of 168 short tandem repeat loci in the Japanese
population, using a screening set for human genetic mapping,’ Journal of Human Genetics
46 (2001) 448-455.
P. Newman, ‘Chadic classification and reconstructions,’ Afroasiatic Linguistics, 51 (1977) 1-
42.
A.M. Podlewski, La dynamique des principales populations du Nord-Cameroun. (Yaoundé:
Institut de Recherches Scientifiques du Cameroun, 1965).
J.K. Pritchard, M. Stephens, P. Donnelly, ‘Inference of population structure using multilocus genotype data,’ Genetics, 155 (2000) 945-959.
F.A. Reed, S.A. Tishkoff, ‘African human diversity, origins and migrations,’ Current
Opinion in Genetics & Development, 16( 2006) 597-605.
J. Reynolds, B.S. Weir, C.C. Cockerham, ‘Estimation of the coancestry coefficient: basis for a short-term genetic distance,’ Genetics, 105 (1983) 767-779.
R.A. Rosenberg, S. Mahajan, S. Ramachandran, C. Zhao, J.K. Pritchard, M.W. Feldman,
‘Clines, clusters, and the effect of study design on the inference of human population
structure,’ PLoS Genetics, 6 (2005) 660-671.
G. Spedini, G. Destro-Bisol, S. Mondovi, L. Kaptué, L. Taglioli, G. Paoli, ‘The peopling of Sub-Saharan Africa: The case study of Cameroon,’ American Journal of Physical Anthropology, 110 (1999) 143-162.
G. Spedini, M. Stefano, G. Paoli, G. Destro-Bisol. ‘Biological and cultural contraditions? A
reply to MacEachern,’ American Journal of Physical Anthropology, 114 (2001) 361-364.
S.A. Tishkoff, S.M. Williams, ‘Genetic analysis of African populations: Human evolution and
complex disease.’ Nature Reviews Genetics, 3 (2004) 611 – 621.
A. Urquhart, N.J. Oldroyd, C.P. Kimpton, P. Gill, ‘Highly discriminating heptaplex short
tandem repeat PCR system for forensic identification.’ Biotechniques, 18 (1995) 116-121. B.S. Weir, Genetic Data Analysis II: Methods for discrete Population Genetic Data.
(Sinauer Associates, Inc., Sunderland, MA, USA, 1996).
B.S. Weir, C.C. Cockerham, ‘Estimating F-statistics for the analysis of population structure.’ Evolution, 38 (1984) 1358-1370.
Table 1a. Primer sequences and concentrations for IBG-Hvar1 12-plex PCR
Primer Locus Size Range Concentration (Het) † (base pairs) Primer Sequences (5’ to 3’) and dye labels ‡ (µM)
Amelogenin 1 103-109 F NED™-CCCTGGGCTCTGTAAAGAATAGTG 0.08 R ATCAGAGCTTAAACTGGGAAGCTG 0.08
D2S1384 121-165 F NED™-AATAGAGGGCCCTTGCTTAA 0.60 (0.67) R TTTGGGATAAAAGGTATTTTGC 0.60
D13S796 136-176 F 6FAM™-CATGGATGCAGAAT CACAG 0.20 (0.77) R TCATCTCCCTGTTTGGTAGC 0.20
D1S679 136-176 F HEX™-GCCATCAAGAAAACTAG ACTGC 0.60 (0.84) R ACCATGGTACTCAGCAGTGC 0.60
D8S1119 170-200 F NED™-TCAAAGCAGGTTACTCTCACG 1.40 (0.81) R TAAATATGGGAAGGCAGCAG 1.40
D4S1627 177-201 F 6FAM™-AGCATTAGCATTTGTCCTGG 0.30 (0.69) R GACTAACCTGACTCCCCCTC 0.30
D9S301 205-237 F NE6FAM™-AGTTTTCATAACACAAAAGAGAACA 0.50 (0.75) R ACCTAAATGTTCATCAAAAGAGG 0.50
D3S1766 200-228 F HEX™-ACCACATGAGCCAATTCTGT 0.75 (0.86) R ACCCAATTATGGTGTTGTTACC 0.75
D20S481 215-249 F NED™-TGGGTTATGAGTGCACACAG 0.40 (0.81) R AACAGCAAAAAGACACACAGC 0.40
D7S1808 252-280 F 6FAM™-CAGAACAAACAAATGGGGAG 0.50 (0.81) R CCAAATAAGACTCAGGACGC 0.50
D15S652 282-312 F NED™-GCAGCACTTGGCAAATACTC 1.40 (0.81) R CATCACTCAAGGCTCAAGGT 1.40
D6S1277 278-322 F HEX™-ACACTGCAGGGTAAGACAGC 0.60 (0.69) R AAGACAGTGTCTAAGCTGTCACA 0.60
† (Het), heterozygosity values from http://mp.invitrogen.com/resources/apps/mappairs/. ‡ Primer sequences from the GBD Human Genome Database [www.gdb.org]. 1 Primer sequences from [Krenke et al, 2002].
Table 1b. Primer sequences and concentrations for IBG-Hvar2 14-plex PCR
Primer Locus Size Range Concentration (Het) † (base pairs) Primer Sequences (5’ to 3’) and dye labels ‡ (µM)
Amelogenin 103-109 F 6FAM™-CCCTGGGCTCTGTAAAGAATAGTG 0.10 R ATCAGAGCTTAAACTGGGAAGCTG 0.10
D3S1358 101-147 F HEX™-ACTGCAGTCCAATCTGGGT 0.20 (0.79) R ATGAAATCAACAGAGGCTTGC 0.20
D5S818 119-155 F GGTGATTTTCCTCTTTGGTATCC 0.25 (0.70) R NED™-AGCCACAGTTTACAACATTTGTATCT 0.25
vWA 122-182 F 6FAM™CCCTAGTGGATGATAAGAATAATCAGTATG 0.18 (0.81) R GGACAGATGATAAATACATAGGATGGATGG 0.18
D4S2639 1 152-192 F HEX™-AAGGTTCCAGGACACATTCA 0.25 (0.88) R CTTGAAAGCTCCATAATCATACG 0.25
D13S317 157-201 F ATTACAGAAGTCTGGGATGTGGAGGA 0.30 (0.79) R NED™-GGCAGCCCAAAAAGACAGA 0.30
D9S934 2 198-238 F HEX™-TTTCCTAGTAGCTCAAGTAAAGAGG 0.25 (0.56) R AGACTTGGACTGAATTACACTGC 0.25
D8S1179 203-251 F ATTGCAACTTATATGTATTTTTGTATTTCATG 0.50 (0.82) R FAM™–ACCAAATTGTGTTCATGAGTATAGTTTC 0.50
D7S820 211-251 F NED™–ATGTTGGTCAGGCTGACTATG 0.60 (0.82) R GATTCCACATTTATCCTCATTGAC 0.60
TPOX 258-294 F GCACAGAACAGGCACTTAGG 0.40 (0.64) R 6FAM™-CGCTCAAACGTGAGGTTG 0.40
D16S539 264-304 F GGGGGTCTAAGAGCTTGTAAAAAG 0.40 (0.75) R NED™-GTTTGTGTGTGCATCTGTAAGCATGTATC 0.40
D20S470 3 161-313 F HEX™-CCTTGGGGGATATAGCCTAA 0.25 (0.94) R TGAGTGACAGAGTGATACCATG 0.25
CSF1PO 291-331 F NED™-CCGGAGGTAAAGGTGTCTTAAAGT 0.25 (0.72) R ATTTCCTGTGTCAGACCCTGTT 0.25
D15S657 4 332-60 F 6FAM™–TCTACATTGGACAGAAATGGG 0.25 (0.72) R GATACACATTCTGATTCATGCG 0.25
† (Het), heterozygosity values from http://mp.invitrogen.com/resources/apps/mappairs/. ‡ Primer sequences from [Krenke et al, 2002]. 1 Replaces TH01, primer sequences from the GBD Human Genome Database [www.gdb.org]. 2 Replaces D21S11, primer sequences from GBD Human Genome Database. 3 Replaces D18S51, primer sequences from GBD Human Genome Database. 4 Replaces FGA, primer sequences fromGBD Human Genome Database. Table 1c. Primer sequences and concentrations for IBG-Hvar3 5-plex PCR
Primer Locus Size Range Concentration (Het) † (base pairs) Primer Sequences (5’ to 3’) and dye labels ‡ (µM)
Amelogenin 103-109 F 6FAM™-CCCTGGGCTCTGTAAAGAATAGTG 0.10 R ATCAGAGCTTAAACTGGGAAGCTG 0.10
THO1 152-196 F HEX™-GTGGGCTGAAAAGCTCCCGATTAT 0.60 (0.77) R GTGATTCCCATTGGCCTGTTCCTC 0.60
D21S11 203-261 F HEX™-ATATGTGAGTCAATTCCCCAAG 0.50 (0.84) R TGTATTAGTCAATGTTCTCCAG 0.50
D18S51 1 262-342 F HEX™-CAAACCCGACTACCAGCAAC 0.25 (0.88) R GAGCCATGTTCATGCCACTG 0.25
FGA 308-464 F 6FAM™–GGCTGCAGGGCATAACATTA 0.20 (0.86) R ATTCTATGACTTTGCGCTTCAGGA 0.20
† (Het), heterozygosity values from http://mp.invitrogen.com/resources/apps/mappairs/. ‡ Primer sequences from the GBD Human Genome Database [www.gdb.org]. 1 Primer sequences from Urquhart et al [1995]. Table 2a. Allele diversity at 7 of 28 STR loci describing the extent of variation within six language groups of Northern Cameroon.
CSF1PO ² TH01 ² TPOX ² D5S818 ² D7S820 ² D13S317 ² D16S539 ²
Population Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA
Gidar 7 12 6 7 7 11 7 13 9 11 7 11 7 9 Mina 7 12 6 7 7 8,9 7 11,13 9 11 8 11 6 11 Peve 8 11 5 7 7 9 7 12 7 11 6 11 7 11 Mambay 8 12 6 7 7 9 8 12 8 11 8 10 8 11 Hdi 7 12 6 7 7 8,9 7 12,13 9 11 9 10,11 7 11,12 Mafa 6 11 5 7 7 9 7 12 7 11 7 10 8 12
Note: ² , CODIS marker; Alleles, # of observed alleles; MCA, most common allele(s).
Table 2b. Allele diversity at 7 of 28 STR loci describing the extent of variation within six language groups of Northern Cameroon.
FGA ² vWA ² D3S1358 ² D18S551 ² D21S11 ² D8S1179 ² D8S1119
Population Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA
Gidar 14 21,23 7 16 6 16 8 17 12 28 8 14 9 5,9 Mina 12 23 6 16 5 17 12 17 12 28 9 13,15 8 9 Peve 11 23,24 7 15 4 16 11 17 11 28 5 14 7 9 Mambay 12 22 8 17 4 16 10 17 14 30,31 7 14 6 4 Hdi 16 22 6 15,16 5 16 8 16,18 12 30 6 14 9 4,5 Mafa 13 23.2 7 16,17 6 15,16 12 17 11 29 6 13 7 4
Note: ² , CODIS marker; Alleles, # of observed alleles; MCA, most common allele(s).
24 Table 2c. Allele diversity at 7 of 28 STR loci describing the extent of variation within six language groups of Northern Cameroon.
D1S1679 D2S1384 D13S796 D4S1627 D4S2639 D9S301 D3S1766
Population Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA
Gidar 11 8,9 5 8 7 8 6 8 9 6 10 7 10 6 Mina 9 8 7 8,9 9 5,7 8 8 9 5 9 7 4 6 Peve 8 7 5 8 9 7 7 6,7 7 6 9 7 7 6 Mambay 9 8 8 6 10 3,5 6 7,8 8 6 9 7 6 6 Hdi 11 8 9 6,7 7 7 7 6,7,8 9 6 9 7 6 5,6 Mafa 10 8 7 6 7 4,5 6 7,8 10 6 7 7,8 4 6,7
Note: Alleles, # of observed alleles; MCA, most common allele(s).
Table 2d. Allele diversity at 7 of 28 STR loci describing the extent of variation within six language groups of Northern Cameroon.
D9S934 D20S481 D7S1808 D20S470 D6S1277 D15S652 D15S657
Population Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA
Gidar 4 6 9 7 6 5 10 4,12 9 6 7 2 7 3 Mina 5 6 8 6,7 7 5 10 9 8 5 7 2,5 4 3 Peve 4 6 8 8 7 3,4,5 9 11 6 4 9 2 6 3 Mambay 5 6 8 3 8 6 9 9,10 8 6 9 4 5 3 Hdi 4 6 7 3 7 5,6 9 12 8 6 7 4,5 5 3 Mafa 6 5 6 5 6 6 10 9,11 8 6 8 2 5 3
Note: Alleles, # of observed alleles; MCA, most common allele(s).
25 Table 3. Genetic diversity calculated from 28 STR markers.
Population Genetic Diversity index (S.E.)
Gidar 0.7413 (0.386) Mina 0.7500 (0.376) Peve 0.7320 (0.377) Mambay 0.7502 (0.376) HDi 0.7166 (0.381) Mafa 0.7423 (0.377)
Note: S.E., standard error (+/-). Table 4. Pairwise coancestry coefficients between populations and their significances. ²
Populations 1 2 3 4 5 6
Gidar - + + + + Mina 0.006 + + + + Peve 0.015 0.007 + + + Mambay 0.034 0.039 0.031 - - Hdi 0.020 0.028 0.017 0.001 - Mafa 0.035 0.041 0.033 0.003 0.002
² Significance of Fst values is shown above the leading diagonal. Population pairwise Fst values are shown below the diagonal. Genetic distances are based on 110 permutations. Table 5. Analysis of Molecular Variance (AMOVA)
Sum of Variance Percentage Grouping Source of Variation df Squares Components of Variation
Geography Between populations 2 36.62 0.596 0.95 Between population within language 3 33.47 0.832 1.32 Within populations 354 2181.85 6.163 97.73
Language Between populations 1 15.14 0.011 0.18 Between population within language 4 54.95 0.127 2.02 Within populations 359 2181.85 6.302 97.81
Language sub-family Between populations 3 36.56 -0.059 -0.95 Between population within language 2 33.53 0.179 2.84 Within populations 354 2181.85 6.163 98.10
Table 6. Proportion of membership in each of the six language groups for K = 2 to K = 6 assuming admixture and correlated allele frequencies.
Population 1 2 3 4 5 6
K = 2 Gidar 0.063 0.937 Mina 0.060 0.940 Peve 0.074 0.926 Mambay 0.853 0.147 Hdi 0.723 0.277 Mafa 0.843 0.157
K = 3 Gidar 0.065 0.382 0.554 Mina 0.053 0.250 0.697 Peve 0.069 0.313 0.619 Mambay 0.692 0.252 0.056 Hdi 0.600 0.248 0.152 Mafa 0.721 0.193 0.085
K = 4 Gidar 0.046 0.261 0.408 0.286 Mina 0.039 0.232 0.488 0.241 Peve 0.051 0.520 0.229 0.200 Mambay 0.680 0.053 0.060 0.207 Hdi 0.575 0.118 0.129 0.178 Mafa 0.704 0.067 0.077 0.152
K = 5 Gidar 0.259 0.387 0.038 0.037 0.279 Mina 0.224 0.481 0.039 0.027 0.229 Peve 0.513 0.212 0.049 0.046 0.180 Mambay 0.055 0.061 0.384 0.320 0.179 Hdi 0.107 0.122 0.259 0.360 0.152 Mafa 0.066 0.068 0.486 0.267 0.112
K = 6 Gidar 0.033 0.194 0.031 0.295 0.231 0.215 Mina 0.034 0.159 0.022 0.385 0.210 0.190 Peve 0.041 0.200 0.038 0.224 0.342 0.154 Mambay 0.374 0.089 0.305 0.045 0.063 0.125 Hdi 0.242 0.125 0.344 0.075 0.092 0.122 Mafa 0.473 0.083 0.246 0.052 0.062 0.083
Table 7. Allele diversity for 5 STR loci in Northern Cameroon, Bamileke, and Ewondo language groups.
TH01 TPOX vWA D18S51 D21S11
Population Alleles MCA Alleles MCA Alleles MCA Alleles MCA Alleles MCA
N. Cameroon ² 6 7 7 9 8 15,16 16 17 21 28 Bamileke ³ 6 7 7 8 9 15,16 10 15,17 13 28 Ewondo ³ 5 7 6 8 9 17 11 15 14 28
Note: ² , represents the overall allele frequencies across Gidar, Mina, Peve, Mambay, HDi, and Mafa language groups examined here; ³ , reported in Destro-Bisol et al [2000]; Alleles, # of observed alleles; MCA, most common allele(s).
30 Figure Captions
Figure 1. Linguistic map of Northern Cameroon [Ethologue].