Copyright 0 1997 by the Genetics Society of America

The Geographic Distribution of Human Y Variation

M. F. Hammer,* A. B. Spurdle? T. Karafet,* M. R. Banner,* E. T. Wood,* A. Novelletto,’ P. Malaspina,: R. J. Mitchell,§ S. Horai,** T. Jenkinst and S. L. Zeguratt

*Laboratory of Molecular Systematics and Evolution and ttDepartment of Anthropology, University of Arizona, Tucson, Arizona 85721, tDepartment of Human Genetics, South Afiican Institute for Medical Research and the University of Witwatersrand, Johannesburg 2000, South Africa, ‘Departimento di Biologia, Universita degli Studi “Tor Vergata”, Rome 001 73, Italy, §Department of Genetics and Human Variation, La Trobe University, Melbourne, Victoria 3038, Australia, and **Department of Human Genetics, National Institute of Genetics, Mishima 41 1, Japan Manuscript received July 7, 1996 Accepted for publication November 27, 1996

ABSTRACT We examined variation on the nonrecombining portion of the human Y chromosome to investigate human evolution during the last 200,000 years. The Y-specific polymorphic sites included the Y Alu insertional polymorphism or “YAP” element (DYS287), the poly(A) tail associated with theYAF’ element, three point mutations in close association with the YAP insertion site, an A-G polymorphic transition (DYS271), and a tetranucleotide microsatellite (DYSl9). Globalvariation atthe fivebi-allelic sites (DYS271, DYS287, and the three point mutations) gave rise to five “YAP haplotypes” in 60 populations from Africa, Europe, Asia, Australasia, and the New World (n = 1500). Combining the multi-allelic variation at the microsatellite loci (poly(A) tail and DYS19) with the YAP haplotypes resulted in a total of 27 “combination haplotypes”. All five of the YAP haplotypes and 21 of the 27 combination haplotypes were found in African populations, which had greater haplotype diversity than did populations from other geographical locations. Only subsets of the five YAP haplotypes were found outside of Africa. Patterns of observed variation were compatible with a variety of hypotheses, including multiple human migrations and range expansions.

ENETIC data have played a key role in current largest nonrecombining region in the , G debates about the origin of modern humans. In the male-specific portion of the Y chromosome. Al- particular, human mitochondrial DNA data (CANN et though the firstY-linked restriction fragment length al. 1987; VIGILANTet al. 1991; STONEKING1993; PENNY polymorphisms were identified in 1985 (CASANOVA et et al. 1995) have been interpreted to support “Recent al. 1985;LUCOTTE and NGO1985), the paucity of known Out of Africa” models (MINUGH-PURVIS1995) rather variation on the Y chromosome throughout the late than the “Multiregional Evolution” model (FRAYERet 1980’s limited the role of this chromosome in testing al. 1993). The realization that mtDNA represents a sin- hypotheses about the origin and migration patterns of gle has led to the suggestion that accurate infer- human populations. However, recent methodological ences about human population history require the ex- advances in the detection of both single nucleotide site amination of many loci (STONEKING1993; SZATHMARY and microsatellite variation on the Y chromosome 1993; TEMPLETON1993). Indeed, extensive batteries of (HAMMER 1995;JOBLING and TYLER-SMITH 1995; WHIT- multilocus data have been used to portray population FIELD et al. 1995; UNDERHILL et al. 1996) have facilitated affinities within Homo sapiens (BATZERet al. 1994; BOW- studies encompassing such diverse topics as paternal COCK et al. 1994; MOUNTAINand CAVALLI-SFORZA1994). migration at the continentallevel (TORRONIet al. 1994a; However, as TEMPLETON(1993, 1996) has pointed out, HAMMERand HORAI1995; KARAFET et al. 1997; UN- there have been no critical tests of the two major com- DERHILL et al. 1996), the demographichistories of single peting human origin(s) models. Moreover, there are populations and the genetic affinities of closely related inherent advantages in studying genetic systems without groups (ROEWERet al. 1996),the number of male recombination. For instance, they provide opportuni- founders in isolated populations (NYSTUENet al. 1995), ties to (1) determine character state polarity, (2) con- levels of population admixture (HAMMERand HORAI struct trees showing the phylogenetic history of individ- 1995; KARAFET et al. 1997), andmale and female repro- ual haplotypes, and (3) estimate the time frame for the ductive patterns (LUCOTTEet al. 1994; SALEMet al. branching events in the trees. 1996). Over the last decadeattention has turned to the Although our paper is not concerned with testing hypotheses associated withRecent Out of Africamodels or Multiregional Evolution, it does address possible Corresponding author: Michael Hammer, Department EEB, Biosci- ences West, University of Arizona, Tucson, AZ 85721. early and subsequenthuman migrations from both E-mail: [email protected] global and regional perspectives. Mitochondrial DNA

Genetics 145: 787-805 (March, 1997) 788 M. I?. Hammer et al.

TABLE 1 YAP haplotype and DYSl9 allele frequencies in 60 populations (n = 1500)

~~~ YAP haplotype DYSl9 allele Population n 1 23 45ZABc DE Khoisan (KHO) Sekele 34 32 24 0 12 32 9 1515 12 37 12 Tsumkwe 31 51 42 0 3 3 27 16 3 19 19 16 Nama 13 62 8 0 15 15 0 23 8 23 15 31 Pygmies (PYG) Mbuti 14 50 0 14 0 36 7 2921 21 7 14 Biaka 14 42 000 58 0 0 0 21 36 43 West Africans (WAF) Gambians" 48 15 0 17 10 58 0 8 14 29 29 19 Nigerians" 8 0 0 13 0 87 0 0 63 13 13 13 East Africans (EAF) Bagandans 29 10 0 10 0 80 0 038 13 28 21 Kenyans" 18 6 0 11 0 83 0044 17 33 6 Ethiopians" 3 67 0 0 33 0 0 033 33 33 0 East Bantus (EBA) Nguni" 41 22 0 15 2 61 0049 17 29 5 Zulu 28 21 0 18 4 57 0 039 22 32 7 Ndebele 7 14 0 0 0 86 0 0 0 71 29 0 Xhosa 4 25 0 25 0 50 0 050 25 25 0 Swazi 2 50 0 0 050 0 0 0 100 0 0 Sotho/Tswana" 37 30 05 5 60 0 5 8 59 19 9 Sotho 15 47 0 7 33 13 0 13 60 14 13 0 Tswana 13 15 08 0 77 0 0 8 61 23 8 Pedi 9 22 000 78 0 0 0 56 22 22 Tsonga 6 50 0 0 33 17 0 17 0 66 17 0 Venda 2 50 0 0 0 50 0 0 0 50 50 0 West Bantus (WBA) Herero/Himba" 29 28 0 0 0 72 0 0 38 14 34 14 Herero 27 26 000 74 0 034 15 36 15 Himba 2 50 0 0 0 50 0 0 50 50 0 0 Ambo 26 4 0 8 8 80 0 853 12 15 12 Enigmatic Southern Africans (ESA) Dama 22 21 59 5 60 0 5 40 36 5 14 Lemba 34 67 0 3 6 24 0 624 61 9 0 North Africans (NAF) Egyptians" 93 51 0 1 46 2 1 1921 45 12 2 West Asians (WAS) Saudi Arabians" 22 91 005 5 0 523 55 18 United Arab Emirates" 20 90 0 0 5 5 0 515 65 15 Omanis" 11 100 00 0000 36 64 0 Iranians" 5 80 00 0 20 0080 20 0 East Asians (US) Japanese" 132 58 0 42 0 0 0 8 3 49 25 15 Aomoris 26 61 0 39 0 0 077 55 24 7 Shizuokans 61 67 0 33 0 0 0 8 2 57 21 12 Okinawans 45 45 0 55 0 0 072 36 32 23 Tibetans" 30 53 0 47 0 0 0067 23 10 0 Koreans" 27 100 0 0 0 0 0 0 26 22 41 11 Taiwanese" 76 99 0 1 0 0 1 438 28 25 4 South Chinese" 48 100 000 0 0 452 10 25 8 South Asians (SAS) Indians" 39 100 000 0 0 051 23 21 5 Southeast Asians" 47 100 0000 0 2 7 55 23 13 Vietnamese 13 100 0 0 0 0 0 8 8 46 23 15 Philippinos 10 100 0 0 0 0 0060 10 10 20 Malaysians 10 100 0 0 0 0 0 0 0 70 10 20 Laotians 7 100 0 0 0 0 0 0 0 43 57 0 Indonesians 4 100 0 0 0 0 000 50 50 0 Cambodians 3 100 0 0 0 0 0033 67 0 -0 Global Human Y Haplotype Variation 789

TABLE 1 Continued

YAP haplotype DYSl9 allele Population n 1 23 4 5ZAB C DE South Europeans (SEU) Greeks" 75 83 0 0 25 0 0 22 27 37 13 1 Mainland Greeks" 62 42 0 0 38 0 0 31 38 17 12 2 Island Greeks'' 82 17 0 0 18 0 0 12 35 41 12 0 Cretans 92 24 0 0 8 0 0 1233 38 17 0 Italians" 86 208 0 1 13 0 0 14 37 31 10 8 General Italians" 52 83 0 0 17 0 0 19 46 23 6 6 Sardinians 55 93 0 0 7 0 0 7 25 33 15 20 Calabrians 90 29 0 0 10 0 0 741 45 4 3 Venetians 89 27 0 0 11 0 0 1126 41 19 3 Puglians 21 81 0 0 19 0 0 2429 38 9 0 Sicilians 79 24 0417 0 0 21 25 42 4 8 North Europeans (NEU) Germans" 93 30 0 0 7 0 0 730 50 10 3 British" 98 43 0 0 2 0 0 12 63 23 2 0 South African Europeans" 95 20 0 0 5 0 0 5 60 15 15 5 Australasians (AUS) Australian Aboriginal People" 98 43 0 0 20 0 0 23 35 33 9 Great Sandy Desert 100 36 0000 0 036 17 36 11 Western Australians" 7 86 0 0 14 0 0 0 57 29 14 0 Northern Australians" 100 19 0 0 0 0 0 026 32 11 32 Papua New Guineans 100 48 0 0 000 0 46 33 15 6 Native Americans (NAM) Navajos 100 47 0000 0 57 32 11 0 0 Frequencies were rounded off to the nearest percent. As a result some haplotype rows do not sum to 100%. Composite samples. has been used to trace maternally based genetic trails A-G transition (SEIELSTADet al. 1994) and the DYS19 from Asia both to the Pacific (MELTONet al. 1995) and microsatellite (ROEWERet al. 1992) loci was investigated. to the Americas (MERRIWETHERet al. 1995). Likewise, The combination of data from these loci with the YAP Ychromosome (paternal) genetic trails have been used haplotypes resulted in a total of 27 "combination" hap to test hypotheses about the peopling of Japan (HAM- lotypes and formed the basis for our exploration of MER and HORAI1995) and to formulate theories about global Y chromosome variation. the peopling of the Americas (WETet al. 1997; UN- DERHILL et al. 1996). Unfortunately, there has been only MATERIALSAND METHODS one published study that qualifies as a global survey of Sample composition: We analyzed a total of males human Y chromosome haplotype variation, JOBLING 1500 from 60 populations. These populations were categorized into and TYLER-SMITH'S(1995) excellent review paper, 15 major groups based on geographic, linguistic, and ethno- which also presents new compound haplotype data. historical criteria (Table 1). A total of 78 samples came from HAMMER(1995) performed a sequencing survey of Khoisan-speakers originating from southern Africa (SPURDLE 16 geographically diverse humans for a 2.6-kb region andJENKINs 1992). This grouping was composed of the Khoi (referred to as the "YAP" region) on the long arm of (13 Nama from Namibia) and the !Kung San (34 Sekele and 31 Tsumkwe from Namibia). We studied 28 Pygmies (Bow- the human Y chromosome and constructed a tree with COCK et al. 1994) including 14 Mbuti Pygmies from Zaire and five Y chromosome haplotypes. The mean time to the 14 Biaka Pygmies from the Central African Republic. Biaka common Y chromosome ancestor was estimated to be Pygmies speak languages belonging to the Niger Kordofanian - 188 kya (thousand years ago) with a 95% confidence family, whereas, Mbuti Pygmies speak centralSudanic lan- interval from 51 to 411 kya. Other estimates of the age guages belonging to the Nilo-Saharan language family (CA- VALLI-SFORZAet UL. 1994). The West African sample consisted of our common Y chromosome ancestor concur in sug- of 48 diverse Gambians (14Wolof, 13 Mandinka, seven Serere, gesting that analyses of Y-linkedvariation will eventually four Manjago, three Jola, two Fula and five other) (SOODYALL be useful for testing hypotheses about human evolution et al. 1996), and eight Nigerians (WAINSCOATet al. 1986). The during the last 200,000 years (Fu and LI 1996; WEISS East African sample consisted of 29 Bagandans from Uganda, and VON HAESELER1996). The present study surveys 18 Kenyans (WAINSCOATet al. 1986;MATHIAS et al. 1994), and three Ethiopians. The 141 samples from South African Bantu- theaforementioned five YAP haplotypes in a large speaking groups representeddifferent chiefdoms, eachspeak- worldwide sample. In addition, variation at the DYS271 ing a different Bantu language (SPURDLEandJENKlNs 1992). 790 M. F. Hammer e/ nl.

The southeastern antl so~~thwesternBantu-spcakel-s are postw DYSl9 DYS271 DYS287 I;~rctl to havr followed dilTerent migratol-\’ paths from west- P E-== I MI I Im ccntl-;d Africa, a generalsorlthbound course (southeastern B;ultu) antl a southwsterly route;Icross central Africa toward the wstern parts of the continent (southwesternBantu) (SI~IXDI.I;.and JETKISS 1992). Eighty-six southeastcrn Bantu speakers (referred to as “East Rantus”) were grouped into 41 Nguni (28 Zulu, sewn Nclcl~ele,four Xlnosa, and two Segment 1 Segment 2 Segment 3 Swzi), 37 Sotho/Tswana (1.5 Sotho, I3 Tswana, andnine PNl poly(A) PN3 PN2 I’cdi), six Tsonga, and two I‘cnda. The .5.5 southwestern Bantu speakcrs (rcferretl to as “West Rantus”)included 27 from * polymorphic nucleotide + PCR oligonucleotide If poly(A) lengthvarlatlon the I-lercro, 26 from the Ambo, and two from the Himha - hybridization probe -0.5 kb chicldoms. Two cnignlatic southern African groups were in- clutletl in this study. The 34 I.cmha are eastern Bantu speakers FIGURE l.”Schematicrepresent;ltion of the Ychromosomr (\'entia) with a possible Semitic origin (S1v’RI)I.E andJENKINS and the specific genetic regions studied. The short arm (p), 199(i),and the 22 Dama are from a Khoi-speaking group, who the centromere (shaded o\.nI),antl the long arm (q) of the Y are biologically more closely related to non-Khoisan African chromosome are indicated in the top illustration. The three groups (SIY’RIM.Ilw)I.E and JESKINS 1992). the 3‘ end of the YAP element and hybridizes to a number The Austml;1sian sample incldctl 43 Australian Aboriginal of Nu element5 in the genome; whereas, the F1070 primer Pcoplr (36 from thc northern edgeof the Great Sandy Desert is derived from a flanking singlecopysequence. Genomic and srwn lionl thewest coast ofAustralia) (PERNAe/ nl. 1992), DNAs were amplified in a 100 pl final volume containing 12.5 19intligctnous northern Australians, and 48 Papria New Chin- ng genomic DNA, 0.12 p\t each primer, 0.2 m\I each dNTP, cans (1% from the coxt antl 30 from the highlands) (STONE- 2..5 mM MgCI.,, 50 mb1 KCI, IOm\t Tris-HCI (pH 8.3),and 0.5 and 14’11.~0s1989). A single sample of47 Navajos from U AmpliTaq DNA polymerase (Perkin-Elmer). The cycling Norrhern Arizona mtl New Mexico represented Native Ameri- conditions were 94” for 2 min, antl then 30 cycles of 94” for c;\ns. 1 min, 51” for 1 min, and 72” for I min. A fourth segment DNA extraction: DNA samplrs provided by other investiga- encompassing the insertion site of the YAP element (not tors were as follows: seven Nigerians and three Kenyans by shown in Figure 1) was amplified as tlescribed in HMNER I. S. M’..\ISS(:O;Y~;14 Riaka Pygmies, 14 Mhuti Pygmies, three and HORN (1995). Ethiopians, and three Cambodians by K. KIDI),J. knn, and The DY.7271 and DYSf 9 loci were amplified and genotyped J.ROGF.RS; 29 Baganchns by I,. I,O~;IE;21 Egyptians by Y. GAD; according to the procedures of SF.IIe/ crl. (1994) and 30 Germans by G. R,\PIY)I.I>;27 South Chinese, 43 Australian HAMMER andHORN (199.5), respectively, Polymorphism in Aboriginal Pcople, antl 48 Papua New Guineans by M. SrmE- the length of the poly(A) tail on the 3’ end of the A111 se- KIS(;; and 48 G;unbians by A. V. S. HIt.1.. Blood samples from quence was analyzed in the following manner. PCR products Taiwan w~eprovitled by I.. I,. HSIEII.”hole blood DNA was were mixed with loading dye and resolved on 20 X 3.5 X Global Human Y Haplotype Variation 79 1

0.15 cm 6% polyacrylamide/O.l5% bisacrylamide gels in TBE unconstrained branchlengths and assume both additivity and buffer (50 mM Tris-borate/EDTA pH 8.3) for 8 hr at 300 V. independent errors. The UPGMA option constructs a tree Following electrophoresis, the gel was stained with ethidium by successive agglomerative mergers using an average-linkage bromide and the fragments were visualized by UV light. method and assumes that an “evolutionary clock” is valid. Site-specificoligonucleotide (SSO) hybridization: SSO Accordingly, the true branch lengths from the root of the probes were designed according to the methods described tree to each tip are the same and the expected amount of by IKUTAet al. (1987). Figure 1 shows the location of the evolution in any lineage is proportional to elapsed time. The hybridization probes to detect the polymorphic base substitu- CONTML program was used to estimate phylogenies based tions in segments 1 and 3. The sequences of these probes on restricted maximum likelihood using Y chromosome hap- are as follows: Probe lA, 5’-GAGAGCCTTTTGTCT3’; lotype frequencies (FEISENSTEIN1981). This program takes probe lB, 5’-TTAAGACAAAAAGCTCTCG3’; probe 2A, 5’- haplotype frequencies as input and performs a square-root AAACTAATGCCTTCTCCTC-3’; probe 2B,5”GGAGGA- transformation onthe data. Thenthe Brownian motion GAAAGCATTAGTT-3’; probe 3A, 5”GCTTGAAGGGGAGG model is used on the resulting coordinates. It assumes that TAAAT-3’; and probe 3B, 5’-AATTTACCTCCCCCTCAAG3’. each locus evolves independently via genetic drift. The CON- The probes were labeled with [yP2P]-ATP (Amersham) to a SENSE program was used to read the trees produced from the specific activity of at least lo8 cpm/pg DNA. Approximately CONTML, NEIGHBOR, and FITCH programs and compute 200 ng (5 pl) of each amplified DNA was added to denatur- majority-rule consensus trees. ationbuffer (0.4 NaOH, 25 mM EDTA) anddotted on a nylon membrane (Amersham). The DNAwas fixed to the membrane by UV irradiation with a Stratalinker UV cross- RESULTS linker (Stratagene). Membranes were prehybridizedin hy- Variation at four polymorphic sites within the bridization solution (5X SSPE, 5X Denhardt’s, 0.5% SDS) for YAP 30 min at 53”. Labeled SSO probes were then added directly region (DYS287): The frequency of Y to the hybridization solution to a concentration of 2 pmol/ with the YAP element (YAP’) varied greatlyamong geo- ml, and hybridization was carried out for 2 hr at 53”. Mem- graphical regions. Consistent with previous surveys branes were rinsed in wash solution (2X SSPE, 0.1% SDS) at (HAMMER 1994; SEIELSTADet al. 1994; SPURDLEet al. room temperature, washed at 53 for 30 min, and exposed to 1994),YAP+ chromosomes were found at high frequen- film for 2-48 hr. Data analysis: For each individual, the combination of se- cies in samples throughoutthe African continent quence variants (both point and lengthmutations) observed (mean frequency = 61.7%): EastAfricans (88.0%), across the three YAP segments and the DYS271 and DYS19 West Africans (87.5%), South African Bantu-speakers loci is referred to as a Y chromosome haplotype. An unbiased (76.6%), Pygmies (53.5%), and North Africans estimate of haplotype diversity (h) equivalent to heterozygos- (49.5%). TheKhoisan sample had the lowest frequency ity in each population was calculated using Equation 8.5 of the NEI (1987). The variance of h was determined as described of YAP’ chromosomes (26.9%) of all African sam- in NEI (1978). The diversityvalues can theoretically range ples. Although YAP’ chromosomes were not found in from 0 to 1, with 0 representing the minimum value (ie., a the SouthAsian or New World samples, they werefound single haplotype fixed in a population) and 1 representing at extremely low frequencies in the Australasian sample the maximum possible genetic variability (ie., an equal fre- (1.0%), at low frequencies in the West Asian (8.6%) quency of an infinitely large number of haplotypes). The same and European (13.8%) samples, and at moderate fre- measure was used to estimate diversity carried by each of the fiveYAP haplotypes. Fst analyses were performed using the quencies in the East Asian sample (22.3%). computer program Antana (HARPENDING and ROGERS1984) Direct DNA sequencing of theYAP element in a small for the YAP data. For the microsatellite locus, MICHALAK~S sample of YAP+ chromosomes demonstrated that the and EXCOFFIER’S(1996) PHIst withAMOVA procedure was number of adenine residues at the 3’ end of the Alu adopted. These PHIst values are related to SLATKIN’S(1995) sequence(poly(A) tail) is variable (HAMMER1995). Rst (GOLDSTEINrt al. 1995). Associations between YAP haplo- types and alleles at the DYSlY locus were investigated using Three poly(A) tail length alleles were identified at this the linkage disequilibrium (0)method of WEIR(1990). Nor- mononucleotide microsatellite site: long (L: 46 bp), malized disequilibrium values (D’) between -1 and +1 were short (S: 26 bp), andvery short(VS: 19 bp). Oligonucle- determined by dividing D by the appropriate denominator otide primers were designed to amplify an -220-bp given on p. 97 in WEIR(1990). DNA fragment encompassing the poly(A) tail of the The PHYLIP Package (FELSENSTEIN 1993)was used to com- pute genetic distances for use in distance matrix programs YAP element (Figure 1, segment 2). After PCR amplifi- and forconstructing maximum likelihood phylogeny esti- cation of this segment from all 437 YAP’ chromosomes mates based on Y chromosome haplotype frequencies. Ge- in this survey, we observed the three known length al- netic distances were generated using two different methods: leles (Figure 2A) as well as a new poly(A) allele interme- CAVAI.I.I-SFORZA’Schord distance (4D)(CAVALLISFORZA and diate in size (36 bp) between the L and S alleles. We BODMER1971) and REYNOLDS et aL’s (1983) genetic distance (Fst). These methods assume that there is no new mutation refer to this as the medium (M) allele (not shown in and that all frequency changes are due to genetic drift. Figure 2A). The L allele was associated with88.496, Constant and equal population sizes are not assumed. The 9.9%, and 2.2% of the East Asian, African, and Euro- neighbor-joining (NJ) (SAITOUand NEI 1987) and the Un- pean YAP’ chromosomes, respectively. The S allele (ab- weighted Pair Group Method Using Arithmetic Averages sent in East Asia) was associated with 97.8% and 90.1% (UPGMA) (SOKAI.and MICHENER1958) options of the pro- gram NEIGHBOR and the Fitch and Margoliash (FM) option of the European and African YAP’ chromosomes, re- (FITCH andMARGOIJASH 1967) of the program FITCH were spectively. The VS and M alleles were identified only in used to construct branching diagrams from matrices of pair- association with East Asian YAP’ chromosomes (7.1% wise distances. The NJ and FM programs fit a tree that has and 4.5%, respectively). 792 M. F. Hammer et nl.

YAP- YAP+ %: 69.3 1.5 6.7 7.9 14.5 h0.763 : 0.617 0.716 0.490 0.657 YAP: 1 2 3a 3b 3c 4 5 DYS19: ZABCDE BCDEFCD CE ZABC ABCDE

Y haplotype (-188 kya)7 FIGURE3.- Ychromosome haplotype tree based on allelic associations at the YAP, DYS271, and DYS19 loci. Haplotypes carrying the YAP element are referred to as YAP' and those lacking theYAP element are referred to asYAP- (the ancestral state). The thick solid lines represent the branching order and relative time of appcarancc of lineages based onthe YAP insertion and single nucleotide mutational changeswithin the YAP region (HM!NI:.R199.5) and the DYS271 locr~s.Thin solid lines refer to lineages created solely by changes in the length FIGL~REZ.-Detection and rapid genotvpingof alleles atthe sf the poly(A) tail associated with the YAP element (M, VS). poly(A) tail of the YAP element and the three polvmorphic Mutational changes at the variable nucleotide (PNI-3 and nucleotide sites in the YAP region. (A) Polyactylamide gel DYS.271) and poly(A) tail (L, M, S, VS) sites are indicated by electrophoresis of the segment 2 PCR product from 1.5 YAP' short solid bars across branches of the tree. Dotted lines de- males. Lanes 2 and 12 contain samples with the polv(A)-VS pict lineageswith different DY.Sl9alleles, and the uncertainty allele; lanes 1, 3, 4, 7-11, 13, and 15 contain samples with in order and timing of origination of these lineages is illus- the polv(A)-S allele, and lanes 5, 6,and 14 contain samples trated by the use of dotted lines that remain detached from with the polv(A)-Lallele. Lanes marked with M contain a the YAP haplotype lineages. For each YAP haplotype (bold molecular weight standard (pBR-Msj, 1: 242,238,217, and 201 numerals) the following are shown above the tree: frequency bp). (B) Autoradiograph of SSO hybridization with probes 2A (76). diversity (II), and associations with DYS19allelic states. and 2B. Seventeen DNA samples were hybridized in turn with the 2A and 2B probes. The hybridization signals seen in the ated with YAP' chromosomes produced three subtypes top two rows indicate detection of the C allele bv probe 2A, of haplotype 3, referred to as 3a, 3b, and 3c (Figure 3). while signals inthe bottom two rows indicate detection of the T allele bv probe 2B. Because the S allele was associated with the PNI-T and PN2-T variant.. in allcases in this survey, it was not possible to determine the order of occurrence of the SSO typing of variation at three polymorphic nucleo- poly(A)L-S or PN2 GT mutational evens (Figure 3). tide sites (PNl -PNS) in the YAP region (Figure 1) was Variation at the Dl3271 locus: The A-G transition at performed on 16 individuals of known sequence (HXM- the DYS271 STSlocus was typed in 352 individualsrepre- MER 1995) to verify the accuracy of the method. For senting all 15 major groups in Table 1.Initially, the each of the three polymorphic sites (PNl, GT; PN2, DYS27I-G allele was found to be associated only with GT; PN3, GA), specific hybridization with only the cor- YAP haplotype 5; therefore, most of the subsequent typ respondingprobe was observed (Figure 2B). Subse- ing was performed on haplotypes 4 and 5. All 225 haplo- quently, all 1500 individuals were typed at PNl, PN2, type 5 chromosomes typed had the DYS271-G allele. The and PN3 to determine the frequencies of the YAP re- DY.5271-A allele was found associated with all haplotype gion sequence variants. The PNl-T variant was found 4 chromosomes (n= 81) and all the typed chromosomes in Africa at a frequency of 42.8% and in West Asia at a with haplotypes 1-3 (n = 46). These data suggest that frequency of 5.2%. The PN2-T variant, absent in the the DYS271 A-G transition occurred on the lineage lead- Asian samples, was found at a frequencies of 55.5.% ing to YAP haplotype 5, after the occurrence of the PN2 and 13.5% in African and European populations, re- GT transition (Figure 3). Because we did not find either spectively. The PN3-A variant was found only in Khoisan a haplotype 4 chromosome associated with the DY.5271- populations, where it had a frequency of 28.2%, and in G allele or a haplotype 5 chromosome associated with a single Dama individual. There was maximal linkage the DyS271-A allele, it was not possible to determine disequilibrium among all of the polymorphic loci in the order of occurrence of the PNl GT and DYS271 the YAP region, consistent with the hypothesis that each mutational events. Although additional survevs may un- of the polymorphisms originated a single time in hu- cover a chromosome with one of these intermediate hap man evolution. Variation at the four polymorphic sites lotypes, inthe Ychromosomes surveyed here the DSY271 resulted in five YAP haplotypes (Figure 3, thick solid A-G transition did not add a new lineage to the YAP lines) (HAMMER1995). The four length alleles associ- haplotype tree shown in Figure 3. Global Human Y Haplotype Variation 793

Variation at the DYS19microsatellite locus: Polymor- phism at the DYSl9 locus on the short arm of the Y chromosome is due to differences in the number of GATA tandem repeats (ROEWERet al. 1992). Primers designed to unique sequences flanking the repeated region amplify fragments ranging in size from 178 to 206 bp in four-nucleotide increments reflecting eight to 15 copies of the GATA motif (ROEWERet al. 1992; CIMINELLIet al. 1995; SANTOS et al. 1996). In previous surveys five alleles (A-E) corresponding to the 186- to 202-bp fragments were found to becommon in human populations (ROEWERet al. 1992; SANTOSet al. 1993; GOMOLKAet al. 1994; MULLERet al. 1994). Recently, three rare allele classes have been identified: the 178- bp (178-bp allele) ( CIMINELLIet al. 1995), the 182-bp (Z allele) (SANTOSet al. 1996), and the206-bp (F allele) (HAMMER and HOW 1995). In this survey, we found alleles Z-F at the following worldwide frequencies: 0.7%, 10.076, 27.1%, 35.5%, FIGURE 4.-Geographical distribution of Y chromosome 18.l%, 8.4%, and O.l%, respectively. Although allele haplotype frequencies. Nine of the 15 major groups in Table 1 are shown by large circles.The 15 smaller circlesare compo- profiles for most samples studied were unimodal, the nent populations of the other six major groups.The frequen- highest frequency allele vaned among groups. Similar cies of the five YAP haplotypes are shown ascolored pie charts to previous studies (GOMOLKAet al. 1994; MULLERet al. (see left insert at bottom of figure for color codes). The pro- 1994; SANTOS et al. 1996), the B allele was found to be portions of the different DYS19 alleles associated with each the most frequent variant in northern Europe (58.1%) YAP haplotype are indicated by solid lines withinthe colored portions of the pie chart (see right insert for orientation of (Table 1).The B allele was also the modal allele in the DYSl9 alleles within each YAP haplotype). When six DYSlB South European sample (mean frequency = 36.8%); alleles are present, the order of appearance is as shown in however, when the Italian and Greek samples were bro- the insert; however, when fewer alleles are present, consult ken down into subpopulations there was a great deal Figure 5A for their identity and order. Three-letter codes not of variation among these lower-level units. Some of this defined in Table 1 include: Dam, Dama; Lem, Lemba; Jap, Japanese; Tib, Tibetans; Kor, Koreans; Tai, Taiwanese; Sch, intruregional variation has been attributed to isolation South Chinese; Ind, Indians; Sea, Southeast Asians; Png, Pa- and genetic drift (CIMINELLIet al. 1995). In contrast, pua New Guineans; Aap, Australian Aboriginal People; Bri, the C allele was the modal allele in all East and South British; Ger, Germans; Ita, Italians; Gre, Greeks. Asian samples (mean frequencies = 46.3% and 53.5%, respectively) except for theKoreans and Laotians where 4, see below), haplotype 1 was the only YAP haplotype the D allele was more common. The C allele did, how- found in theNew World, South Asian, and Australasian ever, predominate in a larger sample of Koreans not samples. Furthermore, within East Asia it was the only reported in this paper (n = 247, C = 38.9%, WOOK haplotype found in the Korean, South Chinese, and KIM,personal communication). This predominance of Taiwanese samples (except fora single Taiwanese indi- the C allele is similar to the majority of the results from vidual with haplotype 3). Haplotype 2 (filled-green in other surveysof Asian populations (GOMOLKAet al. Figure 4) was limited to the Khoisan-speaking popula- 1994; MULLERet al. 1994; HAMMER and HOW 1995). tions and varied in frequency from 4.5% in the Dama The C allele was also the modal allele in the Australasian to 41.9% in the Tsumkwe San. Except for the Khoisan, sample (38.2%) and inmostAfrican populations (Table haplotype 3 (filled-red in Figure 4) was found in all 1). Theabsence of the A allele in theAustralasian sam- African samples (mean African frequency = 6.1%) at ple distinguished it from the East and South Asian sam- frequencies ranging from 3.6% (WestBantus) to 16.1% ples. Consistent with previous studies of Native Ameri- (WestAfricans). Haplotype 3 was not found in theEuro- can populations (PENA et al. 1995; SANTOS et al. 1996; pean samples (except in a single Sicilian not apparent UNDERHILLet al. 1996), the A allele was found to pre- in Figure 4 due to its low frequency) but was the only dominate in the Navajo sample. YAPf haplotype in the East Asian samples (mean fre- YAP haplotype frequencies: The frequencies of the quency = 22.4%). Within East Asia haplotype 3 was five YAP haplotypes in all 60 populations examined in found at high frequencies only in the Japanese and this study are given in Table 1 and are shown for 24 Tibetan samples, and ina single Taiwanese male (Table selected groups in Figure 4 (color). Haplotype 1 (filled- 1).Haplotype 4 (filled-blue in Figure 4) was widespread white in the pie chart of Figure 4) was found in all in Africa (except in the Pygmy samples) and ranged in populations and was the most frequent worldwide frequency from 2.0% in East Africa to 46.2% in North (mean frequency = 69.2%). Other than a single West- Africa (mean African frequency = 12.7%). This haplo- ern Australian with haplotype 4 (not shown in Figure type had a distribution pattern opposite to that of hap 794 M. F. Hammer et al. lotype 3 outside of Africa: it was present as the major alleles C (D’ = 0.930), D (D‘ = 1.O) , and E (D‘ = 1.O) , YAP’ haplotype in Europe (13.5%) and was absent in as well as between haplotype 4 and allele A (D’= 0.82). the Asian samples. Haplotype 4 was also present in a TwoAfrican groups,the Dama and Lemba, were single aboriginal male from Western Australia; however, termed“enigmatic” because of unclear origins and this occurrence is likely due to European paternal ad- possible extensive admixture. Their combinationhaplo- mixture. Haplotype 5 (filled-yellow in Figure 4) was type distributions (Figure 5A) were indicative of pater- limited almost entirely to sub-Saharan Africa where it nal admixture. The Khoi-speaking Dama shared haplo- was the most frequent YAP haplotype (mean frequency type 2 with the Khoisan and haplotype 3 with non- = 52.1 %) . The frequency of haplotype 5 ranged from Khoisan Africangroups. Moreover, their overall profiles 17.9% in the Khoisan sample to 76.4% in the West for haplotypes 1 and 5 were more similar to those of Bantu sample. This haplotype was also found at low the non-Khoisan populations substantiating other bio- frequencies in the West Asian (5.2%) and the North logical evidence for thekinship of the Dama with these African (2.2%) samples. non-Khoisan groups (SPURDLEand JENKINS 1992). The YAP/DYS19 combination haplotype frequencies: The Venda-speaking Lemba exhibitedan extremely high association of the fiveYAP haplotypes (including poly(A) frequency of haplotype 1B (58.8%), which was also tail subtypes) with alleles at DYS19 produced 27 Y-chro- found at high frequencies in WestAsia (48.3%) and mosome combination haplotypes (Figure 3). Histogram North Africa (22.6%), but was absentin other East representations of the 27 combination haplotype fre- Bantu speakers. This pattern may reflect the retention quencies are given for the 15major groups in Figure 5A. of Semitic Ychromosome haplotypes by this EastBantu- YAP- haplotypes lC, lB, and 1D were the most common speaking population characterized by its rudimentary worldwide withfrequencies of 25.9%,22.9%, and11.6%, Semitic culture(SPURDLE and JENKINS 1992, 1996). respectively. The most common YAP+haplotypes in this Haplotype 5 in the Lemba, however, clearly had a sub- survey were haplotypes 5C (6.5%), 4A (5.2%), and 5D Saharan African origin supporting the hypothesis of a (5.0%). All other haplotypes were relatively rare with significant male contribution from neighboring South frequencies under 5%. There was a nearly continuous African Bantu-speaking populations. DYS19 diversity: The regional differences in patterns series of DYSl9 allele sizes associated witheach YAP hap of variation at the DYSl9 locus were reflected in differ- lotype (Figure 3, dotted lines),while DYSl9allelesC and ent diversity values (h-+ standard error) among the 15 D, and C and E were associated with haplotypes3b and major groups (data not shown). Low diversityvalues 3c, respectively (Figure 3, thin solid lines). Alleles at the were found in the North European sample due to the variable nucleotide and the poly(A) tail sites appear to overwhelming prevalence of the B allele (0.600 ? have originated a single time on the haplotype tree in 0.005). Other diversity values were 0.620% 0.005,0.644 Figure 3 (shortsolid bars). However, several allelesat the ? 0.004, 0.695 f 0.001, 0.715 -+ 0.003, 0.718 2 0.002, DYSl9locus probablyoriginated independently multiple and 0.736 2 0.001 for the West Asian, South Asian, East times. For example, the Z allele, associated with haplo- Asian, North African, Australasian, and South Euro- types 1, 2, and 4, probably arose from the A allele on at pean samples, respectively. The lowest diversity values least three occasions. A minimum of 21 changes in allele in this survey were found in the Navajo sample (0.569 size of one repeat unit parallel changes) are required (15 f 0.007) and in the enigmatic southern African group, to account for the distribution of DYSl9 alleles on the the Lemba (0.569 -+ 0.013). haplotype tree. This is consistent with the (single) step This study represents the first large-scale survey of wise-mutation model and high rate of evolution of short variation at the DYS19 locus in African populations. tandem repeats (WEBERand WONG1993). The DYS19 allelic diversity was greater in sub-Saharan Despite the convergent pattern of evolution at the African populations (0.763 t 0.001) thanin Austral- DYSl9locus, there were severalexamples of strong asso- asian (0.718 ? 0.002),European (0.712 ? 0.001), or ciations between YAP haplotypes and DYSl9 alleles. In Asian (0.695 5 0.001) populations (Bonferroni Test, P sub-Saharan Africa, haplotype 1 was negatively associ- < 0.001). Within Africa, diversity values were 0.631 5 ated with DYSl9alleles Z and A (D‘= -0.63 and -0.79, 0.005, 0.708 ? 0.006, 0.719 ? 0.005, and 0.769 2 0.004 respectively), haplotype 2 was positively associated with in the East Bantu, West Bantu, East African, and West DYSlPZ(D’ = 0.88), and haplotype 4 was positively African samples, respectively. associated withDYSl9A (D’ = 0.71). Haplotype 3aB The most unusual profiles were seen in the Khoisan was widelydistributed inAfrica (Pygmies, West Africans, and Pygmy populations where there were roughly bi- East Bantus, West Bantus, Dama, and Lemba) (Figure modal distributions of alleles of widely different sizes. 5A). Haplotype 3a was associated with other DYSl9 al- This pattern of widespread allelic variation was re- leles only in West Africa(C andD) and East Africa (C). flected in higherdiversity values than those seen in the In contrast, haplotype 3 in East Asia was never found rest of the world (0.820 -+ 0.002 and 0.823 5 0.006, with the DYS19 B allele, and haplotype 1 was strongly respectively, Bonferroni Test, P < 0.001). The Khoisan associated with alleles A and B (D’ = 1.0). In Europe were found to have a relatively high frequency of the there were strong associations between haplotype 1 and unusual Z allele. This allele, not observed in worldwide G lobal Human Global VariationY Haplotype 795

A Haplotype I Hap.l Haplotype 3 Haplotype 5 Japanese

4 407 Tibetans

30- m West Africans Koreans 20" 10" 4 0 30- Taiwanese East Africans 20- a , 30" South Chinese East Bantus

I YAP I 2 3a 3b 3c 4 5 40- - "- - West Bantus DVSl9 2 A U C D E 2ABUCDECDCE AUC ABCDE 20.- FIGURE5. - Continued o.::::::-:::::::::=::::,, Dama san surveyed, as well as in single Pygmy, Egyptian, and "+:.;::::::::.I::: Taiwanese males. Lernba YAP haplotypediversity: The numbers of the five YAP haplotypes found in eachmajor population group I..... ranged from one in South Asia and North America to four in all African populations except thePygmies (Ta- ble l and Figure 4 color). Only one population, the n possibly admixed enigmatic Dama, was found to con- D 60, X Asians West tain all five YAP haplotypes. Most non-African popula- tions hadjust two haplotypes (East Asians, North I Europeans, and Australasians), although the South Eu- East Asians ropean and West Asian populations had low frequen- cies of a third haplotype. These patterns of variation in .I YAP haplotype number andfrequency were reflected in South Asians significantly greater haplotype diversity in sub-Saharan

I Africa (0.630 ? 0.001) compared withAsia (0.283 f South Europeans 0.001) and Europe (0.239 f 0.001) (data not shown; Bonferroni Test, P < 0.001). I,,;:::::,YAP/DYS19 combination haplotype diversity: In Fig- North Europeans ure 4 the proportions of the different DYS19 alleles associated with each YAP haplotype are indicated by -. . , . . . I . , solid lines within the colored portions of the pie chart. When considering the 27 combination haplotypes, the number found ineach major group ranged from three in the Native American sample to 14 in the East Asian Native Americans sample (Figure 5A). When the populations making up the East Asian sample were examined separately, the I YAP I 2 3a 3b 3c 4 5 maximum number of haplotypes ranged from four in - "- ~ DVS19 2 A B C D E ZAU BCDECDCE AUC ABCDE the Koreans to 13 in the Japanese(Figure 5B). InAfrica FIGURE5.-Histogram representations of YAP/DYSI9com- the number ranged from nine in the Bantu-speaking bination haplotype frequencies.(A) Twenty-five combination populations to 11 in the Pygmy, Khoisan, and North haplotype frequencies for the 15 major groups (the two re- African populations. Nine combination haplotypes maining combination haplotypes are not shown: haplotype 3aF found in a single Japanesemale and haplotype 42 found were found in South Europe,seven in West Asia, sixin in a single Egyptian male. (B) Combination haplotype fre- North Europe, and five in South Asia and Australasia. quencies for the five component populations of East Asia. Haplotype diversities determined on thebasis of the number and relative abundance of each of the 27 com- populations studied previously, was recently reported bination haplotypes are shown in Figure 6A forthe in a single !Kungcell line (SANTOS et al. 1996). We individual populations while those for the major group- identified the Z allele in eight outof 78 (10.3%) Khoi- ings are shown in Figure 6B. For the total sample of 796 M. F. Hammer Pt 01.

FIGURE6.-Combina- tion haplotype diversi- 0.400 ties. Haplotype diversi- ties (h) are given on the i vertical ( Y) axis and pop ulationnames, codes, and sample sizes are shown on the horizontal (X) axis. For each popu- lation the topof the gray bar represents the sam- ple mean and the two Africans WAS EAS SAS EAS WASAfricans SEU NEU AUS NAN horizontal lines repre- B sent the95% confidence interval. When the stan- dard error is especially small, these symbols oc- - casionally become indis- 0.900 -.- """"""""""""-"""""" tinct. (A) The 53 popula- t .I tions in Table 1 with n > 5. (B) The 15 major groups without the enig- matic southern Africans.

1500 individuals the worldwide haplotype diversity (r= 0.022; slope = 17.7; %intercept = 89.0). Levels of equaled 0.852 2 0.000,The mean combination haplo- Y chromosome haplotype diversity ranged from 0.509 type diversity was significantly higher for subSaharan t 0.031 in the Omanis to 0.945 2 0.010 in the Mbuti Africans(0.878 2 0.000) than for Asians(0.779 2 Pygmies (Figure 6A). Regions with the highest haplo- 0.001), Europeans (0.736 t 0.001), or Australasians type diversities(h > 0.700) were Africa, East Asia,South (0.718 2 0.002) (Bonferroni Test, P < 0.001). There Europe, and Australasia (Figure 6B). The New World were no statistically significant correlations between and North Europe were characterized by low levels of sample size (Y)and haplotype diversity (X)for the 53 haplotype diversity. South Asia and West Asia had inter- populations listed in Table 1 where n > 5 (r= 0.133; mediate levels of haplotype diversity. Within Africathe slope = 23.1; Y-intercept = 11.0) or for the 15 major Pygmies and Khoisan had the highest diversity levels groupings excluding the enigmatic southern Africans and the East Bantus and West Bantus had the lowest. Global Human Y Haplotype Variation 797

TABLE 2 Diversity of five YAP haplotypes in six geographic regions

YAP haplotype 1 2 3 4 5

SAF 0.730 ? 0.002 0.617 ? 0.013 0.493 ? 0.017 0.416 ? 0.019 0.656 ? 0.001 (122) (23) (29) (23) (122) (22) (213) - NAF 0.667 ? 0.005 - - 0.614 ? 0.006 (47) (43) B AS1 0.639 ? 0.001 - 0.715 ? 0.004 - - (382) (70) EUR 0.662 ? 0.001 - - 0.275 ? 0.011 - (331) (52) AUS 0.718 ? 0.002 - - - - (109) NAM 0.569 ? 0.007 - - - - (47)

Values are h ? SE, with sample size in parentheses. SAF, sub-Saharan Africa;NAF, North Africa; MI, Asia; EUR, Europe; AUS, Australasia; NAM, North America. “ Haplotype not present. Diversity values not indicated when haplotype is present in fewer than three individuals.

The high levels of haplotype diversity in East Asia haplotype 5 had arelatively high diversity valuebut was were mainly due to the high haplotype diversity of the more widespread within sub-Saharan Africa and only Japanese (0.835 ? 0.002). This diversity value is equiva- occurred at low frequencies in North Africa and West lent to the North African and East African values. It is Asia. interesting that the island of Okinawa had one of the Fstand PHIst analyses: The overall worldwide Fst highest haplotype diversities inthe world (0.876 5 value based on the 15 major groups in Table 1 (exclud- 0.003), lower onlythan thevalues for theMbuti Pygmies ing the enigmatic southern Africans) was 0.302 for the and Sekele San (t-tests, P < 0.0001). The haplotype YAP data. This Fst estimate is concordant with expecta- diversity of the East Asian sample excluding the Japa- tions based on the fourfold lower effective population nese was 0.752 ? 0.001. This value lies between those size of the Ychromosome relative to autosomes (HA” from South Europe and Australasia. MER 1995). Worldwide Fst valuesfor avariety of autoso- Regionalvariation in diversity by YAP haplotype: mal genetic datasources as well asfor craniometric data The genetic diversity carried by each of the five YAP sets cluster around 0.1 (RELETHFORD 1995). In contrast, haplotypes was determined based on the number and the worldwide PHIst value for the DYS19 data was only relative abundance of DYSl9alleles associated with each 0.163. This lower thanexpected figure may be ex- haplotype in six global regions (Table 2). Haplotype 1 plained by higher mutation and convergence rates for diversity was greatest in sub-Saharan Africa followed by the DYSl9 locus relative to the YAP nucleotide substitu- Australasia, Asia, North Africa, Europe, and North tional sites. CIMINELLIet al. (1995) found similar results America (all painvise comparisons were statistically sig- in “Caucasoid” populations and concluded that con- nificant atthe 0.001level except the North Africa/ vergent evolution of DYS19 alleles in different popula- Europe comparison where P > 0.05). Sub-Saharan Afri- tions may result in the underestimation of the among- can populations were characterized by widely divergent group variance for microsatellite loci. diversity levels (data not shown). These values ranged Populationtrees: Maximum likelihood, neighbor- from 0.083 ? 0.015 in the East Bantus (the lowest of joining, Fitch-Margoliash, and UPGMA methods were all the 15 major groups in Table 1) to 0.772 t 0.032 employed to analyze the frequencies of the 27 YAP/ in the West Bantus and 0.833 5 0.020 in the Pygmies DYS19 combination haplotypes. Neighbor-joining (NJ), (the two highest values of the 15 major groups). Al- Fitch-Margoliash (FM), and UPGMA analyses were based though haplotype 3 diversity was greater in Asians than on genetic distances generated from both the Chord in sub-Saharan Africans (Bonferroni Test, P < 0.001) (a)and REYNOLDS et aL’s (Fst) equations. For heuristic (Table 2), the EastAsian (0.715 5 0.004) and West purposes, a majority-rule consensus tree based on the African (0.722 ? 0.032) diversity values did not differ sevenpossible combinations of tree-building and dis- significantly from each other (Bonferroni Test, P > tance measures (ML, NJ/@, NJ/Fst,FM/4D, FM/Fst, 0.05). Haplotype 4, present in three of the six geo- UPGMA/4n, and UPGMA/Fst) is shown in Figure 7. All graphic regions, reached its highest diversity value in seven methods consistently grouped the North African/ North Africa. Although haplotype 2 was confined to the non-African populations into a single cluster (l),and all Khoisan, its diversity valuewas relatively high. Likewise, sub-Saharan African populations (except the Khoisan) 798 M. F. Hammer et al. - 7EAS mon with other genetic data sets. The Fst value at the YAP locus in the sample of 14 world population group 5 ings of 0.302, equivalent to an Fst of 0.098 for nuclear SAS 3 , is similar to estimates from classical genetic poly- morphisms (0.09-0.11: NEI and ROYCHOUDHURY1982), AUS 100 DNA polymorphisms (0.14: BOWCOCKet al. 1991), 4 -AUS and dinucleotide microsatellites (0.11: DEKA et al. NEU 1995). Population trees based on Y-chromosome com- 6 bination haplotypes show that sub-Saharan African pop- ulations are clearly differentiated from the rest of the 7 4 WAS world (Figure 7). This finding is similar to results from classical genetic markers (CAVALLI-SFORZAet al. 1988; LSEUSEU NEI and ROYCHOUDHURY1993), mtDNA (CANNet al. 1987; VIGILANTet al. 1991), and microsatellites (BOW- I COCK et al. 1994). Furthermore, sub-Saharan African populations exhibit the highest within-group diversity levels for the YAP system, the DYS19 locus, and the b combination haplotype analyses (Figure 6B). Finally, I 3 EAF- the global analysis of Y-chromosome haplotypes shows 6 WBA that paternally inherited variation outside of Africa is a subset of the variation existing within Africa (Figure 4). 7 WAF This parallels results from surveys at other loci (CANN et al. 1987; CHENet al. 1995; TISHKOFFet al. 1996). 6 These within-group and among-group patterns of ge- . IJ EBA netic variation have commonly been interpreted as sup porting a modelin which an (older)African population 5 I PYG gave rise toa founder population(s) thatcarried subsets of genetic diversity to Asia and Europe (RELETHFORD KHO 1995). Thehigher diversity in Africa is claimed to sup - port this model because it is assumed thatgreater within-group variation reflects a larger accumulation of FIGURE7.-Majority-rule consensus treefor 15 major mutations in Africa, and therefore greater time depth groups (without enigmaticsouthern Africans and Native (STONEKINGand CANN 1989). Anothercritical assump- Americans) based on the frequencies forthe 27 combination haplotypes. Seven trees were generated using combinations tion for this view is that newly formed daughterpopula- of tree-building and distance measures. The numbers at the tions must experience a bottleneck to reduce levels of nodes indicate the number of methods in which the group within-group variation outside of Africa (ROGERSand consisting of the composite populations to the right of that JORDE 1995). In addition, proponentsof Recent Out of node occurred. Numbers to the right of the tree represent Africa models generally argue that the genetic distinc- the two primary clusters. tiveness of African populations reflects a simple branching process in whichAfrican populations di- into another cluster (2). Only the UPGMA/4D method verged before non-African populations. On the whole, placed the Khoisan in cluster 1. When data from the five these propositions underscore thedifficulty indisentan- YAP haplotypes were analyzed separately, the consensus gling the effects of population history on global diver- tree topology was similar to that of Figure 7 (tree not sity patterns from influences due to population struc- shown). The occasional placement of the Khoisan with ture components. the rest of the world replicates the results of SPURDLE Thus, a straightforward interpretation of genetic di- andJENKINs (1992). These authors postulated small sam- versity patterns may beoverly simplistic. As demon- ple sizeand genetic drift as reasons for the lack of consis- strated by RELETHFORD (1995), the fourlines of genetic tent clustering of the Khoisan with the rest of subsa- evidence traditionally used to support Recent Out of haran Africa based on Y chromosomai polymorphisms, Africa models do notnecessarily do so. Specifically, nei- while CAVALI.I-SFORZAet al. (1988) pointed to admixture ther therelatively low amount of among-group variation with “Caucasoids” as a possible cause for this phenome- within our species, northe relatively high levelsof non based on autosomal data. within-group variation in sub-Saharan Africa, nor the marked genetic divergence of sub-Saharan African pop DISCUSSION ulations confirm the recent African origin models en- Global patterns of Y chromosome variation: The tailing a migration of a small population out of Africa global patterns of human Y chromosome haplotype (RELETHFORD 1995). As postulated by the version variation presented here have several features in com- known as the “Weak Garden of Eden” model, theensu- Global HumanVariation Y Haplotype 799 ing expansions in effective population size may have distributional data forms the basis for our hypotheses occurred much later than this origin (HARPENDING et about migrations and range expansions. The ancestral al. 1993). Also, the recent age of our common mito- states for the polymorphic sites in the YAP region al- chondrial DNA and Y chromosome ancestors may indi- lowed HAMMER (1995) to determine the relative ages cate a small effective population size throughout the of four of the five YAP haplotypes: haplotype 1 with Late Middle Pleistocene, rather than the time of origin its basal position on the haplotype tree represents the of anatomically modern humans. Even within the con- ancestral YAP haplotype (- 188 kya), and haplotype 5 is text of a recent African origin model, a single emigra- the most derived or recenthaplotype. The intermediate tion from a homogeneous African source population is positions of haplotypes 3 and 4 in Figure 3 reflect their seen as being too simplistic. For example, as discussed intermediate times of origin. We also received the fol- by LAHR and FOLEY(1995), human behavioral, linguis- lowing approximate absolute dates for the mutational tic, morphological, and genetic diversity patterns can events in Figure 3 from R. C. GRIFFITHS(personal com- also be interpreted in terms of multiple dispersals from munication) calculated by coalescence-based methods a variable population in Africa. These multiple dispers- explained in GRIFFITHSand TAVARE(1994): YAP inser- als probably originated from different parts of Africa tional event (-135 kya), PN2 (-90 kya), PNl (-30 at different times and left Africa through at least two kya) , and PN3 (-25 kya). In the context of these abso- independent routes. lute ages it is important to recall that the time of origin Numerous explanations are possible for our global of polymorphisms is generally thought to precede the Y-chromosome haplotype patterns, some of which are dates of population splits and any subsequent expan- not mutually exclusive (TAKAHATA1994). In addition sions (LI and GRAUR1991; HARPENDINGet al. 1993; to the recent African origin model with synchronic or STONEKING1993). Theglobal distribution of haplotype delayed population size expansion and the multiple Af- 1 us. the primarily sub-Saharan African distribution of rican dispersal hypothesis mentioned above, other haplotype 5 areconsistent with their postulated relative models deserve serious consideration. For instance, sce- ages. In accord with their intermediate temporal posi- narios based on the following evolutionary processes tions in Figure 3, haplotype 3 is found in both Africa are all quite plausible: (1) restricted, recurrent gene and Asia (north of the Himalayas), while haplotype 4 flow via isolation by distance, (2) long-distance coloniza- occurs in Africa and West Asia, as welt as throughout tion followed by long-range gene flow, or (3) one or Europe. Haplotype 2 may be a private polymorphic hap- more migrations followed by extensive genetic drift due lotype limited to the Khoisan. Although it is not useful to founder effects and small population sizes leading for discussing intercontinental migration patterns, it to random haplotype extinctions. Although we cannot may turn outto be useful for measuringlevels ofadmix- yet exclude any of these hypotheses based on rigorous ture between the Khoisan and neighboring populations inferential statistical tests, we propose that a multiple (e.g., the Dama) . migration explanation appearsto be eminently compat- Because haplotype 1 is the oldest haplotype and is ible with our data. distributed globally, we cannot determine whetherthis In theaggregate, the above models and scenarios pattern reflects long-term migration drift and/or range employ a continuum for the amountsof gene flow and expansions. We postulate (but cannot prove) that hap- genetic drift responsible for global diversity patterns. lotype 1 traces a genetic trail of the first modern hu- Our picture of global Y-chromosome variation may re- mans out of Africa. A southern Asian expansion from sult from multiple evolutionary processes including Africa resulting in the occupation of Australia and Pa- both short range (1) and long range (2) migration, as pua New Guinea 50-60 kyais thought to precede well as genetic drift and extinction (3). The difficulty expansions to Europe and North Asia (NEI and ROY- in distinguishing these evolutionary processes and their CHOUDHURY 1993; CAVALLI-SFORZA et al. 1994; LAHR and effects is underscored by clear-cut examples in which FOLEY 1995).Conversely, haplotype 5 is the most de- genetic drift can obscure population history (RELETH- rived haplotype in Figure 3 and its distribution may FORD 1996) or where branching and migration models reflect a recent out of Africa event by a southern route can produce the same pattern of genetic distances (RE- very similar to the one just postulated for haplotype 1: LETHFORD 1995).The potentially complicating phe- going first to WestAsia and, according to new data, nomenon of recurrent mutationinvolving single nucle- continuing all the way to India (A. PANDYA,L. SINGH,C. otide and Alu insertion polymorphisms is less likely to TYLER-SMITHand M. F. HAMMER, unpublished results). be involved; nevertheless, it cannot be ignored. The distribution of haplotype 4 exhibits a frequency Origins of the YAP haplotypes: We now consider the gradient from North Africa (46.2%) to Northwest Eu- genetic and geographicconnections associated with rope (4.3%). The implications of this pattern may be each of the five YAP haplotypes. Figure 4 shows that all twofold. First, because of its stochastic nature, genetic five YAP haplotypes are found in sub-Saharan Africa. drift is not expected to produce clinal distributions. Only subsets of these five haplotypes are found outside Thus, haplotype 4 may provide suggestive evidence ar- of Africa. The combination of information about the guing against the random haplotype extinction model relative age of each of the major YAP haplotypes with (3).Second, this cline may chronicle any and/or all of 800 M. F. Hammer et al. the following population movements to and through Saharan Africa. One of the consequences of a putative Europe: (1) the Levantine expansion of anatomically Asian origin for haplotype 3 is that all YAP+ chromo- modern humans (ie., -40 kya) involving the eventual somes (Figure 3, see YAP insertion leading to haplo- demise of theNeanderthals (STRINGERand GAMBLE types 3-5) also must have an Asian origin contra HAM- 1993), (2) the slightly later pre-agricultural migration MER (1994). New data from a different locus on the Y (i.e., -25 kya) detected by mtDNA data (RICHARDS et n1. chromosome in which the Asian form of haplotype 3 is 1996), or (3)the even more recent Neolithic expansion shown to be ancestral to the African form of haplotype 3 (ie., - 10 kya) associated with the spread of agriculture also support this Asian origin hypothesis (T. ALTHEIDE as postulated by AMMERMAN and CAVAILI-SFORZA’S and M. F. HAMMER,unpublished data). (1984) demic diffusion model. An alternative to an Asian origin of haplotype 3 is an In each of the preceding cases, an argument could African origin hypothesis coupled with a subsequent be made for migrations that originated within Africa at reduction in the diversity associated with haplotype 3 different times and places, and that exited Africa via in Africa. This hypothesis is supported by the finding the two routes proposed by LAHR and FOLEY(1995): a of equivalent levels of haplotype 3 diversity in West Af- southern route through the hornof Africa to West Asia, rica and East Asia, and much lower levels of diversity South Asia, and eventually to Australasia, and a north- associated with haplotype 3 in other surveyed regions of erly route through North Africa, the Levant, and even- sub-Saharan Africa (Figure 5A). The pattern of higher tually to Europe. The diversity of each YAP haplotype diversity in West Africacompared with the Bantu-speak- may also be used to bolster hypotheses regarding the ing populations in southern and eastern Africa (Figure origins and ages of the migrational events discussed 6B) may be due to founder effects associated with the above. Haplotype 1 diversity was greatest in sub-Saharan Bantu expansion (CAVALLI-SFORZAet ul. 1994). Thelack Africa, which can provisionally be interpreted to sup- of close clustering of the East and West Bantus as well port an African origin model (Table 2). Australasia ex- as the inconsistent placement of the West Bantus in hibits the second highest level of haplotype 1 diversity; Figure 7 could also be due to the effects of genetic thus, an argument can be made that Australasia repre- drift. sents the terminus of an early out of Africa migration. YAP haplotype 3 and East Asian population affini- Haplotype 2 was confined to the Khoisan and its high ties: We initially grouped the Japanese, Tibetans, Tai- diversity value may reflect either great antiquity in Af- wanese, Koreans, and South Chinese into asingle major rica, a wider geographic distribution (CAVALLI-SFORZAEast Asian group based on geographic and ethnohistor- et nl. 1994), and/ora larger effective population size for ical criteria. The Y chromosome data presented here the ancestral Khoisan. Haplotype 4 reached its highest provide evidence for significant differences among pop- diversity level in North Africa. Although haplotype 5 ulations within this grouping. For example, the highest occurred at low frequencies in West Asia and North world frequencies of YAP haplotype 3 are found in Ja- Africa, it attained higher frequencies, greater diversity, pan (41.7%) and Tibet (46.7%);whereas, the frequency and was more widespread in sub-Saharan Africa. Al- of this haplotype is <2% in all other East Asian popula- though the presentdistributions of haplotypes 2,4, and tions (Table 1).Also, when the frequencies of the com- 5 are compatible with African origin models, the haplo- bination haplotypes are viewed separately for these five type 3 data areindicative of a possible Asian origin (see populations (Figure 5B), it can be seen that the Japa- below) while the global occurrence of haplotype 1 is nese and Tibetans share several haplotypes that are not consistent with any Old World geographic origin. present in the other groups (exceptfor a single Taiwan- African us. Asian originof YAP haplotype 3: As in ese withhaplotype 3aC). Thedistinctiveness of the Japa- the cases of haplotypes 4 and 5, the world distribution nese and Tibetans with respect to the other East Asian of haplotype 3 (ie., widespread in Africa and present groups is also apparent in dendrograms based on the in localized regions outside of Africa) can be interpre- 27 combination haplotypes (data not shown). Thesimi- ted to fit a sub-Saharan African origin model. However, larity between the Tibetan and Japanese populations unlike the cases of haplotypes 4 and 5, the diversity data makes sense in the context of other studies showing a in Table 2 lead to more complicated scenarios. The northern Asian (ie., Mongolian) origin for Tibetans statistically significant higher level of diversity associ- (NEI andROYCHOLJDHURY 1993; TORRONIel ul. 199413). ated with haplotype 3 in Asia compared with sub-Sa- CAVAILI-SFORZAet nl. (1994) suggest that populations haran Africa (Bonferroni Test, P < 0.001) may be inter- between the Yangtse and Yellow Rivers in China form preted in favor of an Asian origin of haplotype 3. the dividing line between Northeast Asians and South- Because haplotypes 4 and 5 arederived from haplotype east Asians. The Koreans (who are situated north of the 3 and account for the majority (55.8%) of Y chromo- Yangtse River) are generally considered to be a North somes in Africa, this hypothesis implies that a major Asian group (NEI and ROYCHOUDHURY1993; CAVAILI- component of African diversityis derived from Asia and SFORZAet nl. 1994). This would lead us to expect that contradicts the African origin hypothesis. Also, the fre- the Koreans would be more similar to the Japaneseand quency of haplotype 3 in EastAsia (22.4%) is three Tibetans than to the South Chinese and Taiwanese in times higher than its average frequency (7.1%) in sub- Y chromosome haplotype frequencies. Also, the Yayoi Global Human Y Haplotype Variation 801 migration entered Japan through theKorean peninsula sion associated with the spread of agriculture across 2300 years ago (HAMMER and HOW 1995). However, Europe toward the British Isles. Other examples of ex- both the haplotype 3 and combination haplotype data tremely low diversity in Figure 6A probably have differ- place the Koreans with the South Chinese and Taiwan- ent explanations [ie., the Omanis and Malaysians are ese. One explanation for this unexpected affinity is that characterized by especially smallsample sizes of 11 and a recent range expansion from South China has re- 10, respectively, whilesuspected recent founder events/ sulted in the replacement of haplotype 3 by haplotype population fissions could explain the Navajo (GREEN- 1 in Korea. BERG et al. 1986) and Lemba values]. Lemba males are Globalpatterns of Y chromosomediversity: The thought to have a Semitic origin (SPURDLEandJENWNs highest values for Y chromosome combination haplo- 1996) while mtDNA analyses point to African ancestry type diversity are generally found in sub-Saharan Africa for Lemba females (SOODYALLet al. 1996). Overall, our (Figure 6B). This pattern is similar to results from other data imply a small founding population of males from kinds of genetic data as mentioned above and can be the Middle East consistent with SPURDLEand JENKINS’ attributed to great population antiquity and/or large (1996) conclusions. effective population size in subSaharan Africa. Exten- Figure 6, A and B, are based upon combination hap- sive admixture can also lead to high diversity levels as lotypes; however, the componentsof these combination in the case of the North African sample that comes haplotypes differ in their tempoand mode of evolution. from a fairly wide area around Cairo. Urban popula- The YAP sites represent rare single mutational events tions from North African countries are thought to be and fit the infinite sites model (LI 1977). In contrast, admixed with contributions from Berbers and Bedou- the DYS19 microsatellite system fits the stepwise muta- ins, aswell as from Mediterranean and sub-Saharan tion model, evolves rapidly, and is subject to widespread African populations (CAVALLI-SFORZAet al. 1994). High convergence leading to homoplasy (SHRIVERet al. 1993; levels of diversity outside of Africa are concentrated in ESTOUPet al. 1995; GOLDSTEINet al. 1995). The legti- East Asia, South Europe, and Australasia. Although a macy of combining these two different types of genetic claim can be made that the Australasian diversity levels data is open toquestion. As a result we undertook diver- reflect remnants of an old populationsystem, the South sity analyses ofeach system separately (data not shown). Europe diversity levelsare more likely a product of ex- As in the combination haplotype analysis, subSaharan tensive admixture as exemplified by the Sicilians who Africa had the highest diversity levels in both the YAP have the highestdiversity valueof all European popula- and DYS19 individual analyses. Then the YAP diversity tions (Figure 6A). values (Y) were plotted against DYS19 diversity values The case ofEast Asia is even morecomplex. For (X) for14 major groups in Table 1 (the enigmatic instance, the high Japanese diversity levels may be due southern Africans were omitted). A statistically signifi- to a combination of an ancient population (Jomon) cant correlation (T = 0.697, P < 0.01) was obtained subjected to recent admixture (Yayoi) as discussed in between these two sets of diversity values. Interestingly, HAMMERand How (1995). This mixing effect is fur- the three most obvious outliers were the Australasians, ther illustrated by the finding of higher diversity levels South Europeans,and East Bantus. The first two groups in the extreme southern and northern regions of the are characterized by high DYSl9 diversity values with Japanese archipelago, where remnants of the ancient respect to YAP values and it is this microsatellite diver- Jomon gene pool are thought to survive. Because diver- sity that causes their relatively high combination haplo- sity levels may not have reached equilibrium in popula- type diversity values in Figure 6B. On the other hand, tions that have undergone large expansions in size the East Bantus exhibit relatively low DYS19 diversity within the last 10,000 years, some rather large popula- resulting in alower combination value. A differentsitua- tions may exhibit lower thanexpected within-group tion obtains for the low West Bantu combination value variation (ROGERS and JORDE 1995). Thus, the unex- that reflects concordantly low YAP and DYSl9 values. pectedly low diversity level seen in the South Chinese In fact, the two Bantu groups exhibit the lowest levels sample may be a productof a homogenizing influence of Y-chromosome diversity within Africaconsistent with associated with the agriculturally based Holocene popu- the hypothesis of a comparatively recent expansion of lation expansions originating in Southeast Asia (CA- Bantu-speaking populations (EXCOFFIERet al. 1987; CA- VALLI-SFORZAet al. 1994). This trend toward lower diver- VALLI-SFORZAet al. 1994). Also, because these results sity also includes Southeast Asia and India. conflict with those derived from mtDNA and autosomal The lowest diversity valuefor any major region in the data from some of the same populations, SPURDLEand Old World is found in North Europe (Figure 6B), and JENKINS (1993) hypothesized that Y-specific diversity val- within that region, the British sample surprisingly has ues may have been reduced dueto the widespread prac- the third lowest diversity valueof the 53 populations in tice of polygamy in some African populations (HAMMER Figure 6A. Both the low diversity values and the clinal and ZEGURA1996). distribution of haplotype 4 are concordantwith A”ER- Theoreticalconsiderations: At this junctureone MAN and CAVALLI-SFORZA’S(1984) demic diffusion should recall RELETHFORD’S (1995) important caveat model involving the relatively recent Neolithic expan- that among-group patterns of variation, rather than re- 802 M. F. Hammer et al. flecting the branchinghistory and age of human popula- raphy. His two principal explanatory categories are pop- tions, can be explained by alternative demographic pa- ulation structure and population history. In this con- rameters, especially effective population size. Because of text, population structureincludes restricted, recurrent the effects of differing population sizes on the magni- gene flow via isolation by distance, while range expan- tude of genetic drift, often one cannot distinguish be- sions reflect population history. The first human data tween patterns that reflect population structure and set analyzed using TEMPLETON’S(1993) approach was those contingent on population history (RELETHFORD EXCOFFIER’S(1990) mtDNA data from 12 Old World 1996; RELETHFORD and HARPENDING 1994), nor are in- populations (n= 1218). This analysis did not lead to a terpretations of within-group patterns of genetic diversity dichotomous separation between explanations derived straightforward. For example, the finding of greater from population history us. population structure con- within-group variation in subSaharan Africa has been siderations. Within higher-order clades, restricted and used to claim that subSaharan African populations are recurrent gene flow was the principal explanation for older (CANN et al. 1987; VIGILANTet al. 1991; TISHKOFF the data patterns. At the lowest level in the hierarchy, et al. 1996).As pointed out by several authors, divergence the data led to the detection of two regional (but no time is not the only factor that can affect within-group global) range expansions. One of these expansion variation (EXCOFFIERand LANGANEY 1989; HARPENDING events involved Arabs and sub-Saharan Africans, while et al. 1993; STONEKING1993; TEMPLETON1993; RELETH- the otherinvolved an expansion across Europe. As TEM- FORD and HARPENDING 1994; MARTINSON et al. 1995; RE- PLETON (1993) pointed out, the current distribution of LETHFORD 1995;ROGERS and JORDE 1995; AOKI and mtDNA variation that he analyzed was influenced both SHIDA1996). Demographic considerations that influ- by restricted gene flow and by population expansion. ence levels of within-population variation include popu- The critical point is his explicit statement thathis analy- lation bottlenecks and size expansions, migration pat- sis does not treat population expansion or restricted terns and reticulation, and degree of population gene flowas mutually exclusive alternatives. Indeed, subdivision. Other factors that could specifically influ- TEMPLETONet al. (1995) recently found evidence for ence Y-chromosomevariation include selectivesweeps restricted gene flow via isolation by distance as well as and genetic hitchhiking (MAYNARD-SMITH1990), back- for range expansions in mtDNA data from the tiger ground selection (CHARLESWORTHet al. 1993), and high salamander, Ambystoma tignnum. From these examples, variance in male reproductive success (LUCO~Eet al. it is clear that even portions of the same data set can 1994; PENAet al. 1995). An additional complication is show the differential import of factors associated with the admonition that one cannot use standard measures population history and population structure. of within-group population variation (ie., diversity val- The graphical display of the geographic distribution ues) to test geographic origin hypotheses because the of our Y chromosome data in Figure 4 is similar to effects ofmutation rates, inbreeding effective population methods used in intraspecific phylogeography (AVISE size, and population age are all conflated with past and 1994) and especially in contemporary human mtDNA present gene flow inthe diversityvalues, themselves research (MELTONet al. 1995). TEMPLETONet al. (1995, (TEMPLETON 1993). difficultyThe in empirical studies is p. 779) provide acogent caveat about such displays determining which combination of the above factors is when they state: “Such pictorial representations are an responsible for differences in the levels of within- and excellent exploratory tool for formulating hypotheses, among-group variation. This dilemma can sometimes be but are inappropriate inference tools.” Our paper is addressed by supplemental demographic and ethnohis- about hypothesis compatibility sensu STONEKING(1994) torical data (BLETHFORD1996), aswell by innovative rather than about hypothesis testing sensu TEMPLETON population genetic analyses such as those used for the (1994). We employed exploratory graphical analyses to comparison of mtDNA mismatch and intermatch distri- generate hypotheses to explain the geographic distribu- butions that often demonstrate a temporal lag between tion of our Y chromosome data. TEMPLETON(1993) population origin and expansion/bottleneck (WEND-was fortunate to have 76 human mtDNA haplotypes ING et al. 1993; ROGERS and JORDE 1995). distributed over 18 geographic locations, while TEM- Methodological considerations:The most promising PLETON et al. (1995) used 23 mtDNA haplotypes distrib- analytical technique for trying to resolve some of these uted over 53 geographic regions in thesalamander theoretical issues is that of TEMPLETON(1993). His analysis. We, however, only have fivehuman Ychromo- method that attemptsto disentangle the effects ofpopu- some haplotypes distributed over 60 geographic loca- lation structure and population history on genetic varia- tions. It is doubtful that this represents enough genetic tion is based on a nested cladistic analysis of the geo- variation to carry out a nested cladistic haplotype test graphic distribution of mtDNAhaplotype data. The null (A. R. TEMPLETON,personal communication). hypothesis is that there is no geographic association of Conclusions: An important feature of our results is clades. This hypothesis is tested using a random permu- the numberof polymorphisms that appearto haveorigi- tation procedure. If the null hypothesis is rejected, nated in Africa (ie., PN1, PN2, PN3, DYS271 A-G, and TEMPLETON(1993) seeks to clarify the causes for these the poly(A) S allele). This may reflect a greater African nonrandom associations between haplotypes and geog- population size and/or an African ascertainment bias Global HumanVariation Y Haplotype 803 in the polymorphic sites surveyed. On the other hand, AVISE,J. C., 1994 Molecular Markers, Natural History and Evolution. Chapman & Hall, New York. the YAP element insertion and the poly(A) M and VS BATZER,M. A,, M. STONEKING,M. ALEGRIA-HARTMAN,H. BAZAN, D. H. alleles may have originated in Asia. Overall, our combi- KAss et al., 1994 African origin of human-specific polymorphic nation haplotype data are compatible with either an Alu insertions. Proc. Natl. Acad. Sci. USA 91: 12288-12292. BOWCOCK,A. M., J. R. BDD,J. L. MOUNTAIN, M. J. HERBERT,L. WO- African origin, an extremely heterogeneous source pop- TENUTO et al., 1991 Drift, admixture, and selection in human ulation in Africa, the aforementioned larger effective evolution: a study with DNA polymorphisms. Proc. Natl. Acad. African population size, and/or various gene flow and Sci. USA 88: 839-843. BOWCOCK,A. M., A. RUIZ-LINARES,J. TOMFOHRDE,E. MINCH,J. R. genetic drift-based scenarios. Our contribution differs BDDet al,, 1994 High resolution of human evolutionary trees from previous genetic studies that have supported a with polymorphic microsatellites. Nature 368: 455-457. model based on a single migration out of Africa (CANN CANN, R. L., M. STONEIUNGand A. C. WILSON,1987 Mitochondrial et al. 1987; VIGILANTet al. 1991; TISHKOFFet al. 1996). DNA and human evolution. Nature 325: 31-36. CASANOVA M., P. LEROY,C. BOUCEKKINE,J. WEISSENBACH, C. BISHOP For instance, in our opinion the TISHKOFFet al. (1996) et aL, 1985 A human Y-linked DNA polymorphism and its po- microsatellite data may have traced a more recent mi- tential for estimating genetic and evolutionary distance. Science gration as exemplified by haplotypes 4 and 5 in our Y 230: 1403-1406. CAVAIL-SFORZA, L.L., and W.F. BODMER,1971 The Genetics of Hu- chromosome tree (Figure 3). Separate analyses of the man Populations. W. H. Freeman, San Francisco. patterns of genetic variation associated with each of the CAVAILI-SFORZA,L. L., A. PIAZZA, P.MENOZZI andJ. MOUNTAIN, 1988 five YAP haplotypes point to the conjecture that our Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl. Acad. Sci. USA data contain evidence consistent with multiple dispers- 85: 6002-6006. als, some emanating from Africa and one perhapscom- CAVALI.I-SFORZA,L. L., P. MENOZZIand A. PIAZZA,1994 The History ing from Asia. It is also possible that our microsatellite and Geography of Human Genes. Princeton University Press, Princeton. data reflect comparatively recent demographicparame- CHAFU~ESWORTH,B., M. T. MORGANand D. C~RLESWORTH,1993 ters and microevolutionary events, while the rest of our The effect of deleterious mutations on neutral molecular varia- data reflect much earlier population dynamics. tion. Genetics 134: 1289-1303. Although we have treated our polymorphic sites as CHEN,Y.-S., A. TORRONI,L. EXCOFFIER,A. S. SANTACHIARA-BENERE- CE~Iand D. C.WALLACE, 1995 Analysis of mtDNA variation different loci, in actuality the nonrecombining portion in African populations reveals the most ancient ofall human of the human Y chromosome represents a single locus. continent-specific haplogroups. Am. J. Hum. Genet. 57: 133- The same stricture holds for mtDNA. Although more 149. CIMINELLI,B., F. POMPEI,P. MALASPINA,M. F. H~VMER,F. PERSI- information can be potentially obtained from multilo- CHETTI et al., 1995 Recurrent simple tandem repeat mutations cus data, different questions can be addressed utilizing during humanY chromosome radiation in Caucasian subpopula- single locus haploid systems. Differences between ma- tions. J. Mol. Evol. 41: 966-976. DEKA,R., L. JIN, M.D. SHRIVER,L. M. Yu, S. DECROOet al., 1995 ternal and paternal histories may be more informative Population genetics of dinucleotide (dCdA),,-(dGdT),,polymor- for disentangling multiple migration events that have phisms in world populations. Am. J. Hum. Genet. 56: 461-474. left palimpsest-like genetic and morphological patterns. ESTOUP,A,, L. GARNERY,M. SOIJGNACand J.-M. CORNUET,1995 Microsatellite variation in Honey Bee (A@ Mellifkra L.) popula- In addition, theability to construct haplotype trees with tions: hierarchical genetic structure and test of the infinite allele determinable ancestral and derived states and to esti- and stepwise mutation models. Genetics 140: 679-695. mate a time frame for the branchingevents in the trees EXCOFFIER,L., 1990 Evolution of human mitochondrial DNA evi- dence for departure from a pure neutral model of populations provides a heuristic framework for the futuretesting of at equilibrium. J. Mol. Evol. 30: 125-139. hypotheses about modern human origins. Additional Y EXCOFFIER,L., and A. LANGANEY, 1989 Origin and differentiation chromosome data from a greater number of polymor- of human mitochondrial DNA. Am. J. Hum. Genet. 44 73-85. phic sites and human populations will be necessary to EXCOFFIER,L., B. PEI.I.EGRINI,A. SANCHEZ-MUAS,C. SIMONand A. L+NGANEY,1987 Genetics and history of sub-Saharan Africa. perform model-based tests of these hypotheses (TEM- Yrbk. Phys. Anthropol. 30: 151-194. PLETON 1993, 1996, personal communication). FELSENSTEIN,J., 1981 Evolutionary trees from DNA sequences:a maximum likelihood approach. J. Mol. Evol. 17: 368-376. We thank JOHN CONAWAY,JAMES TUGGLE,JARED RAGWD,JENNIFER FELSENSTEIN,J., 1993 PHYLIP: Phylogeny Inference Package, Ver- VUTURO-BRADY,AI-WEI Ho, andCHARLES HOEFFER for excellent tech- sion 3.5~.Joseph Felsenstein and the University of Washington, nical assistance. Special thanks go to oneof the anonymous reviewers Seattle. for his/her comments that were both very thoughtful and helpful. FITCH,W. M., and E. MARGOLIASH,1967 Construction of phyloge- Also, we thank LESLIELOUIE and CHRISTYI~ER-SMITH for contributing netic trees. Science 155: 279-284. DNA samples. This work was supported by National Science Founda- FRAYER, D. W.,M. H. WOI.POFF,A. G. THORNE,F. H. SMITHand G. G. tion grant OPP-9423429 and National Institutes of Health grant GM- POPE, 1993 Theories of modern human origins: the paleonto- logical test. Am. Anthropol. 95: 14-50. 53566 to M.F.H. Fu, Y.-X., and W.-H. LI, 1996 Estimating the age of the common ancestor of men from the ZJY intron. Science 272: 1356-1357. LITERATURE CITED GOLDSTEIN,D. B., A.R. LINARES,L. L. CAVAIL-SFORZAand M.W. FELDMAN,1995 An evaluation of genetic distances for use with AMMERMAN,A. J., and L. L. CAVAI.I.I-SFORZA,1984 Neolithic Transition microsatellite loci. Genetics 139: 463-471. and the Genetics ofPopulations in Europe. Princeton University Press, GOMOLKA,M., J. HUNDRIESER,P. NURNBERG, L. ROEWER,J. T. EPPLEN Princeton. et al., 1994 Selected di- and tetranucleotide microsatellites from ACXI, K., and M. SHIDA,1996 AMonte Carlo simulation study of chromosomes 7, 12, 14, and Y in various Eurasian populations. coalescence times in a successive colonization model with migra- Hum. Genet. 93: 592-596. tion, pp. 66-71 in PrehistoricMongoloid Dispersals, edited by T. GREENBERG,J. H., C. G. TURNER andS. L. ZEGURA,1986 The settle- AKAZAWA and E. J. E. SZATHMARY.Oxford University Press, Ox- ment of the Americas: a comparison of the linguistic, dental, ford. and genetic evidence. Current Anthropol. 27: 477-497. 804 M. F. Hammer et al.

GRIFFITHS,R. C., and S. TAVARE,1994 Ancestral inference in popula- NEI, M., 1987 Molecular Euolutionaq Genetics. Columbia University tion genetics. Statistical Sci. 9 307-319. Press, New York. WMER,M. F., 1994 A recent insertion of an Nu element on the NEI, M., and A. K. ROYCHOUDHURY,1982 Genetic relationships and Y chromosome is a useful marker for humanpopulation studies. evolution of human races. Evol. Biol. 14: 1-59. Mol. Biol. Evol. 11: 749-761. NEI,M., and A. K. ROYCHOUDHURY,1993 Evolutionary relationships HAMMER,M. F., 1995 A recent common ancestry for humanY chro- of human populations on a global scale. Mol. Biol. Evol. 10927- mosomes. Nature 378: 37G378. 943. HAMMER,M. F., and S. HORAI,1995 Y chromosomal DNA variation NYSTUEN, A., T. BUSINGA,R. CARMI, G. M. Dmand V. C. SHEFFIEID, and the peopling of Japan. Am. J. Hum. Genet. 56: 951-962. 1995 Identification of polymorphic markers for the Y chromo- HAMMER,M. F., and S. L. ZEGURA,1997 The role of the Y chromo- some and the use of these markers to determine the number of some in human evolutionary studies. Evol. Anthropol. (inpress). male founders in isolated populations. Am. J. Hum. Genet. 57 HARPENDING,H., and A. ROGERS,1984 Antana: A Package for Multi- (Suppl.): A26. variate Data Analysis. Harpending & Rogers, Bosque Farms. PENA,S. D. J., F. R. SANTOS,N. 0. BIANCHI,C. M. BRAVI,F. R. CARNESE HARPENDING,H. C., S. T. SHERRY,A. R. ROGERSand M. STONEKING, et al., 1995 A major founder Y-chromosome haplotype in Amer- 1993 The genetic structure of ancienthuman populations. indians. Nature Genet. 11: 15-16. Curr. Anthropol. 34 483-496. PENNY,D., M. STEEI.,P. J. WADDEIJ.and M. D. HENDY,1995 Im- IKUTA,S., K. TAKAGI,R. B. WAI.IACEand K. ITAKURA,1987 Dissocia- proved analyses of human mitochondrial DNA sequences sup- tion kinetics of 19 base paired oligonucleotide-DNA duplexes port a recent African origin for Homo sapias. Mol. Biol. Evol. containing different single mismatched base pairs. Nucleic Acids 12: 863-882. Res. 15: 797-811. PERNA,N. T., M. A. BARER,P. L. DEININGERand M. STONEKING,1992 JOBI.ING, M. A,, and C. TYLER-SMITH,1995 Fathers and sons: the Y Alu insertion polymorphism: a new type of marker for human chromosome and humanevolution. Trends Genet. 11: 449-456. population studies. Hum. Biol. 64 641 -648. WET,T., S. L. ZEGURA,J. VUTURO-BRADY,0. POSUKH, L. OSIPOVA PERSICHETTI,F., P. BLASI,M. HAMMER,P. MAL~SPINA, C. IODICEet et al., 1997 Y chromosome markers and trans-Bering Strait dis- al., 1992 Disequilibrium of multiple markers on the human Y persals. Am. J. Phys. Anthropol. (inpress). chromosome. Ann. Hum. Genet. 56: 303-310. LAHIRI, D. K., and J. I. NURNBERGER,1991 A rapid nonenzymatic RELETHFORD,J. H., 1995 Genetics and modern human origins. Evol. method for the preparation of HMW DNA from blood for RFLP Anthropol. 4 53-63. studies. Nucleic Acids Res. 19: 5444. RELETHFORII,J. H., 1996 Genetic drift can obscure population his- LAHR, M. M., and R. FOLEY,1995 Multiple dispersals and modern tory: problem and solution. Hum. Biol. 68: 29-44. human origins. Evol. Anthropol. 3: 48-60. REI.ETHFORD,J. H., and H. C. HARPENDING,1994 Craniometric vari- LI, W.-H., 1977 Distribution of nucleotide differences between two ation, genetic theory, and modern human origins. Am. J. Phys. randomly chosen cistrons in a finite population. Genetics 85: Anthropol. 95: 249-270. 331-337. RELETHFORD, J. H.,and H. C. HARPENDING,1995 Ancient differ- LI, W.-i., and D. GRAUR,1991 Fundamtals of Molecular Evolution. ences in population size can mimic a recent African origin of Sinauer Associates, Inc., Sunderland, MA. modern humans. Curr. Anthropol. 36: 667-674. LUCOTTE,G., and N.Y. NGO, 1985 p49f, a highly polymorphic RE.YNOI.DS, J., B. S. WEIRand C.C. COCKERHAM,1983 Estimation probe, that detects Taql RFLPs on the human Y chromosome. of coancestry coefficient: Basis for short-term genetic distance. Nucleic Acids Res. 13: 82-85. Genetics 105: 767-779. LUCDTTE,G., N. GERARD,R. KRISHNAMOORTHY, F. DAVID,A. AOUIZER- RICHARDS,B., J. SKOLETSKY,A. P. SHUBER,R. BALFOUR,R. C. STERN ATE et al., 1994 Reduced variability in Y-chromosome-specific Pt nl., 1993 Multiplex PCR amplification from the GTR gene haplotypes for some Central African populations. Hum. Biol. 66: using DNA preparedfrom buccal brushes/swabs. Hum. Mol. 519-526. Genet. 2: 159-163. MARTINSON,J. J., L. EXCOFFIER,C. SWNBURN,A. J. BOYCE,R. M. HAR- RICHARIIS,M., H. CoR'rE-Rw., P. FORSTER,V. MACAUIAY,H. WILKIN- DING et al., 1995 High diversity ofa-globin haplotypes in a Sene- SON-HERBOTSet al., 1996 Paleolithic and Neolithic lineages in galese population, including many previously unreported vari- the European mitochondrial gene pool. Am.J. Hum. Genet. 59: ants. Am. J. Hum. Genet. 57: 1186-1198. 185-203. MATHIAS, N., M. BAYESand C. 'I~ER-SMMITH,1994 Highly informa- ROEWER,i., J. ARNEMANN,N. K SPURR,K.-H. GRZESCHIKand J. T. tive compound haplotypes for the human Y chromosome. Hum. EPPIXN,1992 Simple repeat sequences on the human Y chrw Mol. Genet. 3: 115-123. mosome are equally polymorphic as their autosomal counter- MAYNARD-SMITH,J., 1990 The Y of human relationships. Nature 344: parts. Hum. Genet. 89: 389-394. 591-592. ROEWER,L., M. KAYSER, P. DIELTJES,M. NAGY,E. BAKKERet al., 1996 MELTON, T., R. PETERSON,A. J. REDD, N. SAHA,A. S. M. SOFROet al., Analysis of molecular variance (AMOVA) of Y-chromosome-spe- 1995 Polynesian genetic affinities with southeast Asian popula- cific microsatellites intwo closely related human populations. tions as identified by mtDNA analysis. Am. J. Hum. Genet. 57: Hum. Mol. Genet. 5: 1029-1033. 403-414. ROGERS,A. R., and 1.. B. JORDE, 1995 Genetic evidence on modern MERRIWETHER,D. A,,F. ROTHHAMMERand R. E. FERRELL,1995 Distri- human origins. Hum. Biol. 67: 1-36. bution of the four foundinglineage haplotypes in Native Ameri- SAITOU,N., and M. NPI, 1987 The neighborjoining method: a new cans suggests a single wave of migration for the New World. Am. method for reconstructing phylogenetic trees. Mol. Biol. Evol. J. Phys. Anthropol. 98: 411-430. 4: 406-425. MICHAIAKIS,Y., and L. EXCOFFIER,1996 A generic estimation of SALEM,A.-H., F. M. BADR,M. F. GABALIAHand S. PAABO,1996 The population subdivision using distances between alleles with spe- genetics of traditional living: Y chromosomal and mitochondrial cial references for microsatellite loci. Genetics 142 1061-1064. lineages in the Sinai Peninsula. Am. J. Hum. Genet. 59 741- MINUGH-PURVIS,N., 1995 The modern human origins controversy: 743. 1984-1994. Evol. Anthropol. 4 140-147. SANTOS,F. R., S. D. J. PENA andJ. T. EPPLEN, 1993 Geneticand pop- MITCHFLI.,R. J., L. EARL andJ. W. WILLIAMS,1993 Two Y-chromo- ulation study of a Y-linked tetranucleotide repeat DNA polymor- some-specific restriction fragment length polymorphisms (DYSll phism with a simple non-isotopic technique. Hum. Genet. 90: and DYZ8) in Italian and Greek migrants toAustralia. Hum. Biol. 655-656. 65: 387-399. SANTOS,F. R., T. GERELSAIKHAN,B. MUNKHTUIA, T. OYUNSUREN,E. J. MOUNTAIN,J. L., and L. L. CAVALLI-SFORZA,1994 Inference of hu- EPPLENet al., 1996 Geographicdifferences in the allele fre- man evolution through cladistic analysis of nuclear DNA restric- quencies of the human Y-linked tetranucleotide polymorphism tion polymorphisms. Proc. Natl. Acad. Sci. USA 91: 6515-6519. DYSIY. Hum. Genet. 97: 309-313. MULLER,S., M. GOMOLKAand H. WALTER, 1994 The Y-specific SSLP SEIEISTAD,M. T., J. M. HEBERT,A. A. LIN, P. A. UNDERHII.~.,M. of the locus DYSlY in four different European samples. Hum. IBRAHIMet al., 1994 Construction of human Y-chromosomal Hered. 44: 298-300. haplotypes using a new polymorphic A to G transition. Hum. NEI, M., 1978 Estimation of average heterozygosity and genetic dis- Mol. Genet. 3: 2159-2161. tance from small number of individuals. Genetics 89: 583-590. SHRIVF,R,M. D., i.JIN, R. CHAKRABORT(and E. BOERWINKLE,1993 Global Human Y Haplotype Variation 805

VNTR allele frequency distributions under thestepwise mutation TAKAHATA,N., 1994 Repeated failures that led to the eventual suc- model: acomputer simulation approach. Genetics 134 983-993. cess in human evolution. Mol. Biol. Evol. 11: 803-805. SIATKIN,M., 1995 A measure of population subdivision based on TEMPLETON,A. R., 1993 The “Eve” hypothesis: a geneticcritique microsatellite allele frequencies. Genetics 139: 457-462. and reanalysis. Am. Anthropol. 95 51-72. Sow., R. R., and C. D. MKHENER,1958 A statistical methodfor TEMPI.ETON,A. R., 1994 “Eve”: hypothesis compatibility us. hypoth- evaluating systematic relationships. Univ.Kansas Sci. Bull. 28: esis testing. Am. Anthropol. 96 141-147. 1409-1438. TEMPLKTON,A. R., 1996 Gene lineages and human evolution. Sci- Soo~uiu.~,H., L. VIGILANT,A.V. HILI.,M. STONEKINGand T.JENKINS, ence 272: 1363. 1996 mtDNA control-region sequence variation suggests multi- TEMPI.ETON,A. R., E. ROUTMANand C. A. PHII.IIPS,1995 Separating ple independent origins of an “Asian-specific” 9-bp deletion in population structure from population history: a cladistic analysis sub-Saharan Africans. Am. J. Hum. Genet. 58 595-608. of the geographical distribution of mitochondrial haplotypes in SPURI>I.E,A. B., and T. JENKINS, 1992 Y chromosome probe p49a the tiger salamander, Ambystoma tigrinurn. Genetics 140: 762- detects complex Puu I1 haplotypes and many new Taq I haplo- 782. types in southern African populations. Am. J. Hum. Genet. 50: TISHKOFF,S. A., E. DIETZSCH,W. SPEED,A.J. PAKSSIIS,J. R. KIDD et al., 107-125. 1996 Global patterns of linkage disequilibrium at the CD4 locus SPl‘ml.E, A. B., and T. JENKINS, 1993 Complex polymorphisms are and modern human origins. Science 271: 1380-1387. revealed by Y chromosome probe 49a with BglII, Hind 111, PstI TORRONI,A.,Y.-S. CHEN, 0.SEMINO, A. s. SANT..\(:HI~-BENE(:ER~;~l, and BtI. Ann. Hum. Genet. 57: 41-53. C. R. SCOTI et al., 1994a mtDNA and Y-chromosome polymor- SPI!RDI.L,A.B., and T. JLNKINS, 1996 The originsof the Lemba phisms in four Native American populations from southern Mex- “Black Jews’’ of southern Africa: evidence from pZ2f2 and other ico. Am. J. Hum. Genet. 54: 303-318. Y chromosome markers. Am. J. Hum. Genet. 59: 1126-1133. TORRONI,A,, J. MIILER, L. G. MOORE:,S. Z.AMVI)IO,J. ZHUANGrt nl., SPURJX.~,A,, M. F. HAMMERand T. JENKINS, 1994 The Y Alu poly- 1994b Mitochondrial DNA analysis in Tibet: implications for morphism in Southern African populations, and its relationship the origin of the Tibetan population and its adaptation to high to other Y-specific polymorphisms. Am. J. Hum. Genet. 54 319- altitude. Am. J. Phys. Anthropol. 93 189-199. 330. UNDERHILL,P. A,, L. JIN, R. ZEMANS,P. J. OEFNEKand L. L. CAVAILI- STONF.KINC,M., 1993 DNA and recent human evolution. Evol. An- SFORZA,1996 A pre-Columbian Y chromosome-specific transi- thropol. 2: 60-73. tion and its implications for human evolutionaly histoly. Proc. STONEKING,M., 1994 In defense of “Eve”-a responseto Tem- Natl. Acad. Sci. USA 93: 196-200. pleton’s critique. Am. Anthropol. 96: 131-141. VIGIIANT,L., M. STONEKING,H. HARPENDINC., K HAWKESand A. C. STONEKING,M., and R. L. CANN,1989 African origin of human mito- WILSON,1991 African populations and the evolution of human chondrial DNA, pp. 17-30 in The Human Revolution: Behavioural mitochondrial DNA. Science 253: 1503-1507. WAINSCOAT,J. S., A. V. S. HILI.,A. L. BOYGE,J. FLINT,M. HERNANDEZ and Biological Pmpectiues on the Orips of Moda Humans, edited et al., 1986 Evolutionary relationships of human populations by P. MEI.IARS and C. STRIKER.Princeton University Press, from an analysis of nuclear DNA polymorphisms. Nature 319: Princeton. 491 -493. STONF.KINU(;,M., and A. C. WILSON,1989 Mitochondrial DNA, pp. WEISS, G., and A. VON HAESELER, 1996 Estimating the age of the 215-245 in The Colonization of the Pacijic: A Genetic Trail, edited common ancestor of men from the 7!Y intron. Science 272: by A. V. S. H11.1. and S. W. SERJWNTSON. OxfordUniversity Press, 1359-1360. New York. WEBER,J. L., and C. WONC,1993 Mutation in human short tandem STRINGER,C., and C. GAMBI.E,1993 In Search of thr Neanderthals: repeats. Hum. Mol. Genet. 2: 1123-1128. Solving the Puzzle of Human Origins. Thames and Hudson, Inc., WEIR, B. S., 1990 Genetic Data Ana@. Sinauer Associates, Inc., Sun- New York. derland, MA. SZATMMAKY,E. J. E., 1993 Genetics of aboriginal North Americans. Evol. Anthropol. 1: 202-220. Communicating editor: A. G. CIARK