Copyright 0 1990 by the Society of America

Geographic Variation in Mitochondrial DNA from Papua New Guinea

Mark Stoneking,*” Lynn B. Jorde,? Kuldeep Bhatia* andAllan C. Wilson* *Division ofBiochemistry and Molecular Biology, University of Calqornia, Berkeley, Calqornia 94720, ?Department ofHuman Genetics, University of Utah School $Medicine, Salt Lake City, Utah 84132 and $Papua New Guinea Institute of Medical Research, Goroka, Eastern Highlands Province, Papua New Guinea Manuscript received September 16, 1989 Accepted for publication December 5, 1989

ABSTRACT High resolution mitochondrial DNA (mtDNA) restriction maps, consisting of an average of 370 sites per mtDNA map, were constructed for 119 people from 25 localities in Papua New Guinea (PNG). Comparison of these PNG restriction maps to published maps from Australian, Caucasian, Asian and African mtDNAs reveals that PNG has the lowest amount of mtDNA variation, and that PNG mtDNA lineages originated from Southeast Asia. The statisticalsignificance of geographic structuring of populations with respect to mtDNA was assessed by comparing observed Gsr values to a distribution of Gsr values generated by random resampling of the data. These analyses show that there is significant structuring of mtDNA variation among worldwide populations, between highland and coastal PNGpopulations, and even betweentwo highland PNG populations located approximately 200 km apart. However, coastal PNG populations are essentially panmictic, despite being spread over several hundred kilometers. Highland PNG populations also have more mtDNA variability and more mtDNA types represented per founding lineage than coastal PNG populations. All of these observa- tions are consistent with a more ancient, restricted origin of highland PNG populations, internal isolation of highland PNG populations from one another and from coastal populations, and more recent and extensive population movements through coastal PNG. An apparent linguistic effect on PNG mtDNA variation disappeared when geography was taken into account. The high resolution technique for examining mtDNA variation, coupled with extensive geographic sampling within a single defined area, leads to an enhanced understanding of the influence of geography on mtDNA variation in human populations.

ITOCHONDRIAL DNA (mtDNA) has several 1989).This rapid evolutionof mtDNA sequences M properties that make it a useful molecule for meansthat local populationdifferentiation will be inferring the genetic structure and evolutionary his- accelerated,further enhancing the usefulness of tory of human populations. First, mtDNA is readily mtDNAin studying closely relatedpopulations. purified and analyzed because of its high copy numberFourth, theavailability of the complete sequence from and localization to a cytoplasmic organelle. Second, one individual(ANDERSON et al. 1981)enables the the strictly maternal, haploid mode of inheritance of construction of high resolutionrestriction maps by mtDNA (GILESet al. 1980) has two important conse- the sequence-comparison method: by comparing ob- quences:(1) phylogenetic trees that relate mtDNA servedfragment patterns to the fragment pattern types can be interpreted as genealogies reflecting the expected from the sequence, the location (and often maternal history of a population or species (WILSON the exact nature) of nucleotide substitutions respon- et al. 1985; AVISE 1986); and (2) the effective popu- sible for restriction site polymorphisms canusually be lation size for mtDNA is about one-fourth precisely deduced (CANN, BROWN andWILSON 1984; that for nucleargenes (BIRKY, MARUYAMA and STONEKING, BHATIAand WILSON 1986a). It thus be- FUERST1983), leading to a higher rate oflocal differ- comes feasible to map hundreds of cleavage sites per entiation of mtDNA by random genetic drift. Third, mtDNA, allowing many mtDNA types (even closely mtDNA evolves five to ten times faster than nuclear related ones) to be distinguished.’ DNA, with mtDNA mutations appearing to accumu- late in a clocklike manner (BROWN, GEORGEand WIL- ’ The reliability of the sequence-comparison methodof mapping has been questioned by SINCH,NECKELMANN and WALLACE (1987), who discovered SON 1979; BROWNet al. 1982; WILSON et al. 1985, that base substitutions in humannitDNA can occasionally produce small alterations in the electrophoretic mobility of restriction fragments in poly- acrylamide gels. While such conformational mutations can lead to incorrect The publication costs of this article were partly defrayedby the payment map assignments of restriction site polymorphisms, these errors areunlikely ofpage charges. This article must therefore be hereby marked“advertisement” to alter estimates of sequence divergence, mtDNA diversity, or phylogeny, in accordance with 18 U.S.C. 51734 solely to indicate this fact. as discussed by VIGILANT,STONEKING and WILSON(1988). In particular, I Current address: Department of Human Genetics, Cetus Corporation, conformationalmutations will usually notbe mistaken for restriction site 140053rd Street, Emeryville,California 94608. After August 1, 1990: polymorphisms; thus the criticisms of SINCH, NECKELMANN and WALLACE Department of ,The Pennsylvania State University, University [ 1987) have little bearing on theseor previous analyses of nitDNA restriction Park, Pennsylvania 16802. site variation.

Genetics 124: 717-733 [March, 1990) 718 M. Stoneking et al.

The above properties of mtDNA have motivated Papuan or non-Austronesian, consisting of over 750 global surveys of human mtDNA variation (BROWN distinct languages andconfined almost entirelyto 1980; JOHNSON et al. 1983; CANN, STONEKINGand New Guinea and adjacent islands. All of the highland WILSON1987), resulting innew and controversial and most of the coastal PNG populations speak non- perspectives concerningthe evolutionary history of Austronesian (NAN) languages, while Austronesian . Chief among these is the proposal that all (AN) speakers are limited to a few coastal areas and mtDNAs in modern human populations trace back to the offshore islands. a single common ancestor who probably lived in Af- Thisextraordinary linguistic situation has rica some 200,000 yr ago(CANN, STONEKINGand prompted numerous investigations into the genetic, WILSON1987; WILSONet al. 1987;STONEKING and biological and cultural diversity of PNG populations CANN1989). WHITTAMet al. (1986) have also used (KIRK 1980,1982a, b; BHATIAet al. 1981,1988, the high resolution restriction maps to show that most 1989; FROEHLICHand GILES1981; MOURANTet al. of the mtDNA genetic diversity in human populations 1982; WOOD et d. 1982; WURM 1982, 1983; HOPE, fits the predictions of the neutral model and that gene GOLSONand ALLEN 1983; SERJEANTSON, KIRK and flow between different worldwide populations is prob- BOOTH 1983;BHATIA, GOROGOand KOKI 1984; ably limited. CRANEet al. 1985; OHTSUKAet al. 1985; SERJEANTSON The small effective population size and rapid sub- 1985; WOOD,SMOUSE and LONG1985; FOLEY1986). stitution rate suggest that mtDNA should reveal new The resulting wealth of information provides a valu- perspectives on the genetic structure and evolutionary able background with which to compare the results of history of human populations, not on just a global our in-depth study of mtDNA variation in PNG. We scale but also on a more localized level; with that aim report:(1) external and internal isolation of PNG in mind we have been investigating mtDNA variation populations with respect to mtDNA; (2) a probable in Papua New Guinea (PNG).STONEKING, BHATIA Southeast Asian origin of PNG mtDNA lineages, with and WILSON (1986a) have previously described no particularly close phylogenetic relationship be- mtDNA variation in the Eastern Highlands of PNG tween Australian and PNG mtDNA lineages; and (3) and utilized estimates of sequence divergence within an unusual relationship between mtDNA genetic dis- and between PNG-specific clusters of mtDNA types tance and geographic distance separating PNG popu- to derive an intraspecific calibration of the rate of lations that probably reflects ancestral colonization human mtDNA evolution (STONEKING,BHATIA and routes through PNG. WILSON1986b). Here, we present analyses based on a widespread geographic sampling of mtDNA varia- MATERIALS AND METHODS tion from 25 PNG localities, in which the extensive Samples: mtDNAs from 1 19 individuals from Papua New information on the biological and cultural diversity of Guinea were purified from placental tissue by differential PNG (summarized below) provides a valuable back- ultracentrifugation through cesium chloride density gra- ground with which to compare the mtDNA results. dients (BROWN,GEORGE and WILSON 1979; STONEKING 1986). Purified mtDNAs were mapped with 12 restriction The principal geographic features of PNG include enzymes (APPENDIX) by the sequence-comparison method a rugged, mountainous interior (the highlands) and (CANN,BROWN and WILSON1984; STONEKING, BHATIAand lowland tropical rainforests and swamps (the coastal WILSON1986a). The geographic origin, language group, region). The highlands constituteone of the most mtDNA type andmtDNA clan (defined below) of each densely populatedareas, with a total population in individual from Papua New Guinea are presented in Table 1 ; a map of the geographic locations is in Figure 1. Restric- excess of one million people. Highland villages gen- tion maps for 55 of these PNG mtDNAs were previously erally consistof fewer than300 people, located in published by STONEKING,BHATIA and WILSON(1986b) and river valleys at elevations from 1000 to 2500 m be- CANN,STONEKING and WILSON(1 987). Additional mtDNA tween impassable mountain ranges that reach 3000- restriction maps included in some analyses came from four 4000 m in elevation, a situation leading to external other populations (African, Asian, Caucasian and Austra- lian) and have been described by CANN,STONEKING and and internal isolation of PNG populations (ALLEN WILSON(1 987). 1983). Larger settlements (up to 1000 people) exist mtDNA genetic variation within populations:For each on the river systems and estuaries of the coastal re- pair of mtDNA restriction maps, maximum-likelihood esti- gion. mates of the amount of sequence divergence (number of The single most indicative feature of this extreme nucleotidesubstitutions per site)were obtained by the method of NEI and TAJIMA(1 983). This method uses the isolation is the enormous linguistic diversity in PNG. ratio of shared sites to total sites for two restriction maps, Three principal languagegroups comprising over and the averagelength of the restriction enzyme recognition 1500 languages (about one third of the known lan- sequences, to compute an initial estimate of the probability guages of the world) are found in the South Pacific that the two mtDNAs differ at a given nucleotide position. (FOLEY1986): Australoid, consisting of about 200 Starting with this initial estimate, equation 28 of NEI and TAJIMA(1 983)was solved iteratively and the sequence di- languages confined entirely to Australia; Melanesian vergence calculated from equation 2 1. computerA program or Austronesian, consisting of over 600 languages supplied by J. C. STEPHENSwas modified to perform these distributedfrom Southeast Asia to Polynesia; and calculations. Geographic Variation in Human mtDNA 719

TABLE 1 Origin, language group, mtDNAtype, and mtDNA clanof 119 Papua New Guineans

~ -r Language mtDNA mtDNA Language mtDNA mtDNA Individual Origin group type clan Individual Origin group type clan " __ 1 1 NAN 152 18 61 2 NAN 27 5 2 1 NAN 89 10 62 2 NAN 98 10 3 I NAN 39 5 63 13 AN 96 10 4 1 NAN 26 4 64 3 NAN 27 5 5 I NAN 101 IO 65 24 AN 111 10 6 I NAN 105 10 66 24 AN 130 16 7 1 NAN 89 10 67 24 AN 11 1 8 1 NAN 27 5 68 24 AN 130 16 9 1 NAN 109 10 69 18 - 69 9 IO 1 NAN 27 5 70 18 AN 119 11 11 1 NAN 100 IO 71 18 AN 129 16 12 1 NAN 65 8 72 18 AN 93 IO 13 I NAN 27 5 73 18 AN 69 9 14 1 NAN 99 10 74 18 AN I19 11 15 1 NAN 107 IO 75 21 NAN 1 I9 11 16 I NAN 24 2 76 21 NAN 106 10 17 1 NAN 34 5 77 21 NAN 27 5 18 1 NAN 107 10 78 21 NAN 43 6 19 1 NAN 103 10 79 20 NAN 68 9 20 1 NAN 107 10 80 20 NAN 130 16 21 1 NAN 107 10 81 21 NAN 43 6 22 1 NAN 107 10 82 18 NAN 110 10 23 1 NAN 89 10 83 15 NAN 130 16 24 1 NAN 107 10 84 17 AN 119 11 25 1 NAN 114 10 85 19 AN 67 9 26 2 NAN 40 5 86 17 AN 119 11 27 2 NAN 37 5 87 16 AN 27 5 28 2 NAN 108 IO 88 17 AN I27 15 29 2 NAN 27 5 89 19 NAN 130 16 30 2 NAN 115 10 90 17 AN 119 11 31 2 NAN 115 10 91 16 AN I I9 11 32 2 NAN 27 5 92 16 AN 119 11 33 2 NAN 102 10 93 17 AN 150 17 34 2 NAN 45 7 94 19 NAN 130 16 35 2 NAN I I3 10 95 7 AN 33 5 36 2 NAN 116 10 96 22 AN 120 12 37 2 NAN 27 5 97 25 AN 94 10 38 2 NAN 26 4 98 12 AN 32 5 39 2 NAN 27 5 99 6 AN 27 5 40 2 NAN 1 I6 IO 100 23 AN 35 5 41 2 NAN 27 5 101 4 AN I30 16 42 2 NAN 46 7 102 5 NAN 27 5 43 2 NAN 97 10 103 6 - 31 5 44 2 NAN 46 7 104 6 - 95 IO 45 2 NAN 115 IO 105 11 AN I I9 11 46 2 NAN 29 5 106 13 AN 92 10 47 2 NAN 27 5 107 9 NAN 27 5 48 2 NAN 27 5 108 6 AN 26 4 49 2 NAN 112 IO 109 10 AN 130 16 50 2 NAN 27 5 I10 9 NAN 25 3 51 2 NAN 1 I5 10 111 5 NAN 104 IO 52 2 NAN 88 10 112 9 AN 119 11 53 2 NAN 41 5 113 6 NAN 30 5 54 2 NAN 38 5 114 3 NAN 27 5 55 2 NAN 27 5 I15 13 AN 114 IO 56 2 NAN 27 5 I I6 6 - 36 5 57 2 NAN 28 5 1 I7 8 AN 121 13 58 2 NAN 42 5 118 5 NAN 27 5 59 2 NAN 90 10 119 14 AN 122 14 60 2 NAN 91 IO Origin numbers correspond to the map in Figure 1. Language groups are Austronesian (AN), non-Austronesian (NAN), or unknown (-). mtDNA types were determined by restriction site polymorphisms (APPENDIX) and mtDNA clans were determined by phylogenetic analysis as described in the text; mtDNA type numbers and clan numbers correspond to those in Figure3. 720 M. Stoneking et al.

PAPUA NEW GUINEA

FIGURE1 .-Map of Papua New Guinea showing the sampling localities included in this study. Circled numbers are keyed to Table 1. Shaded areas indicate that many villages were sampled within each region.

Estimates of mean “heterozygosity” and the associated PALUMBI1985). G.w valuesbased on mtDNA types were variance were obtained using equations 1 1 and of2 ENGELS1 calculated according to equations 7.39, 7.40, and 8.27 of (1981). While there are no heterozygotes for mtDNA re- NEI (1987) by treating the entire mtDNA molecule as a striction site polymorphisms, ENGELS’definition of mean single locus, with the number of mtDNA types equivalent heterozygosity is nonetheless a meaningful and useful meas- tothe number ofalleles. GST valuesbased on mtDNA ure of mtDNA variability, for the following reasons: (1) it restriction maps, treating each restriction site as a separate corrects for an ascertainment bias that arises when using locus with two alleles(presence or absence of the site), were restriction enzymes to measure variability; (2) it is free of calculated according to TAKAHATAand PALUMBI(1985); assumptions concerning the evolutionary process or the their approach is designed specifically for mtDNA restric- genetic structure of the population; and (3) thevariance of tion site polymorphisms and takes into account the ascer- the estimate of heterozygosity can be easily determined, tainment bias noted by ENGEM(1 98 1). allowing evaluation of the statistical significance of differ- Since the variance of the GST statistic is unknown, the ences in levels of heterozygosity between populations. By approximate statistical significance of observed G.w values contrast, the variance of the mean sequence divergence for was estimated by a variation of the bootstrap procedure. a population is extremely difficult to calculate, because the Individuals weresampled without replacement and ran- estimate of mean sequence divergence is obtained by aver- domly assigned to populations such that equivalent sample aging allpairwise comparisons between individuals and sizes were maintained; this procedure was then repeated hence these observations are not statistically independent 1000 times to generate a distribution of G.v values, permit- (TAKAHATAand NEI 1985; M. NEI, personal communica- ting an ad hoc assessment of the statistical significanceof an tion). observed Gsr value (seeRESULTS). mtDNA geneticvariation between populations: The Three different measures of mtDNA genetic distance proportion of the total mtDNA variation attributable to between populations were used. A measure of genetic dis- between-population differences was measured using the GsT tance based on mean sequence divergence (NEI and TAJIMA statistic of NEI (1973). CsT is a convenient measure of 1983) is population differentiation becauseit reaches equilibrium = d, (d, 4)/2 (1) quickly (CROWand AOKI 1984) and is relatively insensitive DNT - + to the number of population subdivisions (TAKAHATAand where d, and d, are the mean sequence divergence within Geographic Variation in Human mtDNA 72 1

TABLE 2 mtDNA variation in five human populations

Sequence Sample Polymorphicdivergence mtDNA Heterozygosity Populationsites size ‘Y pes (%) (k 1 SD) African 20 71 20 0.47 0.47 f 0.03 34 78 Asian 34 34 0.35 0.35 f 0.02 CdUCdSfdIl 4743 80 0.23 0.27 f 0.02 Austrdlian 21 48 20 0.25 0.27 * 0.02 PNG 119 85 65 0.21 0.22 f 0.01 24 1 230Worldwide 1 24 182 0.27 0.32 f 0.01 populations x and y, d, is the observed mean sequence age PHYLIP (FELSENSTEIN1987). Fitch-Margoliash trees divergence between these two populations, and DNT is thus were computed from the three measures of genetic distance mean sequence divergence corrected for intrapopulation described above by the PHYLIP program FITCH. variation (6 WILSON et al. 1985). A measure of genetic distance based on the probability of allelic identity of RESULTS mtDNA restriction sites (TAKAHATAand PALUMBI 1985) is Restriction site polymorphisms: The 12restriction DTI, = -lOge(J/I) (2) enzymes detect an average of nearly 370 sites per where J is the probability of allelic identity between two human mtDNA, which represents approximately 9% populations and I is the geometric mean of the probability of the human mtDNA . There were85 poly- of allelicidentity within each population. The third measure morphic sites in the sampleof 1 19 PNG mtDNAs and of genetic distance is derived from the genetic covariance these defined 65 mtDNA types (APPENDIX),where an or coefficient of kinship (HARPENDING andJENKINS 1973), defined as mtDNAtype consists of a distinctcombination of polymorphic sites. ThemtDNA type of each New rq = (Pz- P)(PJ- P)/P(l - P) Guinean appears in Table 1. No individual appeared where P, and Pj are thefrequencies of a particular restriction to possess more than one mtDNA type, and none of site polymorphism in populations i and j and P is the mean these 65 PNG mtDNA types has been found in any frequency of the polymorphism in the combined population. The ri, values are averaged over all polymorphic sites, re- other survey of human mtDNA variation that utilized sulting in the following measure of genetic distance: the same (or largely the same) set of restriction en- zymes (HORAI and MATSUNACA1986; CANN,STONE- Dl, = ri, + rj, - 2ri, (3) KING and WILSON 1987). where rli and ru are the genetic variances withinpopulations mtDNA variation in worldwide populations:The i and j and r,, is the genetic covariance between populations sample size, number of polymorphic restriction sites, i and j. Correlation between distance measures: The statistical number of mtDNA types, mean sequence divergence, significance of correlation coeffkients derived from com- and mean heterozygosityare given in Table 2 for each paring various distance measures (genetic, geographic and of the five populations and for the total sample. In linguistic) cannot be easily assessed, becausethe observations generalthe estimates for meanheterozygosity are consist of all possible pairwise comparisons of the set of nearly the same as (although somewhat larger than) populations and hence are not independent. We therefore used the MANTEL (1967) testof matrix correspondence the mean sequence divergence estimates, in accord- (SMOUSE, LONGand SOKAL1986) to determine thestatistical ance with theoretical expectations if each restriction significance of correlation coeffkients involving distance site polymorphismis due to change aat single nucleo- matrices. To assess the significance of a correlation coeffi- tide position (ENGELS1981). Using the standard er- cient derived by comparing two distance matrices, Monte rors of the meanheterozygosities to construct approx- Carlo randomization is used to construct a null distribution by holding one matrix constant while randomly permuting imate95% confidence intervals, we findthat: (1) the rows and columns of the other matrix. The statistical Africans are significantly more variable than any other significance is then determined by comparing the observed population; (2) Asians are significantly more variable correlation coefficient to the randomly generated distribu- than the remaining populations; and (3) Caucasians tion of correlation coeffkients. and Australiansdo not differ from each other butare Phylogeneticanalysis: Maximum-parsimony trees de- picting the genealogical relationships of the mtDNA types significantly more variable than the PNG population. defined by restriction site polymorphisms were constructed PNG is thuscharacterized by the lowestlevels of using the computer program PAUP (SWOFFORD1985). The mtDNA variability. network produced by PAUP was rooted by the midpoint A GsT value of 0.31 was calculated from theaverage method and hypothetical ancestral nodes were positioned to reflect the average sequence divergence (NEI and TAJIMA probability of allelic identity of mtDNA restriction 1983) between the descendant mtDNA types. maps (TAKAHATAand PALUMBI 1985) within and be- Maximum-likelihood trees relating populations were gen- tweenthese five populations (Table 3). This value erated using the CONTML program in the computer pack- meansthat roughly 30% of thetotal variance in 722 M. Stoneking et al.

TABLE 3 GSTvalues for human mtDNA, removing each population in turn

Population removed PI - PJ HO HT GST None0.836 0.895 0.105 0.152 0.31 0.898 0.838 0.102 African0.838 0.898 0.147 0.31 Asian 0.1070.893 0.828 0.31 0.156 Caucasian 0.897 0.3 0.8360.149 0.103 1 Australian 0.888 0.827 0.1 12 0.158 0.29 PNG 0.900 0.850 0.100 0.137 0.27 P, is the average probability of allelic identity within each population and P, is the average probability of allelic identity between each pair of populations; Ho and HT refer to the average variability within individual populations and within the total population, respectively. All values were calculated according to the equationsin TAKAHATAand PALUMBI (1985).

150

100 .c, C 3 0 0 50 Observed A- h Value

r 0.26 0.27 0.28 0.29 0.30 0.31

FIGURE2,"Histogram of G,STvalues obtained by random resampling of the rntDNA data. Individuals were assigned at random without replacement to five groups with sample sizes identical to the five worldwide populations in the study. G,YTvalues were calculated by the method Of TAKAHATA and PALUMBI(1985) and the above procedurewas repeated 1000 times to generate the distributionof G,YTvalues. The observed value, 0.31, was greater than any of the randomly-generated values and hence indicates geographic structuring of IntDNA in worldwide populations at an approximate significance level of P < 0.001, mtDNA restriction maps distinguishes human popu- 1000 times, generating a distribution of GST values lations and about 70% occurs within populations. (Figure 2). The mean of these randomly-generated Each population was removed in turn from the GsT values was 0.26 and none was as large as the analysis and the G.yT value recalculated, to investigate observed value of 0.31. Since by this methodthe the relative contribution of each population to theGsT probability of observing a GST value as high as 0.31 is value (Table 3). Removing Africans, Asians or Cau- less than one in 1000, we conclude thatthere is casians did not alter the Gsr value, whereas removing statistically significant geographicstructuring of Australians decreased it by 6.5% and removing PNG mtDNA restriction site polymorphisms with respect decreased it by 13%. Thus, PNG and, toa lesser to these five populations. extent, Australia are isolated with respect to the other Further evidence forthe significant geographic populations. structure of human populations based on mtDNA To assess the statistical significance of the observed comes fromthe distribution of mtDNA types, In- GST value, individuals from the worldwide sample of dividuals fromdifferent populations always had 241 mtDNAs were assigned at random without re- differentmtDNA types, and individuals with iden- placement to five populations with sample sizes cor- tical mtDNA types always came from the same pop- responding to the five surveyed populations and the ulation. GsTvalue was calculated; this procedure was repeated Tree of mtDNA types: A relating Geographic Variation in Human mtDNA 723

0 African o Asian a Australian A New Guinean Caucasian

0- 0.2 0.4 0.6 0.0- 0.4 0.2 0 % Sequence Olvorgence % Sequonco Olvergonce FIGURE3,"Phylogenetic tree relating 182 mtDNA types found in 241 individuals from five worldwide populations. The branching order was obtained from 109 phylogenetically informative restriction site polymorphisms using the PAUP computer program (SWOFFORD1985). This parsimony tree, which was rooted by the midpoint method, has a length of 363 changes at these informative sites and a consistency index of 0.3. Although no shorter trees were found, fora data set of this size it is impossible to guarantee that shorter trees do notexist, and the number of trees of equal (or nearly equal) length is probably large. Inferred ancestral nodes are positioned approximately with respect to the scale of sequence divergence. Asterisks indicate rntDNA types found in more than one individual, white circled numbers identify the 18 New Guinean n1tDNA clans. the 182 mtDNA types found in our worldwide sample example, PNG mtDNAs appear at 18 separate loca- of 241 humans appears in Figure 3. This tree, like its tions in the tree. We have used the term "clan" to predecessors that were based on fewer individuals refer to a group of mtDNA types, all from one geo- (STONEKING,BHATIA and WILSON 1986b;CANN, graphic region, whose nearest relative is from a dif- STONEKINCand WILSON 1987;WRISCHNIK et al. ferentgeographic region (WILSONet al. 1987). As 1987), possesses two main features. First, thetree indicated in Figure 3, thereare 18 PNG mtDNA contains two primary branches, one consisting solely clans. We further assume that each clan represents of African mtDNAs and the other consisting of all of the descendants of a different mtDNA lineage that the other mtDNAs, including some Africans. This is colonized a particular geographic region; the number one line of evidence that led to the hypothesis of an of mtDNA clans is thus an estimate of the minimum African origin for human mtDNA (CANN, STONEKINCnumber of females that colonized that region. For and WILSON1987; WILSONet al. 1987;STONEKING example, the above clan analysis indicates that a min- and CANN 1989). imum of 18 females colonized PNG. We have used The second feature of the tree in Figure 3 is that this concept of an mtDNA clan to derive estimates of mtDNAs from each population do not form a single the rate of human mtDNA evolution (STONEKINC, clade but rather appear scattered across the tree. For BHATIA and WILSON1986b; WILSONet al. 1987; 724 M. Stoneking et al. STONEKINGand CANN 1989) and the rateof dispersal of human populations (STONEKINGand WILSON1989); other insights that derive fromthe concept of mtDNA clans are discussed below. Geographic origin of founding lineages: The phy- logenetic tree of mtDNA types (Figure 3) can be used toexamine the geographic origin of themtDNA lineages that colonized PNG. The geographic distri- bution of the nearest relative(s) of the 18 PNG clans is shown in Figure 4. For every clan the most closely related mtDNA type(s) from outside PNG are either exclusively Asian or include Asian mtDNA types. By contrast, Australian, African, and Caucasian mtDNA types arethe nearest relative (together withAsian mtDNA types) of only two, six and three PNG clans, Asian AustralianAfrican Caucasian respectively. A x* analysis reveals that this distribution In of nearest relatives differs significantly from the ex- Nearest Relative of Clan pected distribution based on the sample sizes of these populations (P 0.001). This statistical support for FIGURE4,"Histogram of the number of occurrences of each < geographic population as a nearest relative of a PNG mtDNA clan. an Asian origin of PNG clans is retained even when the 18 PNG clans in Figure 3 are grouped intolarger HighlandsI is not simply due to more mtDNA types aggregations of closely related clans (data not shown). being present in the Highlands, as the ratio of sample Furthermore, five of the six cases in whichan African size to mtDNA types is essentially identical in the mtDNA type is anearest relative of aPNG clan Highland and Coast populations (Table 4). However, involve type 124 (Figure 3). This African-American the ratio of mtDNA types to mtDNA clans is over mtDNA type carries an Asian-specific deletion (WRIS- twice as high in the Highland population as in the CHNIK et al. 1987) and is thus probably of maternal Coast population, because the Highland population Amerindian origin, since the deletion occurs in Amer- has only half as many mtDNA clans represented. On indians (M. STONEKING,S. PAABo, L. VIGILANT,C. E. average, fewer founding maternallineages are present BECK, C. ORREGOand A.C. WILSON,unpublished in the Highlands, but each founding lineage has dif- results). Asia is thus overwhelmingly indicated as the ferentiated into many more types than is true for the source of PNG mtDNA lineages. Coast population. This is consistent with an earlier, mtDNA variation in PNG populations: The 1 19 morerestricted origin of the Highland population PNG individuals came from25 different localities (see DISCUSSION). (Figure l), which were groupedinto the following mtDNA differentiation betweenPNG popula- five geographic subdivisions or "populations" in order tions: The GST value for the five PNG populations is to increase sample sizes to statistically meaningful 0.28 for mtDNA sites and 0.08 for types (Table 5), levels yet still retain geographically relevant divisions: indicating thatabout 72% of the total variance in Eastern Highlands (EH), locality 1;Southern High- mtDNA restriction maps and 92% of the total vari- lands (SH), localities 2and 3; North Coast (NC), ance in mtDNA types in PNG occurs within these five localities 4-6, 9, 12 and 13; South Coast (SC), locali- populations. Random resampling of the data reveals ties 15-21; and offshore islands (IS), localities 7, 8, that these GsT values are statistically significant (P < 10, 11, 14 and 22-25. The sample sizes, number of 0.001), indicating that there is geographic structure mtDNA types and clans, ratio of mtDNA types to in PNG populations with respect to mtDNA. Removal clans, and mean heterozygosity for these five PNG of each population in turn from the GST analysis does populations are given in Table 4 (estimates of mean not reveal any significant patterns (data not shown). sequence divergencewere identical or nearly so to the However, GSTvalues for the Highland-Coast compar- estimates of mean heterozygosity and therefore are ison and for the EH-SH comparison are also statisti- not given). The two highland populations are themost cally significant, but the GsT values for a comparison variable, with the NC next most variable and the SC of the three coastal populations are not (Table 5). and IS populations having the least variability. Divid- Thus while the highland and coastal populations, and ing the total sample into a Highland population (con- even the two highland populations, are strongly dif- sisting of the EH and SH groups) and a Coast popu- ferentiated with respect to mtDNA, the three coastal lation (consisting of the NC, SC and IS groups) reveals populations essentially comprise a single panmictic thatthe Highland population is significantly more population. variable than the Coast population. Three measures of mtDNA genetic distance be- The greater amount of mtDNA variability in the tween each pair of PNG populations are in Table 6; Geographic VariationGeographic in Human mtDNA 725

TABLE 4 mtDNA variation in PNG populations and language groups

(4 (4 (4 Sample Heterozygosity mtDNA Ratio mtDNA Heterozygosity Sample mtDNA Ratio Population size (k 1 SD) types clans alb blc EH 0.24 25 f 0.02 16 6 1.6 2.7 SH 0.21 39 f 0.01 22 5.5 4 1.8 SC 26 8 0.17 f 0.02 13 2.0 1.6 NC 17 0.19 f 0.02 14 6 1.2 2.3 IS 0.17 12 f 0.02 10 8 1.2 1.2 Highland 64 7 0.24 f 0.01 36 1.8 5.1 Coast 0.19 55 f 0.01 32 14 1.7 2.3 NAN 0.23 81 f 0.01 45 12 1.8 3.7 AN 34 0.15 f 0.01 22 12 1.5 1.a

TABLE 5 GSTvalues for PNC populations and language groups

mtDNA sites mtDNA types Comuarison Observed Random Observed Random EH-SH-NC-SC-IS 0.24 0.08***0.28*** 0.04 Highland-Coast 0.16 0.21*** 0.01 0.03*** EH-SH 0.02 0.14 0.19** 0.04** IS-SC-NC 0.21 0.20 0.04 0.06 NAN-AN 0.16 0.04*** 0.23*** 0.01 Highland-Coast (NAN only) 0.22*** 0.16 0.02 0.03* NAN-AN (Codst only) 0.15 0.140.05 0.02 GSTvalues were calculated for mtDNA restriction site polymorphisms and mtDNA types as discussed in the text. Random G,,T values represent the mean of 1000 random resamplings of the data for each comparison. *, ** and *** indicate that fewer than 50, 10 and 1, respectively, of the 1000 randomly generated Gsr values exceeded the observed value, and thus roughly correspond to P < 0.05, P < 0.01 and P < 0.00 I. branching diagrams constructed from these genetic tween each pair of PNG populations appears in Figure distances are in Figure 5. While these branching dia- 6. The correlation for a linear relationship between grams will be referred to as trees, it should be empha- genetic distance and geographic distance was not sta- sized that they are not truephylogenies (ie., indicating tistically significant by the MANTELmatrix regression only sharedancestry). Rather, they serve as useful approach (R2= 0.006, P = 0.86); however, the cor- illustrations of overall similarity between PNG popu- relation fora quadratic model that included the lations, with such similarity reflectingboth shared square of geographic distance was significant (R2 = ancestry and migration. 0.663, P < 0.01). Correlations for linearand quadratic The trees in Figure 5 agree in placing the IS and models that included linguistic distance either alone SC populations together but they differ in the place- or in conjunction with geographic distance were not ment of the NC population. The NC population significant (data not shown). groups with the coastal populations in the tree based Inspection of the plot in Figure 6 reveals the reason on TAKAHATA-PALUMBI(DTp) distances and with the for this significant quadratic relationship,namely pop- highland populations in theother trees, actually ulations that are either close together or far apart grouping more closely with the SH population than have a relatively low genetic distance and populations does the EH population in the trees based on NEI- intermediate in geographic distance have a relatively TAJIMA(D,~T) and HARPENDING-JENKINS(DHJ) dis- high genetic distance. In particular, the two popula- tances. This discrepancy may reflect differences in the tions with the smallest genetic distances overall (Table assumptions underlying each distance measure, as dis- 6), the SC and IS populations, are also separated by cussed below. the largest geographic distance. The otherpopulation mtDNA variation and geographic distance: Hav- pairs with relatively low genetic distances include the ing demonstrated significant differences in the distri- three comparisons involving theEH, SH and NC bution of mtDNA polymorphisms and types between populations. Although the number of comparisons is geographic areas of PNG, we can investigate the re- small, this quadratic relationship between genetic and lationship between geographic distance separating geographic distance may actually reflect the historical PNG populationsand mtDNA geneticdistance. A plot settlement pattern of PNG (see DISCUSSION). of genetic distance versus geographic distance be- Distribution of identical mtDNA types: Twelve of 726 M. Stoneking et al.

TABLE 6 Genetic and geographic distances betweenPNG populations

Geographic distance Populations DNT DTP DHJ (W EH-SH 0.000 190.15952 0.03049 220 EH-NC 0.0001 2 0.02552 0.13427 240 EH-SC 0.00032 0.02249 0.19152 360 EH-IS 0.00023 4400.02480 0.20556 SH-NC -0.00003 0.03 126 0.10267 200 SH-SC 0.00033 0.03408 0.18076 560 SH-IS 0.00027 0.02881 0.17392 360 NC-SC 0.151530.00022 0.01826 620 NC-IS 0.00016 0.01540 0.14873 160 sc-IS -0.00004 0.0 1326 800 0.1 1654

DNr, &I’ and DI~Jwere calculated according to Equations 1, 2, and 3, respectively. Q

C FIGURE5.-Four unrooted, branchingnetworks (ie.,distance trees) illustrating patterns of similarity between five PNG populations. These distance trees were constructed using the PHYLIP package of computer programs (FELSENSTEIN1987). A, Maximum-likelihood tree; B, Fitch-Margoliash (FM) tree of DNr distances from Table 6 (with 0.0005 added to each value to avoid negative numbers); C, FM tree of DTp distances from Table 6; D, FM tree of D,distances from Table 6. The populations include two from the highlands (EH and SH, circles), two from the coast (IS and SC, squares), and the NC population (triangle), which groups with the highland populations in some trees and with the coastal populations in other trees.

the 65 PNG mtDNA types occurred more than once; geographic distance onmtDNA, we comparedthe the distribution of these 12 types within the five PNG distribution of geographic distances separating pairs populations is shown in Figure 7. The probability of of individuals with identical mtDNA types tothe allelic identity, based on mtDNA types, is 0.1238 distribution of geographic distances separating all within the five PNG populations, 0.0012 among the pairs of PNG individuals (Figure 8). These two distri- five PNG populations, and 0.0 between PNG and butions are significantly different (x2= 183.85, d.f. = other worldwide populations (ie., none of the PNG 10, P < 0.001), with a much higher proportion of mtDNA types have been found in any other popula- individuals with identical mtDNA types living within tion). The probability of identity thus decreases by 100 km of each other. The average geographic dis- two orders of magnitude when going from compari- tance between individuals with identical mtDNA types sons within PNG populations to comparisons between is 239 km, much less than the overall average distance PNG populations. of 383 km between PNG individuals in this study. As a means of further investigating the influence of mtDNA variation and language groups: The lan- Geographic Variatioln in Human mtDNA 727

o-22 0 Oa20 -t 0 1 0 K 0 3 U 0.4 2 LL

0.2

Geographic Distance (km) 0 .o FIGURE fi."l'lot of.grnc.tic tlist;lnce (D//,)vs. geogr;lphic dist;lnce for five PSG popu1;ttions. guage spoken by the mothers of 115 of the 1 19 PNG MtDNA type individuals was known and could be classified as either FICI~RE7.-Frequencies of mtDNA types that occurred in more Austronesian (AN) or non-Austronesian (NAN); the th;w one individual. Types are numbered according to the tree in sample sizes, number of mtDNA types and clans, ratio Figure 1. The height of each shaded bar indicates the frequency of of mtDNA types to clans, and mean variability for that type in that particular population. these two groups are in Table 4. The NAN speakers DISCUSSION are significantly more variable than the AN speakers and have more mtDNA types represented per clan. Significant geographic structure in worldwide The GST value for the comparison of these two lan- populations: The analysis of mtDNA variation in the guage groups is 0.23 for mtDNA restriction sites and worldwide sample from five geographic localities in- 0.04 for mtDNA types (Table 5). Based on random dicated statistically significant structuring of these five resampling, both of these values are statistically sig- populations with respect to mtDNA; the observed GST nificant (P< 0.00 l), indicating that thereis significant value of 0.31 indicates that about a third of the total differentiation between these two languagegroups variation is attributable to between-population varia- with respect to mtDNA. tion. From a subset of thesedata WHITTAMet al. However, a caveat to this last finding is that the (1 986) calculated a Gsrvalue of 0.063. The difference analysis is confounded by a geographic effect: all of between their estimate and our estimate is probably the AN speakers were from the coast, whereas 64 of due to the differentdefinitions of loci and alleles; the the NAN speakers were from the highlands and only TAKAHATAand PALUMRI (1 985) method we used es- 17 were from the coast. Since we have already dem- sentially treats each cleavage site as a locus with two onstrated that the highland and coastal populations alleles, while WHITTAMet al. (1986) used each func- are significantly differentiated with respect to tional region of the mtDNA genome as a locus, defin- mtDNA, it is necessary to account for this geographic ing alleles on the basis of distinct combinations of distinction. Two additional analyses in Table 3 dem- cleavage sites mapping within each region. onstrate that the significant differences observed be- The GST value for nuclear genes is about 0.10 (NEI tween NAN and AN speakers with respect to mtDNA andROYCHOUDHURY 1982), which is considerably are in fact due to this geographic distinction. First, if smaller than the mtDNA Gsr valueof 0.31. The one considers only the NAN speakers, the Gs7 values greater between-population differentiation displayed forthe comparison of highland NAN speakers to by mtDNA is in accord with expectations as it presum- coastal NAN speakers are still statistically significant, ably reflects the faster evolution and smaller effective indicating that the geographic distinction between population size of mtDNA (due to the maternal, hap- highland and coastal PNG mtDNAs persists within loid inheritance). TAKAHATAand PALUMRI (1 985) de- NAN speakers. Second, if one considers only the scribe how the GsT value for mtDNA can be used to coastal mtDNAs, the GSTvalues for the comparison of derive an estimate of the minimum number of female NAN speakers to AN speakers are no longer statisti- migrants exchanged on average between populations cally significant. Thus geography, not language, ap- per generation (ATtmc),while the Gsr value for nuclear pears to be the primary factor influencing the distri- DNA gives a similar estimate of the total number of bution of mtDNA variation in PNG. migrants (male plus female). From the G,sT values given 728 M.Stoneking et al. Other evidence bearing on the relatedness of Aus- 50 1 tralian and PNG populations does not give a clear-cut picture. The linguistic evidence for any Australian- PNG association is tenuous at best (FOLEY 1986). Analyses of blood group, serum protein, and redcell enzyme variation do tend to associate Australia and PNG (KEATS1977; KIRK 1979; NEI and ROYCHOUD- HURY 1982; KAMBOHand KIRK 1983a, b, 1984;KAM- BOH and KIRWOOD 1984; KAMBOH, RANFORDand KIRK 1984; NEI 1985; CAVALLI-SFORZAet al., 1988; CHEN et al. 1990). However, the geneticdistance separating Australian and PNG populations tends to bequite large, comparable to the average genetic distance separating Asian and Caucasian populations (e.g., NEI 1985; CAVALLI-SFORZAet al. 1988). In par- ticular, CAVALLI-SFORZAet al. (1988)found that a bootstrap analysis supported the association of Aus-

0000 tralia 000and PNG only about 50% of the time. Further- 000 0000 more, HLA haplotypes show virtually no association 000 lI Inn,,0000 ,?? ??W ? ???? between Australia and PNG (SERJEANTSON,RYAN and 000000 0 00 I A 00 000 0 000 -Nlilnl mem a3 hmo THOMPSON1982; CRANE et al. 1985; SERJEANTSON In 1985). NEI (1985) suggests that the large genetic distance Geographic distance(km) between Australia and PNG is caused by inbreeding FIGURE8,”Distribution of geographic distances between pairs anddrift due to isolation andreduced population of New Guineans with identical rntDNA types (solid bars) and sizes, and that HLA has been subject to selection, between all pairs of New Guineans (hollow bars). thereby distorting ancestral relationships. While drift above, N,m, is 1.11 for mtDNA and 2.25 for nuclear and inbreeding(and selection on HLA)have undoubt- DNA. Similarly, an estimate of the effective popula- edly played arole, the mtDNA results pointto a tion size of human females is about 6,000 (WILSONet different explanation: additional migrations (primar- al. 1985), which is about half the estimate of 10,000 ily through PNG) in the 8,000 yr since the physical from nuclear DNA for males and females (NEI and separation of Australia and New Guinea have de- GRAUR 1984). It shouldbe emphasized that these are creased the genetic association between Australia and only rough estimates, since the assumptions used to PNG. derive them (e.g., equilibrium) are no doubt violated It is known from linguistic and archaeological evi- by human populations. Furthermore, differences in dencethat major population movements occurred the methods used to sample genes and calculate Gsr through both the highland andcoastal regions of PNG values can have a large influence on the results, as overthe past 10,000 yr (WURMet al. 1975; KIRK when our results are compared tothose of WHITTAM 1980, 1982a, b; WURM 1982, 1983). The last major et al. (1986).A high-resolution sampling of both migration of NAN speakers through the highlands mtDNA and nuclear DNA genes from thesame world- occurred about 5,000 yr ago, while there were two wide populations, with G.qT values calculated by similar major migrations of AN speakers through the coastal methods, is needed to verify the above findings. regions 5,000 yr ago and 3,500yr ago (SERJEANTSON, Lack of association between Australian andPNG KIRK and BOOTH1983; WURM 1983; FOLEY1986). mtDNA lineages: The phylogenetic analysis of PNG These migrations may have considerably altered the mtDNA lineages points to a Southeast Asian origin genetic structure of PNG populations, accounting for and does not support aparticularly close phylogenetic the lack of a close genetic relationship between Aus- relationship between PNG and Australian mtDNA tralia and PNG mtDNA lineages. lineages. This lack of a strong Australia-PNG associ- Indeed, a 9-bp deletionin a small noncoding region ation is surprising, since at the time of the first colo- of human mtDNA may provide a markerfor the most nization of thearea by humans,approximately recent population movement alongthe coast of PNG. 40,000-50,000 yr ago (WHITEand O’CONNELL1982; Phylogenetic analysis indicates that this deletion arose GROUBEet al. 1986), Australia and New Guinea were only once and it appears to occur only in populations a single land mass, Sahul, and were separated by rising of Asian origin (WRISCHNIKet al. 1987; STONEKING sea levels only 8,000 yr ago (WHITEand O’CONNELL and WILSON 1989). The deletion occurs at high fre- 1982). This presumably would not be sufficient time quency in coastal PNG but is absent from highland to erase the association between Australian and PNG PNG andfrom Australia (HERTZBERCet al. 1989; populations. STONEKINCand WILSON1989). It also occurs in In- Geographic Variation in Human mtDNA 729 donesia (M. STONEKING,A. SOFROand A. C. WILSON, occur between highland and coastal populations, with unpublished results) and is nearly fixed in Polynesia F,yT values (similar to G.$T values) ranging from 0.02 to (HERTZBERGet al. 1989). Thus, thedistribution of the 0.04 (summarized in JORDE 1980). deletion mirrors the presumed path of the last AN A caveat tothe overall agreement between the migrationfrom Southeast Asia through Indonesia, mtDNA results and the above studies of variation in along the coast ofNew Guinea, and then through the products of nuclear genes concerns differencesin Melanesia into Polynesia (BELLWOOD1987). The lack methodology. A more direct comparison is provided of the mtDNA deletion in both highland PNG and by the study of restriction site polymorphism in nu- Australia indicates a stronger genetic association be- clear DNA by HILL (1986), who provided allele fre- tween these two regions, as suggested previously quencies at five unlinked loci for five highland and (KIRK 1980, 1982a, b). Studies of mtDNA variation three coastal populations. From his data we calculate in any highlandPNG populations that are derived GsT values of 0.03 for the highland populations and from the primary colonization of Sahul and isolated 0.01 for the coastal populations. Thus, as with the from later migrationsmight reveal even closer genetic mtDNA results, highland populations aremore associations between Australia and PNG. strongly differentiated from one another thancoastal Geographic structuring of coastal and highland populations are differentiated from one another.Fur- populations: The analysis of mtDNA variation within thermore, PNG populations show more differentia- PNG populations showed that highland and coastal tion with respect to mtDNA than nuclear DNA, as populations are strongly differentiated with respect to was the case previously with GsT analyses based on the mtDNA.Highland populations have significantly global survey of human populations. more mtDNA variability than coastal populations and The genetic distance analyses of the PNG popula- each highland mtDNA lineage is represented on av- tions (summarized in the branching diagrams in Fig- erage by nearly twice as many mtDNA types as each ure 5) were not completely concordant, contrary to coastal mtDNA lineage (Table 4). Theseresults prob- the results commonly obtained when various distance ably reflect the older colonization of the highlands measures are applied tothe same data set UORDE and more frequentrecolonization of the coast, for the 1985). The discrepancy basically concerns the place- following reason: assuming no additional migration ment of the NC population: the TAKAHATA-PALUMBI following colonization, mtDNA lineages will be lost distances place this population with theother two through time due tostochastic processes (AVISE,NEI- coastal populations (SC and IS), whereas the NEI- GEL and ARNOLD1984; STONEKINGand WILSON TAJIMAand HARPENDING~ENKINSdistances, and the 1989), whilenew mtDNA types will be gained via maximum-likelihood analysis, place the NC popula- mutation.Thus, the number of mtDNA clans will tion with the two highland populations (EH and SH). decrease through timewhile the ratioof mtDNA types One possible explanationconcerns the assumptions to clans will increase through time. Additional migra- underlying each of the methods. The DNT, DHJ and tions through the coastal region (discussed above) will maximum-likelihood analyses all basicallyassume that increase the numberof clans but not the ratioof types populations are evolving independently by mutation to clans. and drift; only the DTPanalysis explicitly assumes that Highland populations are sufficiently isolated that migration is occurring as well. The assumptions of the even the Eastern and Southern Highland populations, latter method would thus appear to be more realistic located about 200 km apart, are significantly differ- for the present situation. entiated with respect to mtDNA. By contrast, coastal Unusualrelationship between genetic and geo- populations located over a rangein excess of1000 km graphicdistance: The analysisof genetic distance are not significantly differentiated with respect to with respect togeographic distance between PNG mtDNA, presumably because there has notbeen populations showed an unusual quadratic relationship enoughtime and/or because migration between with populations located either close together or far coastal populations is sufficient to prohibit such dif- apart being more similar genetically than populations ferentiation. located at intermediate distances (Figure 6). It is im- These results are in excellent agreement with stud- portantto keep in mind that this peculiar pattern ies of blood group, serum protein, red cell enzyme, might simply be due to chance, since the number of andHLA variation in PNG populations. It is well- populations compared is relatively small.Also, the known that PNG populations tend to exhibit large Euclidean measure of geographic distance probably between-populationdifferences (GILES, OGANand does not accurately reflect the actual travel distances STEINBERG 1965; SIMMONSet aZ. 1972; BOOTH1974; between PNG populations. However,the pattern WIESENFELDand GADJUSEK1976; KEATS 1977; KIRK shown in Figure 6 is consistent with the genetic dis- 1980, 1982a, b; MOURANTet al. 1982; WOODet al. tance analyses in that the two distinctly coastal popu- 1982; SERJEANTSON,RYAN andTHOMPSON 1982; SER- lations (IS and SC) are among themost closely related JEANTSON, KIRKand BOOTH1983; CRANE et al. 1985; populations,despite being located several hundred SERJEANTSON1985). The most extremedifferences kilometers apart. It appears that this reflects the pat- 730 M. Stoneking et al. tern of migrations to and across PNG. The earlier We thank N. DAS, K. EDWARDSand R. L. KIRK for assistance colonization of the highlands permitted more differ- with the collection of tissues, S. KURTZ fortechnical assistance, K. entiation between highland populations. Later, sepa- HESTIRfor statistical advice, J. C. STEPHENSand D. SWOFFORDfor computational assistance, and M. P. ALPERS,R. L. CANN,W. R. rate migrations tothe coastal regionsresulted in ENGELS, U. GYLLENSTEN,H. OCHMAN, E. M. PRAGER,A. ROGERS, greater differentiation between highland and coastal V. M. SARICH, G.THOMSON and L. VIGILANTfor useful discussion. populations but little or no differentiation between This research received support from National Science Foundation various coastal populations. and National Institutes of Healthgrants to A.C.W., and from In addition to geography, we also investigated the National Institutes of Health grant GM-39245 andNational Science Foundation grant BNS-8720330 to L.B.J. M.S. was supported by a contribution of linguistic differences to mtDNA dif- National Institutes of Health postdoctoral fellowship. ferentiation between PNG populations, andfound thatthe apparently significant differences between NAN mtDNAs and AN mtDNAs disappeared when LITERATURE CITED geography was considered. Previous studies of nuclear ALLEN,B., 1983Human geography of Papua New Guinea. J. DNA markers originally claimed significant allele fre- Hum. Evol. 12: 3-23. quency distributions between NAN and AN groups ANDERSON,S., A. T. BANKIER,B. G. BARRELL,M. H. L. DE BRUIJN, (GILES,OGAN and STEINBERG1965; BOOTH 1974; A. R. COULSON,J. DROUIN, I.C. EPERON,D. P. NIERLICH,B. A. ROE,F. SANGER,P. H. SCHREIER,A. J. H. SMITH,R. STADEN SCHANFIELD,GILES and GERSHOWITZ1975); more andI. G.YOUNG, 1981Sequence and organization of the thorough studies involving more loci and taking into human mitochondrial genome. Nature 290 457-465. account geography have failed to find any consistent AVISE,J. C., 1986 Mitochondrial DNA and the evolutionary ge- differences between NAN and AN groups (BOYCEet netics of higher animals. Philos. Trans. R. SOC.Lond. B 312: al. 1978; KIRK 1980; SERJEANTSON,KIRK and BOOTH 325-342. AVISE,J. C.,J. E. NEIGELand J. ARNOLD,1984 Demographic 1983). The mtDNA results are thus in excellent agree- influences on mitochondrial DNA lineage survivorship in ani- ment with the nuclear DNA results. mal populations. J. Mol. Evol. 20: 99-105. Geography has a profound influence on the distri- BELLWOOD,P., 1987 The Polynesians, Prehistory ojan Island People. bution of mtDNA types in PNG. The probability of Thames and Hudson, London. BHATIA,K., M. GOROGOand G. KOKI, 1984 HLA-A,B,C and DR allelic identity based on mtDNA types was two orders antigens in Asaro speakers of Papua New Guinea. Hum. Im- of magnitude larger within PNG populations than it munol. 9: 189-200. was between PNG populations. Furthermore, the av- BHATIA,K., N. M. BLAKE,S. W. SERJEANTSONand R. L. KIRK, erage geographicdistance separating PNGindividuals 198 1 Frequency of private electrophoretic variants and indi- with identical mtDNA types was significantly less than rect estimates of the mutation ratein Papua New Guinea. Am. J. Hum. Genet. 33: 112-122. the average distance between any two individuals in BHATIA,K., M. PRASAD,G. BARNISHand G. KOKI, 1988 Antigen this study. As shown in Figure 8, 44% of the pairwise and haplotype frequencies at three human leucocyte antigen comparisons of geographic distance were less than loci (HLA-A, -B and -C) in the Pawaia of Papua New Guinea. 100 km for individuals with identical mtDNA types, Am. J. Phys. Anthropol. 75: 329-340. versus just 17% for the entiresample of individuals. BHATIA,K., C. JENKINS, M. PRASAD,G. KOKI and J. LOMBANGE, 1989 Immunogenetic studies of two recently contacted pop- Previously, the only observation that could be made ulations from Papua New Guinea. Hum. Biol. 61: 45-64. concerning theeffect of geography onthe distribution BIRKY,C. W., T. MARUYAMA andP. FUERST,1983 An approach of mtDNA types was that individuals from different to population andevolutionary genetic theory for genes in continentsnever shared identical mtDNA types mitochondria andchloroplasts, and someresults. Genetics 103: (CANN,STONEKING and WILSON1987; STONEKING 5 13-527. BOOTH, P. B., 1974Genetic distancesbetween certain New and WILSON1989). Even this relatively simple obser- Guinea populations studied under the International Biological vation required high resolution restriction maps that Programme. Philos. Trans. R. SOC.Lond. B 268: 257-267. allowedclosely relatedmtDNA types to be distin- BOYCE,A. J., G. A. HARRISON,C. M. PLATT,R. W. HORNABROOK, guished; the previously employed low resolution tech- S. W. SERJEANTSON,R. L. KIRKand P. B. BOOTH, niques had found mtDNAtypes to be shared between 1978 Migration and genetic diversity in an island population: Karkar, Papua New Guinea. Proc. R. SOC.Lond. B 202: 269- continents (e.g., JOHNSON et al. 1983). The limited 295. geographic sampling of theabove studies also re- BROWN,W. M., 1980 Polymorphism in mitochondrial DNA of stricted our understanding of the influence of geog- humans as revealed by restriction endonuclease analysis. Proc. raphyon mtDNA variation within continents. Our Natl. Acad. Sci. USA 77: 3605-3609. BROWN,W. M., M. GEORGEand A. C. WILSON, 1979 Rapid extensive geographic sampling of 119 individuals evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. from PNG has enhanced our appreciation of the in- USA 76: 1967-1971. fluence of geography. As future studies obtain even BROWN,W. M., E. M. PRAGER,A. WANGand A. C. WILSON, morethorough geographic sampling, using higher 1982 Mitochondrial DNA sequences of primates: tempo and resolution techniques to examine mtDNA variation mode of evolution. J.Mol. Evol. 18: 225-239. CANN,R. L., W. M. BROWNand A. C. WILSON,1984 Polymorphic (e.g., VIGILANTet al. 1989), we expect to attaina sites and the mechanism of evolution in human mitochondrial deeperunderstanding of the relationship between DNA. Genetics 106: 479-499. geography and patterns of mtDNA variability. CANN, R. L., M. STONEKINGand A. C. WILSON, 1987 Geographic Variation in Human mtDNA 73 1

Mitochondrial DNA and . Nature 325: 31- ulations: evidence for the existence of a new subtype Tf(C6). 36. Hum. Hered. 33: 237-243. CAVALLI-SFORZA,L. L., A. PIAZZA, P.MENOZZI and J. MOUNTAIN, KAMBOH,M. I., and R. L. KIRK, 1984 Genetic studies of PGMl 1988 Reconstruction of human evolution: bringing together subtypes: population data from the Asian-Pacific area. Ann. genetic, archaeological, and linguistic data. Proc. Natl. Acad. Hum. Biol. 11: 211-219. Sci. USA 85: 6002-6006. KAMBOH,M. I., and C. KIRWOOD,1984 Genetic polymorphism of CHEN,L. Z., S. EASTEAL, P. G.BOARD, K. M. SUMMERS,K. BHATIA the thyroxin-binding globulin (TBG) in the Pacific area. Am. and R. L. KIRK, 1990 Albumin-Vitamin Dbinding protein J. Hum. Genet. 36: 646-654. haplotypes in Asian-Pacific populations. Hum. Genet.(in press). KAMROH,M. I., P. R. RANFoRDand R. L. KIRK, 1984 Population CRANE,G., K. BHATIA,M. HONEYMAN,T. DORAN, N.MESSEL, G. genetics of the vitamin D binding protein (GC) subtypes in the HAKOS,D. TARLINTON,D. B. AMOSand H. BASHIR,1985 Asian-Pacific area: description of new alleles at the GC locus. HLA studies of Highland and Coastal New Guineans. Hum. Hum. Genet. 67: 378-384. Immunol. 12: 247-260. KEATS,B., 1977 Genetic structure of the indigenous populations CROW,J. F., and K. AOKI, 1984 Group selection for a polygenic in Australia and New Guinea. J. Hum. Evol. 6: 3 19-339. behavioral trait: estimating the degree of population subdivi- KIRK, R. L., 1979 Geneticdifferentiation in Australia andthe sion. Proc. Natl. Acad. Sci. USA 81: 6073-6077. Western Pacific and its bearingon the originof the first ENCELS,W. R., 1981 Estimating geneticdivergence and genetic Americans, pp. 21 1-237 in The First Americans: Origins, Afin- variability with restrictionendonucleases. Proc. Natl. Acad. ities,and Adaptations, edited by W. S. LAUGHLIN.Gustav Sci. USA 78: 6329-6333. Fischer, New York. FELSENSTEIN,J., 1987 PHYLIP 3.1 Manual. University of Califor- KIRK,R. L., 1980 Language, genes and people in the Pacific, pp. nia Herbarium, Berkeley. 1 13- 137 in Population Structure and Genetic Disorders, edited FOLEY,W. A,, 1986 The Papuan Languages of NewGuinea. Cam- by A. N.ERIKSSON, H. FORSIUS,H. R. NEVANLINNA,P. L. bridge University Press, Cambridge. WORKMAN andP. K. NORIO. Academic Press, New York. FROEHLICH,J. W.,and E. GILES, 1981 A multivariate approach KIRK,R. L., 1982a Linguistic, ecological, and genetic differentia- to fingerprint variation in Papua New Guinea: perspectives on tion in New Guinea and the Western Pacific, pp. 229-253 in the evolutionary stability of dermatoglyphic markers. Am. J. Current Developments in Anthropological Genetics, Val. 2: Ecology Phys. Anthropol. 54: 93-106. andPopulation Structure, edited by J. H. MIELKEand M. H. GILES,E., E. OGAN andA. G. STEINBERG,1965 Gamma-globulin CRAWFORD.Plenum Press, New York. fictorc (Gm and Inv) in New Guinea: anthropological signifi- KIRK,R. L., 1982b Microevolution and migration in the Pacific, cance. Science 150: 1158-1 160. pp. 2 15-225 in Human Genetics, Part A: The Unfolding Genome, GILES,R. E., H.BLANC, H. M. CANNand D. C.WALLACE, edited byB. BONN~-TAMIR,T. COHEN and R. M. GOODMAN. 1980 Maternalinheritance of humanmitochondrial DNA. Alan R. Liss, New York. Proc. Natl. Acad. Sci. USA 77: 6715-6719. MANTEL,N. A., 1967 The detection of disease clustering and a GROURE,I.., J. CHAPPELL, MUKEJ. and D. PRICE,1986 A 40,000 generalized regression approach. Cancer Res. 27: 209-220. year-old human occupation site at HuonPeninsula, Papua New MOURANT,A. E., D. TILLS,A. C. KOPEC, A. WARLOW,P. TEESDALE, Guinea. Nature 324: 453-455. P. B. BOOTHand R. W. HORNABROOK,1982 Red cell antigen, HARPENDING, H., andT. JENKINS, 1973 Genetic distance among serum protein and red cell enzyme polymorphisms in Eastern southern African populations,pp. 177-199 in Methodsand Highlanders of New Guinea. Hum. Hered. 32: 374-384. Theories of Anthropological Genetics, edited by M. H. CRAWFORD NEI, M., 1973 Analysis of gene diversity in subdivided popula- and P. L. WORKMAN.University of New Mexico Press, Albu- tions. Proc. Natl. Acad. Sci. USA 70: 3321-3323. querque. NEI, M., 1985 Human evolution at the ~nolecular level, pp. 41- HERTZRERC,M., K.N. P. MICKLESON,S. W. SERJEANTSON,J. F. 64 in and Molecular Evolution, edited by T. PRIORand R.J. TRENT,1989 An Asian-specific 9-bp deletion OHTA andK. AOKI.Springer-Verlag, Berlin. of mitochondrial DNA is frequently found in Polynesians. Am. NEI, M.,1987 MolecularEvolutionary Genetics. ColumbiaUniver- J. Hum. Genet. 44: 504-510. sity Press, New York. HILL,A. V. S., 1986 The population genetics of alpha-thalassemia NEI, M., and D. GRAUR,1984 Extent of protein polymorphism and the malaria hypothesis. Cold Spring Harbor Symp. Quant. and the neutral mutation theory. Evol. Biol. 17: 73-1 18. Biol. 51: 489-498. NEI, M., and A. K. ROYCHOUDHURY,1982 Geneticrelationship HOPE,G. S., J. GOLSONand J. ALLEN,1983 Paleoecology and and evolution of human races. Evol. Biol. 14: 1-59. prehistory in New Guinea. J. Hum. Evol. 12: 37-60. NEI, M., and F. TAJIMA,1983 Maximum likelihood estimation of HORAI,S., and E. MATSUNAGA,1986 MitochondrialDNA poly- the number of nucleotide substitutions from restriction sites morphism in Japanese. 11. Analysis with restriction enzymes of data. Genetics 105: 207-21 7. four or five base pair recognition. Hum. Genet. 72: 105-1 17. OHTSUKA, R.,T. KAWABE,T. INAOKA,T. AKIMICHI andT. SUZUKI, JOHNSON, M. J., D. C. WALLACE, S. D. FERRIS,M. C. RATTAZZIand 1985 Inter-and intra-population migrationof theGidra in I.. L. CAVALLI-SFORZA,1983 Radiationof human mitochon- lowland Papua: a population-ecological analysis. Hum. Biol. 57: dria DNA types analyzed by restriction endonuclease cleavage 33-45. patterns. J. Mol. Evol. 19 25.5271. SCHANFIELD,M. S., E. GILESand H. GERSHOWITZ,1975 Genetic JORDE, L. B., 1980 The geneticstructure ofsubdivided human studies in the Markham Valley, northeastern Papa New populations, pp. 133-208 in CurrentDevelopments in Anthro- Guinea: gamma globulin (Gm and Inv), group specific compo- pologicalGenetics, Val, 1: Theory andMethods, edited by J. H. nent (Gc) and ceruloplasmin (Cp) typing. Am.J. Phys. Anthro- MIELKEand M. H. CRAWFORD. PlenumPress, New York. pol. 42: 1-8. JORDE, L. B., 1985 Human genetic distance studies: present status SERJEANTSON,S. W., 1985 Migration and admixture in the Pacific: and future prospects. Annu. Rev. Anthropol. 14: 343-373. insights provided by human leucocyte antigens, pp. 133-145 KAMROH,M. I., and R. L. KIRK,1983a Alpha-I-antitrypsin(PI) in Out of Asia-Peopling the Americas and the Pacijic, edited by types in Asian, Pacific, and aboriginal Australian populations. R. L. KIRK and E. SZATHMARY.Journal of Pacific History, Dis. Markers 1: 33-42. Australian National University, Canberra. KAMROH,M. I., and R. L. KIRK, 1983b Distribution oftransferrin SERJEANTSON,S. W., R. 1.. KIRKand P. B. BOOTH,1983 Linguistic (Tf) subtypes in Asian, Pacific and Australian aboriginal pop- and genetic differentiation in New Guinea. J. Hum. Evol. 12: 732 M. Stoneking et al.

77-92. GYLLENSTEN,K. M. HELM-BYCHOWSKI,R. G. HIGUCHI,S. R. SERJEANTSON,S. W., D. P. RYAN andA. R. THOMPSON, 1982 The PALUMRI,E. M. PRAGER,R. D. SAGEand M. STONEKING, colonization of the Pacific: the story according to human leu- 1985 Mitochondrial DNA and two perspectives on evolution- kocyte antigens. Am. J. Hum. Genet. 34: 904-918. ary genetics. Biol. J. Linn. SOC.26: 375-400. SIMMONS,R. T., J.J. GRAYWN,D. C. GADJUSEK,M. P. ALPERSand WILSON,A. C., M. STONEKING, R.L. CANN,E. M. PRAGER,S. D. R. W. HORNABROOK,1972 Genetic studies in relation to FERRIS, L. A. WRISCHNIKand R. G. HIGUCHI,1987 Kuru. 11. Blood-group genetic patterns in Kuru patients and Mitochondrial clans and the age of our common mother, pp. populations of the Eastern Highlands of New Guinea. Am. J. 158- 164 in Human Genetics, Proceedings of the Seventh Interna- Hum. Genet. 24: 539-571. tional Congress, Berlin 1986, edited by F. VOGELand K. SPER- SINGH,G.,N. NECKELMANN and D. C. WALLACE,1987 LING. Springer-Verlag, Berlin. Conformational mutations in human mitochondrial DNA. Na- WILSON,A. C., E. A. ZIMMER,E. M. PRAGER andT. D. KOCHER, ture 329: 270-272. 1989 Restriction mapping in the molecular systematics of SMOUSE,P. E., J. C.LONG andR. R. SOKAL, 1986 Multiple mammals: a retrospective salute, pp.407-41 9 in The Hierarchy regression and correlation extensions of the Mantel test of of Life, edited by B. FERNHOLM,K. BREMER andH. JORNVALL. matrix correspondence. Syst. Zool. 35: 627-632. Elsevier Science B. V. (Biomedical Division), Amsterdam. STONEKING,M., 1986Human mitochondrial DNAevolution in WOOD,J. W., P. E. SMOUSEand J. C.LONG, 1985 Sex-specific Papua New Guinea.Ph.D. thesis, Universityof California, dispersal patterns in two human populations of highland New Berkeley. Guinea. Am. Nat. 125: 747-768. STONEKING,M., K. BHATIA and WILSON,C.A. 1986a WOOD,J. W., P. L. JOHNSON,R. L. KIRK, K. MCLOUGHLIN,N. M. Mitochondrial DNA variation in Eastern Highlanders of Papua BLAKE andF. A. MATHESON, 1982 The genetic demography New Guinea, pp. 87-100 in Genetic Variation and Its Mainte- of the Gainj of Papua New Guinea. I. Local differentiation of nance, edited by D. F. ROBERTSand G. F. DE STEFANO. Cam- blood group, red cell enzyme, and serum protein allele fre- bridge University Press, Cambridge. quencies. Am. J. Phys. Anthropol. 57: 15-25. STONEKING,M., K. BHATIAand A. C. WILSON,1986b Rate of WRISCHNIK,L. A,, R. G. HIGUCHI,M. STONEKING,H. A. ERLICH, sequence divergence estimated from restriction maps of mito- N. ARNHEIMand A. C. WILSON,1987 Length mutations in chondrial DNAs from Papua New Guinea. Cold Spring Harbor human mitochondrial DNA: direct sequencingof enzymatically Symp. Quant. Biol. 51: 433-439. amplified DNA. Nucleic Acids Res. 15: 529-542. STONEKING,M., and R. L. CANN, 1989 African origin of human WURM,S. A., 1982 Papuan Languages of Oceania. Gunter Narr, mitochondrial DNA, pp. 17-30 in The Human Revolution, Be- Tubingen. havioural and BiologacalPerspectives on the Origins of Modern WURM,S. A., 1983 Linguistic prehistory in the New Guinea area. Humans, edited by P. MELLARSand C. STRINGER. Edinburgh J. Hum. Evol. 12: 25-35. University Press, Edinburgh. WURM,S. A., D. C. LAYCOCK,C. L. VooRHoEvEand T. E. DUTTON, STONEKING,M., and A. C. WILSON, 1989 Mitochondrial DNA, 1975 Papuan linguistic history and past language migrations pp. 215-245 in The Colonization of the Pacijic, A Genetic Trail, in the New Guinea area,pp. 935-960 in New Guinea Area edited by A. V. S. HILLand S. W. SERJEANTSON.Oxford Languages and Language Study, Vol. I: Papuan Languages and University Press, Oxford. the New Guinea Linguistic Scene, edited by S. A. WURM.Pacific SWOFFORD,D. L.,1985 Phylogenetic Analysis Using Parsimony Linguistics, Australian National University, Canberra. (PAUP), Version 2.4. Illinois Natural History Survey,Cham- Communicating editor: A. G. CLARK paign. TAKAHATA,N., and M. NEI, 1985 Gene genealogy and variance APPENDIX of interpopulational nucleotidedifferences. Genetics 110 325- 344. A list of the state of each polymorphic restriction site in each TAKAHATA,N., and S. R. PALUMBI, 1985 Extranuclear differen- mtDNA type is given in Figure 9. mtDNA types are numbered tiation and gene flow in the finite island model. Genetics 109: according to the tree in Figure 3. A “1” indicates that the site 441-457. is present and a ”0” indicates that the site is absent, except in VIGILANT,L., M. STONEKINGandA. C. WILSON, 1988 the case of Region V (the small noncoding region between the Conformational mutationin human mtDNA detectedby direct COII and lysine tRNA genes), where “1” indicates one copy of sequencingof enzymatically amplified DNA. Nucleic Acids the 9-bp repeat and“2” indicates two copies of the repeat. The Res. 16: 5945-5955. nomenclature for designatingsites follows that of CANN, VIGILANT,L., R.PENNINGTON, H. HARPENDING, T. D. KOCHER STONEKINGand WILSON(1987): sites are numbered according and A. C. WILSON, 1989Mitochondrial DNAsequences in to the published sequence (ANDERSONet al. 1981), with itali- single hairs from a southern African population. Proc. Natl. cized numbers indicating the gain of a site absent from the Acad. Sci. USA 86: 9350-9354. published sequence and nonitalicized numbers indicating the WHITTAM,T. S., A. G. CLARK,M. STONEKING,R. L. CANN andA. loss of a site present in the published sequence.The 12 restric- C.WILSON, 1986 Allelic variation in humanmitochondrial tion enzymes are designatedby the following single-letter code: genes based on patterns of restriction site polymorphism. Proc. AM, a; AvaII, b; DdeI, c; FnuDII, d; HaeIII, e; HhaI, f; Hinfl, Natl. Acad. Sci. USA 83: 9611-9615. g; HincII, h; HpaII, i; MboI, j; RsaI, k; TaqI, I. Italicized sites WHITE,J. P., andJ. F. O’CONNELL,1982 A Prehistory of Australia, separatedby commas indicate alternative placements of in- New Guinea, and Sahul. Academic Press, New York. ferred site gains. Sites separatedby a slash indicate simultaneous WIESENFELD,S. L., and D. C. GADJUSEK,1976 Genetic structure gain of a site for one enzyme and loss of a site for a different and hetero7ygosity in the Kuru region, Eastern Highlands of enzyme, due to a single inferred nucleotide substitution; these New Guinea. Am. J. Phys. Anthropol. 45: 177-190. sites are counted once in the analyses as the gain of the new WILSON,A. C.,R. L. CANN,S. M. CARR,M. GEORGE,U. B. restriction site. Geographic Variation in Human mtDNA 733

11111111111111111111111111 12222223333333333444444666688999999999900000000001111111122222355 t VDB 145678eQll1456LBPP1W657898941l3456789P123456789PU7456901279002 8e 00000000000000000000000000000000000000000000000000000000010000000 64,16494 i 00000000000000001000000000000000000000000000000000000011000000000 163 k 00000000000000000000000000010000000000000000000000000000000000000 207 h 00000000000000000000000000000001111111011111111111111000000000000 267 1 00000000000000000011000000000000000000000000000000000000000000000 625, 626 e 00000000000000000000000000000000000001000000000000000000000000000 748 b 00000000000000000000000100000000000000000000000000000000000000000 1043 c 00000000000000000000000000100000000000000000000000000000000000000 1403,1448 a 11111111111111111111111111100000000000000000000000000000000000000 1715 c 11111111111111111001111111111000011111111111111111111111111111111 1893 a 11111111111111111111111111111111111111111111111111111111101111111 1939 f 00000000001000000000000000000000000000000000000000000000000000000 2390 j 00000000000000000000000110000000000000000000000000000000000000000 3337 k 11111111111111111111111111111111111111111111111101111111111111111 3391 e 00000000000001000000000000000000000000000000000000000000000000000 3624,3833,9253 e 00000000000000000000011000000000000000000000000000000000000000000 3698 f 11111111111111111111111111111111111111110001011111111111111111111 3714,3744 e 00000000100000000000000000000000000000000000000000000000000000000 3944 1 10111111111111111111111111111111111111111111111111111111111111111 4464 k 11111111111111111111111111111111111011111111111111111111111111111 4810 g 11111111111111111111111111111101111111111111111111111111111111111 5125 1 00000100000000000000000000000000000000000000000000000000000000000 5176 a 11111111111111111111110111111111111111111111111111111111111111111 5351 f 00000000000000000000001000010000000000000000000000000000000000000

5538 f 00000000000000000000000000000000000000001000000000000000000000000~~~~~~ 5971 f 11111111111111111111111111101111111111111111111111111111111111111 5985 k/5983 g 00000000000000000000000000000000000000000000011010000000000000000 6260 e 11011111111111111111111111111111111111111111111111111111111111110 6610 g 010000000000001000000000000000000000000000000000000000000000000c0 6904 j 11111111111111111111111111111111111111111111111111111111111110011 6915 k 00000000000000000000000100000000000000000000000000000000000000000 7025 a 11011111111111111111111111111111111111111111111111111111111111111 7103 c 11111111111111111111111111111111011111111111111111111111111111111 7325 e 00000001000000000000000000000000000000000000000000000000000000000 7598 f 11111111111111111111111100011111111111111111111111111111111111111 8150 i 11111111101111111111111111111111111111111111111111111111111111111 8249,8250 b/8250 e 00000000000000000000000000000000000000000010000000000000000000000

R466"" a- 00000000000000000000000010000000000000000000000000000000000000000...... ~~~~~~~~~~ ~~ ~ ~ ~ ~ ~ ~ ~~~.~..~~~~~ ~ ~ ~ ~ ~ ~ ~~ ~~ ~ ~~ ~ ~~~ ~ ~ 8515 c 11111111111111111011111111111111111111111111111111111111111111111 8568,8569 c/8572 e 00000000000000000000000000000000000000000000000000000111000000000 8572 e 11111101111111111111111111111111111111111111111111111111111111111 8729 j 11111111111111111111111101111111111111111111111111111111111111111 8838 e 11111111111111111111111111111111111110111111111111111111111111111 8852,8854,8856,8858 f 11111111111111111111111111111111111111111111111111111000111111111 9053 11111111111111111111111111111111111111111111111111111111111111100 91 81 00000000000000000011000000000000000000000000000000000000000000000 9266 11111111111111111111111011111111111111111111111111111111111111111 9342 11111111111111111011111111111111111111111111111111111111111111111 9429 00000000000000000000000000000000000000000000000000011000000000000 9553 11111111111111111111111111111111111111111111111011111111111111101 10252 1 00000000000000000000000000000001000000000000000000000000000000000 10394 c 11111111111111111111111111100010000000000000000000000000000000001 10830 g 11111111111111111111111110011111111111111111111111111111111111111 11390,11633 e 00000000000000000000000000000010000000000000000000000000000000000 11806 a 00000000000000000000000000001000000000000000000000000000000000000 12282 a 11111111111111111111111111111111111111111111111111111111111011111 12345,12350,12528 k 00000000000000000000000000000000011111111111111111111000000000000

13018 e OOOOOOOOOOOOOOOOOOOOOOOOOOOlOOOOOOOOOOOOOOOOC ~~~~~...... 13065 c 01111111111111111111111111111111111111111111111111111111111111 13068 a 00000000000000000000000000000010000000010000000000000000000000000 13096 k 00000000000000001000000000000000000000000000000000000000000000000 13268 g 11011111111111111111111111110111111110111111111111111111111111111 13467 c 00000000000010000000000000000010000000000000000000000000000000000 14648,15765 h 00000000000000000000000000000000001000000000000000000000000000000 14678,14680,14695 d 00000000000000000000000000000010000000000000000000000000000000000 14899 e 00000000000010000000000000000000000000000000000000000000000000000 15047 e 11111111111111111111111111111111011111111111111111111111111111111 15060 j 11111111111111111111111111101111111111111111111111111111111111111 15606 a 00000000000000000000000000011111111111111111111111111111000000000 15882 b/15883 e 00000000000000100000000000000000000000000000000000000000000000000 15897 k 00000000000000000000000000000000000100000000000000000000000000000 16049 k 11111111111111111111111101111111111111111111111111111111111111111 16089 k 00000000000000000000000000000000000000000000000000000000001000000 16096 k 11111111111111111111111101111111111111111111111111111111111111111 16178 1 00001111111111111111000000000000000000000000000000000000000000000 16208 k 11111111111110111111000111111111111111111111111110011111111111111 16217 1 00000000000000000000000000000000000010000000000000000000000000001

16254 a 00000000000000000000000000000000000000100010000000000000000000000~~~~~~~~ ~~~~~~~...... 16303 k 11111111111111111111111111111111111111111100011111011111111111101 16310 k 01000000000000000000001111111111111111111011110001110110111111100 16389 g/16390 b 00000000000000000000000011100000000000000000000000000000000000000 16517 e 10000000000000010000000101111100000000000000000000000111111101111 Region V 22222222222222222222222222222222222222222222222222222222111111122 FIGURE9.-Polymorphic restriction sites,