<<

Proc. Nat. Acad. Sci. USA Vol. 72, No. 7, pp. 2758-2760, July 1975 Interlocus variation of and the neutral theory (protein /computer simulation) AND YOSHIo TATENO Center for Demographic and , University of Texas at Houston, Texas 77025 Communicated by , April 25, 1975

ABSTRACT Theoretical distributions of genetic distance MODELS AND PARAMETERS between reproductively isolated taxa are detived by means of computer simulation, taking into account mutation and Our basic model of genetic differentiation of populations is random . The distributions obtained are in good that of Nei (4) and Latter (9). A foundation stock is split into agreement with the observed distributions of interracialand two populations X and Y and thereafter no migration occurs interspecific genetic distances for enzyme loci in Drosophila. of the two This indicates that the substitution at enzyme loci can between the two populations. The differentiation be explained by the neutral mutation theory. populations occurs due to mutation and random genetic drift. All are assumed to be unique and thus dif- It has recently been noted that the genetic distance between ferent from the preexisting ones in the populations. The ef- two taxa varies greatly among loci (1-3). Particularly note- fective population size (N) is assumed to be the same for worthy is the extensive study by Ayala and his associates (1, both X and Y as well as for the foundation stock, and in each 2), who showed, by using enzyme loci in five of the population the balance between mutation and genetic drift Drosophila willistoni group, that the distribution of normal- is maintained throughout the evolutionary process. Since we ized identity of or genetic identity (4) between races is are interested in the test of neutral mutation hypothesis, it is inverse L-shaped, whereas the genetic identity between essential to assume no selection. species shows a U-shaped distribution. Namely, the at Our recent study (10) has indicated that in the above a in two different species were virtually identical with model the variance of genetic distance is determined almost each other or entirely different at a majority of loci. They exclusively by M = 4Nv and the time (t) after divergence regarded this as the indication that at some loci, gene substi- between two populations, where v is the mutation rate per tution proceeds quickly by the aid of directional selection, locus per generation. In the present study, we used M = 0.1, whereas at other loci, the similarity of genes is maintained since this is the representative value for many different or- by balancing selection. Ayala and Gilpin (5) computed the ganisms (11). In the D. willistoni group, M appears to be theoretical distribution of Rogers' (6) genetic distance under about 0.2 since the average heterozygosity has been estimat- the assumption of no selection and no mutation. The distri- ed to be 0.177. However, this difference does not appear to bution obtained was drastically different from the observed affect our conclusion appreciably, as will be discussed later. ones in Drosophila. They then took this as evidence against The mutation rate for electrophoretically detectable alleles the neutral mutation hypothesis of and has been estimated to be 10-7 per locus per year (11, 12). polymorphism (7, 8). Therefore, the mutation rate per locus per generation for Ayala and Gilpin's computation is, however, based on two Drosophila species may be about 10-8. This suggests that the unrealistic assumptions. First, they assumed that the initial effective population size for a Drosophila species is about 5 gene frequencies at the time of population splitting are the X 106. This number is certainly much smaller than the cur- same for all loci. This assumption is clearly incorrect, since rent size for many Drosophila species, but it should be noted in any natural population gene frequencies vary greatly that if a population goes through bottlenecks occasionally in with locus, some loci being completely monomorphic and the evolutionary process, the effective size is drastically re- some other loci being highly polymorphic. Second, they as- duced (13). In our computer simulation we used N = 50 and sumed no mutation. While this assumption is acceptable for v = 0.0005 to save computer time. However, since 4Nv = a short-term change of population, the effect of mutation 0.1, the results obtained should be applicable to many orga- cannot be neglected in long-term evolution such as specia- nisms. tion. The fact that most Drosophila species show the same The initial gene frequencies at the time of population degree of average heterozygosity for protein loci suggests splitting were derived from an equilibrium population that natural populations are at or near equilibrium with re- under the effects of mutation and genetic drift and varied spect to the effects of mutation, genetic drift, and selection, from locus to locus. The number of alleles at a locus and if any. gene frequencies in the two descendant populations were re- The purpose of this paper is to compute the distribution of corded in specified generations. The actual procedure of genetic distance and genetic identity under the hypothesis of computer simulation was as follows: Since the population neutral mutation, removing the above two unrealistic as- size was 50, there were 100 genes at a locus in each of popu- sumptions, and to see whether it agrees with the actual ob- lations X and Y. Each gene was subjected to mutational servations or not. Since it is difficult to derive the distribu- event with an equal probability. The number of mutations tion of genetic distance analytically, we have used a Monte occurring in a population followed a Poisson distribution Carlo simulation. with mean 2Nv = 0.05 in each generation. After mutation 2758 Downloaded by guest on September 29, 2021 Genetics: Nei and Tateno Proc. Nat. Acad. Sci. USA 72 (1975) 2759

Table 1. Theoretical and observed values of average heterozygosity (H), standard deviation of heterozygosity 0.8 - (OH), average number of alleles per locus (k), and genetic distance (D) in computer simulation 0.6 - T=N Theo- Generation C-) T = 0.5 N retical w DR = 0.0535 DR = 0.0905 j 0.4 - Parameter value N 2N 4N 6N 1ON a w H 0.091 0.078 0.083 0.083 0.096 0.083 UH 0.159 0.151 0.154 0.156 0.166 0.149 U-0.2 - k 1.44 1.45 1.44 1.42 1.46 1.50 Theoretical 0.050 0.100 0.200 0.300 0.500 D ,, Observed 0.046 0.111 0.209 0.343 0.518 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 DR DR All observed values are based on gene frequency data for 200 loci. FIG. 1. Distributions of Rogers' genetic distance in generations The values of H, UH, and k refer to population Y. 0.5N and N in computer simulation. These distributions are based on gene frequency data for 200 loci. occurred, 2N genes were sampled at random by using pseu- dorandom numbers from the parental gene pool to produce shaped, whereas Ayala and Gilpin's are bell-shaped. Thus, if the next generation. This process was repeated for 500 (10 we remove the two unrealistic assumptions mentioned ear- N) generations for each of populations X and Y. One set of lier, we get entirely different distributions. In our model of these computations represented the evolutionary change of population differentiation, the distribution of DR is initially one locus. To study the distribution of genetic distance L-shaped and, as time goes on, it becomes gradually U- among loci, we studied 200 loci, i.e., 200 replications of the shaped, thus agreeing with the observed distributions for en- above set. The initial gene frequencies at a locus (replica- zyme loci. In the present case, however, comparison of the tion) were those at the last (500th) generation in population theoretical and observed distributions can be studied more X at the previous locus (replication). The results for the first appropriately by using the genetic identity Ii. set (replication) of 500 generations were discarded to elimi- The high frequency of class DR < 0.05 in Fig. 1 is due to nate the effect of arbitrariness of our initial gene frequen- the fact that the initial population contains a large propor- cies. We note that in the present case there is virtually no tion of monomorphic loci (about 60% in the present case) correlation in gene frequencies between generations 0 and and the genetic differentiation of these loci occurs only after 500 (10). Following Ayala and Gilpin, we computed the following Rogers' distance for each locus. 0.8 DR = (X - yj)2] T=N T= 2N 0.6 > = 0.9547 ± 0.0095 = ± where xi and yi are the gene frequency of the ith in 0 Ij Ij 0.8953 0.0163 populations X and Y, respectively. We also computed the z genetic identity (4) defined by LUD0.4 w I = U- 2Xjy_1(Xi2y.2)112 0.2

RESULTS AND DISCUSSION 0 (1) Reliability of Simulation. To see the reliability of our 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 simulation, we checked a number of population parameters, whose theoretical expectations are already known. The ex- pected mean and standard deviation of heterozygosity in an equilibrium population are given by M/(M + 1) (14) and 0.6 - [2M/I(M + 1)2(M + 2)(M + 3)}]1/2 (15, 10), respectively, whereas the expected number of alleles per locus is ZZ04 T = 4N T= 1ON I 0.4 - x)M-lx-1dx (16-18). Furthermore, Nei's (4) DLU Ij = 0.8114 ± 0.0245 Ij = 0.5957 ± 0.0331 12NM(1 a h genetic distance, D, has the expectation 2vt. These theoreti- w cal values are given in Table 1 together with the observed values from simulation. It is clear that the agreement be- tween the expected and observed values is quite satisfactory in all parameters in view of the large standard errors expect- 1I ed for these parameters (10). 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.01 (2) Distribution of Genetic Distance. The distributions of Ij Ij DR for generations 0.5N (t = 25) and N (t = 50) are present- FIG. 2. Distributions of Ij in generations N, 2N, 4N, and iON ed in Fig. 1. The difference between these distributions and in computer simulation. These distributions are based on gene fre- Ayala and Gilpin's is striking. Our distributions are L- quency data for 200 loci. Downloaded by guest on September 29, 2021 2760 Genetics: Nei and Tateno Proc. Nat. Acad. Sci. USA 72 (1975) new mutations are introduced. Nei and Li (19) have shown whereas it is 0.307 for M = 0.1 and 0.290 for M = 0.2 when that these loci stay monomorphic for the same allele for a t = 10N. This suggests that the shapes of the distributions of surprisingly long time in related species. Ij for M = 0.1 and M = 0.2 are virtually the same. The distribution of Is in computer simulation is given in Our mathematical model depends on the assumption that Fig. 2 for generations N, 2N, 4N, and iON. This distribution new mutations are always different from the preexisting al- is inverse L-shaped in the early generations but gradually leles in the population. This assumption may not hold for becomes U-shaped. These distributions can be compared electrophoretic data, since there is some chance of back with the observed distributions of Ij in Drosophila. Compari- mutation with respect to the net charge of a protein. How- son of Fig. 2 with Fig. 2 in the Ayala et al. (2) paper indi- ever, this is expected to increase the frequency of class Ij = 1 cates that the distribution of Ij between local races of Droso- rather than the frequencies of intermediate values of Ij. phila is close to that for generations N and 2N in our simula- Therefore, our conclusion will remain unaffected. tion, though the mean of If in Drosophila is slightly larger than that in our simulation. On the other hand, the distribu- This work was supported by U.S. Public Health Research Grant tion of Ij for generation ION is very similar to those between GM 20293. sibling species of Drosophila (Fig. 4 in ref. 2). In this case, 1. Ayala, F. J. & Tracey, M. L. (1974) Proc. Nat. Acad. Sci. USA the mean of Ij for Drosophila (0.51 to 0.67) is also of the 71,999-1003. same order of magnitude as that for our simulation (0.60). 2. Ayala, F. J., Tracey, M. L., Barr, L. G., McDonald, J. F. & This suggests that the divergence time for the sibling species Perez-Salas, S. (1974) Genetics 77,343-384. of the Drosophila willistoni group is about iON generations. 3. Nei, M. & Roychoudhury, A. K. (1974) Am. J. Hum. Genet. This can be converted to chronological time by using genetic 26,421-443. distance D = 2vt. If the mutation rate is 10-7 per locus per 4. Nei, M. (1972) Am. Nat. 106,283-292. year, as mentioned earlier, the divergence time of these sib- 5. Ayala, F. J. & Gilpin, M. E. (1974) Proc. Nat. Acad. Sci. USA ling species is estimated to be about 3 million years (11). 71,4847-4849. 6. Rogers, J. S. (1972) Studies in Genetics VII, Univ. Texas Publ. This estimate is not unreasonable in view of the species life 7213, 145-153. in evolution in other . At any rate, our simulation 7. Kimura, M. (1968) Nature 217,624-626. study indicates that the distribution of genetic distance and 8. King, J. L. & Jukes, T. H. (1969) Science 164,788-798. genetic identity among enzyme loci is quite consistent with 9. Latter, B. D. H. (1972) Genetics 70, 475-490. the neutral mutation theory. 10. Li, W.-H. & Nei, M. (1975) Genet. Res., in press. As mentioned earlier, the value of M in the D. willistoni 11. Nei, M. (1975) Molecular Population Genetics and Evolution group appears to be about 0.2 rather than 0.1. This is expect- (North-Holland, Amsterdam). ed to increase the frequency of intermediate values of genet- 12. Kimura, M. & Ohta, T. (1971) Nature 229,467-469. ic identity (0 < Ij < 1) but only slightly. This can be seen by 13. Nei, M., Maruyama, T. & Chakraborty, R. (1975) Evolution of = 1. When t >> 2N 29, 1-10. examining the expected frequency Ij 14. Kimura, M. & Crow, J. F. (1964) Genetics 49,725-738. (1 + M), this frequency is approximately given by 15. Stewart, F. M. (1975) Theor. Popul. Biol., in press. 16. Kimura, M. (1968) Genet. Res. 11, 247-269. PiM = (1 + M)q Me2t, 17. Wright, S. (1948) in Encyclopedia Britannica, (Encyclopedia Britannica, Inc., Chicago), Vol. 10, pp. 111-112. where q = 0.01 in the present case (18). The value of PIM is 18. Ewens, W. J. (1964) Genetics 50,891-898. 0.414 for M = 0.1 and 0.391 for M = 0.2 when t = 4N, 19. Nei, M. & Li, W.-H. (1975) Genet. Res., in press. Downloaded by guest on September 29, 2021