<<

Copyright 0 1991 by the Genetics Society of America

A Study on a Nearly Neutral Model in Finite Populations

Hidenori Tachida National Institute of Genetics, Mishima, Shizuoka-ken 41 1,Japan Manuscript received August 16, 1990 Accepted for publication January 19, 199 1

ABSTRACT As a nearly neutral mutation model, the house-of-cards model is studied in finite populations using computer simulations. The distribution of the mutant effect is assumed to be normal. The behavior is mainly determined by the product of the population size, N, and the standard deviation, u, of the distribution of the mutant effect. If 4Nu is large compared to one, a few advantageous mutants are quickly fixed in early generations. Then most mutation becomes deleterious and very slow increase of the average follows. It takesvery long for the population to reach the equilibrium state. Substitutions of alleles occur very infrequently in the later stage. If 4Na is the order of one or less, the behavior is qualitatively similar to that of the strict neutral case. Gradual increase of the average selection coefficient occurs and in generations of several times the inverse of the mutation rate the population almost reaches the equilibrium state. Both advantageous and neutral (including slightly deleterious) are fixed. Except in the early stage, an increase of the standard deviation of the distribution of the mutant effect decreases the average heterozygosity. The substitution rate is reduced as 4Nu is increased. Three tests of neutrality, one using the relationship between the average and the variance of heterozygosity, another using the relationship between the average heterozygosity and the average number of substitutions and Watterson’s homozygosity test are applied to the consequences of the present model. It is found that deviation from the neutral expectation becomes apparent only when 4Na is more than two. Also a simple approximation for the model is developed which works well when the mutation rate is very small.

he mechanismof protein has been one severe competition used in the experiment. Another T of the most debated issues in the study of evo- observation is that of AQUADRO,LADO and NOON lution. KIMURA (1968) emphasized the effectof ran- (1988). They estimated DNA sequence variation ofa dom and the neutral theorywhich states region including rosy locus in both Drosophilamela- that the main cause of evolutionary change at the nogaster and Drosophilasimulans. In contrast to the molecular level is random fixation of selectively neu- resultfor protein polymorphism which shows that tral or very nearly neutral mutants rather than posi- both species have almost the same amountof protein tive Darwinian selection is gaining support from most variation(CHOUDHARY and SINCH, 1987),they ob- molecular genetic data rapidly accumulating these ten served that D. simulans has several times more DNA years (see KIMURA 1983, 1987). Although some dis- variation than D. melanogaster has. They hypothesize crepancies between the prediction of the strict neutral that D. simulans has a larger population size and that theory with the assumption of a constant mutation this causes the increase of complete neutral variation rateand observations may exist (GILLESPIE 1987; (DNA variation) in this species while slightly deleteri- TAKAHATA1987), we can say at this point that the ous variation (protein variation) does not increase due effect of random genetic drift is very important in tomore effectiveoperation of selection in larger molecularevolution and should be considered in populations. Finally, GOJOBORI(1 982) observed that models of protein evolution. someof the substrate-specificenzymes involved in On the other hand there are observations which mainpathways or singlepathways have lower vari- indicatethe importance of very weakselection in ancesof heterozygosity than those expected from proteinevolution. DEAN, DYKHIZEN and HARTL complete neutrality. Although statistical significances (1988) have shown that new mutations which caused were not examined and another study which consid- replacements of amino acids in @-galactosidasein Esch- ered statistical significances did not find any discrep- erichia coli have small effects. Effects of many ancy fromthat expected from complete neutrality of them are not detectableby their method, but some (FUERST,CHAKRABORTY andNEI 1977), Gojoborisus- caused detectable increase or decrease of fitness. Al- pects that slightly deleterious mutations (OHTA1973) though the detected selection coefficients are in the have occurred in these enzymes. Thus,is worthwhile it order of one percent, they speculate that the selection to investigate models which incorporate both random coefficients in nature are much smaller because of the genetic drift andweak selection.

Genetics 128: 183-192 (May, 1991) 184 H. Tachida

OHTA and TACHIDA(1990) proposed a model of MODEL protein evolution in which effects of random genetic Consider a random matingpopulation of N diploid drift and very weak selection are incorporated.In this individuals. The standard Wright-Fisher model in nearly neutral mutation model,the distribution ofthe is assumed (see, forexample, effect of mutant allele on selection coefficient is fixed. More precisely, the fitness effect of any mutation is a CROWand KIMURA1970). A mutation occurs with a randomnumber sampled from a fixed distribution rate u per generation per . If mutation occurs, regardless of the original state of the allele. This the selection coefficient of the mutatedgene is a model is differentfrom previous models of nearly random numbers drawn from afixed distributionf(s) neutral mutation in which the distribution of mutant regardless of the original state of thegene. This effect is shifted so thatthe difference between the mutation model is the house-of-cards model (KING- original and the mutant alleles has a constant distri- MAN 1977). The fitness of a genotype AiAj is 1 + si + bution (see OHTA 1977;KIMURA 1979). A motivation sJ where si is the selection coefficient of the allele i. In for the model of OHTA and TACHIDA(1 990) is our the presentstudy, we assume that f(s) is normally biological intuition that there must be a limit in the distributed with mean zero and variance a*. Because improvement of a protein and that after major im- we are interested in mutations with very small effects, provements there would be some fine tuning of the the magnitude of u is assumed to be 0(1/N) through- function. In this model, the proportion of advanta- out the paper. The assumption of mean zero is not geous mutations decreases as the population accumu- restrictive since changing the mean by m corresponds lates advantageous mutations and in consequence has to changing the initial selection coefficients by -m in higher average fitness. Such behavior may explain the the zero-mean case. pattern of amino acids substitution in the globin gene Since it is difficult to obtain analytical results for family where adaptations leading to responses to new this model,I used mainly computer simulation to chemical stimuli have evolved by only a few amino investigate theproperties of the model. However, acid substitutions in key positions (PERUTZ 1983).The before going into simulation studies, I try to obtain fixed mutation model is the same as the "house-of- some results using approximations to guide the simu- cards" model of KINCMAN(1978). The model is also lation studies. First 4Nu << 1 is assumed. Then, the adopted in the studies of evolution of quantitative population is mostly monomorphic experiencing in- characters and selection limits (COCKERHAMand frequent transitions among monomorphic states. Let TACHIDA1987; ZENG, TACHIDAand COCKERHAM p(s, t)be the density function of the population being 1989). fixed with an allele whose selection coefficient is s at The behavior of the house-of-cards model in finite time t. From (A2) in APPENDIX, the equilibrium dis- populations with finite allelic states was studied in the tribution p(s) = p(s, ") is equilibrium state by ZENG, TACHIDAand COCKERHAM (1989). However, we are also interested in the tran- sient state since adaptation of protein occurs in this stage. Moreover, their concern was directed toward Puttingf(s) = (&a)" exp(-s2/2a2) into (l), we ob- selection limits in quantitative characters. Thus, quan- tain tities of interest in molecular evolution such as substi- tutionrates and heterozygosities were notstudied p(s) = (&a)"exp[-(s - (2) there. Some aspect of the transient state was studied At equilibrium the effect, s, of the allele fixed in the in OHTAand TACHIDA(1990) in conjunction with population is normally distributed with mean (4Na)a molecular evolution. However, their model incorpo- and variance u2. The distribution does not dependon rates population structures assuming spatial fluctua- tion of selection coefficient and thus systematic study the mutation rate. In other words, the distribution of was not performed due to the computational limita- the allelic effect is shifted upward by 4Na times the tion. In the present study, I investigate the house-of- standard deviation fromthat of theneutral case. cards model in finite panmictic populations. I first Thus, selection is very effective in bringing the pop- examine how long does it take for the population to ulation fitness to a high value if 4Na is, say, greater reach the equilibrium state in this model. Then var- than four. Next we consider the substitution rate still ious properties of evolutionary interest such as substi- assuming 4Nu << 1. In this case, the substitution rate tutionrate and heterozygosity are investigated. Fi- is approximately computed by multiplying the total nally, three tests of neutrality are applied to this model number of mutants in one generation by the fixation to see whether these tests can distinguish between probabilities of them (KIMURA 1983). Since the equi- complete neutrality and theaction of very weak selec- librium distribution is represented by (l), the substi- tion. tution rate per generation in the equilibrium state is Nearly Neutral Mutation Model 185 having a completely new allelic state and numbered 1 as such. To compute the substitution rate, each gene has a counter which records the total number of mutations which occurred in the descent of the gene. When a new mutant is created, the counter of the gene is increased by one. Finally, the alleles in which If we note that the expression (3) is an expectation of mutations have occurred are determined. For each a function of the difference of two independent nor- mutant, the allele is determined randomly, the prob- mal random variables, it can be rewritten as ability of one allele being chosen is proportional to its frequency, x, and 1/2N is deductedfrom the fre- quency. If the allele frequency is less than 1/2N, the allele frequency is made zero and a uniform number where E is an expectation operator and Z is a normal is drawn to determine another allele from which fre- random variable with mean -4Na2 and variance 2. quency 1/2N - x is deducted.This procedure is Therefore, if 4Na is large, say larger than four, when continued sequentially for M mutants. Selection is the distribution of Z is mostly in negative region and performed in a deterministic way. Gene frequencies the equilibrium distribution and the mutant distribu- after selection are determined by the standard for- tion virtually do not overlap, there will be very little mula (see Chapter 5 of CROW andKIMURA 1970). substitution in the equilibriumstate. When 4Nu is In order to save the computational time, the tele- much smaller than one, an approximate simulation scoping method of sampling multiple alleles (KIMURA method described in APPENDIX can be used to com- and TAKAHATA1983) is used to perform the random pute the expected number of substitutions in t gen- sampling stage. Instead of drawing 2N uniform ran- erations. dom numbers to produce binomial a random number, If protein evolution occurs in a manner similar to this method uses a uniform random number with the this model, is it reasonable to assume that populations same variance as that of thebinomial random number. have been in the equilibrium state? So the next ques- In my simulation, I followed KIMURAand TAKAHATA'S tion ishow long does it take for the population to method for sampling multiple alleles one by one ex- reach the equilibriumstate and what is the time cept that I used the infinite allele model and that the course. For very small u such that 4Na << 1, the final alleles otherthan the one whose frequency is the rate of approach to the equilibriumvalue of the mean highest before sampling are sampled to maximally fitness is shown to be u per generation for general avoid using the recipe for nk d 20 [see KIMURAand values of u (Z. ZENG and H. TACHIDA,unpublished TAKAHATA(1983) for the method and the meaning results). So the equilibrium is reached in the order of of nh]. l/u generations. This is expected because this case is Each simulation is continued for 1O/u generations. almost the same as the neutral case. Except for this The population is monitored every I/( 1OOu) genera- case, we could not obtain an explicit expression for tions for the first 1/u generations and every l /( l Ou) the rate of approach to the equilibrium. However,as generations in the remaining period. At these gener- shown in the APPENDIX, for very small mutation rate ations the average selection coefficient, the total num- such that 4Nu << 1, therate of approachto the ber of substitutions and the average heterozygosity equilibrium is proportional to the mutation rate u if are computed. The total number of substitutions is we measure time in generations. computed by averaging the counter numbers of all in the population. In addition, the numbers of SIMULATION EXPERIMENTS advantageous substitutions are estimatedapproxi- Runs of simulation were performed to investigate mately as follows. If a gene frequency of an allele the dynamics of the house-of-cards model.In the exceeds 0.90 for the first time, I regard that afixation simulation,gene frequencies are changed by three has occurred at that generation. If a fixation occurs, causes, mutation, selection and random sampling of the selection coefficient, sf, of the fixed allele and that, gametes in this order. So one generation consists of sp, of the previously dominant allele were compared. three stages, that is, those of mutation, selection and If their difference, sf - sp, is larger than 1/2N, the random sampling. fixation is regarded to be advantageous. The criterion For mutation, first the number, M, of mutations in is taken because if the difference is less than 1/2N or the population is determined by using a Poisson ran- so the changes of gene frequencies are mostly deter- dom number with mean 2Nu. Then, M normal ran- mined by random genetic drift.All other substitutions dom numbers with mean zero and variance u2 are are considered to be neutral in the following. This sampled and assigned as selection coefficients of new class includes slightly deleterious substitutions. This mutants. Since I assume the infinite allele model (KI- procedure will cause inaccuracy from two reasons. MURA and CROW 1964), each mutant is identified as First, the allele whose frequency exceeds 0.90 may 186 H. Tachida

0.4 0.02

0.016

0.012

0 008

8 0.1 0 004 f! f! f 4 0 0 0 2 4 e 8 10 4000 6000 0 2000 8000 10000 Generation x u Generation FIGURE 2.-Changes of the average selection coefficient through FIGURE 1 .-Changes of the average selection coefficient through time for various u’s. Generations are scaled by l/u. Values at * on time for various 4Nu’s. Values at :on the horizontal axis are the the horizontal axis are the mean values during the period from mean values during the period from 1,000,000 to 1,500,000th 1OOO/u to lOOO/u + 500,000thgeneration. Four values, u = generation. Five values, 0.2, 1.0, 2.0, 10.0 and 20.0, of 4Nu are 0.0002,0.0005,0.001and 0.002, are used. Other parameters used used. Other parameters used are u = 0.001, 2N = 100.The are 4Nu = 2.0, 2N = 100. The selection coefficient of the initial selection coefficient of the initial allele fixed in the population is allele fixed in the population is zero. zero not be fixed. Second, if the mutation rate is high and 10,000th generation, the population is still far from the equilibrium.In fact, even after 1,OOO/u 500,000 the population is highly polymorphic, next substitu- + = 1,500,000 generations, the population may not tion processes may start before the present fixation have reached the equilibrium since the average equi- process is finished. Thus, virtual substitutions may librium selection coefficients computed from the for- occur without the maximum frequency of alleles ex- mula in the previous section are 0.5 for 4Na = 10.0 ceeding 0.9. The former leads to an overestimation and 2.0 for 4Na = 20.0 (see Table 1). Thus,the andthe latter leads toan underestimation in this approach to the equilibrium is extremely slow when counting of advantageous substitutions. However, un- 4Na is large. This slow approach is due to the extreme less the population is very polymorphic, the error is rareness of high fitness alleles in the present normal not large. distribution model. The population size used in all simulations is 2N = Effects of different mutation rates on the changeof 100. Several values of a and u are chosen. By an the average selection coefficient are shown in Figure appropriate time scaling, the results can be extended 2. Here time is measured in units of l/u generations. to cases with the same values of 4Na and 4Nu. Initial 4Na is 2.0. Difference of mutation rate causes a pro- populations are assumed to be monomorphic.For portional slowdown. Thus, by this scaling, changes of each parameter set 1000 replications are performed the average selection coefficient through time with and the average and the variance over these replica- different mutation rates are very similar. For larger tions are obtained. In orderto see the long-term 4Na, this tendency is more pronounced and graphs consequences, another set of simulations is carried out overlap almost completely. Both approximations for in which generationsfrom lOOO/u to lOOO/u + small mutationrates and for weak selection in the 500,000 are observed. In these simulations, only 25 previous section predicted that speed of the approach replications are made. to the equilibrium is proportional to u. This seems to In order to monitor the approach of the population hold for general parameter values. The average selec- to the equilibrium state, the average selection coeffi- tion coefficient is smaller for larger mutation rates cient of the population is observed through time. In because mutation is working against selection in the Figure 1, the changes of the average selection coeff- final stage. Most mutations are deleterious at this cient with various values of 4Na are shown when the stage. mutation rate is 0.001. The initial average selection Different initial values change the behavior of the coefficient is zero. As noted in the previous section, average selection coefficient only in the early stage of the average selection coefficient becomes very close the evolution. Results of the simulations in which the to the equilibrium values in several times l/u = 1000 initial populations were monomorphic for alleles with generations if 4Na is small (4Na = 0.2, 1.0, 2.0). selection coefficients -a, 0 and a with u = 0.001 and However, for larger 4Na, the behavior becomes dif- 4Na = 2.0 show that after 2/u = 2000 generations, ferent.Though the average selection coefficient the average selection coefficients are almost the same quickly increases in the first l/u generations, a slow- for the threeinitial values (data not shown). Thus, the down of increase occurs after that and at the 1O/u = previous results seem to hold true for general values Nearly Neutral Mutation Model 187

0 2000 4000 6000 8000 10000 Generation FIGURE3.-Numbers of substitutions as functions of time. Ad- vantageous (broken lines) and total substitutions (solid lines) are 2345676910 separately plotted. For the definitionof advantageous substitutions, 4No see the text. Three values, 0.2, 2.0 and 20.0, of4Nu are used. FIGURE4.-Numbers of total and advantageous substitutions in Other parameters are 2N= 100, u = 0.001 and the initial selection 1,500,000 generations as a function of 4Nu. Advantageous (broken coefficient is zero. lines) and total (solid lines) substitutions areplotted separately. of initially fixed alleles except at the initial stage. Other parameters areu = 0.001 and 2N= 100. The initial selection coefficients are all zero. The total number of substitutions are plotted

I 1 I I I against time in Figure 3. Solid lines represent the total 0.2 numbers of substitutions and the dotted lines repre- sentthe numbers of advantageoussubstitutions. Three values of 4Na are used. The mutation rate is u = 0.001. When 4Na = 0.2, all fixations are neutral and the accumulated number increases linearly with 0.12 time. The behavior is very similar to that expected fromthe neutral case. In theneutral case, theex- 0.01 pected number of substitutions in 10,000 generations is 10 and that for 4Na = 0.2 is 9.737 f 0.200. When 4Na = 2.0, a significant proportion of the substitutions 0.04 is due to advantageous mutations. A retardation of advantageous substitution rate can be seen in the later n 0 2000 4000 6000 8000 10000 generations. The neutralsubstitutions occur at an . Generation almost constant rate. When 4Na = 20.0, almost all fixations are advantageous and their occurrences are FIGURE 5,"Changesof the averageheterozygosity through time. Values at t on the horizontal axis are the mean values during the mostly in the early generations. Very few substitutions period from 1,000,000 to 1,500,000th generation.Five values, 0.2, occur after one thousand generations. After one and 1.0, 2.0, 10.0 and 20.0, of 4Nu are used. Other parameters used a half million generations, the numbers of advanta- are u = 0.001, 2N = 100. The selection coefficient of the initial geous and the total substitutionsare 5.32 & 0.49 and allele fixed in the population is zero. 6.44 k 0.76, respectively. The retardation of the are used. The average heterozygosity approaches to a substitution rate in the later stage when 4Na is large constant in less than one thousand generations. With in thepresent model isin sharpcontrast with the small a's (4Na G 2.0), the increase to the constapt is constancy of the substitution rate in the shift model monotoneand the value reached inless than one of OHTA(1 977) and KIMURA(1 979). In this range of thousand generations is very close to the equilibrium a, the substitution rate thusdepends on the initial value. However, with larger values of a (4Na 3 10.0), condition. the average heterozygosity first increases and then Since a drastic change of behavior with regard to decreases quickly. Afterwards the decrease becomes the number of substitutions occurs between the cases gradual. The initial increase is due to the fixation of with 4Na = 2.0 and 10.0, a more detailed study was advantageous mutations. In the course of fixation, the carried out and the result is shown in Figure 4. Total population becomes very polymorphic. However, numbers of substitutions after one and ahalf million after the populationfitness becomes high, the propor- (1500/u) generations areshown. 4Na is changed from tion of mutations which can contribute to heterozy- 2.0 to 10.0 with an incrementof one. From this figure, gosity decreases and the average heterozygosity be- it can be seen thata rapid reduction of the total comes very small. In these cases, the gradual decrease number of substitutions occurs between 4Na = 2.0 to continues for a long period. 5.0 In the previous section and APPENDIX, approximate Changes of the average heterozygosity are shown methods are developed to compute the equilibrium in Figure 5. The mutation rate is 0.001 and five a's distribution of alleles fixed in the population and the 188 H. Tachida

TABLE 1 .-5 0.06 VI Comparison of approximations and simulation results: average 0 selection coefficient a 0.05 E Mean Variance e 0.04 Q) C 4Nu 4No App." Sirn.' APP Sim. Q) E 0.03 r 0.04 1.0 0.0050 0.0046 0.000025 0.000023 0 0.1 0.0050 0.0043 0.0000250.0000220.00430.0050 0.1 Q) 0.02 0.2 0.0050 0.0041 0.0000250.0000220.00410.0050 0.2 0 0.4 0.0050 0.0037 0.0000250.0000190.00370.0050 0.4 c .g 0.01 0.04 0.2 0.00020.00020.000001 0.000001 z 2.0 0.0200 0.0161 0.000100 0.000072 '0 10.0 0.5000 0.1225 0.002500 0.000703 0 0.05 0.1 0.15 0.25 0.2 0.3 20.0 2.0000 0.2600 0.010000 0.002350 Average heterozygosity Means and variances of the average selection coefficient at the FIGURE6.--The variance of heterozygosity plotted against the lO/uth generation are shown. Numbers of replications are 1000 average heterozygosity. The solid line represents the relationship for all simulations. 2N = 100. expected from complete neutrality. For each 4Nu value, four values a App., approximations computed from (2). of 4Nu, 0.04, 0.1, 0.2, 0.4,are used. The population size is 2N = Sin).,simulation results. ' 100. TABLE 2 are tabulated. For these quantities, the approximation Comparison of approximations and simulation results: substitutions works quite well and we find discrepancies only when 4Nu is fairly large (4Nu = 0.4). Thus, theapproximate Total" Adv.' methods which are derived by regarding the fixation 4Nu 4No App.' App. Sim. process as instantaneous are appropriate when 4Nu is small, say less than 0.1. 1.0 7.91 8.05 0.59 0.57 0.59 8.050.04 7.91 1.0 As a test for neutrality of alleles, plots of the vari- 0.1 8.03 8.14 0.53 0.57 0.53 8.14 8.03 0.1 8.01 8.24 0.53 0.52 0.530.2 8.24 8.01 ances of heterozygosity against the average heterozy- 0.4 0.51 0.538.00 8.53 gosity have been used in the literature(FUERST, CHAK- 0.04 0.2 9.85 9.87 0.0 0.0 0.0 9.87 9.85 0.2 0.04 RABORTY and NEI 1977; GOJOBORI1982). so this 2.0 5.07 5.38 1.33 1.44 relationship was also investigated in thepresent 10.0 2.67 2.69 2.33 2.35 model. The variances of heterozygosity among 1,000 20.0 2.85 2.87 2.78 2.71 replicate populations are computed every I /( 1Ou) Mean numbers of the total and the advantageous substitutions generationfrom 2/u to lO/uth generation andthe in 1O/u generations are shown. Numbers of replications are 1000 for all simulations. 2A' = 100. average over this period was calculated. The earlier a Total, mean number of total substitutions. generations are removed because of the peculiar be- ' Adv., mean number of advantageous substitutions. havior of theaverage heterozygosity in this phase ' App., approximate simulations using (A6). Sim., simulation results. when 4Na is large. The result is shown in Figure 6. Four values of 4Nu, 0.04,O. 1,0.2,0.4,and five values expected number of substitutions when 4Nu is much of 4Na, 0.2, 1.0, 2.0, 10.0, 20.0 are used and all 20 smaller than one. Here, we briefly check the validity combinations are tried. The solid line represents val- of these approximations. From (2), the equilibrium ues expected from complete neutrality. For 4Na less mean and variance of the average selection coefficient than 2.0, points obtained from the simulation are on are approximately computed to be 4Na2 and a, re- the line expected from complete neutrality. Thus, it spectively, when 4Nu << 1. The approximate values is very difficult to distinguish the complete neutral computed from these and values computed from the case and those with 4Na 2.0 with this test. For simulation at lO/uth generation are tabulated in greater values of 4Na, points are under the expected Table 1. If 4Na and 4Nu are small, values computed line. Similar observations were made in LI (1 978) and from both methods agree fairly well as expected. The KIMURA and TAKAHATA(1 983) in their slightly dele- reason whywe do not have good agreements when terious mutation models. 4Na is large even though 4Nu is small is thatthe Another type of test for neutrality uses the relation- population is not in the equilibrium state at lO/uth ship between the variation within and between popu- generation in these cases while the approximatevalues lations (WARDand SKIBINSKI1985; KREITMANand are for the equilibrium.The expected numbers of the AGUADE 1986; HUDSON, KREITMAN and AGUADE total and the advantageous substitutionsare computed 1987). In these tests, the observed variation between by the simulation method using (A5) and (A6) of populations is compared to that expected from the APPENDIX. They are compared with those from the neutral case with the same amount of variation within simulation used in the present work (Table 2). The population. One measure of the variation between expected numbersof substitutions in 1O/u generations populations is the total number of substitutions which Nearly Neutral Mutation Model 189

25 I I I I I TABLE 3 4 No I Results of WATTERSON'Shomozygosity test I 1'1-ohability" 4h'n e!..:,% H,, 297.5% No. t(.stck x 10.0 - 0 20.0 0.041 0.2 0.959 0.000 562 1.0 0.016 0.984 0.000 567 - 2.0 0.019 0.981 0.000 483 10.0 0.000 1.000 0.000 25 1 20.0 0.006 0.994 0.000 164 I I I I 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Tests are performed at the 10,000th generation. Other param- Average heterozygosity eters are 2N = 100 and 4Nu = 0.2. a 52.5%, 297.5% and H,, are the probabilities of the hotnozy- FIGURE7.-The relationship between the average heterozygos- gosity to be less than or equal to 2.5% point, greater than or equal ity and the total number of substitutions. The average heterozygos- to 97.5% point and between these two points, respectively. ity is measured at 10,000th generation and the total number of Numbers of tests performed in the 1000 replications. substitutions are counted in the period from 0 to 10,000th gener- ation. The solid line representsthe relationship expected from replications are also shown. We can see that this test complete neutrality. For each 4No value, four values of 4Nu, 0.04, does not discriminate the present model from com- 0.1, 0.2, 0.4, are used. The population size is 2N = 100. plete neutralityfor anyvalue of 4Na used in our have occurred since the divergence of the populations. study. This may be expected because WAI-I'ERSON'S Here I investigate the relationship between the total test is not powerful when thenumber of' alleles is number, K, of substitutions in T generations and the small. For large 4Na, the number of alleles becomes heterozygosity, H. Under the completely neutral case, very small except for the early stage. the relationship is expressed as In summary, we can identify three types of'behavior which emerge as 4Na is changed. If 4Na is very small, E[K] = WHI say less than 0.2, the behavior is very similar to that 4N( 1 - E[H]) of theneutral case (the almost neutral case). The average selection coefficient does not go up and all where E denotes the expectation operator. The ob- substitutions are neutral. If 4Na becomes ilrtcrmedi- served relationship between E[K]and E[H]is shown ate (0.2 < 4Na < 3 to 5), the behavior is nearly neutral. in Figure 7 with the same combinations of u and a as Here a gradual increase of the average selection coef- in Figure 6. 2N = 100 and T = 10,000 are used. The ficient occurs and both neutral (including slightly solid line shows the relationship under complete neu- deleterious) and advantageousmutations are fixed. trality represented by (5).Again, when 4Na is equal The total number of substitutions and the average to orsmaller than two, points are on the line expected heterozygosity are reduced from those in the complete fromcomplete neutrality. However, when 4Na is neutral case as 4Na increases. In both the almost and greater than ten, points are above the line. the nearly neutral cases, the population reaches the Finally, WATTERSON'S(1978) homozygosity test was equilibriumstate in several times 1/u generations. applied to the simulation results. This test is based on Also the substitution rate is almost constant through the observation that the distribution of sample allele time. The variance of heterozygosity has the same frequencies conditioned on the number of alleles in relationship with the average heterozygosity as in the the sample is free of the nuisance parameter 4Nu complete neutral case. The relationship between the under neutrality (EWENS1972). Thus, conditioned on variation within and between populations is the same the number of alleles in the sample, the distribution as that of complete neutrality (Equation 5). If 4Na is of the sample homozygosity defined as the sum of bigger than three to five, the behavior is conlpletely squares of the sample gene frequencies does not de- different. The average selection coefficimt quickly pend on 4Nu. WATTERSON(1978) tabulates conserv- goes up in early generations (up to I/u generations). ative percent points for the neutral sample homozy- Then the increase becomes gradual. Even after 1000/ gosity and I used 2.5% and 97.5% points of them. u generations, the population has not yet reachcd the One hundred genes are sampled randomly from sim- equilibrium state. Several advantageous substitutions ulated populations at the 10,000th generation with occur in early generations and very few sulxtitutions replacement. The sample homozygosity is computed occur afterward. Neutral substitutions do not occur and the numbers of cases where the homozygosity is in early generations. When advantageous substitutions less than or equal to 2.5% point, more than or equal occur, the average heterozygosity increases \.el.); rap- to 97.5% point and in between were counted. Some idly. Later it goes down quickly and becomes very of the results are tabulated in Table 3. The test can small compared to that of the almost and the near be carried out only when there are more than one neutral cases. The variance of heterozygosity is allele and the numbers of tests performed in 1000 smaller compared to thatexpected fro^^^ corllplete 190 H. Tachida neutrality and more substitutions than those expected adaptation is expected to occur in less than one to ten from the completely neutral case with the same aver- million years. According to our model, this part of age heterozygosity occur. proteins has not reached the final equilibrium state because even 1OOO/u = one billion to ten billion years DISCUSSION is not enough to achieve the equilibrium state. The retardation in the later stage and the dependencywith In the present model, the house-of-cards model of regard to the initial condition of the substitution rate mutation was used instead of the shift model previ- when 4Nu is large discriminate the present model ously studied (OHTA1977; KIMURA1979). In theshift from the shift model which has constancy and inde- model, the fitness difference of the mutant gene and pendence of the substitution rate. the original gene has a constant distribution. One of For the second part with intermediate u (nearly the major differences of the consequences of these neutral case), substitutions occur at a fairly constant two models is that in the shift models there is no limit rate for fixed u and 4Nu and most of them would be in the increase or the decrease of fitness while in the nowin the equilibrium state. Thus, in the present house-of-cards model there is a limit. Thus, if some model, if enoughgenerations have passed and the proportion of mutations is advantageous, very rapid proportion of nearly and almost neutral sites is large substitutions occur in large populations in the shift so that contributions from quick adaptation is negli- model (KIMURA 1979). This is contrary to observa- gible, substitution occurs at a constant rate in a pop- tions. However, in the house-of-cards model, only ulation with a fixed size. The behavior of mutants in several advantageous fixations bring the population this partdepends on the population size, andthe fitness to a high value for larger 4Nu and in this state substitution rate decreases by bringing the population most mutations become deleterious as shown in this size larger. Thus the evolutionary pattern is similar to study. Thus, unrealistically rapid substitutions do not that underthe slightly deleteriousmutation model occur in the house-of-cards model. Furthermore, in (OHTA1973). It is possible that the value of u becomes the shift model continuous deterioration of the pop- smaller for larger population size because of the av- ulation fitness results if there is no advantageous mu- eraging effects of fluctuating selection intensity tation. In the house-of-cards model, there is a stochas- (OHTA1972; OHTAand TACHIDA1990). Then the tic equilibrium to which the population approaches value of 4Nu would not be strictly proportional to N and the population fitness goes up and down through but would be mildly dependent on N. Thus,the time according to this distribution in the equilibrium increase of the population size by say ten times would (for an example see Equation 2). Thus, indefinite not bring this class of sites to the large class in a deterioration of the population fitness does not occur u structured population with fluctuating selection inten- in this model. The existence of advantageous muta- tions was demonstrated by DEAN,DYKHUIZEN and sity. The mutations in the third part arealmost neutral, HARTL(1988) although only one such mutant was and the substitution pattern is very similar to that of foundthus far. So consideration of models which incorporate advantageous mutationsand which do not the completely neutral case. However, in our model, contradict observations seems important. the change of population size causes changes of sub- Many proteins are now known to be encoded by stitution rate and even leads to a shift of genes belonging to some gene family. They are con- sites from one class to another. Thus,different behav- sidered to have been created by gene duplication and ior from that of the complete neutral case will show subsequent adaptations (OHTA1988). Our study may up even for this class of sites when the population size be relevant to the stage after these duplications.Only is changed. a small number of sites are considered to be apparently In conclusion, three types of behavior are identified contributingto adaptations of proteins (PERUTZ in the present model depending on the parameter, 1983). At many other sites, replacements of amino 4Nu. Though thebehavior is very different when 4Nu acids do not cause drastic changes of the structure is large from that of the completely neutral case, the and activity of the protein (BOWIEet al. 1990). Thus qualitative behaviors of the nearly neutral and almost we may divide the amino acid sequence of a protein neutral case are very similar to that of the completely into three parts, one with a large u, one with inter- neutral case and can not be distinguished from the mediate u and the other with small u. latter using three types of neutrality tests studied here In the first part, adaptation quickly occurs in less which use relative relationships among observed quan- than l/u generations and then very slow fine tuning tities. However there are differences of consequences occurs. At present not many data have accumulated between the presentmodel and thecompletely neutral to estimate the mutation rate. However in both human case. For example, the absolute values of the average and Drosophila, estimated values of mutation rate are heterozygosity and the substitution rate are reduced in theorder of to 10-7 per year (MUKAI and in the nearly neutral case compared to the values of COCKERHAM1977; NEELet al. 1986). Thus, this quick those under complete neutrality with the same muta- Nearly Neutral Mutation Model 191 KREITMAN,M., and M. AGUADE,1986 Excess polymorphism at tion rate. Also the change of populationsizes leads to the Adh locus in Drosophila melanogaster. Genetics 114: 93- a shift of amino acid sites from one class to another. 110. Thus, the evolutionary consequences of themvery are LI, W. -H., 1978 Maintenance of genetic variability underthe different. Furtheranalyses of molecular data arenec- joint effect of mutation, selection, and random genetic drift. Genetics 90: 349-382. essary in order to clarify the extent of nearly neutral MUKAI,T., and C. C. COCKERHAM,1977 Spontaneous mutation and almost neutral amino acid sites in these smaller u ratesat enzyme loci in Drosophilamelanogaster. Proc.Natl. sites. Acad. Sci. USA 74: 2514-2517. NEEL,J. V., C. SATOH,K. GORIKI,M. FUJITA, N. TAKAHASHI,J. ASAKAWAand R. HAMAZAWA,1986 The rate with which I thank T. OHTAfor giving me a motivation of the problem and spontaneousmutation alters the electrophoretic mobility of M. KIMURA, T.OHTA, Z. B. ZENG and two anonymous reviewers polypeptides. Proc. Natl. Acad. Sci. USA 83: 389-393. for very helpfulsuggestions. The simple simulation methodde- OHTA, T., 1972 Population size andrate of evolution. J. Mol. scribed in APPENDIX was suggested by one of the reviewers. This Evol. 1: 305-314. work is supported by a grant-in-aid from the Ministry of Education, OHTA, T., 1973 Slightly deleterious mutant substitutions in evo- Science and Culture of Japan. Contribution No. 1858 from the lution. Nature 246: 96-98. National Institute of Genetics, Mishima, 41 1 Japan. OHTA,T., 1977 Extension to the neutral mutation random drift hypothesis, pp. 148-167 in Molecular Evolution and Polymor- phism, edited by M. KIMURA.National Institute of Genetics, LITERATURE CITED Mishima. OHTA, T., 1988 Multigene andsupergene families. Oxf.Surv. AQUADRO,C. F., K.M. LADOand W. A. NOON,1988 The rosy Evol. Biol. 5: 41-65. region of Drosophila melanogaster and Drosophila simulans. 1. OHTA,T., and H. TACHIDA, 1990 Theoretical study of near Contrasting levels of naturally occurring DNA restriction map neutrality. I. Heterozygosity and rate of mutant substitution. variation and divergence. Genetics119: 875-888. Genetics 126: 219-229. BOWIE,J. U.,J. F. REIDHAAR-OLSON,W. A. LIM and R. T. SAUER, PERUTZ,M. F.. 1983 Species adaptation in a protein molecule. 1990Deciphering the message in protein sequences: toler- Mol. Biol. Evol. 1: 1-28. ance to aminoacid substitutions. Science 247: 1306-1 3 10. TAKAHATA,N., 1987 On the overdispersed molecular dock. Ge- CHOUDHARY,M., and R. S. SINGH, 1987 A comprehensive study netics 116: 169-179. of genic variation in natural populations of Drosophila melano- WARD,R. D., and D. 0. F. SKIBINSKI,1985 Observed relationships gaster. 111. Variations in geneticstructure and their causes between protein heterozygosity and protein genetic distance between Drosophila melanogaster and its sibling speciesDrosoph- and comparisons with neutral expectations. Genet. Res. 45: ila simulans. Genetics 117: 697-7 10. 3 15-340. COCKERHAM,C. C., and H. TACHIDA,1987 Evolution and main- WATTERSON,G. A., 1978The homozygosity test of neutrality. tenance of quantitative genetic variation by mutations. Proc. Genetics 88: 405-4 17. Natl. Acad. Sci. USA 84 6205-6209. ZENG, Z. -B., H. TACHIDAand C. C. COCKERHAM, 1989Effects of CROW, J.F., and M. KIMURA, 1970 An Introduction to Population mutation on selection limits in finite populations with multiple Genetic Theory. Harper and Row, New York. alleles. Genetics 122: 977-984. DEAN,A. M., D.E. DYKHUIZENand D. HARTL,1988 Fitness effects of amino acid replacements in the @-galactosidase of Communicating editor: E. THOMPSON Escherichia coli. Mol. Biol. Evol. 5: 469-485. EWENS,W. J., 1972 The sampling theory of selectively neutral APPENDIX alleles. Theor. Popul. Biol. 3: 87-1 12. FUERST,P. A., R.CHAKRABORTY and M. NEI,1977 Statistical studies onprotein polymorphism in naturalpopulations. I. We first computethe equilibrium distribution of Distribution of single locus heterozygosity. Genetics 86: 455- alleles when there are k alleles and the mutation rate 483. from the ith alleleto thejth allele is u, using the weak GILLESPIE,J. H., 1987 Molecular evolution and the neutral allele theory. Oxf. Surv. Evol. Biol. 4 10-37. mutationapproximation of ZENG, TACHIDAand GOJOBORI,T., 1982 Means and variances of heterozygosity and COCKERHAM(1989). The selection coefficient of the protein function, pp. 137-148 in Molecular Evolution, Protein ith allele is assumed to be si. If four times the product Polymorphismand the NeutralTheory, edited byM. KIMURA. Springer-Verlag, Berlin. of the mutation rate and the population size, 4Nu, is HUDSON,R. R., M. KREITMANand M. AGUADE,1987 Atest of small compared to one, the populationis mostly mon- neutral molecular evolution based on data.Genetics omorphic. Let p, be the probability of the population 116 153- 159. KIMURA,K., 1962 On the probability of fixation of mutant genes being monomorphicwith the allele i. To compute the in a population. Genetics 47: 713-719. equilibrium distribution, consider acase where all uj’s KIMURA,K., 1968 E,volutionary rate at the molecular level. Na- can be expressed by rationals n,/d, where n’s and d’s ture 217: 624-626. KIMURA,M., 1979 Amodel of effectively neutralmutations in are integers. Thiscase corresponds to the Etl n,,,d,n, which selective constraint is incorporated. Proc.Natl. Acad. allelecase with equalmutation rates among them. Sci. USA 76: 3440-3444. There are alleles with selection coefficient st. KIMURA,M., 1983 TheNeutral Theory of MolecularEvolution. n,,&zj Cambridge University Press, London. Therefore, from the equation just above Equation 7 KIMURA,M., 1987 Molecular evolutionary clock and the neutral of ZENG, TACHIDAand COCKERHAM(198Y), theory. J. Mol. Evol. 26: 24-33. KIMURA,M., and J. F. CROW, 1964 The number of alleles that u,exp(4Ns,) can be nraintained in a finite population. Genetics 49: 725- pr= k 738. KIMURA,M., and N. TAKAHATA,1983 Selective constraint in c u,exp(4l\is,) I protein polynrorphisnr: study of the effectively neutral muta- ,= tionmodel by using animproved pseudosampling method. Since real numbers can be approximated by rationals Proc. Natl. Acad. Sci. USA 80: 1048-1052. KINGMAN,J. F. C., 1978 A simple model for the balance between at any degree of accuracy, the above equation holds selection and mutation. J. Appl. Probab. 15: 1-12. for any set of ut’s. Now consider the case where the 192 H. Tachida

effect of mutation on selection coefficient is distrib- a differential equation, uted with f(s). This distribution can be approximated by the finite allele case with unequal mutation rates. dP(s7 tf Divide the s axis with intervals of length ds. Then, the dt equilibrium probability of the population being mon- omorphic with the alleles whose selection coefficients are between s and s + ds is proportional to f(s)ds exp(4Ns) from (Al). Therefore, the equilibrium den- 2(r - s) sity P(s) is computed to be - lI1 - exp[-4N(r - s)] We could not solve this equation analytically. How- ever two observations can be made from this expres- sion. First, the equilibrium solution (A2) satisfies the steady state form of (A4). Secondly, the equation does Next we consider the transient state. We continue not have u as a parameter if we use l/u generations to assume that 4Nu << 1. If we take l/u generations as a unit time. Thus, the rateof the process is propor- as a unit time, the time required for one allele to be tional to u if we measure time in generation so long fixed in the population is very short under this con- as 4Nu is much smaller than one. dition since even for a neutral mutant, the average When 4Nu << 1, the number of substitutions in t time for fixation is 4Nu which is assumed to be small generations can be computed using an approximate simulation method (suggested by one of the review- compared to one. We approximate this process by ers).In this case, thepopulstion is mostly mono- regarding each fixation of a new allele as instanta- morphic and we can define the selection coefficient, neous. Then the process becomes a jump process in Sk, of the monomorphic allele when the Kth mutation which transition occurs among monomorphic states. occurs. Define h(x, y) = 2(x - y)/(l - exp[-4N(x - Let p(s, t)be the probability density of the population y)]). Then by regarding the fixation of an allele as being monomorphic for an allele with selection coef- instantaneous, the transition of Sh is described by ficient s at time t. Since the fixation probability of a gene with selection coefficient r appearing in a popu- Sk+l = ZkX(h(Zn, Sa)) -k Sk(1 - X(h(Zk, Sk))) (A5) lation otherwise monomorphic for anallele with selec- where Zk (the selection coefficient of the kth mutation) tion coefficient s is approximately (KIMURA 1962) and x( P)are independent randomvariables, Zh having a normal distribution with mean 0 and and x(P) 2(r - s) a* being an indicator random variable such that 1 - exp[-W(r - s)]' I with probability P when 1 r - s I << 1 and 2Nu.(r)drnew mutations whose X(P) = {0 otherwise. selection coefficients are between r and r + dr occur every generation, With this representation,the sequence IS,) can be easily simulated. The expected number, E[K(t)], of P(s, t + 4 substitutions in t generations can be written as = (1 - 'LNu)P(s, t) E[K(t)] = E[h(Zl, SI) + ~(ZB,Sz) + . . . + h(Z~,t),~'AW))], (A61 where N(t) has a Poisson distribution with mean 2Nut. The expected number of advantageous substitutions can be calculated in a similar way with h replaced by h,(x, y) = h(x, y)Z(x, y) where I is an indicator function such that 1 x - y > 1/2N Dividing both sides by u and letting u * 0, we obtain