<<

J. Genet., Vol. 75, Number 1, April I996, pp. 33-48. (() Indian Academy of Sciences

Effects of the shape of distribution of mutant effect in nearly models

HIDENORI TACHIDA Department of Biology, Faculty of Science, Kyushu University 33, Fukuoka 812, Japan

Abstract. Models of the theory of nearly neutral mutation incorporate a continuous distribu- tion of mutation effects in contrast to the theory of purely neutral mutation which allows no mutations with intermediate effects. Previous studies of one such model, namely the house-of- cards mutation model, assumed normal distribution of mutation effect. Here I study the house-of-cards mutation model in random-mating finite populations using the weak-mutation approximation, paying attention to the effects of the distribution of mutant effects. The average selection coefficient, substitution rate and average heterozygosity in the equilibrium and transient states were studied mainly by computer simulation. The main findings are: (i) Very rapid decrease of the substitution rate and very slow approach to equilibrium as selection becomes stronger are characteristics of assuming normal distribution of mutant effect. If the right tail of the mutation distribution decays more rapidly than that of the normal distribution, the decrease of substitution rate becomes slower and equilibrium is achieved more quickly. (ii) The dispersion index becomes smaller or larger than 1 depending on the time and the intensity of selection. (iii) Let N be the population size. When selection is strong the ratio of 4N times the substitution rate to the average heterozygosity, which is expected to be 1 under neutrality, is larger than 1 in earlier generations but becomes less than 1 in later generations. These findings show the importance of the distribution of mutant effect and time in determina- tion of the behaviour of various statistics frequently used in the study of molecular .

Keywords. Protein evolution; substitution rate; average heterozygosity; population ; dispersion index.

1. Introduction

Understanding the mechanism of is one of the most important problems in biology. Kimura (1968) proposed the neutral theory, which states that the main cause of evolutionary change at the molecular level is random fixation of selectively neutral or very nearly neutral mutations rather than Darwinian selection. In its pure form (Kimura 1987 and later), the neutral theory assumes that there are two discrete classes of mutations depending on their selection coefficient, s. One class is neutral mutations whose selection coefficients satisfy N~ rsl << 1, where N~ is the effective population size. The other class is deleterious mutations whose selection coefficients satisfy -Nes >> 1, and these do not contribute to molecular evolution. The theory allows no mutations with intermediate effects. The proposal aroused a lot of contro- versy (see Lewontin 1974 and Kimura 1983 for review) but now the theory is widely accepted as an explanation for most of DNA changes in noncoding regions and synonymous sites. However, the mechanism for amino acid substitutions (replacement substitutions) is still controversial (Gillespie 1991). Ohta (1972) proposed the nearly neutral mutation theory to explain replacement substitutions. In this theory mutations with intermediate effects, i.e. Nes ~- O(1), are permitted and the distribution of mutation effect is considered somewhat continuous 33 34 Hidenori Tachida

(see Ohta 1992a for review). The first model that explicitly incorporated a continuous distribution of mutation effects was that by Ohta (1977). In this model an exponential distribution was employed as a distribution of mutation effects. More precisely, the difference between the fitnesses of the mutated and original alMes was assumed to be distributed exponentially. Such models with fixed distributions for differences between the mutated and the original alleles are in a class called the shift model (Ohta and Tachida 1990). In the shift model the existence of advantageous mutations was corlsidered unlikely because, if they exist, the substitution rate becomes extremely large in large populations, which is not observed in molecular data (Kimura 1979). Ohta and Tachida (1990) introduced another type of nearly neutral mutation model, called the fixed model, in which the fitness of the mutant allele, not the difference between the fitnesses of the mutated and the original alleles as in the shift model, has a fixed distribution. This mutation model is the same as the house-of-cards model of Kingman (1978), and we call the model by this name from now on. The most distinguishing feature of this model from the shift model is that the distribution of the fitness difference between the mutated and original alMes changes as the population evolves. Thus, when the average fitness of the population is low, most mutations are advantageous; when the average fitness of the population is high most mutations become deleterious. Even though advantageous mutations are allowed in this model, the substitution rate does not become large in large populations because such populations have high average fitnesses and most mutations are deleterious. This is in accord with the traditional view of mutation, namely that mutations are mostly deleterious after evolution. Ohta and Tachida (1990) and Ohta (1992b) considered the behaviour of this mutation model in geographically structured populations while Tachida (1991) considered a simpler case of random-mating populations. Tachida's (1991) conclusions are, in summary: (i) The behaviour is mainly determined by the product c~ = 4Neo', where o" is the standard deviation of the mutant distribution. (ii) If 0~ is less than 0.2, the behaviour is almost the same as that of the neutral model. (iii) If c~ is between 0.2 and 3-5, both selection and random contribute to the gene evolution. Although the long-term substitution rate is reduced as c~ increases, it is difficult to distinguish its behaviour from that of the neutral model using conventional neutrality tests such as that using the average and the variance of heterozygosity. (iv) If ~ is larger than 5, selection dominates the process. The average fitness quickly goes up and after that very few substitutions occur. However, it takes a long time to reach equilibrium. In all these studies, the distribution of mutant fitness is assumed to be normal. However, there is no biological reason for assuming normality of mutant fitness distribution. Thus it is necessary to investigate the effects of shape of distribution on the evolution. In the present paper I investigate the house-of-cards model in finite panmictic populations employing various mutant distributions. Specifically, I investigate the effects of the distribution shape on average fitness, substitution rate, heterozygosity and dispersion index. To save computer time, i use a jump-process approximation that can be used when the mutation rate is small. I first describe the model and explain this approximation, and then summarize results obtained by this approximation. House-of-cards mutation model 35

Appropriateness of the approximation was checked by comparison with the result obtained by the simulation study of Tachida (1991) on the Wright-Fisher model.

2. Analysis

We assume the diploid Wright-Fisher model with fixed population size N (see Crow and Kimura 1970). Consider a locus with an infinite number of alleles (the infinite-allele model of Kimura and Crow 1964). Mutation always results in a new allele, labelled by a random variable S. We call S the selection coefficient of the allele. The fitness of an individual that has two alleles labelled by S x and S 2 is 1 + S 1 + S 2. For different alleles, S is independently distributed with the same density, fo(s). We designate the variance of it by o-z. This mutation model is the house-of-cards model of Kingman (1978). In a finite population the frequency changes become a multidimensional Markov process and it is difficult to deal directly with this process (but see Ethier and Kurtz 1994). So we use an approximation that is valid for small mutation rates.

2.1 Weak-mutation approximation

Let u be the mutation rate per generation. IfNu is much smaller than 1 the population is expected to be mostly monomorphic, and occasionally fixations of new alleles occur very quickly. This can be approximated by a one-dimensional Markovian jump process (Zeng et al. 1989; Tachida 1991). We briefly explain this jump process. In this process the population is characterized by the selection coeff• of the allele currently fixed. Let Po (s; t) be the density of the population currently fixed by an allele with the selection coefficient s at time t. A transition from a state s to another state with a selection coefficient between r and r + dr in a very short time At occurs with a probability 2(r - s) 2NuAt ~e (r) dr, (1) 1 - exp [- 4N(r - s)] a~ because 2Nufo (r)drAt mutations within this range occur and the probability of fixation for such mutations is 2(r-s)/(1-exp[-4N(r--s)~) (Kimura 1962). Thus, if we measure time in units of 1/u generations, the transition equation for Po(S; t) is written as (Tachida 1991)

dPo(S,t) = 2N { I ~ 2(s-r) d----7-- ~ 1 - exp [- 4N(s -r)j f~176 t)dr

2(r - s) t) dr }. -fT~l_exp[_4N(r_s)]fo(r)p(s , _ (2)

If we scale s and r by the standard deviation o" offo (s), the equation can be written as dp(s, t) { f ~ (s-r) d~ - ~ ~o 1 - exp [- ~(s -r)] f(s)p(r' t)dr

- f7 oo 1 - exp[--~(r(r-s) _s)]f(r)p(s,t)dr } ' (3) 36 Hidenori Tachida where a = 4No- and p(s, t) andf(s) are densities of the population and mutation effect respectively for scaled variables. Thus the process is characterized by a and the shape of the standardized mutation density f(s). Previous studies (Ohta and Tachida 1990; Tachida 1991; Gillespie 1994b) used the normal distributionf(s) = exp(- s2/2)/w,/2~ as the distribution of mutant effect. To investigate the effect of the shape off(s), we use the following family, 9,(s), of densities (n >/I): nC(3/n)l/2 I C(3/n)'v2[s--I"-] (4) G(S)=2[C(t/n)-13/2exp F(1/n),,/2 _]' where F is a gamma function. The standard deviation is 1. The main reason for choosing this family is that by changing n we can modulate the tail of the distribution, which is considered to have large effects, especially on the equilibrium behaviour. This family includes the double-exponential (n = 1), normal (n = 2) and uniform (n = co) distributions. Their densities are expressed as

1 gl(s) = ---~exp(- ~lsl) (n = 1) ,/2

92 (s) = ~ exp _ - (n=2) (5)

1

ols)= (n --'= co). 0 (otherwise)

As n becomes large the decay of the tail becomes faster.

2.2 Equilibrium state

The transition equation (3) has a detailed balance equilibrium solution p(s) Tachida 1991): f (s) exp(as) (6) p(s) = j,_~00 f (x) exp (~x) dx"

For n = 1,2, co, we can compute the denominator explicitly, and the equilibrium distributions are

(2 - ~2)exp [- (,/51sl - ~s) 2x/~ (n = 1, c~ < x/2) exp [- (s -- ~)z/2] p(s) = (n-=- 2) (7)

a exp Ec~s] exp(x/~) _ exp(_ j~a) @=0% - x/~ ~< s ~< x/~).

For other n's we need to compute the integral of equation (6) numerically. House-of-cards mutation model 37

2.2a Average selection coefficient

For the double-exponential distribution (n = 1) and ~/> ~-2, there is no equilibrium distribution and the average selection coefficient increases indefinitely. For ~ < ~, the equilibrium distribution exists and the average selection coefficient is 2c~/(2 - c~2). If the mutant distribution is normal (n = 2) the average selection coefficient in the equili- brium is c~ (Tachida 1991). As can be seen from the expression in (7), the equilibrium distri- bution is also a normal distribution with the same variance but its mean shifted c~ upward. If the distribution is uniform (n= m) the average selection coefficient in the equilibrium is

.... + O Fexp(- 2.q/3~)]. (8) - exp(- 2,,//-3 ~) / a a As the right-hand side of the equation shows, the average selection coefficient converges to x/~ with a rate 1/(e) when c~ becomes large. This approach to the limit is in striking contrast to those in the double-exponential and normal cases where the average selection coefficient increases linearly with c~. For medium n, intermediate behaviour is seen (data not shown). This difference in the average selection coefficient has large effects on the equilibrium substitution rate and heterozygosity as shown below.

2.2b Substitution rate

Since a total of 2Nu mutations arise in the population per generation, the substitution rate k per generation is represented as

u = ~ _co 1-exp[-c~(r-s)] f(r)p(s)drds. (9)

If the alleles are neutral the right-hand side of the equation is 1. As c~ increases, it becomes smaller. The equilibrium substitution rates were computed numerically for various distributions (n = 1, 2, 4, 6, oo) and plotted as functions ofc~ (figure 1). As can be seen from the figure, the substitution rate is always a decreasing function of 0z. However, if the distribution of mutant effect is changed, some differences are seen. For the double- exponential and normal distributions, k decreases very rapidly as pointed out by Tachida (1991) and Gillespie (1994b) and, for the normal, is almost zero when c~ is more than 4. On the other hand, for the uniform distribution (n = oo), the decrease is slow, and even when c~ is 10, k/u is about 0.1. For intermediate values of n, the decrease rates are in between. In the following, I compute k/u for the two extreme cases of n = 2 and n -= oo approximately using a double inequality,

z z~< ~< 1 +z, (10) 1 - exp(- z) for z~>0. For the normal distribution, because the difference between two normal random variables is again normal, the substitution rate is written as

k - u f | G(z)dz, (ll) 2M/-s -~ 38 Hidenori Tachida

. " ' . '" ' | | i i | |

0.8

0.6

.... double exponential 0.4

- ~- -,g(6) \ \ \a.'o , 0.2 - .o-- unirom, \ \ o ,~o. "

...... , 0 , . , ~, ~ , ,,I \.\ , ,~'-- "~,~r 0.1 I C~

Figure 1. The equilibrium substitution rate assuming various mutation distributions. The substitution rate divided by the mutation rate is shown: Double exponential, double- exponential distribution; Uniform, uniform distribution; Normal, normal distribution; g(n), O,(s) (see equation (4)). where G(z) = 1 - exp(- az) exp . (12)

Using the inequality (10) and the symmetry G(z)= G(-z), we can show the double inequality

I7--~4 + 0 ((-~)3) I exp I -- ~1 ~ -kU ~< ~-~exp I- ~ -~ ] " (13)

To derive the inequality, an asymptotic expansion of the normal distribution for large vaIues, Af;exp(__ )dx = exPL-TJI1 C a2q -7+~1 ], (14) was used (see, for example, Kendall et al. 1987). For the uniform distribution, the substitution rate is expressed as

(x-y)exp(y) dxdy -45~ o -,/5~ 1 - exp[- (x - y)]

r- e=~ zexp(- z) = 2Cu Lexp(.v/3cz) J ..... ctz o 1 - exp(- z)

1 - exp(- z) dz , (15) where C = {2,v/3~ [exp (x//5@- exp(- x/"300] }-1 (16) House-of-cards mutation model 39

Again using the second inequality of (10), we can show that

0 < fl "/~ 1 - exp(z _ z)dZ ~< 2x/~c~(1 + x/~c~)

oo zexp(-z) 2(V/3~ llexp(--2w/5c O. (17) O< _ 1 _exp(_z)dZ ~< + f 2~/3e~ Putting these into equation (15), we obtain

I~ - 2Cexp(x/r3~' 6rc ~< 4C(1 + x/~z)2exp( - x/~c~), (18) which implies, for large 0~, the following relation: k zr2 u 6c~" (19)

The two relations (13) and (19) show that, while k/u decreases very rapidly for the normal distribution, the decrease is not so rapid for the uniform distribution as ~ becomes large.

2.2c Heterozygosity

Since the weak-mutation approximation employed here assumes monomorphism of the population, it seems contradictory to try to obtain heterozygosity using this approximation. However, if the mutation rate is very small, the heterozygosity may be calculated as a sum of heterozygosity contributions fi'om all mutations. We use this approach to obtain approximate values of the average heterozygosity when 4Nu is much smaller than 1. An exact formula for this quantity using moments of homozygosity under neutrality was obtained by Ethier (S. N. Ethier, unpublished result). Kimura (1969) showed that total heterozygosity contributed by a mutant with a selective advantage s over the presently dominant allele is

2( 2s 1) (20) s 1-exp(-4Ns) 2N " Since 2Nu new mutations arise every generation, the average heterozygosity H is represented by

H = 8Nu 1 - exp ['- ~(r - s)-] c~(r - s) f(r)p(s)drds (21) --00 --O~ after standardization. Note that the average heterozygosity is a function of Nu, c~, and the shape of the mutation distributionf(s). Since this is a very rough argument we need some justification of this procedure, and a comparison with the results of a Wright- Fisher simulation in transient states is shown in the next section where the procedure to compute the transient heterozygosity is described (also table 1). Generally this approxi- mation gives fairly good estimates for 4Nu < 0.04. The average heterozygosity was computed using equation (21) for various mutant distributions In = 1 (double exponential), 2 (normal), 4, 6, oo (uniform)] numerically 40 Hidenori Tachida

0.5

0.4

.~ 0.2

0.1

0 0.1 1 10 EZ

Figure 2. The average heterozygosit2 in equilibrium assuming various distributions of mutant effect. The average heterozygosity divided by 8Nu is plotted. The notations for the distributions are the same as in figure 1.

and the results are shown in figure 2. To standardize the results I1/(8Nu) is plotted against c~. As in the case of the substitution rate, the a,r heterozygosity decreases as increases and the decrease is slower for larger n~s. Note, however, that the decrease of the average heterozygosity is much slower than that of the substitution rate in the equilibrium state. Comparison of equations (9) and (21) indicates why this is so. Firstly, the difference between the integrands of the right-hand sides of the two equations is in the function, denoted here by K(x), in the braces. These functions show very different behaviours when the absolute value of the variable is large. Specifically, for very small x (large absolute values with negative signs), K(x)~ ]xrexp(-Exl) for the substitution rate and K(x) ~ 1/Ixl for the average heterozygosity. Thus the former decreases much more rapidly than the latter as x---, -c~. Secondly, recall that most mutations are deleterious in the equilibrium state. This means that r - s in the integrands of equations (9) and (21) is mostly negative with regard to the probability measure. Therefore, as cz becomes large, the substitution rate decreases more rapidly than the average heterozygosity. This fact has an implication for the neutrality test that uses a plot of the total number of substitutions against the average heterozygosity (Ward and Skibinski 1985). Tachida (1991) showed that the total numbers of substitutions in cases of large 0c at the (10/u)th generation were larger than those expected from the neutral model in the plot. However, since in the equilibrium state the substitution rate decreases at a much quicker rate than the average heterozygosity as 0~ becomes large, the total number of substitutions is less than the neutral expectation. More quantitative discussion is given in the next section.

2.3 7u state

Even with the weak-mutation approximation it is difficult to analyse the transient behaviour of the house-of-cards model in finite populations. So computer simulation was carried out. The program simulates the jump process of the weak-mutation House-of-cards mutation model 41

Table 1. Accuracy of the weak-mutation approximation for the average selection coefficient and heterozygosity.

Average s Heteroz) gosity

4Nu c~ W-F" App? W-F ~' App?

0-04 0.2 0.0001 0.0002 0.0350 0.0399 1.0 0"0480 0,0050 0.0334 0.0342 2,0 0-0163 0.0164 0.0221 0.0242 I0.0 0.1208 0,1235 0.0065 0.0051 20.0 3-2607 0.2697 0.0034 0-0024 0-1 1.0 0.0046 0.0050 0.0755 0.0854 0"2 0.0040 0.0050 0.1488 0"1708 0-4 0-0038 0.0050 0.2520 0-3415

"The Wright-Fisher simulation of Tachida (1991) ~'The weak-mutation approximation Values at (10/u)th generations are shown. approximation. This speeds up the computation tremendously and one can investigate long-term behaviours of the process. The method of the simulation is the same as that described in the appendix of Tachida (1991) and explained briefly here. Let Z k and S k be the selection coefficients of the kth mutant allele counted from time zero and the dominant allele when the kth mutation occurs, respectively. Z k is independently distributed with the mutant distribution function f0(z) and Sg+~ is determined by the following rule:

Z k with a probability p Sk+l = Sk with a probability 1-p, (22) where 2(Z k - Sk) P = 1 - exp ]- - 4N(Z k - Sk) ]"

The total number of substitutions is the number of times S k is changed and the average heterozygosity is the sum of the functions,

( 1 1 ) (23) 4 1-exp[-4N(Z k-Sk) ] 4N(Zj,-Sk) ' of all mutations that occurred in one generation. These are recorded in the simulation with specified intervals. Time is measured in units of 2Nu generations. All populations started from S o = 0 and 2N = 100. One thousand replications were run for each parameter set. I investigated the approach to the equilibrium, the dispersion index, and the ratio of the substitution rate to the average heterozygosity assuming various mutant distributions. Before describing these results, I present a comparison with the results of the Wright-Fisher simulation of Tachida (1991) to check the accuracy of the approximation. Table 1 shows the comparison for average selection coefficient and average heterozygosity. Although discrepancies become large for larger values of 4Nu 42 H idenori Tachida

J , ! ill||ll 'l i"aliHl~l a I'lal'|~' I I ~.~

3.5 - Normal ' ~-- g (4) 3 - + g (6) ~ .

r'---it0 2.5 2

1.5 1

[ l Illll] I I I I IIII | J I I lIIl|J 1 l l IlilI 050. i 1 10 100 lO00 time Figure 3. The change of the average selection coefficient.The average selection coefficient divided by the standard deviation of the distribution of mutant effect is plotted against time. Time is measured in units of 1/u generations, and ct = 20. The notations for the distributions are the same as in figure 1.

(4Nu = 0"2, 0-4), the approximation works fairly well for small 4Nu. The substitution rate obtained by this approximation was also close to that obtained by the Wright-Fisher simulation as shown in Tachida (1991). Thus we can use this approximation when 4Nu is small compared to 1 as expected.

2.3a Approach to equilibrium

To monitor the approach to equilibrium, the change of average selection coefficient was followed. In the presentation of the data, time is measured in units of 1/u generations. If ~ is smaller than 2, the equilibrium is almost attained in less than 1/u generations in the cases of uniform (n = oo), 94(n = 4) and 96 (n = 6) distributions (data not shown). This is a bit quicker than in the case of the normal distribution where a few times 1/u generations are required to almost attain the equilibrium state (see Tachida 1991). This difference of the approach rate between the normal and other distributions is more pronounced when e is large. Figure 3 shows the change of the average selection coefficient divided by the standard deviation for the four mutant distributions when e--20. For the normal, 94, 96 and uniform distributions, the equilibrium selection coefficients computed from the equilibrium distributions are 20.0, 3"51, 2.51 and 1.68 respectively. Although the average selection coefficient is far from equilibrium at (1000/u)th generation for the normal distribution, for the other three distributions the values at (1000/u)th generation are close to the respective equilibrium values, especially for the uniform distribution, where the equilibrium is attained in lO/u generations. Thus the very slow rate of approach to equilibrium is a characteristic of the cases with small n. The reason for the slow approach in the case of the normal distribution is the very small substitution rate near the equilibrium state [see inequality (13)~ for large ~ since the change of the average selection coefficient occurs only when substitutions occur. House-of cards mutation model 43

Table 2. Dispersion indices at t = 1/u, t = 1000/u and t = 10000/u for various distributions of mutant effect.

c~ Normal y,(s) 96(s ) Uniform

(a) t= 1/u 0-2 1,048 1,069 0.949 1-015 1.0 0.931 0'895 0'920 0"935 2.0 0-750 0,675 0,709 0.747 10"0 0'326 0,275 0,296 0"283 20-0 0.336 0,323 0.302 0'292 100.0 0.352 0.337 0.340 0.336 (b) t = lO00/u 0-2 0-995 0,990 1.075 1,075 1,0 1.732 1,446 1.422 1,279 2.0 7.075 2-264 1.890 1.639 10.0 0.898 3.353 4.394 1.683 20.0 0.489 0.803 1.497 1.513 100.0 0.426 0-444 0.448 1.204

(c) t = 10000/u 10.0 1"303 11.659 10.234 1.742 20.0 0.543 1.563 4.748 1.609 100.0 0.430 0.486 0-474 1.708

2.3b Dispersion index

The dispersion index, I, measures the regularity of the occurrences of the substitutions in time and is defined as (see, for example, Takahata 1987) Var[M(t)] I(t)= -E[M(t)] ' (24) where M(0 represents the number of substitutions that occurred in a time length t. If the substitution process is Poissonian as expected under the neutral mutation model, I(t) is 1. The dispersion index is less or more than 1 if the occurrences of substitutions are less or more clustered, respectively, compared to those in the neutral case. Estimates of I(t) were obtained by estimating E [M(t)] and Var I-M(t)] and using equation (24) in the simulation. Some of the results are shown in table 2. The dispersion indices at (1/u)th and (1000/u)th generations are presented, and for c~/> 10.0 indices at (10000/u)th generation are also shown because it takes a longer time to approach equilibrium with larger ~. If c~ is 0.2, the dispersion index is almost 1 as expected. However, if ~ is larger, very complex patterns appear. In the earlier generations (t = l/u) the dispersion index becomes less than 1 and this trend is especially pronounced for larger values ofc~. Such underdispersion was observed by Gillespie (1993, 1994a) for other substitution models. However, as the generations proceed, the dispersion index becomes more than 1 (overdispersion) for some ~, as shown by Iwasa (1993) for discrete distributions with finite numbers of alleles and Gillespie (1994b) for the normal distribution. Which produces overdispersion depends on the shape of the distribution of mutant effect. The largest dispersion index in the table is when ~ is 2.0 for the normal distribution and 44 Hidenori Tachida

10.0 for the other distributions. The maxima are quite large for the normal, g4 (n = 4) and g6 (n = 6) distributions. The dispersion indices become very small when ~, becomes large for these three distributions. On the other hand, the dispersion index is mildly large for all values of ~ > 0.2 examined in the case of the uniform distribution. Large dispersion indices are thought to arise when lineages are diversified into two classes, one in which substitutions rarely occur because of the high average fitness and the other in which substitutions occur fairly often. Since the proportions of these two classes and differences of substitution rates between them depend on the shape of the distribution of mutant effect, the magnitude of the dispersion index depends much on the shape of the distribution. In summary the behaviour of the dispersion index is very complex, depending upon the time, c~, and the shape of the distribution of mutant effect.

2.3c Ratio of substitution rate to heterozygosity

Under the neutral hypothesis, the substitution rate k and the average heterozygosity H in the equilibrium state are both functions of the mutation rate and are u and 4Nu/(1 + 4Nu) respectively. The relation between substitution rate and heterozygosity was used to test the neutral hypothesis (Ward and Skibinski 1985; Skibinski et al. 1993; see also Sawyer and Hartl 1992 for other tests comparing variations within and between species). Tachida (1991) investigated how the relation is affected by the presence of nearly neutral mutations. It was found that if we assume the normal mutation distribution the relation is very similar to that under the neutral hypothesis for ~ less than 2 or 3, but for larger ~ the substitution rate becomes larger than expected under neutrality. Here I investigate the generality of these observations. Define the following quantity R: R = 4NE[M(t)] EEH3t (25)

Table 3. Relation between substitution rate and heterozygosityfor various distributions of mutant effect. Estimates of R at t = lO/u and t = 1000/u are tabulated (see text).

Normal g,(s) g6(s) Uniform

(a) t = 10/. 0"2 1-002 0'994 0-996 1-014 1-0 0-936 0"925 0-916 0-945 2-0 0.853 0"841 0-826 0"850 10"0 2"127 1'617 1-416 1-206 20'0 4'465 3'110 2"524 1-839 100-0 33-940 19"946 14"030 6-306 (b) t = 1000/u 0.2 0'993 0-995 0-996 0-994 1.0 0.854 0-866 0-870 0"871 2.0 0'553 0.649 0.672 0'693 10.0 0.099 0-117 0.187 0-410 20.0 0'183 0'125 0.127 0-360 100.0 1'207 0-670 0"514 0'362 House-of-cards mutation model 45

Under the neutral hypothesis, E[M(t)] = ut and in the equilibrium state with very small Nu, E [H-] = 4Nu. Thus R is 1 under the neutral hypothesis. Table 3 shows the behaviour of R in our silrmlation. Tachida (1991) investigated this relation at the (10/u)th generation and our result shows R -~ 1 for other distributions of mutant effect with c~ ~< 2 at this generation. Also, for larger c~, R is larger than 1 in all distributions although the magnitudes differ. However, R is a decreasing function of time after that (data not shown). At later generations (i.e. the (1000/u)th generation), R is a bit larger than 0.5 for e = 2-0, and for c~ >~ 10 it is much smaller than 1 except for the normal distribution with c~ = 100. In fact R = 0.166 at the (10000/u)th generation and the exception is due to the slow approach to equilibrium when the mutation distribution is normal. Thus, for large c~, R is larger than 1 in the earlier generations but becomes less than 1 in later generations. The behaviour in the earlier generations is explained by very quick substitutions of advantageous mutations in the earlier phase which make the numerator of R large. The behaviour in the later generations can be explained by the argument, described in the preceding section on equilibrium state, that the substitution rate decreases much more quickly than the heterozygosity as c~ becomes large.

3. Discussion

The present paper has investigated the house-of-cards model in finite populations paying attention to the effects of the shape of the distributions of mutant effects with different tails. Several features of the model not noted in previous studies were found. Firstly, the very quick decrease of the substitution rate k as c~ becomes large was found to be a peculiar feature of the model with normal distribution of mutant effect. For the normal distribution, k oc ~-1 exp(_0~z/4) and this decrease is very quick compared to that for the uniform distribution (/~ cx: c~- *). This peculiarity of the normal distribution is considered to come from its tail which vanishes very quickly but still makes the average selection coefficient large as c~ becomes large. Gillespie (1994b, 1995) criticized the house-of-cards model because the substitution rate becomes almost zero above cz = 4. However, since this is a characteristic due to employing the normal distribution as a distribution of mutant effect, this alone does not exclude the house-of-cards mutation model as a candidate mechanism for molecular evolution. Of course, even with the uniform distribution, k decreases more quickly than in the optimum selection model of Foley (1987) or the shift model (Kimura 1979) with a gamma distribution of mutant effect where k is proportional to (v/~)- 1 Secondly, although the approach to equilibrium is very slow if the mutant distribu- tion is normal with large ~ (Tachida 1991), the approach becomes quicker as the right tail of the mutant distribution vanishes more quickly (n larger). In the extreme case of the uniform distribution (n = co) the equilibrium is attained in lO/u generations for = 20. As pointed out before, the quick approach in cases of distributions with large n. is due to an appreciable magnitude of the substitution rate in the equilibrium state and this influences the time-dependent behaviour of various statistics discussed below. Thus the shape of the mutant distribution, especially its tail, is an important determi- nant of the behaviour of the substitution process. Thirdly, the dispersion index in the house-of-cards model shows a complex behav- iour. For 0~ t> 1.0, I(t) is less than 1 in earlier generations but becomes larger than 1 for intermediate s Iwasa (1993) and Gillespie (1994b) noted large I(t) values in the 46 Hidenori Tachida house-of-cards model with some parameter ranges, and Gillespie (1993, 1994a) conjec- tured and showed that many symmetrical substitution models including models of rapidly fluctuating environments and overdominance show l(t) less than 1. The reason for small values of I(t) in earlier generations and in later generations with large ~ is probably similar to that given by Gillespie (see p. 1108 of Gillespie 1994a). If a substitution occurs it becomes more difficult for subsequent mutations to get fixed because the average selection coefficient becomes larger. Thus a slowdown of the accumulation of substitutions occurs in a population that accumulates more substitu- tions resulting in a regularity of the substitution process. According to the classification of substitution processes as point processes by Takahata (1987, 1991), the house-of- cards model is one type of semi-Markov processes. Takahata presented examples of such processes with large I(t). Probably the reason for large I(t) in a certain parameter range is infrequent fixations of deleterious mutations followed by a burst of advantage- ous fixations as suggested by Gillespie (1994b). However, if selection is too strong (large ~), I(t) is small. Thus there is a delicate balance for large I(t) to be observed and that is why the range of c~ for large I(t) differs among cases with different distributions of mutant effect. Fourth, the ratio R of substitution rate to average heterozygosity changes as a function of time. In earlier generations (up to 10/u generations) R is around 1 for c~~< 2 and larger than 1 for large c~. Then R decreases as time proceeds and becomes smaller than 1 for ~ ~> 2"0. Tachida (1991) stated that R is around 1 for nearly neutral cases (e ~< 3-5) and is larger than 1 for large c~ but this is not always true. Thus caution is necessary in drawing conclusions from estimates of R when neutrality is rejected since values of R depend on time. Large effects of the shape of the distribution of mutant fitness on the behaviour of the house-of-cards model pose a question as to what fitness distribution mutants of real proteins have. We do not have any definite answer at present. However, we may be able to exclude some distributions by the following consideration. The present study shows that some peculiar behaviour is observed when selection moves the average fitness of the population to a high value. In this state the substitution rate is extremely low but still an improvement of fitness can occur. This situation occurs after a long time and when the order of the exponent, n, in equation (4) is small (i.e. the double-exponential and normal distributions). For example, if we assume the normal distribution and co= 10, the substitution rate divided by the mutation rate is 3"7 x 10 -12 in the equilibrium state. This must be larger than the proportion of advantageous or neutral mutations in this state. If we consider a protein with, say, 300 amino acids, the total number of states to which the protein can mutate in one step is 6000. Then the mutation rate to one allelic state cannot be less than 1/6000 of the total mutation rate. Thus the normal distribution with large = produces a rather artificial situation. This shows that the normal and double-exponential distributions are unlikely candidates as distribu- tions of mutant effects, especially in the equilibrium state with large ~. However, the other extreme case of the uniform distribution also seems artificial. The true situation might be in between, but how quickly the right tail of the distribution of mutant effect decays remains an open question. In the present paper the weak-mutation approximation was used to compute various quantities. The approximation works well for 4Nu < 0-1 but quantitative discrepancies are found for 4Nu > 0.1 especially when we compute the heterozygosity (see table 1). The rate of nearly neutral mutations at a single locus is currently not known. If all House-of-cards mutation model 47 mutations are nearly neutral, 4Nu would be 10 or less for a gene with 1000 nucleotide sites since the nucleotide diversity is usually less than 1% per synonymous site (see Kreitman 1983; Li and Sadler 1991) and synonymous sites are considered to be neutral (Kimura 1983). However, there surely are very deleterious mutations such as lethal ones. Thus 4Nu might be much smaller than the above estimate. Furthermore, as indicated in Tachida (1991), if a gene is divided into several regions with different e, we need to treat each region as a gene and in this case the mutation rate would be smaller. Then 4Nu might be of the order of 10-1 or less. Thus our results might be directly applicable to a small region of a gene. Of course, the main reason for using the approximation is to save computation time to examine a variety of distributions of mutant effect. Thus we need to reexamine to what extent the conclusions obtained by the approximation hold as the mutation rate becomes larger by directly simulating gene frequency changes in future studies. In summary, the behaviour of the house-of-cards model in finite populations is complex and depends on the magnitude of selection as measured by cz, the shape of the distribution of mutant effect, and time, even though it is essentially a single-locus model. As indicated in Tachida (1991), a protein for which the house-of-cards model is meant to be applied has many sites each of which is expected to experience different selection pressure. One way to deal with this is to study a multisite model in which each site behaves like a house-of-cards model. Furthermore, the behaviour of linked neutral sites must be studied to apply this model to analysis of protein evolution. These are obvious extensions of the present work but their properties might be complex considering the complex nature of the single-site house-of-cards model studied here.

Acknowledgement

I thank Drs Tomoko Ohta and John Gillespie for helpful comments on the manuscript. This research was partially supported by grants-in-aid from the Ministry of Education, Science and Culture of Japan.

References

Crow J. F. and Kimura M. 1970 An introduction to theory (New York: Harper and Row) Ethier S. N. and Kurtz T, G. 1994 Convergence to Fleming-Viot processes in the weak atomic topology. Stoch. Processes Appl. 54:1-27 Foley P. 1987 Molecular clock rates at loci under stabilizing selection. Proc. Natl. Acad, Sci. USA 84: 7996-8000 Gillespie J. H. 1991 The causes of molecular evolution (New York: Oxford University Press) Gillespie J. H. 1993 Substitution processes in molecular evolution. I. Uniform and clustered substitutions in a haploid model. Genetics 134:971-981 Gillespie J. H. 1994a Substitution processes in molecular evolution, lI. Exchangeable models from popula- tion genetics, Evolution 48:1101-1113 Gillespie J. H. 1994b Substitution processes in molecular evolution. III. Deleterious alleles. Genetics 138: 943-952 Gillespie J, H. 1995 On Ohta's hypothesis: Most amino acid substitutions are deleterious, J. Mol. Evol. 40: 64-69 Iwasa Y. 1993 Overdispersed molecular evolution in constant enviromnents, J, Theor. Biol, 164:373-393 Kendall M., Stuart A. and Ord J. K. 1987 KendaIl's advanced theory of statistics (London: Charles Griffin & Company) vol. 1, p. 185 Kimura M. 1962 On the probability of fixation of mutant genes in a population. Genetics 47:713-719 48 Hidenori Tachida

Kimura M. t968 Evolutionary rate at the molecular level. Nature 217:624-626 Kimura M. 1969 The number of heterozygous nucleotidesites maintained in a finite population due to steady flux of mutations. Genetics 61:893-903 Kimura M. 1979 A model of effectively neutral mutations in which selective constraint is incorporated. Proc. Natl. Acad. Sci. USA 76:3440-3444 Kimura M. 1983 The neutral theory of molecular e~Jolution (London: Cambridge University Press) K/mura M. 1987 Molecular evolutionary clock and the neutral theory. J. Mol. Evol. 26:24-33 Kimura M. and Crow J. F. 1964 The number of alleles that can be maintained in a finite population. Genetics 49:725-738 Kingman J. F. C. 1978 A simple model for the balance between selection and mutation. J. Appl. Probab. 15: 1-12 Kreitman M. 1983 Nucleotide po!ymorphism at the alcohol dehydrogenase locus of Drosophila meIanogaster. Nature 304:412-417 Lewontin R. C. 1974 The geodetic b..,is of evolutio~ary change (New York: Columbia University Press) Li W.-H. and Sadler L. A. 1991 Lo~,. nucleotide diversity in man. Genetics 129:513-523 Ohta T. 1972 Evolutionary rate of cisirons and DNA divergence. J. MoI. Evol. 1:150-157 Ohta T. 1977 Extension to the neutral mutation random drift hypothesis. In Molecular evolution and (ed.) M. Kimura (Mishima: National Institute of Genetics) pp: 148-167 Ohta T 1992a The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263-286 Ohta T. 1992b Theoretical study of near neutrality. II. Effect of subdivided population structure with local extinction and recolonization. Genetics 130:917-923 Ohta T. and Taehida H. 1990 Theoretical study of near neutrality. I. Heterozygosity and rate of mutant substitution. Genetics 126:219-229 Sawyer S. A. and Hartl D. L. 1992 Population genetics of polymorphism and divergence. Genetics 132: i161-i176 Skibinski D. O. F., Woodwark M. and Ward R. D. 1993 A quantitatiw test of the neutral theory using pooled allozyme data. Genetics 135:233-248 Tachida H. 1991 A study on a nearly neutral mutation model in finite populations. Genetics 128:183-192 Takahata N. 1987 On the overdispersed molecular clock. Genetics 116:169-179 Takahata N. 1991 Statistical models of the overdispersed molecular clock. 7boor. Popul. Biol. 39:329-344 Ward R. D. and Skibinski D. O. F. 1985 Observed relationships between protein heterozygosity and protein genetic distance and comparisons with neutral expectations. Goner. Res. 45:315-340 Zeng Z.-B., Tachida H. and Cockerham C. C. 1989 Effects of mutation on selection limits in finite populations with multiple alleles. Genetics 122:977-984