<<

IN A CLINE

MONTGOMERY SLATKIN Department of Biophysics ana! Theoretical , The University of Chicago, 920 East 58th Street, Chicago: Illinois 60637 and TAKE0 MARUYAMA National Institute of Genetics, Yata 1,111, Misima, Sizuoka-ken 411, Japan Manuscript received November 26, 1974 Revised copy received May 12, 1975

ABSTRACT

A model is developed of genetic drift in a cline maintained by spatially varying and local dispersal of individuals. The model is analyzed by an approximation scheme which is valid for weak selection and small migration rates. The results, which are based on numerical iterations of the approximate equations, are that the cline is less steep than predicted on the basis of the deterministic theory but that for weak selection the correlation between random fluctuations in neighboring colonies is approximately the same as in models of migration and drift in the absence of selection.

PATIAL patterns of gene frequencies in natural populations are determined by S the combination of genetic drift, spatially varying natural selection, and between local populations. A quantitative theory of the interaction of these mechanisms is necessary for interpreting observed spatial patterns and for understanding the possibilities for local genetic differentiation and in the presence of a small amount of gene flow. Most of the previous theoretical work on gene flow has ignored either genetic drift or natural selection. We present here the results of a simple model of genetic drift in a cline main- tained by selection and gene flow and reach come conclusions which do not seem to depend on the details of the model and which have some implications for real populations. We find that genetic drift can significantly alter the structure of a cline maintained by selection and gene flow, but that spatially varying selection makes little difference in the expected correlation in gene frequencies between local populations based on the theory of gene flow and drift alone. FELSENSTEIN(1975) has considered the same problem, but from a different point of view. To analyze the model, he uses the approximation that the ex- pected gene frequencies at each location are the same as in the absence of random drift. This is quite accurate if the local is large, but with this technique it is not possible to determine the effect of drift on expected gene frequencies in the cline. We use a different method, which is accurate only if

* SupDorted in part b4 AEC grant AT (11-1)-2472 Genetics 81: 209-222 September, 1975. 21 0 M. SLATKIN AND T. MARUYAMA the local population size is small; so with our model, we can consider a different range of parameters than does FELSENSTEIN. HASTINGSand ROHLF (1974) consider the effect of genetic drift when there is a linear change in the selection gradient. They consider a different case than ours in which drift cannot change the expected gene frequencies and determine the relationship between the expected and actual clines as a function of migra- tion. They- show that for the case they analyze migration must be relativelg- large before the actual cline observed would be a good predictor of the under- lying selection.

THE MODEL For a diploid with discrete non-overlapping generations we consider a single locus with two alleles. A and a. We assume that there are IZ local popula- tions, each of effective size N, arranged in a one-dimensional array. For def- initeness in notation. we will aways assume that n is odd. Selection acts inde- pendently in each population and there is no dominance; we will consider later the effect of different degrees of dominance. The selection coefficient in the ith population is s1. We assume that there is migration only between adjacent populations in the array and that a fraction m/2 of the alleles is exchanged each generation. This is KIMURA’S(1953) “stepping-stone” model. The assumption of gene flow between adjacent populations only is for computational simplicity and is not restrictive. It is known both for the cases of gene flow and selection and for gene flow and genetic drift that only the variance in the dispersal distance is significant, at least to the first approximation (SLATKIN1973; MARUYAMA1972). Therefore, in the present model, more general types of local dispersal would be expected to produce approximately the same results. We ignore long-distance migrants which could have significant effects on the spatial pattern of gene frequencies because we are interested here only in the effects of local gene flow. By ignoring long-distance migrants, we are underestimating the importance of gene flow. For a finite collection of populations, the only equilibrium state is complete fixation of one of the alleles. unless there are . However. differential selection in different locations could retard fixation greatly in much the same wav as heterosis (ROBERTSON1962). Therefore, the structure of the population before fixation occurs is of interest. One way to analyze that structure is to con- sider a “quasi-equilibrium” state, before fixation occurs. in which there is a relatively permanent cline with some fluctuation due to genetic drift. FELSEN- STEIN (1975) uses this method. Another possibility is to introduce some mecha- nism which prevents fixation and analyze the true equilibrium of the modified model. This is reasonable if the results are not sensitive to changes in the stabiliz- ing mechanism. We use this approach here and assume that the end populations in the array exchange a fraction m/2 of their genes with reservoirs kept at fixed gene frequencies po and p,. This is an appropriate model for a chain of islands between two mainland areas, but, as we will show, our results apply to other situations as well. GENETIC DRIFT IN CLINES 21 1 We denote the frequency of the A allele in generation t by x1 (t). If m and the si are small, the basic recursion equation for the model is m zi(t+l)=(l-m)si(t)+ - (~i-~(t)~i+~(t)) 2 + +SiZi(t) (l-zi(t>) + . (1) With weak selection and low migration rates, this equation applies whether selection occurs before or after dispersal. The first two terms represent the effects of gene flow; the third, natural selection; and the last, genetic drift, modeled by a random variable, ti, which ,has mean 0 and variance xi (t)( 1-xi (t)) /2N (CROWand KIMURA1970, Chapter 7). In (1) , terms of the order of magnitude of (ml), (ms), and (9)are omitted. Throughout, we assume that m,lsil<

where the bar indicates the expected value. We will assume that the expected value or ensemble average is the same as the time average, which could, in principle, be measured. The last term of the right-hand side of (2) can be rewritten as Sifi (1-fi) - siui2 (3) where uiz = so we have

Equation (4), except for the last term, is the same as found in the absence of genetic drift. Since ui2 is always positive, the effective selection at each location is weakened and the cline will be less steep than in the absence of genetic drift. PELSENSTEIN(1975) reached the same conclusion using an intuitive argument.- To take the second moment of (1) we define a new variable yi (t) =xi (t)-xi, the deviation from the expected value of the gene frequencies. The equation for yi (t)in matrix form is

where - - l-m m/2 0 0 - - - m/2 1-m m/2 0 - Y (1) M= 0 m/2 . 21 2 M. SLATKIN AND T. MARUYAMA and

Note that the boundary conditions do not enter directly in (5) but only through - xi. We multiply (5)- by its transpose and take expected values to get the equation for the matrix A = yyT

where

1 D,=- 2N

and

un2 . - The diagonal elements of A are the variances, uL2. In general, the system of equations derived in this way is not closed; (7) involves the higher moments of yL. However, the terms with those higher moments are all mutiplied by s, or st2. If selection is weak (Ns,<

sisj [yi (1-2%;) + (ai2-yi2) ] [yj ( 1-2&) + (ajz-yjz) 1 (8) and

We can place an upper bound on (8) and (9) by using the fact that the moments of yi are a maximum when one or the other allele is fixed at each location and when adjacent locations are completely correlated. For example aiz

A = MAM + D, - I32 (11) While (4)and (11 ) cannot be solved analytically, they can be easily iterated on a computer to obtain solutions for the yi and yiyi in any particular case.

RESULTS Initially we will assume that there is an abrupt change in selection intensities representing two different environmental conditions so that si = -s for i<(n+1)/2, si = +s for i> (n+1)/2, and si = 0 for i = (n+1)/2 (with n always assumed to be odd). SLATKIN(1973) has shown that a population's re- sponse to a step change in selection is an indicator of its response to more general patterns of selection. We will also assume po= 0 and p, = 1. The iterations of (4) and (11) were carried out by assuming same initial values for and=, computing the right-hand side of (4) and (11) to obtain the next approximation to the variables. This procedure was continued until the values changed by less than 10-7 in a single step. Different initial values were tried in several cases, but there was no indication of more than one equilibrium solution. 214 M. SLATKIN AND T. MARUYAMA There are four parameters of the system, n, N, s, m, but, in fact, not all are important for the problems of interest. Firstly, if n is large, then the end popula- tions are far enough from the center of the cline that at the ends one or the other allele is fixed. Any further increase in n would then have no effect on the results. As an example, we let N= 100, m=0.01 and s=O.O01 and find the solutions for different values of n. The results are shown in Figure 1. For small n, the boundaries are close to the center so the shape of the cline is determined mainly by the gene flow between the reservoirs at the two ends, with only a small effect from the differential selection (curves a and b) . For larger values of n, however, the cline is maintained by differential selection and boundaries are not im- portant (curves c and d). For n=45, there is almost no change in curve d (curve e). Since we are interested in the effects of genetic drift in a cline maintained by selection, we will consider only those cases in which n is large enough that the boundary effects are not important. For such cases the results should be the same as for a “quasi-equilibrium” solution as discussed by FELSENSTEIN(1975). In mas" cases presented below. n=25 with same checked by runs with n=35. A second simplification in the model is that the only combinations of param- eters which are important are Nm and Ns. This can be demonstrated by analyzing the approximate equations, (4) and (11). In this case, equation (4) can easily be shown to depend only on s/m and (11) can be shown to depend mly on Nm and g. Thus Ns and Nm are sufficient parameters in the range for

I I II Ill Ill

t 1

I FIGURE1.-Comparison of results for different locations of the boundaries. NM=1 Nsx0.1 (a) n=5 (b) n=15 (c) n=25 (d) n=35 (e) n=45.The absissa is the distance from the center of the cline (i= (n+1)/2). GENETIC DRIFT IN CLINES 21 5 which this approximation scheme is valid. FELSENSTEIN(1975) reached the same conclusion for a different range of parameters. There are two ways of describing our results: the effect of genetic drift on a cline maintained by selection and gene flow, and the effect of selection on the correlation of gene frequencies in nearby populations as determined by genetic drift and gene flow. As mentioned earlier, genetic drift will always tend to make a cline less steep than one determined by selection and gene flow alone. The effect of genetic drift can be described by taking the ratio of the maximum slope of the cline (at the midpoint, i= (nS1) /2) in the absence of genetic drift -(A,) and the expected maximum slope in the presence of genetic drift (A = Z+l- xi for i = (n+ 1)/2). This ratio is shown in Figure 2 for different values of the parameters. The ab- solute value of the slope is smaller for smaller values of m, but the change due to genetic drift is larger. The slope in the absence of drift can be estimated by using the formula in SLATKIN(1973) for the slope in the case of a more general migration pattern. - The maximum slope is approximately Vs/31where I is the root-mean-square of the migration distance, in this case m measured in units of the intercolony dis- tance. (The formula in SLATKIN(1973, p. 739) is incorrect by a factor of VT Thus, the maximum slope, A is approximately -\/s/3m. From the graph we can see that there is a value of Ns for which &/A is a maximum and that the maxi- mum occurs for weaker selection when m is smaller. Another possible effect of genetic drift is the shift of the center of the cline away from the point i= (n+1)/2. While we cannot determine the distribution of locations of the center of the cline, we can get a rough estimate of the range of possible locations by computing the largest value of i (which we will call i") I

F -I a - - ./-* $/ Nm.5 -

Ns FIGURE2.-Ratio of the maximum slopes in the absence and presence of genetic drift. In all cases n=25. 21 6 M. SLATKIN AND T. MARUYAMA

I .-I 2

for which Yi + ~i < .5. If we imagine a strip of width ui about the expected location of the cline, y3,then ir determined in this way is the point at which that strip crosses the lineyi = 0.5. The range of value of the center is approxi- mately i* to n-i*. Figure 3 shows (n+1)/2-i* (which is an estimate of the average displacement of the center of the cline) plotted for the different values of the parameters. The displacement is relatively insensitive to changes in m because a decrease in m leads to a larger slope; but that is compensated for by an increase in the variance. As expected, stronger selection leads to less variability in the center of the cline. We can also consider our results from the point of view of how much change there is in the variance and correlation with distance in gene frequencies. Figure 3 shows the maximum variance (U$*) (at i= (n+1/2)) in different cases. Also indicated on the figure is the maximum variance for the same model in the absence of selection, which will be discussed elsewhere ( SLATKIN,manuscript in preparation). In that case the maximum variance is approximately the same

.2op.. . . Nm.0.5 Nm: 1.0. Nm.5 - I I I .01 .1 25 .5 .75 1.0 Ns FIGURE4.-Maximum variance as a function of Ns. From top to bottom Nm=0.5, 1.0, 5.0 with n=25. GENETIC DRIFT IN CLINES 21 7 whether the reservoirs at the two ends are at 0 and 1 or both at 0.5. The figure shows that the variance is determined primarily by the migration and is ap- proximately the same as the non-selected case. This is not surprising since we have assumed that there is no interaction of selection and drift, other than through the expected gene frequencies. That was the basis for our approximation technique. FELSENSTEIN(1975) found that, for larger values of s, there was some dependence of the maximum variance on s. Another measure of the effect of selection is the change in the correlation with distance. We can measure this -by computing yiyj as a function of j for different values of i. We would expect yiyj to be a decreasing function of the distance between i and i. KIMURAand WEISS(1 964) show that the decrease of correlation with distance in the absence of selection is approximately exponential. We found the same pattern in the presence of selection as well. This is also in agreement with FELSENSTEIN’S reSUltS. If we define ~=yiyi+~/ dm for i=(n+l)/2, then we can compare the approximate value assuming an exponential decrease of yiyj, +ui2, with the actual value from our computations. The results for several cases are shown in Table 1. In all cases, I was relatively large (>0.8), so selection does not re- duce the correlation of the random fluctuations in gene frequencies even though a steep cline is produced. Thus we can characterize the correlation between adjacent populations, at least in the center of the cline, by a single parameter I-. In Figure 5 we show log, I for different cases. In deriving the above results, we have made several simplifying assumptions and we can now consider the consequences of relaxing some of them. First, if there is complete dominance of one of the alleles, say A, then the selection term in equation (1) would change to sixi (1-xi) 2 . (11) When we take the expected value of this term to obtain the equation for 6, we get S;Z~ (1--E;) + si (-2+3z”;) ui2+~jp3i , (12) where p3i = (xi-?;) 3.

TABLE 1 Computed values ojGjfor i= (n+l) /2 using approximate numerical-- technique compared with the values estimated using rlj-llcri*, where r=yi +Iyi/~a,sui+Is Case 1, Ns=.Ol, Nm=.5 (rz.9173); Case 2, Ns=.l, Nm=l (-3978); Case 3, Ns=.5, Nm=.5 (r=.8825) (n=25). The estimated and actual values are the same for j=1 by assumption.

i EST I(’) ACT EST (%) ACT EST (3) ACT 1 .I968 .I968 .I595 .I595 .0673 .0673 2 .I805 .I792 .I432 .I421 .0594 .0591 3 .1655 .I618 .I887 .I257 .0524 .0516 4 .I519 .I447 .I154 .1101 .0463 .0447 5 .I393 A277 .IO36 ,0955 .0408 .0383 21 8 M. SLATKIN AND T. MARUYAMA

, L W -201 0

Ns FIGURE5.-The correlation between fluctuations in adjacent populations.

If the variations in gene frequencies are symmetric about the expected values, the third central moment, p3, would be 0 and, in any case

pa2 2 2,(1-x”,) 11-22, 1 . (13) The coefficient of in (12) varies from -2s‘ (when to (when - ai2 Z%=O) +si zB=l);but in the middle of the cline, 1/<<<%, where genetic drift has the most effect on the expected gene frequencies, the coefficient is negative although smaller than in the case of no dominance. This is clearly more complex, but the overall effect of genetic drift on the expected allele may not be very different. We have restricted the migration to being only between adjacent colonies. If there is local migration but to more than adjacent colonies, then the results of SLATKIN(1973) and MARUYAMA(1972) indicate that only the mean square migration distance is significant as long as the higher moments of the migration function are not too large. Further analysis would be required to show that is the case in this model as well. The primary effect of the restriction on migration is the upper bound imposed on migration distance. However, we have shown that genetic drift is most important in a cline when m is small and the upper bound is not important. When there is a wide dispersal of migrants, the model should be replaced by a continuous model, in which case the approximation techniques used here could not be applied. We can also consider different gradients in the selection intensity. In the cases treated so far, s, has changed from --s to +s in two steps. We analyze a similar case in which the change occurs in six steps. Thus for n=25, sb = -s for i<10, +s for i>16, and --s,=(l-(i-l0)/3)s in the intermediate region. For Nm= 1 .O we compare the results for this selection model with those obtained previously. As expected, there was no significant difference either in the cor- relation between adjacent populations (.) or in the maximum variance (uL2). Also as expected, the effect of more gradual change in selection intensity is de- termined by the “characteristic length” of the system, defined by SLATKIN (1973) to be the standard deviation of the dispersal distance divided by vx The three cases we ran corresponded to characteristic lengths of 10.0, 3.2, and 1.4 GENETIC DRIFT IN CLINES 219 - (computed from vm/s).In the first case, the characteristic length is larger than the region of change in selection intensities and from SLATKIN’S(1973) analysis we would expect that there would be little difference from the previous case. In the other cases, when the Characteristic length is shorter than the region of change, we would expect larger differences. The results are shown in Table 2. We did not consider directly the question either of the uniqueness or stability of the solutions found. However, with the boundary conditions we have imposed, there seem to be no other possible solutions to the basic equations. We also did not analyze in any detail the rates of approach to the equilibrium solution but, since we used an iterative solution to the basic equations, we did obtain some information about the time-dependent properties of the system. In our computations, there were two time scales for the approach to equi- librium. Firstly, when every location was polymorphic initially, selection quickly set up a cline determined by s, m and the initial variances in gene frequencies (ai2).This happened roughly with the time scale of change due to selection in a single population (on the order of 1/s) and for all the cases we ran, this stage took- a few hundred generations. Secondly, the variances (ai2) and covariances (y~f)approached their equilibrium values in the time scales as- sociated with genetic drift in subdivided populations. In the cases we ran, this process always took much longer, on the order of ten to forty thousand genera- tions. During this stage there were only small changes in the expected allele frequencies. We did not carry out a more thorough investigation of this problem because there is no reason to assume that the terms that we ignored in equation (7) are unimportant during the approach to equilibrium even if they are for the equilibrium itself. Clearly, a more complete study of this problem is neces- sary. Finally, we compared some of our predictions against the results of some simulations of the system carried out by FEMENSTEIN(1975). FELSENSTEIN’S haploid model is equivalent to ours, with the exception that the diploid popula- tion size used here must be doubled to be comparable to the haploid population size. Although the parameter values were not exactly the same, the trend of the differences between the two cases is clear enough. One case is shown in Table 3. We used the average slope of the middle six locations (8-1 3) in the simulation

TABLE 2

Comparison of two selection cases: (1) width of zone 2 steps, (2) width of zone 6 steps Nmzl.0

Ns=.Ol Ns=.l Ns=.6 (1) (2) (1) ( 2’) (1) (2) n .OM9 .04-38 .0885 ,0777 .I842 .1413 no 0.605 ,0556 ,1582 ,1286 ,2913 .e196 A/& 1.34 1.27 1.79 1.66 1.58 1.55 it 5 5 7 9 10 10 .I54 .I61 uMAX .I88 .I88 .I78 .179 r .903 .903 384 .886 345 2351 220 M. SLATKIN AND T. MARUYAMA TABLE 3

Comparison of simulation resulis of FELSENSTEIN(2975) and approximate results obtained in this paper

Simulation- Nm n am2 .4 .I437 .I596 .8 .I287 .1293 1.6 .IO71 .lo84 Approximate Nm n no am2 .5 .0947 .2100 .205 1.o .0885 .I582 .178 5.0 .0660 .0784 .lo84 - A = average slope at middle of the cline; um2 = average maximum variance. In all cases Ns = .1 where N is the diploid population size. results because there was too much variability in the maximum slope for a comparison to be made. The maximum variance shown in the simulations is the average of the two control populations (IO and 11) . The simulation results confirm our prediction that the maximum slope is reduced, but the comparison of the two sets of results shows that we are con- sistently overestimating the effects of drift. The slope is not reduced as much as we predict and the maximum variance is smaller as well. Thus our method can be used to provide an upper limit to the effect of drift but is probably not quanti- tatively correct except for very small s (Ns<

DISCUSSION The results from the model are quite simple, although not without interest. The main effects of genetic drift on a cline maintained by selection and gene flow are a reduction in the expected slope of the cline and some variation in its location relative to the environmental changes. Because of the high correlation in random fluctuations in in adjacent populations, the cline would not be expected to be greatly distorted in shape, even when the effective local population size is quite small. For more general spatial patterns of selection, we have to argue by analogy with the deterministic case analyzed by SLATKIN(1973). Genetic drift decreases the expected maximum slope of the cline and therefore increases the minimum size of environmental patches to which a population can respond. A population’s ability to respond to local conditions is, then, reduced by the presence of genetic drift, although the exact extent of the reduction would have to be determined from a more accurate theory. In terms of measurements of gene frequencies in natural populations, if a cline is observed the selection necessary to maintain the cline can be estimated GENETIC DRIFT IN CLINES 221 from the maximum slope. The variance in the maximum slope is relatively small since the correlation between frequencies in adjacent populations is large in most cases (see Equation 20b in FELSENSTEIN1975). When large differences in gene frequencies are found between nearby populations (e.g., SELANDER1970) it cannot be assumed that genetic drift could augment local selection pressures to produce those differences. On the contrary, we have shown that genetic drift tends to reduce the effectivenessof selection in producing local differences. Local variations in natural selection have been proposed as a mechanism of speciation in the presence of gene flow. Several models of “” have been proposed (e.g., PIMENTEL,SMITH and SOANS1967; and MAYNARD SMITH1966). When the effects of genetic drift are considered the possibilities of sympatric speciation are reduced. While local differences in selection can adapt populations to local conditions, genetic drift dilutes that effect. If the association of particular with particular is necessary for the development of which would lead to and speciation, then genetic drift would make that process more difficult. DICK- ENSON and ANTONOVICS(1973) reached similar conclusions from LEVENE’S (1953) “multiple niche” model with genetic drift. This work was done while the first author (M.S.) was the guest of the National Institute of Genetics in Misima, Japan. He gratefully acknowledges the hospitality of the Institute and of DR. K. MORIWAKI,Director, and DR. M. KIMURA,Head of the Population Genetics Department. DRS.J. F. CROWand T. NAGYLAKImade many helpful comments on the manuscript. DR. J. FELSENSTEINwas very generous in providing results from his simulation study and in delaying publication of his paper until the final version of this paper was completed.

LITERATURE CITED CROW,J. F. and M. KIMURA,1970 An Introduction to Population Genetics Theory. Harper and Row, New York. DICKENSON,H. and J. ANTONOVICS,1973 The effects of environmental heterogeneity on the genetics of finite populations. Genetics 73: 713-735. FELSENSTEIN,J., 1975 Genetic drift in clines which are maintained by migration and natural selection. Genetics (this issue) HASTINGS,A. and F. J. ROHLF,1974 Gene flow: effect in stochastic models of differentiation. Am. Naturalist 108: 701-705. KIMURA,M., 1953 “Stepping-stone” model of population process leading to quasi-fixation of alleles due to random fluctuations in selection value. Ann. Rep. Nat. Inst. of Genetics, Misima, Japan, 62-66. KIMURA,M. and G. H. WEISS,1964 The stepping stone model of population structure and the decrease in genetic correlation with distance. Genetics 49: 561-576. LEVFXE,H., 1953 Genetic equilibrium when more than one is available. Am. Naturalist 87: 131-133. MARUYAMA,T., 1971 The rate of decrease of heterozygosity in a population occupying a circular or linear . Genetics 67: 437-474. -, 1972 The rate of decay of genetic variability in a geographically structured population. Mathematical Biosciences 14: 325-335. MAYNARDSMITH, J., 1966 Sympatric speciation. Am. Naturalist 100: 637-650. 222 M. SLATKIN AND T. MARUYAMA

PIMENTEL,D., G. J. C. SMITHand J. SOANS,1967 A population model of sympatric speciation. Am. Naturalist 101 : W3-504. ROBERTSON,A, 1962 Selection for heterozygotes in small populations. Genetics 47: 1291-1300. SELANDER,R. K., 1970 Behavior and genetic variations in natural populations. Am. Zool. 10: 53-66. SLATKIN,M., 1973 Gene flow and selection in a cline. Genetics 75 : 733-756. Corresponding editor: J. FELSENSTEIN