<<

LINKAGE DISEQUILIBRIUM IN SUBDIVIDED POPULATIONS’

MASATOSHI NE1 AND WEN-HSIUNG LI Center for Demographic and , University of Texas, Houston, Texas 77025, and Department of Medical Genetics, University of Wisconsin, Madison, Wisconsin 53706 Manuscript received October 12, 1972

ABSTRACT

The linkage disequilibrium in a subdivided populaton is shown to be equal to the sum of the average linkage disequilibrium for all subpopulations and the covariance between gene frequencies of the loci concerned. Thus, in a subdivided population the linkage disequilibrium may not be 0 even if the link- age disequilibrium in each subpopulation is 0. If a population is divided into two subpopulations between which migration occurs, the asymptotic rate of approach to linkage equilibrium is equal to either r or 2(m, + m,) - (m, + m,)2, whichever is smaller, where r is the recombination value and m, and m2 are the proportions of immigrants in subpopulations 1 and 2, respectively. Thus, if migration rate is high compared with recombination value, the change of linkage disequilibrium in subdivided populations is similar to that of a single random mating population. On the other hand, if migration rate is low, the approach to lnkage equilibrium may be retarded in subdivided populations. If isolated populations begin to exchange genes by migration, linkage disequi- librium may increase temporarily even for neutral loci. If overdominant selec- tion operates and the equilibrium gene frequencies are different in the two subpopulations, a permanent linkage disequilibrium may be produced without in each subpopulation.

INa single random mating population, linkage disequilibrium may be produced by two factors, namely, epistatic interaction in between the loci con- cerned and genetic random drift due to finite population size. The first factor has been studied by WRIGHT(1952), KIMURA (1956), LEWONTINand KOJIMA (1960) and others, while the second factor by HILLand ROBERTSON( 1968), SVED (1968) , and OHTA and KIMURA(1968a,b). However, most natural populations are not a single random mating unit but subdivided into local populations. In the Iollowing we shall show that subdivision of a population may also create linkage disequilibrium. In the case of two subpopulations, this problem was recently studied by SINNOCKand SING (1972). Although the statistical description of linkage disequilibrium in subdivided populations is simple and holds true for any pattern of migration and selection, it does not give any information on the dynamics of linkage disequilibrium in populations. In order to have a rough idea about the effect of migration on linkage disequilibrium we shall study a simple mathematical model. It will be shown that under certain circumstances migration creates linkage disequilibrium with-

Parr of this work was done while the authors were affiliated with Brown University, Providence, A. I. Supported by Public Health Service grants GM-17719 and GM-20293 and National Sclence Foundation grant GB-21224 and GP-20868.

Genetics 75: 213-219 September, 19i3 214 M. NE1 AND W.-H. LI out epistatic selection. Our method will be deterministic rather than stochastic: a general stochastic treatment seems to be quite complicated.

Linkage Disequilibrium in a Subdivided Population Consider a population of size N which is divided into k subpopulations. Let N, k be the size of the i-th subpopulation with N, = N. We consider two loci A a=1z and B at which pairs of A.a and B:b are segregating respectively. Let the frequencies of the four types of gametes AB, Ab, aB and ab in the i-th subpopu- lation be P,, Q1, R, and S, (P,f Q,+ R,f S, = 1 ) respectively. Then the linkage disequilibrium in the i-th population is given by

D, = PJ, - Q,R, (1) The average value of linkage disequilibrium for all subpopulations is

k b = E(D) = I: WLD,, (2) r=l where wa= N,/N is the proportion of the i-th subpopulation. In terms of gamete frequencies, D may be expressed as D E(PS - QR) =FS - QR i- COV(P,S)- COV(Q,R) , (3) where Cov denotes the covariance. The frequencies of alleles A and B are given by pA =P -I- Q and pB = P + R, respectively. Thus, the covariance between pA and pBis cov (pA,pB) = cov (PfQ, P+R) COV(P,1-Q-S) f COV(Q,P+R) = -COV (P,S) + COV(Q,R) . (4) Therefore, if we denote the linkage disequilibrium in the total population by D,i.e., DT = pg- QR, then we obtain the following simple formula: DT = fi f cov (pA,pB). (5) Namely, the linkage disequilibrium in the total population is the sum of the average linkage disequilibrium for subpopulations and the covariance of gene frequencies of the two segregating loci. If the number of subpopulations is 2, then formula (5) reduces to

DT= WDi f WzD?f WiWz PA^ - PA^) (PB~- PB,) 7 in which pA, and pB are the frequencies of A and allele B in the i-th sub- population, respectively. This is identical with SINNOCKand SING’S (1972) formula, though they inadvertently put a negative sign in front of the last term. Note also that formula (5) can be easily extended to the case of multiple alleles if linkage disequilibrium is defined as D,,= PALR,- pa pB,, in which PA’ and PB/ LINKAGE DISEQUILIBRIUM 215 are the frequencies of the i-th allele at the A locus and the j-th allele at the B locus, respectively, and PaiBiis the frequency of the gamete type AiBj. Formula (5) indicates that if Cov(pA,ps) # 0, Dr may be 0 even if D is not 0, or conversely DT may not be 0 even if D is 0.

Migration Between Two Populations In a single random mating population with no selection, the linkage disequi- librium is reduced in every generation at a rate of r, where r is the recombination value between the two loci under consideration. Let us now study how this property is affected if the population is not a single random mating unit. For simplicity, we shall consider the simplest case where migration occurs between two populations. Let N, and N, be the sizes of populations 1 and 2, respectively. We assume that N, and Nzare fairly large and remain constant for all genera- tions. If the two populations exchange M individuals per generation, the pro- portions of migration in populations 1 and 2 are given by m, = M/N, and m, = M/N,, respectively. Although mathematically m, and m, can take any value between 0 and 1, we confine our discussion to the case of (m,f m2)/25 0.5, since in nature the mean of m, and m, are unlikely to be larger than 0.5. For example, m, = m, = 0.5 corresponds to the case of random mating in the whole population. Let P1,Q1,R1,and S, be the frequencies of the four gamete types AB, Ab, aB, and ab in a generation in population 1, respectively. We assume that migration occurs after mating. Then, after migration they become (1 - ml)P1 f mlP2, (1 - m,) Q1 + mlQ2, (1 - m,) R, + mlR2 and (1 - ml)S1 + mlSz, respectively, where P,, Q,, R,, and S, are the gamete frequencies in population 2. The gamete frequencies in the next generation, after meiosis, are given by Pi = (1 - m,) (P, - rDll) + ml(Pz - rDZz) Q:= (1 - m,) (Q1+ rDll) + ml(Qz+ rD2d (6) R; = (1 - m,) (R, + rDI1)+ m, (R, + rDzz) S’,= (1 - m,) (S, - rOl1) f m,(S, - rDzz) where Dll = PIS, - Q,R,, D,,= P,S2 - Q,R,, and r is the recombination value between the A and B loci. Similar recurrence equations may be obtained also for the gamete frequencies in population 2. If random mating occurs in each pop- lation, the linkage disequilibria for populations 1 and 2 in the next generation are given by D;, = P

D;,= (1 - m,) (1 - m, - r)Dl, + m,(l - m1)D1,3- m,(ml - r)Dzz D:,= [ (1 - m,) (2mz- r) - m,r]Dll + (1 - m, - mz + 2m1m2)D,, + [(l - m,) (2m1- r> - mlrla, D’,,=m, (m,- I) D,, + m, (1 - m,) D,, + (1 - m,) (1 - m, - r)D,, 216 M. NE1 AND W.-H. LI where

In matrix notation the above equation may be written as D(t -I- 1) = MD(t) (7) where D (t) is the column vector of D,,(t) , D,,(t) , and D,, (t), in which D,j (t) denotes the value of Dij in the t-th generation, and (1 -ml)(l--ml-r) m,(l - m,) m, (m,- r) M = (1 - m,) (2m, - r) - m,r 1 - m, - m, + 2qm, (1 - m,) (2m, - r) - mlr iIm, (m, - r) m,(l- m2) (1 - m,) (1 - m, - r) 1 The eigenvalues for M are A, = 1 - I, h, = (1 - ml - m,)’, and h3 = (1 - I) (1 - m, - m,), and the corresponding eigenvectors are

VI= I]v,= E] v3= [mzrj in which a = m,[ (m,+ 2mz - m1mz - m: - 1)I + ml(2 - 3mi - 3m2) + mi (mi+ m,) ‘1 b = [ (I - mlr - 2m, + 2m1m,+ m:)a + m, (I - ml>c]/[ml (1 - ml)1 c =m2[(em, + m, - m,ma- m:- 1)r + m,(2 - 3ml - 3m2) + m,(m1+ m,)’]. Using the standard method of matrix algebra, it can be shown that the general formulas for D,,(t) , D,,(1) and D,,(t) are D,,(t) = llh; + a1,h; + mlI,h~ Dlz(t) = 21,Xt, + bI,ht, + (ml - mz)13Xt (8) D,,(t) = llhi i-c1,h: -I- ml13hi where I, = C [(c - b)Dll(0) + aD,,(O) - aD,,(O) - ml “z { (2c - b)D,,(O) + (a- C)D,,(O) - (2a - b)D,,(O))l I, = c [Dll(O> - DlZ(0) +D22(0)1 I, = c [ (2c - b)D,,(O) + (a - C)D12(0) - (ea - b)D22(0)I/(m1+ m,> in which C = (a- b + c)-l. Note that if the initial gamete frequencies are the same for the two populations, I, becomes 0. Formula (8) shows that the asymptotic rate per generation of approach to the equilibrium values of D’s, i.e., D,,= 0 and D,,= 0, is given either by 1 - hl = r or 1 - h, = 2(m, + m,) - (m,+ m,),, whichever is smaller. In a single random mating population the rate of approach to linkage equilibrium is known to be I per generation, that is, D(t) = (1 - I)~D(O).Thus, if 2(m1 + m,)- (ml f m,), > r, the asymptotic rate in subdivided populations is the same as that in a single random mating population. In particular, if m, -I- m, = 1, then hz = LINKAGE DISEQUILIBRIUM 21 7 h3 = 0, and Dll and D,,decrease at the rate of r per generation from the begin- ning. On the other hand, if 2(m, + m,) - (m,+ m,) < r and Dll (0) + D,,(0) - Dl,(0) # 0, then the asymptotic rate is 2(ml + m,) - (m,+ m,),. Thus, the approach to equilibrium is retarded in subdivided populations. This is because if the initial gamete frequencies in the two populations are not the same, migra- tion brings non-equilibrium gametes into one population from the other. In a single random mating population linkage disequilibrium for neutral loci always decreases as generation proceeds, but in subdivided populations it may increase temporarily. Let us examine this problem in some detail. For simplicity, we assume that DIl(O) = D,,(0) = 0, Dl2(0)# 0, and m, = m, = m. In this case (8) reduces to

where h1 = 1 -rand h, = (1 -2m),. If h1 =A,, i.e. r = 4m(l - m),the above formulae degenerate. By using 1'Hospital's rule, however, we obtain

D,,(t) = m(1 - m)D,z(O)t(l - r)t-l (10) Dl,(t) =D12(0){(1-r) +2m(I -m)t}(l -r)t-l Formulae (9) and (10) indicate that the absolute values of Dll (t) and D2,(t) are 0 for t = 0 and first increase as t increases except for m = 0.5 and m = 0, and after reaching a maximum, they decline and eventually become 0. Thus, if two isolated populations start to have gene exchange by migration, the absolute value of linkage disequilibrium in a population may temporarily increase as generation proceeds even for neutral loci. The generation (t,) at which the maximum or minimum linkage disequilibrium is attained is given by

lnh, h, t, = In -/ln - , lnh, h, t,= - l/ln(l - r), hl = A, For example, if r is 0.02, t, is 34.6 for m = 0.01, 12.3 for m = 0.05, and 7.3 for m = 0.1. Therefore, linkage disequilibrium may increase for a considerable number of generations. In the present model no selection occurs, so that Dll and D2, eventually become 0. However, if overdominant selection operates without epistasis but with different equilibrium gene frequencies in the two populations, migration may produce non-zero equilibrium values of D,, and D,,,as will be discussed later. 218 M. NE1 AND W.-H. LI

DISCUSSION In a computer simulation study FRANKLINand LEWONTIN(1970) have sug- gested that if closely linked overdominant loci interact multiplicatively with respect to fitness, strong linkage disequilibria may be developed. A similar sug- gestion has been made by WILLS,CRENSHAW and VITALE (1970) in a study of truncation selection, though the prevalence of truncation selection in nature is questionable (CROW1970; NEI 1971). However, a survey of linkage disequilibria among enzyme loci in Drosophila, conducted by MUKAI,METTLER and CHIGUSA (1971), has revealed no significant disequilibria. These authors have taken this as a suggestion for neutrality of enzymic genes. Caution, however, should be made in this type of study, since natural popula- tions are often subdivided into local mating groups. (MUKAI,METTLER and CHIGUSA[ 1971] apparently sampled their Drosophila genomes from a large randomly mating population.) As we have seen, if the covariance between gene frequencies is not 0, the total linkage disequilibrium in a subdivided population may be 0 even if the disequilibria in subpopulations are not 0, or conversely, it may not be 0 even if the disequilibria in all subpopulations are 0. In natural populations there is often a cline of gene frequency due to or due to the spreading of a new mutant gene from one place to the other. If two loci show such a cline of gene frequency, DTin these loci would deviate from 0 even if there is no epistasis so that Di = 0. As an example, let us consider the frequencies of the P and Duffy blood group genes (PI and F," alleles, respectively) in 37 villages of the Yanomama Indians, which were studied by GERSHOWITZet al. (1972). Although there are no clear clines of gene frequencies in these villages, the P, and F; gene frequencies are correlated with each other. Thus, we obtain Cov(P,, F;) = -0.0062. In the present case information on DTand I, is not available. However, if we assume D = 0, then D, = -0.0062. The maximum possible linkage disequilibrium with the gene frequencies in this population is 0.165 (cf. LEWONTIN1964). The ratio of -0.0062 to 0.165 is therefore -0.0375. The absolute value of this ratio is com- parable to those obtained and found to be statistically significant by SINNOCK and SING (1972) for the same loci in a Michigan population. It is worth noting that the correlation between the gene frequencies for two loci may be written as

I(pA, PB) = (DT - 0)/ vp~vp, (11) where Vp, and Vp, are the variances of gene frequencies of pA and pB, respec- tively. Thus, even if DT= 0, r(pA,pB) may not be 0. In our study of the effect of migration on linkage disequilibrium we have assumed no selection, so that linkage disequilibrium eventually disappears. If there is epistatic selection, this is no longer true, and linkage disequilibrium may persist indefinitely. Selection complicates the mathematical model considerably, but it seems that the essential features of the effect of migration remain the same, as long as genotype fitnesses are the same for different populations. LINKAGE DISEQUILIBRIUM 219 However, if selection is different in different populations, a further compli- cation may occur. Suppose that there is overdominant selection for two loci with- out epistasis in two different populations but the equilibrium gene frequencies for the two populations are different. In the absence of migration there will be no linkage disequilibrium at steady state in each population. If, however, migra- tion occurs between the two populations, non-equilibrium gametes are introduced from one population to the other in each generation, so that permanent linkage disequilibria may be produced without epistasis. We believe that this is a new feature of linkage disequilibrium. We thank DR. JAMESCROW and a referee of this paper for their valuable suggestions on the first draft.

LITERATURE CITED

CROW,J. F., 1970 Genetic loads and the cost of natural selection. pp. 128-177. In: Mathematical Topics in Population Genetics. Edited by K. KOJIMA.Springer Verlag, Berlin. FRANKLIN,I. and R. C. LEWONTIN,1970 Is the gene the unit of selection? Genetics 65: 707- 734. GERSHOWITZ,H., M. LAYRISSE,Z. LAYRISSE,J. V. NEEL,N. CHAGNONand M. AYRES,1972 The genetic structure of a tribal population, the Yanomama Indians. 11. Eleven blood-group systems and the AJ3H-Le secretor traits. Ann. Hum. Genet. 35: 261-269. HILL,W. C. and A. ROBERTSON,1968 Linkage disequilibrium in finite populations. Theor. Appl. Genetics (Der Zuchter) 38: 226-231. KIMURA,M., 1956 A model of genetic system which leads to closer linkage by natural selection. 10: 278-287. LEWONTIN,R. C., 1964 The interaction of selection and linkage. I. General consideration of heterotic models. Genet. 49: 49-67. LEWONTIN,R. C. and K. KOJIMA,1960 The evolutionary dynamics of complex polymorphisms. Evolution 14: 458-472. MUKAI,T., L. E. METTLERand S. I. CHIGUSA,1971 Linkage disequilibrium in a local popula- tion of Drosophila melanogaster. Proc. Nat. Acad. Sci. U.S. 68: 1065-1069. NEI, M., 1971 Fertility excess necessary for gene substitution in regulated populations. Genetics 68: 169-184. OHTA,T. and M. KIMURA,1969a Linkage disequilibrium due to random . Genet. Res. 13: 47-55. -, 1969b Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics 63: 229-238. SINNOCK,P. and C. F. SING, 1972 Analysis of multilocus genetic systems in Tecumseh, Michi- gan. 11. Consideration of the correlation between nonalleles in gametes. Amer. J. Human Genet. 24: 393-415. SVED,J., 1968 The stability of linked systems with a . Genetics 59: 543-563. WILLS,C., J. CRENSHAWand J. VITALE,1970 A computer model allowing maintenance of large amounts of genetic variability in mendelian populations. I. Assumptions and results for large populations. Genetics fi4: 107-123. WRIGHT,S., 1952 The genetics of quantitative variability. pp. 5-41. In: Quantitative Znheri- tance. Edited by E. C. R. REEVEand C. H. WADDINcroN. Her Majesty’s Stationery Office, London. Corresponding Editor: R. LEWONTIN