THEORETICAL POPULATION OF REPEATED FORMING A MULTIGENE FAMILY *

TOMOKO OHTA National Institute of Genetics, Mishima, 411 Japan Manuscript received September 23, 1977 Revised copy received December 19, 1977

ABSTRACT The evolution of repeated genes forming a multigene family in a finite population is studied with special reference to the probability of identity, i.e., the identity probability of two gene units chosen from the gene family. This quantity is called clonality and is defined as the sum of squares of the frequencies of gene lineages in the family. The multigene family is under- going continuous unequal somatic crossing over, ordinary interchromosomal crossing over, and random frequency drift. Two measures of clonality are used: clonality within one and that between two different . The equilibrium properties of the means, the variances and the covariance of the two measures of clonality are investigated by using the diffusion equation method under the assumption of constant number of gene units in the multigene family. Some models of natural selection based on clonality are considered. The possible significance of the variance and covari- ance of clonality among the chromosomes on the adaptive differentiation of gene families such as those producing antibodies is discussed.

INrecent years, much attention has been paid to the problem of evolution of repeated genes. Although how repeated genes emerged and what role they play in evolution are far from being elucidated, their universal existence in the of higher organisms is well demonstrated (see, Proc. Cold Spring Harbor Symp. 1974). HOOD,CAMPBELL and ELGIN (1975) have recently classified repeated genes into three classes: simple-sequence multigene families (highly repeated sequences), multiplicational multigene families (genes for ribosomal RNA, transfer RNA and histone) and informational multigene families (genes for immunoglobulins). They reviewed and discussed the possible evolutionary mechanism and significance of these classes of repeated gene families, and they particularly emphasized the possible importance of the last class for organizing enormous amounts of genetic information to cope with the complexity of higher organisms. There are two remarkable characteristics of multigene families (HOOD, CAMPBELLand ELGIN1975; TARTOF1975). They exhibit contraction-expansion of gene number in a family, as well as coincidental evolution during the course of evolution. The latter represents the process in which parallel or simultaneous

* Contribution No. 1182 from the National Institute of Genetics, Mishiia, 411, Japan.

Genetics 88: 845-861 April, 1978. 846 T. OHTA fixation occurs in many members of the gene units in the same family of a species. The model of provides one of the most satis- factory explanations for this process (SMITH1974; BLACKand GIBSON1974). Here it is assumed that unequal crossing over is continually occurring in a germ- line, which is responsible for the horizontal expansion or contraction of mutant gene number within a chromosome. To put it in another way, unequal crossing over increases the homogeneity among the members of gene units in a family through duplication and , similar to random fixation of mutant by random drift. I have investigated the population consequences of a multigene family by incorporating the above knowledge (OHTA1977,1978). In other words, I studied the evolutionary mechanism of repeated genes at the population level. Partic- ularly in view of their possible relationship with “polygenes” (WRIGHT1977, p. 374), it is desirable to investigate ihe evolutionary mechanism of repeated genes by taking into account the above recent findings. In my study, I started from a simple situation in which the number of gene units in a family is kept constant. Then the probability of gene identity was studied (OHTA1977, 1978). This is the quantity that is used to represent the level of homogeneity among members of the family. Following SMITH(1974), I shall call this probability “clonality.” It is defined as the sum of squares of the frequencies of gene lineages within the multigene family. This quantity is equivalent to homozygosity in population genetics. In the present paper, further analyses on the clonality of repeated genes will be reported. In particular, the derivation of the variance and covariance of clonality among chromosomes in a finite population will be pre- sented. The possible significance 01 such quantities on the adaptive differentia- tion of repeated genes will be discussed.

BASIC THEORY Let us assume a random-mating population with effective size Ne.A multigene family is located on a particular chromosome and continuously undergoes un- equal somatic crossing over, ordinary interchromosomal crossing over, mutation and random genetic drift. Let n be the number of gene units in the family and let it be constant throughout. Let U be the mutation rate per gene unit per generation, so that there are nu per family per generation. We assume here that whenever a mutation appears, it represents a new type of gene; i.e., the so-called infinite- model of KIMURAand CROW(1964) is used. After unequal crossing over, the duplication-deletion of gene units in a family occurs. One cycle of unequal crossing over is defined as two successive unequal crossovers of duplication and deletion of one gene unit (OHTA1976; PERELSON and BELL1977). I assume that the continuous process of unequal crossing over in unit time may be equated to an effectiue number of cycles of unequal cross- overs. This assumption seems to be satisfied under fairly general situations in which the number of duplicated or deleted gene units is not restricted to one, but is some kind of random variable ( OHTA1978). POPULATION GENETICS OF A MULTIGENE FAMILY 847

Let K be the effective number of cycles of unequal crossovers per family per generation. Further, let p be the rate of interchromosomal crossover per family per generation. It is assumed that the interchromosomal crossing over is not accompanied by the duplication or deletion of gene units. The clonality of a gene family may be defined at two different levels; the clonality of a gene family on one chromosome and that between two homologous families on different chromosomes randomly chosen from the population. Let Xz,k be the frequency of the k-th gene lineage in the i-th gene family type. Here a gene lineage means a group of genes genetically identical and a family type also means a genetically identical gene family. Then the two measures of clonality may be written as follows an ' where c, is the clonality of the i-th family type, czjis the clonality between the i-th and the j-th family types and summation is over all k. In the following, I shall present the equilibrium properties of clonality of the family under the pressure of mutation, random sampling drift and crossing over. In my previous report ( OHTA 1978), the mean value of clonality was investi- gated. In the present paper, the variance and the covariance of clonality are studied by using the diffusion-equation method. (i) Unequal crossing ouer It was shown that the process of unequal crossing over is a stochastic one anal- ogous to that of random sampling of in finite populations and that one cycle of unequal crossing over is equivalent to n2/4Negenerations of random gene frequency drift, where Neis the effective population size (OHTA 1976; PERELSON and BELL 1977). Therefore, the theory of the diffusion model for the moments of gene frequencies in finite populations may be applied. OHTA and KIMURA (1971) have shown that the following equation holds for the rate of change of the moments of allelic frequencies d --{f(xi, xs, . . . =E[L{f(xi, . . . xn)}I dt xn)} xz) (2) where E denotes the expectation, x, is the frequency of the ith allele in the popu- lation, f is a continuous function of x, values and L is the differential operator of the Kolmogorov backward equation in the following form

in which M, V and W designate, respectively, the mean, the variance and the covariance in the rate of change of xi values that appear as subscripts. Equation (2) is equivalent to the one used by LI and NEI (1975) for getting the variance of heterozygosity and genetic, distance. 848 T. OHTA In om case, the mean and the variance of change of xi,TCby one cycle of unequal crossing over are given in Table 1 of OHTA (1976). Let Sxi,k be the amount of change of xi,k per K cycles of unequal crossing over (per one generation). Then, the covariance of Sxi,k and Sxi,Z may be similarly obtained. The mean, the vari- ance and the covariance of Sxi,k values become, respectively, after K cycles of unequal crossing over

M6xi,k = 0

VSXi$ = cY1i.k (1 -xi,k) (4) and in which CY = 2x/n2. By using formulas (2) and (4),the change of moments of xi,k in one generation is calculated. The details of the calculation are not given here; only the resulting equations will be presented later in matrix notation.

(ii) Random sampling of gametes The above formula (2) can again be applied for evaluating the effect of ran- dom drift. This time, the dynamics of the frequencies of the various types of a gene family within the population must be analyzed. Let pi be the frequency of the ith type of the gene family in the population. Then the mean clonality of a chromosome (CO)and that between the two chromosomes (CI) in the population become, respectively cl?= + cipi and (5) where summation is over all i and 1. In applying equation (2) for generating the moments of clonality, the follolwing standard formulas in population genetics for the mean, the variance and the colvariance of the change of pi values in one generation are used (e.g., CROWand KIMURA1970, p. 330). Ms,; = 0 and - -pip, WS,, 6,. - -- 3 2N, Note that family types are treated as alleles of a single in using the formula (6). Based on formulas (2) and (6), the change of moments of clonality per one generation by random drift is evaluated. (iii) Interchromosomal crossing ouer It is assumed that the pairing of n gene units of the family is always exact at , i.e., there is neither duplication nor deletion of gene units at inter- chromosomal crossing over. It is further assumed that the gene lineages are POPULATION GENETICS OF A MULTIGENE FAMILY 849 arranged randomly along the chromosome, so that the clonality does not depend on the distance between the units on the chromosome. This condition is not generally satisfied; however, when the allowed latitude of deletion or duplication at unequal crossing over is 10-15% of the total number of gene units in the family, random arrangements of gene lineages are expected (OHTA 1978). Now consider the situation where one crossover takes place between the ith and jth family types. The crossover point divides the gene family into two parts, q and 1 - q, as shown in Figure 1. Then, after the crossover, ci,cj and cij change into cz’, cj’ and cij’ according to the following formulas:

Cif =@Cif (l-qg)”j+2q(l-q)cij

Cj’ (1 - q)2~i+ q2cj + 2q(l -~)cij (7) and Cij’=q(l-q) (~i+~j)+ {q2+ (1-qq)’)cij under the assumption of the random arrangement of gene lineages. If q is uni- formly distributed between 0 and 1, E(q) = s,E(q2) = 1/3, and so forth. Hence the expected change of the moments of ci value may be obtained. However, one needs special caution for calculating the expected change of moments by using formula (7). For cij, there are 2N,(2Ne - 1)/2 pairs of chromosomes in the population, whereas, for ci and cj, there are only 2N, chromosomes that contribute to the mean value. Therefore the change in cij contributes to changing the moments only 1/(2N, - 1) as much as compared with the case of ci and cg. Thus the change of cif by formula (7) is multiplied by 1/(2N, - 1) =: 1/(2N,) for getting the expected change of CI. Later, the equations for deriving the moments of clonality will be given by assuming a constant rate of interchromosomal crossing over per family per generation. (iv) Mutation Since all mutations are assumed to be unique, the clonality is reduced by all new mutations. Let U be the mutation rate per gene unit per generation. Then the clonality after mutation (primed letters) is

CO’ = (1 - U)‘CO

CI’ = (1 - U)“1 and CO?’= (1 - U)4c02 and so forth.

ji3 FIGURE1.-Diagram illustrating the model of interchromosomal crossing over. 850 T. OHTA

RESULTS In order to generate the variance and covariance of clonality, let us define the following moments. Second moments: CO = E{? xC,~'} = 7 cipi 2.

and

Fo = E(Co*) = E{ (2 CGJA)']

Fr =E(CoCr) =E{ (E czpt) (2 I: c,jp,pj)} 21 (12) Fii E(C1') E{ (E 2 c%jp2.~3)'} ai Fiii = E{

00' = Eo - F,

ul =El- F, (13) and 02' = E,, - FiI . In the following analyses, I assume that the parameters, a, /3,1/Ne and U are much smaller than unity and that their products are negligibly small. The equations for obtaining the second moments (CO and C,) are given in OHTA (1978). Let M, S, K and R be the coefficients of the equations for deriving the moments from one generation to the next corresponding to the mutation, sampl- ing, unequal somatic crossing over and interchromosomal crossing over, respec- POPULATION GENETICS OF A MULTICENE FAMILY 85 1 tively, in matrix notation. Then the vector C = [E:] may be obtained by multiplying the following matrix from one generation to the next. CJ = G, C -I- A, (14) in which G, is the product of M,, S,, K, and Rc and A, = [ i],with M, = (1 - u)~

s, =

Kc= ['"0 OJ1 and

R, =

The order of multiplying the above matrices for getting G, should follow the sequence of the corresponding events of the model The equilibrium value of C may be obtained by solving equation (14) with C' = C. The exact solution is very complicated. However, when all parameters l/Ne, a:, U and p are much smaller than 1, the elements of C can be shown to be as follows (OHTA 1978) A a: CO = - - P 4Nev + -l a: 1+4N,v and e- to '-1+4Nev ' Now let D = p:] . Then the resulting equation for deriving D becomes, DII D' = G,D + A, (17) where G, is the product of the following matrices, M,, S,, IC, and R, M,= (1 - v)~ rl 0 0- 1 1 0 852 T. OHTA

1-33a 0

IC, = 0 1-3a

0 0 and

-P [l -f 3 O1 R,= __ I----PP -P 16ie 6 6Ne 6 0 -__P 1-- P 12N, 12Ne ' J and A, = ["I

We now proceed to obtain the fourth moments. Let

. Then E in the next generation is

E/ = GEE + A, where 4aDo + 2aCo A,= [23aD;,',p" 1 and G, is again the product of the following matrices

ME= (1 - u)~

rl 0 0 1 1 1 -- 0 2Ne 1 1 0 1 -- 2Ne I-

1 - 6a 0 O KE=[ 0 1-33a 0 0 0 1-2a POPULATION GENETICS OF A MULTIGENE FAMILY a53 and r 1

Next, let F = . For the change of F due to crossing over and mutation, the equation may be obtained by using the expected change of the second moments, COand C,. However, the effect of sampling on F is more complicated and the direct calculation based on the equation (12) is necessary. Let RK, be the coefficient of equations for obtaining F corresponding to unequal and inter- chromosomal crossovers. Then it can be shown that the resulting equations become in matrix notation

M, (1 - U)*

4 1-L 0 0 0 2Ne

0

s, = 2 6 2 0 - I-- 2Ne 2Ne N, 3 0 0 0 I-- 2Nt! and

h P2 (l-a)2(l--) P2 2 (1-a) -(P -E) - 0 3 3 3 9

where it is assumed that the unequal crossover occurs first, followed by the inter- a54 T. OHTA chromosomal crossover in deriving RK,. Then F is generated by the following equation F' = GFF + A, (23) where G, is again the product of M,, S, and RK, and 4 becomes

2a(l-@)Co+a?+- 2Ne Eo 1 EI aCI - A, = + 1 2E;;:I . 1 I Since we assume that /3 and - are much smaller than unity and that their Ne product is negligible, F,, becomes independent of the other components of F. At equilibrium FII,' = FII, and, therefore, we have

By using the above equations, the variance and the covariance of clonality may be obtained. At equilibrium among mutation, random drift and crossing over, all moments do not change; therefore. D,:= D,,,E,: = E,, El = EI, and so forth. In the next section. some numerical examples of the mean and variance of clonality at equilibrium will be given. The above theory may be checked by evaluating some special cases. I have obtained the variance of clonality (U,?)at equilibrium when there is no inter- chromosomal crossing over (,Q = 0) and when oiher parameters U, a and l/Ne < < 1. In such cases,

AA 16N,u 2; E, - F, = (25 (l+e) (2+0) (3+0) ( 1+4N&8Ne~) 1 where 6' = 2u/a When Ne 3 x, this formula reduces to

which is exactly the same as STEWART'Sformula for the variance of homozygosity of finite populations (STEWART1976). From equation (25), it can be seen that U: is somewhat smaller than unequal crossing over and mutation would allow due to the finite population size. Note that STEWART'Sformula gives the variance of homozygosity at equilibrium under mutation and random sampling drift, which is equivalent to the maximum variance when mutation and unequal cross- ing over balance in our case. In fact, E,, - &,,. reduces to formula (26), irrespec- POPULATION GENETICS OF A MULTIGENE FAMILY 855 tive of the population size, when /3 = 0. In addition, the variance of the mean clonality of the finite population may be obtained by k, - COz.

NUMERICAL EXAMPLES AND SIMULATION STUDIES In this section, numerical examples are given for some interesting cases. The sequence of events in each generation is: unequal crossing over + interchromo- somal crossing over + mutation + random sampling of gametes. The exact results are slightly different if the order of the events differs. When all parameters, U, a, /3 and l/Ne are much smaller than unity and their products are negligibly small, the results become independent of the sequence of events and depend only on the ratios of the four parameters such as U/@,N,u, and NeP. Therefore, the results can be readily extended to the actual situations in which Ne may be extremely large with other parameters extremely small. The results are obtained by computer solutions of equations (14), (17) , (20), and (23) at equilibrium. Several examples are given in Table 1. The parameters are; 2Ne = lo3,n = 10 and U = IO-* with 01 and /3 as given in the table. Therefore, 2N,v = 0.1, 2v/a(= 0) = 0.02 - 0.1 and 2N,P = 0 - 10. From the table, it can be seen that the covariance is generally positive and that uZ2is larger than uo2.In fact, many uZ2values in the table are larger than the values, which the formula (26) predict, in magnitude relative to the mean value (CI). Note that formula (26) gives uO2 in equilibrium between unequal crossing over and mutation without random drift and interchromosomal crossing over.

TABLE 1

Numerical examples of the means, the uariances and the couariance of clonality (parameters are 2N, = IOJ and v = 10-4)

2 B U2 U cI 0 1 u2

0.001 0.001 0.797 0.663 0.033 0.020 0.063

0.002 0.001 0.887 0.738 0.027 0.017 0.083

0.003 0.001 0.922 0.767 0.021 0.014 0.091

0.004 0.001 0.940 0.780 0.018 0.012 0.097

0.005 0.001 0.952 0.789 0.015 0.010 0.099

0.004 0.0 0.953 0.791 0.013 0.008 0.098

0.004 0.0001 0.951 0.790 0.014 0.008 0.097

0.004 0.0005 0.946 0.785 0.015 0.010 0.097

0.004 0.005 0.894 0.742 0.032 0.024 0.092

0.004 0.01 0.842 0.699 0.042 0.034 0.086 856 T. OHTA In order to check the theory developed here, simulation studies were carried out. The simulated population consists of 50 chromosomes (Ne= 25) and each chromosome contains 10 linearly arranged numbers (gene units). One genera- tion of the experiment consists of unequal crossing over, interchromosomal crossing over, mutation and sampling in this order. Each experiment started from a population that was stored in the computer. One cycle of unequal crossing over was done per family per generation; therefore a = 2/nz = 0.02. First, a uniform random number sampled a chromosome. Then, two random numbers determine two gene units for duplication and deletion, and one cycle of unequal crossing over is completed. This process is repeated 2N, times. Four levels of interchromo. soma1 crossover were tried: 0, 1, 2 and 3 interchromosomal crossovers per gen- eration; therefore /3 = 0, 1/25, 2/25 and 3/25. At interchromosomal crossing over, two chromosomes were randomly chosen, and then a random number determined the point of exchange. Sampling was done again by using uniform random numbers. There were two mutations in each generation; therefore U = 2/(2N,n) = 0.004. By mutation, a randomly chosen gene unit (number) of a randomly chosen chromosome was changed into a new not pre-existing number. All experiments started from the same original population, which was stored in the computer at the 250th generation, starting from a homogeneous population

(CO= Cr = 0) with parameters; J3 = 0, (Y = 0.02, U = 0.004 and 2Ne = 50. Each

TABLE 2

Comparison of the observed and the expected values of the means, the variances and the covariance of clonality (parameters are Ne = 25, a = 0.02, n = 10 and v = 0.004)

2 B U2 cI 0 u2

rep 1 0.797 0.628 0.031 0.015 0.062 observed {rep 0 0.809 0.670 0.037 0.023 0.067 expected 0.719 0.502 0.036 0.011 0.070

rep 1 0.686 0.438 0.039 0.015 0.071 observed (rep 1/25 0.563 0.313 0.034 0.011 0.049 expected 0.634 0.443 0.044 0.019 0.058

rep 1 0.625 0.367 0.041 0.011 0.061 observed rep { 0.820 2/2 5 0.737 0.033 0.021 0.044 expected 0.567 0.396 0.045 0.023 0.049

rep 1 0.574 0.389 0.033 0.013 0.036 observed irep 3/25 0.719 0.601 0.045 0.027 0.052 expected 0.513 0.359 0.043 0.025 0.043 POPULATION GENETICS OF A MULTIGENE FAMILY 85 7 experiment was continued 300 generations, and the mean and the variances of clonality were obtained. In each case, two replications were done. The results are given. in Table 2, together with the expected values numerically calculated. The observed values are the averages of the 51-300th generations, so that they represent equilibrium values. From the table, it can be seen that the agreement between the observed and the expected values is satisfactory, although COand CI seem to be slightly underestimated. This is probably attributed to the approxi- mation procedure used for deriving the moments.

DISCUSSION The evolution of repeated genes forming a multigene family was investigated from the standpoint of population genetics by using a simple model assuming unequal crossing over between sister (intrachromosomal crossing over). There are two forces working at different levels that lead the gene family toward homogeneity; random sampling of gametes due to finite population size and duplication-deletion of gene units due to unequal crossing over. The former works at the level of the population and the latter at the level of the multigene family itself. Mutation works in the opposite direction to create genetic diversity, and therefore there will be an equilibrium among these forces. Interchromosomal crossing over also contributes to increasing gene diversity within the family, to some extent. Homogeneity of genes in a family is measured by the probability of gene identity (clonality). This is the probability that two randomly chosen gene units within a family in one chromosome belong to the same lineage. I shall denote by CO the mean clonality in one chromosome and by CI that of the homologous gene family between two different chromosomes. In other words, this is the probability that two genes in homologous families each taken from the two homologous chromosomes, are identical by descent. Then, it can be seen from formula (16) that, at equilibrium. CI may be obtained by multiplying the factor, 1/( 1 + 4N,u), by Co. This factor is independent of the crossover param- eters, both intrachromosomal, a, and interchromosomal, p, crossing over. It is exactly the same as the expected probability of gene identity between two indi- viduals at a locus in which mutation and random sampling drift balance each other (KIMURAand CROW1964). Of course, intrachromosomal unequal crossing over and ordinary interchromosomal crossing over control the magnitude of CO and CI,and only their ratio becomes independent of the crossovers. The result may be applicable to the following observation. WELLAUERet al. (1976) investigated in detail the length heterogeneity of the spacer region of ribosomal RNA genes in Xenopus Zeauis and found that the heterogeneity is greater among genes taken from different individuals than those taken from one individual, implying CO> CI where CO and CIrepresent the two measures of clonality of ribosomal RNA genes with respect to the length of the spacer. In this case, the occurrence of the length heterogeneity is attributed to unequal crossing over within the spacer region. Hence, the rate of occurrence of a new 858 T. OHTA type of spacer with respect to its length should be fairly high. Note that this rate corresponds to the mutation rate, U, in our model. The present study also clarifies how much variation of clonality exists among chromosomes in a finite population when mutations are selectively neutral. The application of the results awaits future research, such as that by WELLAUER et d. (1976) mentioned above. Another application of the results is to predict the theoretical consequence of natural selection based on clonality (OHTA1977, 1978). The variance and covariance of clonality are crucial for predicting the effectiveness of natural selection if it is based on clonality as shown below. Probably the simplest model of selection based on clonality might be the following, in which selection operates to increase the genetic diversity of the gene family. We assume that natural selection acts in such a way that the fitness of the family with clonality c, is Wi = 1 - slci (27) where s is a selection coefficient, which we assume to be positive. Then we can investigate how the mean value of clonality (COand C,) changes by selection. Since COis expressed by CO= z cap2,the amount of change of COdue to selection a may be obtained from the frequency change of p2 in the population. The change of p, by selection in one generation is

1 - SC( Api = pi' - pi = 1 -sco pi - pi

-where pi' is the frequency of the ith type of a gene family after selection, and W is the mean fitness of the population, which is equal to 1 - sCo. Therefore, the change of COby selection becomes

Similarly, the change of CI is

The above model of selection takes into account only the additive effect of the clonality of individual chromosomes. but it would be more desirable if we can incorporate dominance. For the purpose of computing the genetic variation of the gene family of a zygote, the clonality between the two different chromosomes should also be considered, since in general c,~# (c, 4- ci)/2, where c,j is the clonality of the gene family between the ith and the jth type of chromosomes. The fitness of the zygote of the ith and the jth type of chromosome may then be defined as follows POPULATION GENETICS OF A BIULTIGENE FAMILY 859

In this model, the clonality is decreased through selection of both ci and cij. It can be shown that the change per generation of COand CI by selection becomes as follows:

Thus the variance and covariance of c1 and c,, are crucial for determining the effect of selection. The above relationship [formulas (28), (29), (31) and (32)] is analogous to the fundamental theorem of natural selection by FISHER(1 930), since the rate of change of the mean fitness is directly proportional to AC, and ACI. However, in this theorem, the classical single-locus model is treated. Despite the simplicity of the above models, the exact derivation of moments seems to be difficult when selection is involved. By selection, the variance and covariance of clonality may be decreased. However, the amount of reduction is not as large, as my previous simulation studies indicated (OHTA 1978). To clarify the amount of reduction is an important subject for €uture research. A still more interesting type of selection is negative selection. In this case, the genetic diversity of a multigene family is selected against. This would apply to some classes of repeated genes with very uniform members, such as histone and ribosomal RNA genes. A simple model for fitness as a function of clonality is

Wi=1-s(l-c~) . (33) Then the change of COand CI due to selection becomes

SUO2 AC, = - (34) W and

Thus, selection may be quite effective for increasing the clonality (or decreasing the genetic diversity) of the gene family. In other words, natural selection may act efficiently to eliminate deleterious mutations from the gene family. In the case of an informational multigene family such as the genes for immunoglobulins, both positive and negative selection may be operating on the family simultaneously. Deleterious mutations that somehow disturb the basic immunoglobulin activity should be selected against; whereas, favorable muta- tions at the hypervariable sites of the variable chain that are responsible for the diversity of antibody combining ability (see CAPRAand EDMUNDSON1977 for 860 T. OHTA review) should be selected for. An unsettled problem of antibody diversity (germ line theory us. somatic mutation theory) makes the interpretation very com- plicated. However, the present theory may be applicable even if only relatively few genes exist for a particular type of immunoglobulin (such as subgroups of K or type) in a genome, as long as they form a gene family and are undergoing unequal crossing over. In addition, we can predict, based on the theory, that the gene pool of the family of a particular type of immunoglobulin gene may become much more diverse in a large population, such as human or mouse, even if only few gene copies are present in one genome. This is because CI may get much smaller than COwith positive selection in large populations. Another interesting problem is how a family of repeated genes differentiate from a common ancestral family. The genetic differentiation may again be measured by the probability of gene identity between two genes chosen from the two homologous gene families, each taken from different populations descended from a common ancestral population. I shall designate it as cIIii, in which subscripts i and i represent the ith type of the family in the first popula- tion and the jth type in the second population, respectively. Then the mean value (CII) decreases by the rate 221 each generation through mutation, just as COand CI do.

ACII = -2uCII . (36) However, neither crossing over nor sampling of gametes would change the mean value, but selection will affect the rate of differentiation. Let us consider the previous model where Wc = 1 - sci. The change of CrI due to selection in one generation is now a function of the covariance (us) of ci and cIIijin the following form. -2sa, ACII = - w (37) where the same selection is assumed to be working in both populations being compared. Now, by comparing formulas (36) and (37), it can be seen that the rate of gene differentiation may be greatly accelerated by selection. This is because the covariance of cz and crIzjis equal to that of ci and czjat the time of splitting of the two populations, and hence it can be fairly large, as the results of the previous section suggest. Negative selection, of course, greatly decelerates the rate of differentiation. Thus, the present model appears to be suitable to investigate the evolution of the immunoglobulin gene family in which rapid differentiation has occurred only in the hypervariable regions (mostly antibody combining sites), while the basic immunoglobulin molecule has been kept unaltered. The adequacy of the preseEt models awaits future investigation, but these models help treat the evolution of repeated genes at the level of populations. Further investigation along this line would be highly desirable. POPULATION GENETICS OF A MULTIGENE FAMILY 861

I thank MOTOOKIMURA for his stimulating discussions and helpful suggestions throughout the course of this work. I also thank A. S. PERELSONand B. S. WEIRfor their valuable comments on the manuscript.

LITERATURE CITED

BLACK,J. A. and D. GIBSON,1974 Neutral evolution and immunoglobulin diversity. Nature 250: 327-328. CAPRA,J. D. and A. B. EDMUNDSON,1977 The antibody combining site. Sci. American 236(1): 50-59. CROW,J. F. and M. KIMURA,1970 An Introduction to Population Genetics Theory. Harper & Row, New York. COLDSPRING HARBOR LABORATORY, 1973 Proc. Cold Spring Harbor Sym. Quant. Biol. 38. FISHER,R. A., 1930 The Genetical Theory of Natural Selection. Clarendon Press, Oxford. HOOD,L., J. H. CAMPBELLand S. C. R. ELGIN, 1975 The organization, expression, and evolu- tion of antibody genes and other multigene families. Ann. Rev. Genet. 9: 305-353. KIMURA,M. and J. F. CROW,1964 The number of alleles that can be maintained in a finite population. Genetics 49: 725-738. LI, W-H. and M. NEI, 1975 Drift variances of heterozygosity and in transient states. Genet. Res. 25: 229-248. OHTA,T., 1976 A simple model for treating the evolution of multigene families. Nature 263: 74-76. -, 1977 Genetic variation in multigene families. Nature 267: 515-517. -, 1978 Theoretical study on genetic variation in multigene families. Genet. Res. (In press). ORTA,T. and M. KEVURA,1971 between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68: 571-580. PERELSON,A. S. and G. I. BELL, 1977 Mathematical models for the evolution of multigene families by unequal crossing over. Nature 265: 304-310. SMITH, G. P., 1974 Unequal crossover and the evolution of multigene families. Cold Spring Harbor Symp. Quant. Biol. 38: 507-513. STEWART,F. M., 1976 Variability in the amount of heterozygosity maintained by neutral mutations. Theor. Pop. Biol. 9: 188-201. TARTOF,K. D., 1975 Redundant genes. Ann. Rev. Genet. 9: 355-385. WELLAUER,P. K., R. H. REEDER,I. B. DAWIDand D. D. BROWN,1976 The arrangement of length heterogeneity in repeating units of amplified and chromosomal ribosomal DNA from Xenopus leavis. J. Mol. Biol. 105: 487-505. WRIGHT,S., 1977 Evolution and the Genetics of Populations. Vol. 3. Experimental Results and Evolutionary Deductions. The Univ. of Chicago Press, Chicago and London. Corresponding editor: B. S. WEIR