Copyright 0 1988 by the Genetics Society of America

Evolution by Duplication and Compensatory Advantageous Mutations

Tomoko Ohta National Institute of Genetics, Mishima, 41 I Japan Manuscript received April 27, 1988 Revised copy accepted July 14, 1988

ABSTRACT Relaxation of selective constraint is thought to playan important role for by , in connection with compensatory advantageous mutant substitutions. Models were inves- tigated by incorporating gene duplicationby unequal crossing over, selection, mutation and random genetic drift into Monte Carlo simulations. Compensatory advantageous mutations were introduced, andsimulations were carried out with andwithout relaxation, when are redundant on chromosomes. Relaxation was introduced by assuming that deleterious mutants have no effect on fitness, so long as one or more genes free of such mutations remain in the array. Compensatory mutations are characterized by the intermediate deleterious stepof their substitutions, and therefore relaxation by gene redundancy is important. Through extensive Monte Carlo simulations, it was found that compensatory mutant substitutions require relaxation in addition to gene duplication, when mutant effects are large. However when mutant effects are small, such that the product of selection coefficient and population size is around unity, evolution by compensatory mutation is enhanced by gene duplication even without relaxation.

T has been customary to suppose that new genes incorporatecompensatory advantageous mutations I evolve if mutations accumulatewhile selective con- into the model when the evolution of new genes is straints are relaxed by gene duplication (OHNO1970; studied. By gene redundancy, the first step of slight KIMURA 1983). This statement is rather vague and deterioration may be accelerated, and therefore gene not quantitative. I have attempted to construct popu- duplication may provide good opportunities for evo- lation genetic models for evolution by gene duplica- lution by compensatory advantageous mutations. In tion (OHTA 1987a,b; 1988a, b).In these models, this report simulation results are presented that show mutations are assumed to be definitely detrimental or the importance of interaction between gene duplica- beneficial, or completely neutral,and interaction tion and compensatory mutant substitutions, and the among unequal crossing over, random drift and nat- relationship of such acceleration to the relaxation of ural selection was investigated. Relaxation of selective selective constraint will be discussed. constraints of redundant genes was not satisfactorily examined in these studies. Althoughexact under- MODEL AND METHOD OF SIMULATIONS standing of relaxation is difficult, it is highly desirable The model is similar to aprevious one (OHTA to know how duplicated genes are tested by natural 1987a, 1988a), except for mutation that is compen- selection. satory. Each generation of Monte Carlo simulations A noteworthy fact concerning the abovediscussion consists of mutation, unequal crossing over, random is that multigene families which were established a sampling and selection. As before, let 7 be the rateof long time ago apparently do not enjoy relaxation of unequal crossing over per gene copy per generation. selective constraints, as can be seen for immunoglob- Adiploid model is adopted here, so thatunequal ulin genes and others(OHTA 1980; GOJOBORIand NEI crossing over is interchromosomal, and therightmost 1984). Relaxation may be observed by acceleration of gene of a gene array pairs with the next-rightmost amino acid substitutions in evolution (GOODMAN gene of another array atmeiosis (OHTA1988a). Note 1976; LI 1985), and seems to be limited to the short that this is nonhomologous pairing when there is a period at duplication. single locus on the chromosome. No lethal mutation Another related observation is the pattern of mo- is assumed, but with constant rate, u, per generation, lecular evolution where aminoacid or nucleotide sub- a deleterious mutation is assumed to occur at one of stitutions are often clustered, indicating slightly dele- the ten sites of a gene. Mutants are marked by minus terious mutant substitutions compensated by others integers, and the integer characterizing a gene de- (OHTA1973). Thisis understandable from our knowl- creases by one at each mutational occurrence of an edge of higher-order structure of proteins or nucleic experiment. Thus, integerof a gene meansan integer- acids (WATSONet al. 1987). Thus it is reasonable to valued indicator variable.

Genetics 120: 841-847 (November, 1988) 842 T. Ohta Compensation between two mutants takes place deleterious mutations was assumed lethal. The posi- either within a gene or between genes of an array. It tive and negative selection are assumed to be multi- occurs as follows. plicative. Within-gene compensation:When a new mutant is In the other model, no relaxation is incorporated, marked by a multiple of -5, all integers of the gene so that Equation 2holds whether or not thereremains with this mutant are madepositive if this gene already a gene free of deleterious mutation. The two models has a mutant markedby a multiple of -5. may be compared to examine the effect of relaxation Between-gene compensation:When a new mutant by generedundancy. The simulated population is is marked by a multiple of -5, all integers of the gene made of 2N gametes, and unequal crossing over, with the mutant aremade positive if another gene of mutation, sampling and selection were carried out as the array already has a mutant marked by a multiple before (OHTA1988a, b). Mutation rate per gene copy of -5. Note that allelic state of only one gene with per generation is u, and positive and negative selection the new mutant changes. Furthermore, for simplicity, formulas (1) and (2) are combined to determine the this positive allelic state is assumed to remain even survival of a sampled individual. Each Monte Carlo when the two compensating genes are separated by experiment was continued for 1OON generations, and recombination. However such cases were very rare in 15or 100 replications were donefor each set of the present simulations. At any rate, integers of the parameter values. As in my previous study, the prod- gene are made positive when this gene or another ucts, such as 2Nu (= 0.1) and 2Ny (= 0 - 0.2), are gene of the arrayhas already accumulated deleterious chosen to be realistic. mutations. In this report, as in my previous ones, the term “allele” is used to designate the mutational state RESULTS of genes at redundant loci. Genes may thus acquire positive allelic states. They are assumed to obey the Whole experiments fall into four groups: I, within- previous positive selection (OHTA1987a, 1988a), i.e., gene compensation and relaxation; 11, within-gene if the number of different beneficial alleles, marked compensation and no relaxation; 111, between-gene compensation and relaxation; IV, between-gene com- by a multiple of +lo, of a diploid individual is more pensation andno relaxation.For each group, two than the population average,such an individual enjoys a selective advantage accordingto thefitness function, levels of negative selection intensity were carried out, 2Ns- = 10 and 2. The former represents the cases w+,i = 1for k, L E where selection is strong enough to prevent random fixation of mutants if there is no unequal crossing w+,~= exp(-s+(E - k,)) for k, < E (1) over,and the latter represents the casesin which slightly deleteriousmutant substitution may occur where the subscript, i, denotes the ith individual, k, is even under the single-locus model. As a very rough the number of alleles in the ith individual, 1 is the estimate, the chance of spreading of mutants during population average, and s+ is a positive selection coef- lOON generations becomes, by using the formula of ficient. In the presentmodel, the chanceof being KIMURA(1 962), beneficial (multiple of 10) is one-half that of compen- u x 100 x 50 X 0.1 X - l)/(e4 - 1) = 0.38 sation (multiple of 5). Whena gene with positive integers accumulates negative integers again, it is as- when 2Ns- = 2, without unequal crossing over. In sumed that the gene loses its selective advantage of order to find out the effect of gene duplication, the positive integers, and negative selection dominates. rate of unequal crossing over is varied between 2Ny Two models of negative selection for deleterious = 0 - 0.2. mutations are introduced. In one model, deleterious To illustrate the general properties of the simula- mutations become neutral so long as there remains at tion results, some examples on the number of differ- least one gene free of such mutations in the array. ent beneficial alleles in a diploid individual at the 100 This is similar to my previous model (OHTA1987a, Nth generation are shown in Figure 1 for the case of 1988a), andin a sense is maximum relaxation by gene within-gene compensation. The abscissa represents redundancy. Once every gene of the array accumu- the unequal crossing-over rate in terms of 2N7, and lates one or more negative integers, selection works the ordinate represents the number of alleles as the in terms of the total number of mutants in the array. average of 100 replicates. Solid andbroken lines represent with and without relaxation respectively. w-,~= expi-s- mi/li) (2) The vertical bar is one standard error, and figures where mi is the total number of mutant sites in the ith beside lines are the values of ~Ns-.As can be seen array, li is the copy number of the array, ands- is the from the figures, unequal crossing over is effective selection coefficient. Note that,in my previous model, for increasing the numberof beneficial alleles through an array for which every gene contains one or more gene duplication, but the effect is more pronounced Evolution by Gene Duplication 843

C

a 10 """-" - """"_ - _"" 0.00 ""- 0.0 0.05 0.10 0.15 0.20 0.0 I 0.0 0.05 0.10 0.15 0.20 2Nv ZN, FIGURE2.-Effect of unequal crossing over on the positive gene FIGURE1 .-Effect of unequal crossing over on the number of divergence at the lOONth generation of the simulated population different beneficial alleles per genome at the 1 OONth generation of for the case of within-gene compensation (average of 100 replica- the simulated population for the case of within-gene compensation tions). Solid and broken lines, figures beside lines, and other param- (average of 100 replications). Solid and broken lines represent with eters are same as in Figure 1. and without relaxation of selective constraint. Figures beside lines are the values of ~Ns-.Vertical bar is one standard error. Other evolution by gene duplication when compensatory parameters are 2Ns+ = 10, and 2Nu = 0.1 with 2N = 100. advantageous mutations occur.There is another point that is relevantfor discussion of the evolution of when the constraint is relaxed. It is also noted that a difference between strong and weak selection is ob- duplicated genes. Construction of phylogenetic trees is a very popular analysis of molecular evolutionists, served for the cases with relaxation, but not for the but branch lengths are much influencedby relaxation. cases without relaxation. These effects of unequal crossing over are also The following two phylogenetic trees in Figure 3 show found by examining other quantities such as gene the influence. The left tree contains a branch that is divergence and copy number per genome. Positive free of mutation, but the right hasone no such branch. gene divergence is the fraction of sites with positive The former is typical for the cases with relaxation, integers averaged over all redundant loci. Figure 2 and the latter for the cases without relaxation, Note presents results of positive divergence as the average that the negative selection is weaker for the right tree of 100 replications. Again, the abscissa is the unequal than for theleft one, and mutantsaccumulated in the crossing-over rate (2Ny),and the ordinate is the pos- righttree even without relaxation. By considering itive divergence. Standard error is not quite meaning- that the molecular evolutionary rate is roughly con- ful, because the copy number is different among runs, stant as many data show, the right tree seems to be and not given here. The same tendencies can be more realistic than the left one. Thus, the maximum observed on the effects of relaxation and selection relaxation modeled here may not be realistic. intensity as in Figure 1. Gene divergence clearly in- Under the presentset of parameter values, compen- creases when the unequal crossing-over rate becomes sation within agene and that between duplicated higher. When the constraint is not relaxed, little ac- genes are not very different for theevolution of gene celeration is observed for strong selection. When se- families, but the latter is slightly more efficient than lection is weak, acceleration is seen even without re- theformer. This is because the chance of having laxation. compensatory mutations is increased by gene redun- Tables 1 and2 present results for the copy number, dancy. From the viewpoint of various molecular inter- the number of different beneficial alleles, the number actions, within-gene compensation would bemore of genes with deleterious mutations, and positive and realistic. negative gene divergence for more cases but with a At any rate, it is clear from the present simulations smaller number of repetitions. Except divergence, the thatgene redundancy is neededfor evolution by results are given as the average k SD of 15 replications. compensatorymutations. When selection is strong, As in previous models, the SD is as large as the mean, relaxation of a selective constraint becomes a prereq- showing thatthe chanceeffect is very important uisite, but when it is mild, relaxation is not necessarily (OHTA 1987a, 1988a, b). Effects of unequal crossing required. Acceleration of amino acid substitutions over andrelaxation are found notonly by the number following geneduplication, observed by GOODMAN of beneficial alleles and divergence, but also by other (1976) and LI (1985), may reflect relaxation. Also, measures. mildly deleterious mutant substitutions may be accel- As seen above,relaxation has alarge effect on erated by gene duplication even without relaxation, if 844 T. Ohta

TABLE 1 Properties of gene families at theIOONth generation in the simulated populationsfor the case of within-gene compensation

No. of different No. Divergenceof genes beneficial with deleterious 2Ns- 2Ns+ 2NY 2Ns+ 2Ns- Negative PositiveCopy no. mutations alleles With relaxation 2 0 0.0 1.oo 0.76 f 0.12 0.24 f 0.12 0.01 0.01 0.05 1.62 f 0.75 1.08 f 0.21 0.61 k 0.63 0.13 0.02 0.1 3.81 f 2.87 1.39 f 0.86 2.27 f 2.26 0.04 0.1 1 0.2 8.02 f 0.52 1.31 f 0.70 6.10 f 5.41 0.04 0.20 10 0.05 2.00 f 1.07 1.21 f 0.39 0.81 f 0.87 0.08 0.04 0.1 4.35 f 5.31 1.59 f 1.19 2.80 f 4.26 0.03 0.12 0.2 5.07 f 4.61 2.05 f 1.83 2.91 f 2.72 0.06 0.08 20 0.05 2.66 f 2.04 1.33 f 0.98 1.40 C 1.75 0.05 0.06 0.1 3.33 f 2.45 1.71 f 0.83 1.58 f 1.74 0.08 0.09 0.2 7.54 f 5.29 3.14 f 2.28 3.74 f 3.22 0.12 0.10 10 0 0.0 1 .oo 0.99 f 0.01 0.01 f 0.01 0.00 0.00 0.05 2.01 t 1.64 1.12 f 0.45 0.87 f 1.54 0.02 0.05 0.1 3.47 f 2.1 1 1.14 f 0.39 2.20 f 1.92 0.05 0.13 0.2 6.74 f 4.96 1.12 f 0.30 5.43 f 4.85 0.04 0.21 10 0.05 1.76 f 1.37 1.16 C 0.59 0.61 f 0.95 0.02 0.04 0.1 6.20 f 3.95 2.07 f 0.98 3.38 f 3.64 0.07 0.13 0.2 6.11 f 4.82 1.71 f 1.21 3.64 f 3.54 0.06 0.09 20 0.05 2.29 -t 1.71 1.39 f 0.84 0.95 f 1.30 0.02 0.04 0.1 6.00 f 4.93 2.31 & 1.72 3.50 f 3.43 0.06 0.10 0.2 7.80 k 5.99 2.79 f 2.05 4.52 zt 4.25 0.06 0.12 Without relaxation 2 0 0.0 1.oo 0.64 f 0.46 0.36 f 0.46 0.01 0.01 0.05 1.52 f 0.81 0.73 f 0.45 0.68 f 0.87 0.03 0.03 0.1 1.99 f 0.84 0.53 f 0.52 1.40 f 0.78 0.02 0.08 0.2 4.10 f 4.03 0.53 & 0.52 3.20 f 4.19 0.0 1 0.09 10 0.05 1.95 f 0.75 1.26 f 0.67 0.93 f 1.10 0.03 0.04 0.1 2.28 f 1.67 1.13 f 0.50 0.93 f 1.04 0.02 0.04 0.2 5.13 & 4.37 1.75 f 1.18 2.42 & 2.68 0.07 0.05 20 0.05 2.17? 1.56 1.46 f 1.18 0.91 f 0.87 0.03 0.04 0.1 1.75 f 1.20 1.20 f 0.40 0.66 f 0.71 0.03 0.03 0.2 4.32 f 5.00 1.60 f 1.40 2.32 f 2.86 0.02 0.05 10 0 0.0 1.oo 0.98 f 0.07 0.02 f 0.07 0.00 0.00 0.05 1.82 f 1.03 1 .oo 0.16 f 0.35 0.00 0.00 0.1 1.81 f 0.79 1 .oo 0.06 f 0.1 1 0.00 0.00 0.2 2.09 f 2.25 1.oo 0.30 f 1.01 0.00 0.00 10 0.05 1.62 & 0.81 1.oo 0.10 f 0.26 0.00 0.00 0.1 2.63 f 1.72 1.07 f 0.25 0.39 f 0.91 0.00 0.00 0.2 4.07 f 4.55 1.52 f 1.51 0.82 f 1.82 0.01 0.01 20 0.05 1.65 f 1.20 1.oo 0.17 t 0.41 0.00 0.00 0.1 2.47 f 1.12 1.13 f 0.34 0.24 f 0.35 0.01 0.00 0.2 2.58 f 1.90 1.07 f 0.25 0.44 f 0.83 0.01 0.00

Figures are the average f SD for 15 replications. Only the average is given for divergence because the SD is not very meaningful when copy number and so on are very different among the runs. For all simulations, 2Nv = 0.1 with 2N = 100. the selection coefficient is averaged over redundant be seriously impaired. This is one of the major causes genes as modeled here. of genetic defects (VOGELand MOTULSKY1979). In such cases,it is sometimes notedthat the protein DISCUSSION function is recovered by another replacement of an amino acid at an othersite of the protein (WATSONet For folding of proteins or nucleic acids, amino acid aE. 1987, pp. 228-229). For example, if glycine at or base sequences play important roles. If an amino position 21 0 of tryptophan synthetase A is replaced acid at the critical site of a protein is replaced by by glutamic acid, the proteinloses enzymatic function, another aminoacid with different properties, thefold- but when an additional replacement at position 174 ing may be disturbed, and the protein function may occurs,function is restored (YANOFSKY,HORN and G ene DuplicationEvolution by Gene 845 TABLE 2 Properties of gene families at the lOONth generation in the simulated populations for the caseof between-gene compensation

No. of differentNo. ofDivergence genes beneficial with deleterious 2Ns- 2Ns+ 2Nr Negative CopyPositive no. mutations alleles With relaxation 2 0 0.0 1.oo 0.53 f 0.49 0.47 f 0.49 0.00 0.03 0.05 2.51 f 0.94 1.oo f 0.01 1.53 f 0.94 0.03 0.20 0.1 3.36 f 1.85 1.09 f 0.24 2.66 f 1.52 0.02 0.16 0.2 5.09 f 4.09 1.25 f 0.40 3.41 f 3.75 0.08 0.13 10 0.05 2.25 f 1.07 1.31 f 0.44 1.23 f 0.98 0.03 0.14 0. I 4.24 f 3.26 1.85 f 1.53 2.51 2 2.06 0.04 0.13 0.2 7.77 f 6.68 2.84 f 2.31 4.63 f 4.37 0.10 0.13 20 0.05 2.14 f 2.07 1.47 f 1.51 0.81 f 0.74 0.04 0.05 0.1 3.36 f 3.40 2.27 f 3.04 1.38 f 1.42 0.05 0.06 0.2 9.76 f 10.05 4.79 f 5.35 4.38 f 4.01 0.10 0.10 10 0 0.0 1.oo 0.98 f 0.07 0.02 f 0.07 0.00 0.00 0.05 2.74 f 1.96 1.16 f 0.46 1.47 f 1.67 0.04 0.13 0.1 2.36 f 0.94 1 .oo 1.09 f 0.71 0.00 0.07 0.2 7.06 f 3.98 1.49 f 0.66 4.89 f 3.17 0.09 0.22 10 0.05 2.54 f 1.60 1.44 f 0.85 0.91 f 1.04 0.03 0.07 0.1 3.94 f 2.56 1.83 f 1.25 2.01 f 1.62 0.07 0.15 0.2 7.64 f 7.13 2.70 f 2.84 4.25 f 4.10 0.07 0.08 20 0.05 2.93 f 2.00 1.94 f 1.31 0.89 f 1.13 0.06 0.04 0.1 5.71 f 5.40 3.31 f 3.75 2.17f 1.85 0.07 0.06 0.2 14.55 f 12.37 8.95 f 9.1 1 4.00 f 3.56 0.1 1 0.04 Without relaxation 2 0 0.0 1.oo 0.53 f 0.50 0.47 f 0.50 0.00 0.03 0.05 2.06 f 1.3 1 0.87 f 0.52 0.83 f 0.76 0.03 0.04 0.1 2.36 f 1.90 0.53 f 0.52 1.42 f 1.19 0.00 0.08 0.2 3.32 f 1.98 0.60 f 0.51 2.37 f 2.20 0.01 0.09 10 0.05 1.87 f 1.14 0.93 f 0.95 0.78 f 0.97 0.01 0.04 0.1 3.13 f 2.41 1.30 f 0.51 2.01 f 2.09 0.03 0.07 0.2 5.71 f 7.33 2.33 f 3.53 2.90 f 3.05 0.02 0.06 20 0.05 2.65 f 2.73 1.68 f 2.13 1.05 f 1.25 0.04 0.05 0.1 2.61 f 2.88 1.93 f 1.78 1.08 f 1.09 0.03 0.04 0.2 8.14 f 8.59 4.42 f 5.54 3.34 f 3.44 0.07 0.05 10 0 0.0 1.oo 0.98 f 0.07 0.02 f 0.07 0.00 0.00 0.05 1.74 f 1.02 1.oo 0.1 1 f 0.27 0.00 0.00 0.1 1.76 f 0.91 1 .oo 0.1 1 f 0.30 0.00 0.00 0.2 2.72 f 1.73 1 .oo 0.21 f 0.37 0.00 0.00 10 0.05 1.88 f 1.04 1 .oo 0.1 1 & 0.26 0.00 0.00 0.1 1.69 f 0.93 I .oo 0.1 1 f 0.28 0.00 0.00 0.2 3.30 f 4.08 1.66 f 2.46 0.45 f 1.04 0.01 0.0 1 20 0.05 1.95 f 1.50 1.13 f 0.50 0.1 1 f 0.28 0.01 0.00 0.1 1.50 f 0.68 1 .oo 0.03 f 0.07 0.00 0.00 0.2 4.43 f 8.83 2.43 f 5.09 0.96 f 2.97 0.01 0.01

~ ~______Figures are the average f SD of 15 replications. Only the averageis given for divergence because theSD is not very meaningful when copy number and so on arevery different among the runs. Forall simulations, 2Nu = 0.1 with 2N = 100.

THORPE1964). Restoration reflects interaction cause the two mutations are not separated. among amino acids, and in the above example, the Compensatory evolution proceeds through an in- recovered state is close to the original one, i.e. com- termediatedeleterious state, and is related tothe pensatory neutral mutations. Ribosomal RNA genes slightly deleterious mutation theory of molecular ev- are otherexamples which show compensatory nucleo- olution (OHTA1973, 1976, 1987~).This theory states tidesubstitutions (e.g. BRIMACOMBE1984). KIMURA that the rate of molecular evolution is determined (1 985) formulated theprocess of compensatory neu- largely by the mutationalpressure of very slightly tral evolution by means of diffusion equations. He deleteriousmutations, and is based onour under- found that tight linkage between the interacting sites standing of the higher order structureof proteins and enhances the rate of compensatory substitutions, be- nucleic acids. If proteins and nucleic acids have 846 T. Ohta

FIGURE3.-Phylogenetic trees of du- I plicated genes on a sampled chromo- 263 165 255 some from simulated populations at the 425 lOONth generation. The left tree is for the case with relaxation, and the right @ 675 onefor the case without relaxation. Pa- 707 @ rameters are 2Ns+ = 2Ns- = 10,2Nu = I 1 848 L 776 0.1 and 2N-y = 0.05 for the left one, and 876 2Ns- = 2, 2Ns+ = 10,2Nu = 0.1 and 951 937 1330 988 1330 1057 1062 2N-y = 0.1 for the right one, both within- 1341 1096 1472 1322 1377 gene compensation. Figures beside the I tree are mutant marks without minus sign. Beneficial ones are circled. evolved along time ago, their structuresare well evolution of compensatoryadvantageous mutations organized, and any random mutations would disturb may be much enhanced. such an organization. When the effect is very small, There areseveral interesting examples that appar- random drift and mutational pressure dominate over ently have evolved through deleterious intermediate selection. Here compensatorymutant substitutions with redundant genes. The hemoglobin a-chainof the prevent deterioration of genes. opossum is unusual in that theinvariant histidine (His) When the effect is not mild such that Ns- >> 1, the at amino acid position 58 is replaced by glutamine probability of fixation of mutants becomes very small. (Gln). Also it is characterized by rapidevolution, Even under the most favorable condition of complete suggesting compensatory substitutionsto this unusual linkage, mutation rate in terms of Nv must be fairly replacement (STENZEL1974). In man, substitution of high in order to have compensatory mutant substitu- this His to another amino acid is known as methah- tions within a reasonably short period of time(KIMURA emoglobin and causes chronic cyanosis (GERALDand 1985). For example, the fixation time of compensa- EFRON1961). Withoutrelaxation of the constraint tory mutations is about 5N generations, when 4Ns- = allowed in redundant copies, the His 4Gln substitu- 10 - 20 and 4Nv = 1 under complete linkage. In our tion in the opossum hemoglobin would havebeen simulations, the corresponding parameter, 4Nv, for impossible. compensatoryneutral or advantageousmutation is Another example is seen in plant ferredoxins. Cou- 0.2/5 = 0.04, i.e., 1/25 of KIMURA'Svalue. When this pled substitutions are found between duplicated genes parameter is 1/25, the fixation time would become of horsetail ferredoxin, and the amino acid positions 25' times largerfrom KIMURA'S result for of the substitutions are close to each other in the s >> v, and one would expect 625 X (5N) = 3125N higher orderstructure, suggestingcompensatory generations for compensatory neutral mutants to be function (TSUKIHARAet al. 1982). A final example is fixed in our simulations. This value is much longer the unusual codon usage of Mycoplasma. In this or- than the length of time of our simulations. Further- ganism, anordinary stop codon, UGA, is readas more, KIMURA'Sresult shows that the fixation time triptophane (Trp)(YAMAO et al. 1985). By considering increases further by bringing 4Ns- larger than 20. GC +-AT mutational pressure characteristic for this Thus, there is increasing likelihood of establishing species, JUKEs (1985) proposed the following steps for compensatory mutations from the case of single locus this evolutionary change. First,the pressure to replace gene through that of duplication without relaxation G by Acould lead to replacements of UGAstop to that of duplication with relaxation. codons by UAA stop codons. Second, a duplicationof The evolution is much dependent on whether or the gene for tRNA"p(CCA) occurred. Third, a muta- not gene duplicationis involved. In view of commonly tion of one of them totRNAtrP(UCA) tookplace. Note observed redundant genes in eukaryote genomes,this that tRNA"P(CCA) reads UGG, and tRNAtT(UCA), point has an important bearing for evolutionary the- UGA because ofRNA code system. Furthermore ory.Here not only compensatory neutral but also tRNAtrP(UCA) can decode the universal Trp codon compensatory advantageous mutant substitutions need UGG by wobble pairing. Now, if all the UGA stop attention. There may be cases where new gene func- codons had been replacedby UAA, the emergenceof tion is acquired only through an intermediate delete- tRNAt'P(UCA) would havebeen acceptable. The rious state. It would be expected that the greater the fourth stepwas the replacement of Trp UGG codons difference between the new and the old gene func- by UGA through mutationalpressure. The above tions, the larger is the intermediate deleteriouseffect. picture of evolutionary steps has been strengthened If there exist only single copy genes, evolution ofsuch by the finding that the two tRNAtv genes are tan- new function would never occur. However when the demly arranged in a strain of Mycoplasma (MUTO, selective constraint is relaxed by gene redundancy, YAMAOand OSAWA1987). In the above steps, gene Evolution by Gene Duplication 847 duplication with subsequent substitution would have JUKU,T. H., 1985 A change in the genetic code in Mycoplasma been beneficial for tolerating strong GC + AT pres- capricolum. J. Mol. Evol. 22: 361-362. KIMURA,M., 1962 On the probability of fixation of mutant genes sure, since tRNA"p gene usually exists as single copy. in a population. Genetics 47: 7 13-7 19. Thus gene duplication played a crucial role on the KIMURA, M., 1983 TheNeutral Theory of MolecularEvolution. evolutionary change ordinarily unacceptable. Cambridge University Press, Cambridge. In the usual process of evolution by gene duplica- KIMURA,M., 1985 Diffusion models in population genetics with tion, the number of genes increases by acquiring new special reference to fixation time of molecular mutants under functions. The above examples seem to be slightly mutational pressure. pp. 19-39. In: PopulationGenetics and Molecular Evolution, Edited by T. OHTAand K. AOKI.Japan different from such a pattern. The number of genes Science Society Press, Tokyo. has remained almost unchanged and gene duplication LI, W.-H., 1985 Accelerated evolution following gene duplication was used in a restricted period in which relaxation and its implication for the neutralist-selectionist controversy. was needed. If a kind of positive selection as consid- pp. 333-352. In: Population Genetics and Molecular Evolution, ered in the previous section works for a long period Edited by T. OHTAand K. AOKI. Japan Science Society Press, Tokyo of time, the gene number increases, and the present MUTO, A., F. YAMAOand S. OSAWA,1987 The genome of Myco- model would be appropriate for such cases of evolu- plasma capricolum. Prog. Nucleic Acid Res. Mol. Biol. 34 29- tion by gene duplication. Of course, we do not know 58. how often compensatory advantageous mutations oc- OHNO,S., 1970 Evolution by GeneDuplication. Springer-Verlag, cur. My previous model,in which detrimental, neutral New York. OHTA,T., 1973 Slightly deleterious mutant substitutions in evo- and beneficial mutations were taken into account lution. Nature 246: 96-98. without compensation, may be more realistic in some OHTA, T., 1976 Role of veryslightly deleterious mutations in cases. molecular evolution and polymorphism. Theor. Popul. Biol. The intermediate deleterious step would be una- 10: 254-275. voidable when new genes are created by "exon shuf- OHTA, T., 1980 Evolutionand VariatMn of MultigeneFamilies (Lecture Notes in Biomathematics, Vol. 37). Springer-Verlag, fling," as supposed to have occurred in many super- New York. gene families (GILBERT1985; BALTSCHEFFSKY,JORN- OHTA,T., 1987a Simulating evolution by gene duplication. Ge- VALL and RICLER 1986). Thus, evolution by exon netics 115: 207-2 13. shuffling is thought to occur by usingdispensable OHTA,T., 1987b A model of evolution for accumulating genetic genes, that are often redundant. A more elaborate information. J. Theor.Biol. 124: 199-211. modeling is needed for quantitative understanding of OHTA, T., 1987c Veryslightly deleterious mutations and the molecular clock. J. Mol. Evol. 26 1-6. such processes. OHTA,T., 1988a Further simulation studies on evolution by gene I thank B. S. Weir, J. B. Walsh and an anonymous referee for duplication. Evolution 42: 375-386. their many valuable suggestions and comments. This work is sup- OHTA,T., 1988b Time for acquiring a new gene by duplication. ported by a grant-in-aid from the Ministry of Education. Contri- Proc. Natl. Acad. Sci. USA 85 3509-3512. bution no. 1768 from the National Institute of Genetics, Mishima STENZEL,P., 1974 Opossum Hb chain sequence and neutral mu- 41 1,Japan. tation theory. Nature 252: 62-63. TSUKIHARA,T., KOBAYASHI, M., NAKAMURA,M., KATSUBE, Y., FUKUYAMA,K., HASE, T., WADA,K. andH. MATSUBARA, LITERATURE CITED 1982 Structure-function relationship of [2Fe-2S] ferredoxins and design of a model molecule. Biosystems 15 243-257. BALTSCHEFFSKY,H., H. JORNVALL and R. RICLER (editors), VOGEL,F., and A. G. MOTULSKY,1979 Human Genetics. Springer- 1986 Molecular Evolution Lfe. Cambridge University Press, of Verlag, New York Cambridge. WATSON,J. D., N. H. HOPKINS,J. W. ROBERTS,J. A. STEITZand BRIMACOMBE,R., 1984 Conservation of structure in ribosomal A. M. WEINER,1987 Molecular Biology the Gene, Ed. 4. The RNA. Trends Biochem. Sci. 9 273-277. of Benjamin, Menlo Park, Calif. GERALD,P. S., and M. L. EFRON,1961 Chemical studies of several varieties of HB M. Proc. Natl. Acad. Sci. USA 47: 1758-1767. YAMAO,F., MUTO, A., Y. KAWAUCHI,M. IWAMI,S. IWAGAMI,Y. GILBERT,W., 1985 Genes-in-Pieces revisited. Science 228: 823- AZUMIand S. OSAWA,1985 UGAis read as tryptophan in 824. Mycoplasma capricolum. Proc. Natl. Acad. Sci. USA 84: 2306- 2309. GOJOBORI, T., and M. NEI, 1984Concerted evolution of the immunoglobulin VH . Mol. Biol. Evol. 1: 195-212. YANOFSKY,C., V. HORN andD. THORPE,1964 Protein structure GOODMAN,M., 1976 Protein sequences ih phylogeny. pp. 141- relationships revealed at mutational analysis. Science 146 159. In: Molecular Evolution, Edited by F. J. AYALA.Sinauer 1593-1 594. Associates, Sunderland. Communicating editor: B. S. WEIR