Copyright 0 1995 by the Genetics Society of America

How Often Do Duplicated Evolve New Functions?

J. Bruce Walsh

Department of Ecology and Evolutionay Biology, The University of Arizona, Tucson, Arizona 85721 Manuscript received July 3, 1994 Accepted for publication September 1, 1994

ABSTRACT A recently duplicated can either fix a null (becoming a ) or fix an (advanta- geous) allele givinga slightly different function, startingit on the road to evolvinga new function. Here we examine the relative probabilities of these two events under a simple model. Null are assumed to be neutral; linkage effects are ignored, as are unequal crossing over and gene conversion. These assumptions likely make our results underestimates for the probability that an advantageous allele is fixed first. When new advantageous are additive with selection coefficient s and the ratio of advantageous to null mutations is p, the probability an advantageous allele is fixed first is ([1 - e-'] / [ps] + l)", where S = 4N9-with Ne the effective population size. The probability that a duplicate becomes a pseudogene, as opposed to evolving a new gene function, is high unless pS %- 1. However, even if advantageous mutations arevery rare relative to null mutations, for sufficiently large populations pS%- 1 and new gene function, rather than pseudogene formation,is the expected fate of most duplicated genes.

ENE duplication is a ubiquitous featureof genome and theevolution of newgene function. Therehas been G evolution and has historically been viewed as the considerable theoretical work on the silencing (fixing predominant mechanism for theevolution of new gene a null allele) of a duplicate copy. HALDANE (1933) was functions (HALDANE 1932; BRIDGES1935; OHNO1970), among the first to suggest that would eventu- although exonshuffling (GILBERT1978) and alternative ally inactive duplicate genes in recently formed poly- splicing (SMITHet al. 1989) have recently become ap- ploids by generatingnull alleles. Hemade the im- preciated as related and important mechanisms for us- portant observation thatrandom inactivation of inga single ancestral sequence to generate two (or duplicated copies can change the linkage relationships more) functionally divergent products. While func- among active genes. Thus, even if a duplication does tional divergence of duplicated copies can occur, one not lead to a new functionally divergent gene, it can still copy can also suffer the less glamorous fate of fixing a have important evolutionary consequences by changing null allele and becoming a pseudogene.Given that both gene order. and functionally diverged duplicated FISHER(1935) pointed out that Haldane's argument genes are common features of eukaryotes, our concern for mutational silencing is not strictly correct in an in- here is the relative frequencies of these different fates. finite population. Because at least some genotypes car- In particular,how often does a duplicatedlocus become rying null alleles are expected to be selected against, a pseudogene (by fixing a null mutation) as opposed thepopulation approaches a mutation-selection bal- to fixing an adaptive mutation and hencestarting down ance where functional alleles segregate at both loci. the road to a new gene function. We assume that copy Usually the double recessive fitness model is assumed, number does not change and nullthat alleles behave as wherein if A/Brepresent functionalalleles and a/b null if they are neutral. Thissimplified model yields analytic alleles, only aabb individuals are at a selective disadvan- solutions that provide useful benchmarksfor future tage. Fisher's ideas were examined in more detail by considerations of complicating factors such as linkage, CHRISTIANSENand FRYDENBERG(1977) and KIMURA and alternative fitness models for null alleles, and changes KING (1979).For the doublerecessive model they found in copy number due to unequal crossing over. a line of equilibrium exists wherein both loci segregate functional and null alleles when both loci havethe same null mutation rates. When mutation rates differ across PREVIOUS THEORETICAL WORK ONTHE FATE OFDUPLICATED GENES loci, the locus with the higher mutation rate becomes fixed for null alleles while the other locus segregates Before considering our model, it is useful to review null alleles at a low frequency. previous theoretical work on pseudogeneformation When null mutation rates are the same at both loci, genetic drift can move genotype frequencies along the Author e-maiZ:[email protected] line of equilibrium, eventually fixing null alleles at one

Genetics 139: 421-428 (Januav, 1995) 422 J. B. Walsh locus. Hence, finite population size plays a critical role tively driven. A problem with this model is the fixation in the fixation of null alleles and NEI (1970) and NEI of the initial duplication, given it would be selected and ROYCHOUDBURY(1973) examined the fixation pro- against under this fitness scheme. cess when reversible mutation (null mutations can re- This issue of the initial fixation of the duplicate locus vert to functional copies) occurs. Under the more bio- is ignored in all the above models, which were largely logicallyrealistic assumption that null mutations are concerned with the fate of duplicate loci in recently irreversible, a duplicate locus is fixed for a null allele polyploid animals. The duplicate copy of a single locus in a finite population with probability one andthe issue is generally assumed to be fixedsimply by drift, al- becomes the expected time for this to occur. Starting though CLARK(1994) shows that there may be a slight with BAILEYet al. (1978), several workers (KIMURA and selective advantage for fixing a new duplication because KING 1979; TAKAHATAand MARUYAMA 1979; Lr 1980; it potentially masks the accumulation of null alleles. MARUYAMAand TAKAHATA 1981;TAKAHATA 1982; WAT- While there has been considerable modeling of the TERSON 1983) examined the time to silence a duplicate silencing of a duplicate locus, there has been far less locus, generally assuming unlinked loci. The general theoretical work on the actual evolution of new func- finding is that thetime for gene silencing increases with tion. SPOFFORD(1969) examined conditions under effective population size N,. TAKAHATAand MARUYAMA which segregating overdominant alleles can become (1979) show that the expected time for silencing at an fixed at alternative duplicate loci in an infinite popula- unlinked duplicate locus (assuming the double reces- tion, an analysis only concerned with the evolution of sive fitness model) is only slightly longer than the ex- existing alleles. A very different model was considered pected silencing time under the assumption that all by OHTA (1987, 1988a-c, 1989;BASTEN andOHTA null alleles are neutral unless NJ (where r is the selec- 1992), whose simulation studies start with a single mo- tion against double-recessives) is extremely large. WAT- nomorphic locus and examine the accumulation of TERSON, using diffusion approximation arguments, both null mutations and new adaptive alleles in a gene found that the expected silencing time for unlinked family whose copynumber is changing due to unequal loci under the doublerecessive model is approximately crossingover. Her model assumes finite population size Ne[log( ~NJ)- IC, ( 2Nep)]where p is the null allele muta- and incorporates a somewhat unusual fitness function, tion rate and IC, the digamma function (ABRAMOWITZ wherein the fitness of a gamete containing ki different and STEGUN1970, Sec. 6.3). Hence when null allele (functional) alleles is fitnesses are given by the double recessive model, to a reasonable first approximation the fixation of null al- w=(1 for ki 2 X leles at one locus can be treated as neutral unless 2Ner 1 - s(R - ki) for kj < R is extremely large. This is not the case under other fitness models. TAKAHATAand MARWAMA (1979) and where is the average number of functional alleles in a LI (1980) examined a partial fitness model random gamete. Ohta’s simulations incorporate several that assumes genotypes with three null alleles (Aabband key evolutionary forces (unequal crossingover, drift, auBb) are also selected against. The expected time to and mutation to both null alleles and new adaptive fix a null allele under this model is greatly increased alleles) not considered by SPOFFORD.OHTA found that relative to the neutral case. Finally, there are linkage thegene family accumulates new functional alleles effects to consider. ~LJYAMAand TAKAHATA(1981) more rapidly as the rate of unequal crossing over in- show that linkage can be quite important, especially creases, with accumulation being more rapid under in- when the population size is very large and linkage very terchromosomal (between homologues) versus intra- tight. In effect, linkage makes null alleles behave more chromosomal (between identical sister strands) cross- neutrally, speeding up the time to fixation compared ing over. with the unlinked case, although this increase in speed is small unless population size is large. A MODEL FOR PSEUDOGENEFIXATION ALLENDORF (1979) examined a very different model VS. EVOLUTION OF NEW FUNCTION of gene silencing. The above models assume drift and mutational pressure are countered to some extent by Starting with a population fixed for two (functional) selection against certain null allele genotypes. ALLEND- duplicate genes, we are interestedin whether the dupli- ORF, however, assumed a model in which selection also cate copy eventually fixesa null mutation (becoming a acts against individuals with extra copies of functional pseudogene) or else fixes an advantageous mutation alleles, so that the genotypes AABb, AuBB and AABB and hence is provided some protection (via selection) alsohave reduced fitnesses. Under this model rapid from fixing future null alleles. The understanding here silencing occurs and (unlike theabove models) the rate is that these advantageous alleles provide a slightly dif- of silencing increases with effective population size, re- ferent function or selection regime relative to alleles flecting the fact that silencing is to some extent selec- at the other locus. We make a number of simplifjmg Gene Duplication assumptions to obtain an analytic solution for the rela- Phase I: Initial Advantageous Fixation tive probabilities of these two events. Two potentially important complications ignored are changes in gene both copies functional copv number due to unequal crossing over and the -c effects of linkage. We assume the hvo copies are un- linked andhence unequal crossingover generating copy number variation is not likely a concern. We fur- ther assume that null mutations areirreversible-once FB a null allele is fixed, no advantageous mutations can pseudogene advantageous fixation arise and this particular duplication event no longer has the possibility of contributing a new gene function. Gene conversion offers an option for activating pseu- Phase 11: Continuing Differentiation dogenes but this potential complication is ignored. Fi- advantageous fixation nally, null alleles are assumed to behave as if they are H tB neutral, which is not an unreasonable approximation given the work reviewed above. When these assump- tions are violated, our model likely undmsfimnlPs the probability of new gene formation. If gene conversion occurs, some pseudogenes can eventually lead to func- I-A -D ’ ‘ pseudogene further divergence tional genes. Likewise, if some null allele genotypes are under selection, their probability of fixation is de- FIGURI:.1.-(Top) Initially our interest is whether a popula- tion fixed for a duplication (state C) first fixes a null (state creased relative to our neutrality assumption, increasing A) or advantageous allele (state B). Our concern is the final the probability than an advantageous mutation is fixed state of the population, so the actual time to fixation of a first. successful mut;ltion is ignored and instead we focus on the Because our interest is whether a duplicated gene per generation rates at which successful mutations (destined first fixes a null or advantageous allele, we formulate to become fixed) arise. These are denoted by n for null alleles and h for advantageous alleles. (Bottom) Following the fixa- this as a simple Markov chainwhere the population tion of the first advantageous mutation, there is still a chance starts in state C = population fixed for two identical that the duplicate locuswill fix a null allele. We examine this (and functional) copies andhas two absorbing states: A by considering whether the diverging locus next fixes a sec- = population fixed for pseudogene and€3 = population ond advantageous mutation (state D) or a null allele (state fixed for advantageous allele (Figure 1). This model is A). The rates for these state transitions are denoted by d and n (potentially different from rate n in the first model), only concerned with the final states and ignores the respectively. appearance of alleles that are eventually lost. Further we ignore mutations that are neitheradvantageous nor inactivate the gene. Hence, a transitionoccurs from geous to null mutations to be much less than one, per- state C when a new null or advantageous mutation arises haps by many orders of magnitude. Null mutations at and becomes fixed. For now, we assume that each new the duplicatedcopy are assumed neutral while advanta- mutation is either lost or fixed before the next occurs geous mutations are assumed to be additive withfit- (a correction for this is developed later). Lettingn and nesses 1 : 1 + s : 1 + 2s. Standard diffusion theory b denote the expected per generation rates at which (KIMURA 1957) gives these fixation probabilities as successful (destined to befixed) null and advantageous mutations arise, the probability that the duplicatelocus 2N-U(1/2N) is fixed for an advantageous allele (ends up in state B 1 for null mutations rather than state A) is = S/(1 - P-“) foradvantageous mutations (2) -I b where S = 4N,s 211% = -= (1 + n + I) ;) Here N,.I denotes theeffective population size. Thus the These per-generation rates aregiven by the expected rates of appearance of successful null and advantageous number of mutations arising each generationtimes the mutations are probability of fixation of each, (2Nu)-U(1/2N), where ZL is the mutation rate and U(1/2N) is the probability of fixation of a new mutationstarting at initial fre- quency 1/2N. Letp and pp be theper-copy, per-genera- tion mutation rates of null and advantageous alleles, Applying Equation 1, the probability uHthat the pop respectively. Typically we expect the ratiop of advanta- dation fixes anadvantageous allele at the duplicate 424 J. B. Walsh

ePS) is replaced by its expected value and pp is interpre- ted as the mutation rate to any advantageous allele. A common assumption about the distribution of fit- ness effects is that most mutations have small fitness effects, with mutations with larger effects becoming in- creasingly infrequent, which suggests that an exponen- tial distribution of (advantageous) fitness effects is not unreasonable. Under this assumption, if the mean value of selection coefficients is given by 5 the expected value of S/ (1 - e-’) is c k

.01 .1 1 10 100 1000 S where S = 4NeSand FIGURE2.-Plot of Equation 4, the probability that a dupli- cated gene fixes an advantageous (rather than a null) allele as a function of the ratio p of advantageous to null mutation rates and S = 4N2 where s is the selective advantage of advan- tageous mutations. is the trigamma function (ABRAMOWITZ and STECUN 1970, Sec. 6.4). Using the asymptotic approximations locus (and hence starts towards the evolution of a new function) rather than becoming a pseudogene is

As expected, uBincreases with both the scaled strength the rate of fixation under an exponential distribution of selection S andthe advantageous mutation rate of advantageous mutations is ratio p. Figure 2 plots this probability for various values of S = 4N2 and p. When S is either very large (so that 1 - e-~‘ = 1) or very small (1 - e-” = S) these approxima- tions reduce Equation 4 to giving the probability of a new gene function as

“‘2pL2{ Sp for Sp 1 for S 9 1 1 + SP I - (p~)” for sp 8 1 (5) These are reasonable in that when S < I, the advanta- Hence provided S = 4NeE[s] is sufficiently large or geous mutations are effectively neutral and the chance small, Sreplaces S in Equation 5. that the duplicated gene fixes an advantageous muta- A large population correction: The above analysis as- tion first is given by the ratio of mutation rates because sumes that once a mutation destined to become fixed both classes of alleles have the same probability of fixa- enters the population, it is not affected by mutations tion. Conversely,even when advantageous mutations that subsequently arise. New mutations have two distinct are strongly selected (S * 1), if they are sufficiently influences on fixation of the original mutation. First, if rare (Sp 4 1) the probability that the duplicate locus the mutation rate is sufficiently high relative to popula- is inactivated is very high. Only if p % S’is the probabil- tion size, even though all alleles in the population may ity that the duplicate copy fixes an advantageous allele descend from the original mutation, they may differ in first high. state due to new mutations. Hence, even though all Results for a distribution of selection advanta- alleles coalesce back to theoriginal mutation, the popu- geous: The aboveanalysis assumed all advantageous lation may never be fixed for the original allelic type mutations have the same selection coefficient s. If muta- as before fixation of that type additional mutations oc- tions instead have s drawn from a distribution of selec- cur in the descents of that allele. This is not a concern tion coefficients, Equation 4 applies provided S/(1 - to us as we simply lump alleles by functional differences, Gene Duplication 425 for example a nullallele with a second mutation is still regarded as a null allele. The second way new mutations can influencethe probability of fixation is, however,a significant concern 7r = exp(-ppSg) to us. New mutations can arise that alter the fitness of the original allele. For example, if the original mutation To compute note this is just theconditional expecta- is a null allele (assumed neutral), but an advantageous nT, tion mutation arises, the dynamics of fixation are altered as the null allele is no longer neutral. In a sufficiently small population this is not a problem as fixation usually E(2N(1 - yJ I Allele fixed) dt (13) occurs before the next mutation appears.In a very large population, however, the expected fixation time is long where yt is the frequency of the null allele at time t, so and new mutations affecting the probability of fixation that 1 - yt is the frequency of functional alleles. This of the original mutation can arise. If the mutation des- expectation is a standard problem in conditional diffu- tined to become fixed is advantageous, the appearance sions and can be solved using Green’s functions (MARU- of new null mutations shouldnot pose a problem as we YAMA 1977). The Green’s function for a neutral allele treat these as neutral and hence their appearance will conditioned on that allele going to fixation (in the ab not alter the fixation dynamics of the positively selected sence of mutation to other alleles), from Equation5.18 mutation. However, if the mutation destinedto become of MARUYAMA (1977) is fixed (in the absence of new mutations influencing its fitness) is a null allele, in a large population new advan- 4N, for x < y; tageous mutations can appear as the neutral null muta- +(X? y) = y 1-x (14) tion is headed towards fixation and thepossibility exists 4Ne -- for x > y, that one of these advantageous mutation will become 1-y x fixed in place of the null mutation. Hence, our analysis where x is the starting frequency of the allele going to underestimates the probability that a duplicated gene fixation. Hence the expectedtotal number of copies of fixes an advantageous mutation as some of the paths functional alleles that appear during the fixation of a counted as fixing null mutations in fact fix an advanta- null allele given that allele starts at frequency x is geous mutation. We can correct for this by noting that if 7r is the probability that no advantageous alleles are fixed on a path assumed fixed for a null allele (in the absence of‘ mutations influencing its fitness), then the corrected rates are

a< = UT, 6, = b + (1 - 7r)a (10) giving the corrected probability of fixation as In our case, n7’used in Equation 12b assumes we start with a single copy, giving bc UB, = __ - 1 - T(1 - US) (1la) a, + b,

= UB + (1 - ~)(l- US) (llb) which (as expected) exceeds uB. Substituting into Equation 12b gives We compute 7r as follows. Advantageous mutations can only arise from functionalalleles. Thus if nTdenotes 7t = exp(-77pS) the total number of functional alleles present along the entire sample path leading to fixation, the expected where number of advantageous mutationsappearing on a path of a nullallele destined forfixation (in the absence of mutations) is n+p. Because each new advantageous is N,/N times the expected number of new null muta- mutation has probability of fixation U(1/2N), theprob- tions appearing each generation (when all alleliccopies ability that none of these mutations is fixed is arefunctional). If 7pS > 1, most paths previously T = [I - U(I/~N)]”T~’”exp[-ppn,U(1/2N)] (12a) counted as fixing null alleles actually fix advantageous alleles. Taking the large population correction into ac- Since our concern(with large population size) is advan- count, Equation 11 becomes 426 J. B. Walsh

2t. The arguments used above give the rate at which successful null mutations arise as

and the rate for successful advantageous mutations as

where S = 4Ng and T = 4N,t. Setting a = t/s, the probability that the duplicate gene acquires a second advantageous mutation rather than becoming inacti- vated is S FIGURE3.-The effect of applying the large population correction (Equation 18a) when p = 0.001. The large popula- tion correction factor is 17 = 2Nep, where p is the null allele mutation rate. 17 = 0 corresponds to Equation 5, while the When t = s, four other curves plot the probability of fixation for 17 = 1, 10, 100 and 1000.

e-qP" Hence, if S P 1 selection provides a considerable barrier II 1 -___ for S P 1 (18b) 1 + ps to the fixation of a null allele before the subsequent fixation of a second advantageous allele. In particular, As Figure 3 demonstrates, unless 7 9 1 the large popula- note from Equation 5 that the condition for a high tion correction isvery minor. However, if 7 P 1, so probability of fixing the first advantageous allele is that that on average many mutations arise each generation, Sp 9 1, while the condition for fixing the second,given Equation (5) greatly underestimates the probability the first fixation, is ~?pP 1,which is much less stringent. that an advantageous allele is fixed first. Continuing differentiation followingthe first advan- DISCUSSION tageous fixation: The evolution of two functional diver- gent loci followinga geneduplicate is likely a multistep A recently duplicated locus can experience a variety process. Obviously,a key step is the fixation of an advan- of fates. On one hand, mayit fix an advantageous allele tageous allele conveying some functional difference giving it a slightly different, and selectable, function (however slight) on the duplicate copy. However, the from its original copy. This initial fixation provides sub- risk still existsthat this emerging genemay subsequently stantial protection against future fixation of null muta- fix a null allele. It is not unreasonable to assume that tions, allowing additional mutations to accumulate that the fixation of the first advantageous allele provides a refine functional differentiation. Alternatively, a dupli- selective barrier that greatly hinders thefixation of null cate locus can instead first fix a null allele, becoming mutations. We examine just how much of a barrier by a pseudogene. In such cases the original sequence iden- considering the probability of fixing second advanta- tity of that locus is rapidly eroded by the accumulation geous mutation rather than null allele. As above, a sim- of mutations (since essentially all subsequent changes ple three-state Markov model is used with A being the in this sequence are expected to be neutral) masking inactive state, B the starting state (the population is any indication of the original duplication event. While fixed for one advantageous mutation) and D that state pseudogenes are apparent evolutionary deadends, they of being fixed for two advantageous mutations (Figure nevertheless are very useful to evolutionary biologists by 1). Let p and pp be the mutation rates to null and giving a muchless biasedpicture of genome mutational advantageous mutations, respectively. Becausethe pop- events than provided by sequences under selection. ulation starts out fixed for an advantageous allele, null To examine the relative probability of these two very mutations are now deleterious (with selection coeffi- different outcomes, we examined (and then refined) a cient s), while we assume newly arising advantageous simple model of this process. Under our simplest model mutations are additivewith fitnesses 1 : 1 + t : 1 + (which ignores linkage, gene conversion and unequal Gene Duplication Gene 427

crossing over as well as assuming null mutations are tion rates of duplicate genesin Xenopus, concluded that neutral and that each new mutation is either fixed or selection likely occurs on genotypes carrying fewer than lost before the next occurs) theprobability that anewly four null alleles, implying that the probability of fixing duplicate locus fixes an advantageous allele before fix- an advantageous allele rather than anull allele is much ing a null allele is higher than the expression given by our simple model. A second assumption of our simple model is that (yS+ the two duplicate copies are unlinked. While linked duplicate genes appearto be fairly common in animals, in plants duplicate genes appear to be mostly unlinked where S = 4N2, four times the product of the effective (MCGRATHet al. 1993; THOMASet al. 1993). Extrapolat- population size and theselective advantage of an advan- ing from the work on linkage and gene silencing, the tageous allele, and p is the ratio of the advantageous effect of linkage is to make null alleles more neutral so to null allele mutation rates. Analysis of this expression that our model may actually be more accurate under shows that unless pS 9 1, a duplicate locus is much tight linkage than loose linkage. more likely to become a pseudogene than evolve a new There are, however, situations in which our model gene function. overestimates the probability of fixing an advantageous Even when Sp < 1, the probability that a duplicate allele first. The first case is positive selection for gene locus starts on the road to the evolution of a new func- silencing (ALLENDOW 1979). Thisis unlikely in that the tion by fixing an advantageous allele can be greater conditions favoring positive selection to fix a null allele than one might expect given p. For example, suppose are also likely conditions that select against the initial s = 0.01, N, = 5000, and p = 5 X lop5so that only one fixation of the original gene duplication. The second advantageous mutation appears (on average) for each set of casesis where there is selection against functional 20,000 mutations that silence the duplicated copy. In divergence. For example,suppose the original locus this case the probability of fixing an advantageous muta- was controlled by a limiting amount of transacting DNA tion rather than silencing is 1%. While this is not ex- binding factors, such as enhancers. Following duplica- tremely high, it ismany orders of magnitude larger tion there is competition for these factors among the than p. This occurs because even through advantageous two copies (e.g.,ENVER et al. 1990). Insuch cases, selec- mutations arevery rare, they have a much higher proba- tion would likelyact against mutations altering thefunc- bility of fixation than a neutral allele. Thus while many tion of either copy as this reduces the concentration of more null mutations appear, the chance that any one trans-acting factors available to copies of the gene with is fixed is very small [1/(2N)]. Conversely, whileadvan- the original function. However, a mutation that inacti- tageous mutations are very rare, each such mutation vates (or even reduces) the enhancer binding on the has a reasonable probability of fixation (2s when S 9 one the duplicate copies is effectively neutral. 1).As population size increases, the probability of fixa- Finally, we need to stress that our results assume that tion for any null allele decreases while the fixation prob- the molecular events generating the duplication leaves ability for the advantageous mutation approaches 2s, both copies completely functional. The duplication increasing the probability that an advantageous allele event itself may create anew function by generating an is fixed first. In our example, as A', increases to 50,000 internal partial duplication or chimeric sequence in and 500,000, this probability increases to 9 and SO%, the newly duplicated gene (e.g., SMITHIESet al. 1962). respectively. Alternatively, the duplication event may inactivate one As we noted above, the assumptions used in this sim- of the copies, such as duplication by processed pseu- ple model are likely conservative in that when violated dogenes (.g., WasH 1985). they result in a larger probability that an advantageous OHTA(1987,1988a-c, 1988; BASTENand OHTA1992) allele is fixed before a null allele. An especiallyim- has done extensive computer simulations on the accu- portant assumption is the neutrality of null allele geno- mulation of both null and new functional alleles in a types. Results reviewed at the beginning of this paper multigene family. Ohta's simulations are in many as- suggests any errorintroduced by this assumption is pects more realistic than the simple analytic model con- likely small when only double null homozygotes (indi- sidered here in that she started with a single copy and viduals with four null alleles) are selected against. How- allowed copy number variation to be generated via un- ever, if individuals carrying three (or fewer) null alleles equal crossing over. We regard our analysis as verycom- are also selected against, the time to gene silencing is plementary to these simulation results, whichwere considerably longer than under neutrality and hence largely concerned with a gene family under positive the probability of fixing such null mutationsis consider- selection for diversification, such as is expected to occur ably less than we have assumed, greatly increasing the with immune system genes. Our analysis is more inter- probability that an advantageous allele is fixed first. ested in the fate of very small gene families by asking HUGHES andHUGHES (1993), examining the substitu- how often an unlinked duplication of a single gene is 428 J. B. Walsh likely to lead to new gene function. Nevertheless, there CIARK,A. G., 1994 Invasion and maintenance of a gene duplication. Proc. Natl. Acad. Sci. USA 91: 2950-2954. are unified features of the two approaches. In particu- ENVER,T., N. RAICH, A. J. EBENS,T. PAPAY~NOPOULOU,F. COSTAN- lar, OHTAshowed that the quality TINI et al., 1990 Developmental regulation of human fetal-to- adult globin gene switching in transgenic mice. Nature 344: 309- 313. FISHER,R. A., 1935 The sheltering of lethals. Am. Nat. 69: 446-455. GILBERT, w., 1978 why genes in pieces? Nature 271: 501. HALDANE,J. B. S., 1932 The Causes ofEvolution. Longmans Green, was critical in determining the relative frequencies of London. null and functional alleles in her evolving gene families. HALDANE,J. B. S., 1933 The part played by recurrent mutation in evolution. Am. Nat. 67: 5-19. When R is small most sequences in the gene family are HUGHES,M. K., and A. L. HUGHES, 1993 Evolution of duplicate null alleles, while when R is large most sequences are genes in a tetraploid animal, Xenopus lamis. Mol. Biol. Evol. 10: functional. The U in Equation 23 refers to the fixation 1360-1369. KIMURA, M., 1957 Some problems of stochastic processes in genet- probabilities of new mutations, u denotes the mutation ics. Ann. Math. Stat. 28: 882-901. rate and the subscripts + and - denote advantageous KIMURA, M., and J. L. KING, 1979 Fixation of a deleterious allele at one of two “duplicate” loci by mutation pressure and random and null alleles,respectively. From Equation 1, the drift. Proc. Natl. Acad. Sci. USA 76: 2858-2861. probability that an advantageous allele is fixed first in LJ, W.-H., 1980 Rate of gene silencing at duplicate loci: a theoretical our simple model can be expressed as study and interpretation of data from tetraploid fishes. Genetics 95: 237-258. MARUYAMA,T., 1977 StochasticProblems in Population Gnrtirs. UA UI R (R for R< 1 Springer-Verlag, Berlin. MARWAMA,T., and N. TAKAHATA,1981 Numerical studies of the for R S 1 frequency trajectories in the process of fixation of null genes at duplicate loci. Heredity 46: 49-57. MCGRATH,J. M., M. M. JANCSO and E. PICHERSKY, 1993 Duplicate sequences with a similarity to expressed genes in the genome of Arabidqpsis thaliana. Theor. Appl. Genet. 86: 880-888. 1 NEI, M., 1970 Accumulation of nonfunctional genes on sheltered chromosomes. Am. Nat. 104 311-322. NEI,M., and A. K ROYCHOUDHURY,1973 Probability of fixation of Hence, the conditions for functional alleles to domi- nonfunctional genes at duplicate loci. Am. Nat. 107: 362-372. nate over null alleles inOhta’s simulation are essentially OHNO,S., 1970 Evolution by heDuplication. Springer, Berlin. OHTA,T., 1987 Simulating evolution by gene duplication. Genetics identical to our conditions for new advantageous alleles 115: 207-213. predominating over null alleles. Given the Ohta’s simu- OHTA,T., 1988a Further simulation studies on evolution by gene lations allowed for unequal crossing over and started duplication. Evolution 42: 375-386. OHI‘A,T., 1988b Timefor acquiringa new gene by duplication. with a single copy (and hence considered many of the Proc. Natl. Acad. Sci. USA 85: 3509-3512. unique evolutionary features of multigene families), OHTA,T., l988c Evolution by gene duplication and compensatory advantageous mutations. Genetics 120: 841-847. while our simple model ignores most of these features OHTA,T., 1989 Time for spreading of compensatory mutations un- suggests that our model captures robust features of the der gene duplication. Genetics 123: 579-584. evolution of new gene function. SMITH,C. W. J., J. G. PATTON and B. NADAI.-GINARD,1989 Alterna- tive splicing in the controlof gene expression. Annu. Rev. Genet. ANDY CLARKand two anonymous reviewers provided useful com- 23 527-577. ments. This work was supported by the National Science Foundation. SMITHIES,O., G. E. CONNEIL andG. H. DIXON,196‘2 Chromosomal rearrangements and the evolution of haptogolin genes. Nature 196: 232-236. SPOFFORD,J. B., 1969 Heterosis and the evolution of duplications. LITERATURE CITED Am. Nat. 103: 407-432. TAKAHATA,N., 1982 The disappearance of duplicate gene expres- ~RAMOWITZ,M., and I. A. STEGUN,1970 Handbook ojMathematica1 sion, pp. 169-190 in MolecularEvolution, Polymorphism and Functions w’thFormulas, Graphs, and Mathematical Tables. National the Ne-utral Theq, edited by M. KIMuRA,Japan Scientific Societies Bureau of Standards, Washington, DC. Press, Tokyo. AI.I.ENDORF, W., 1979 Rapid loss of duplicate gene expression by F. TAKAHATA,N., and T. MARUYAMA,1979 Polymophism and loss of natural selection. Heredity 43 247-258. duplicate gene expression: a theoretical study with application BAILEY,G. S., R. T. M. POULTERand P. A. STOCKWEI.I.,1978 Gene to tetraploid fish. Proc. Natl. Acad. Sci. USA 76: 4521-4525. duplication in tetraploid fish: model for gene silencing at un- THOMAS,B. R., V. S. FORDS,E. PICHERSKYand L. D. GOTTLIFX,1993 linked duplicated loci. Proc. Natl. Acad. Sci. USA 75: 5575-5579. Molecular characterization of duplicate cystosolic phosphoglw BASTEN,C. J., and T. OHTA, 1992 Simulation study of a multigene cose isomerase genes in Clarkia and comparison to the single family, with special reference to the evolution of compensatory gene in Arabidopsis. Genetics 135: 895-905. advantageous mutations. Genetics 132 247-252. WAI.SH,J. B., 1985 How many processed pseudogenes are accumu- BRIDGES,C. B., 1935 Salivary gland chromosome maps. J. Heredity lated in a gene family? Genetics 110: 345-364. 26: 60-64. WATTERSON,G. A,, 1983 On the time for gene silencing at duplicate CI-IrusTImsEN,F. B., and 0. FRYDENBERG,1977 Selection-mutation loci. Genetics 105: 745-766. balance for two nonallelic recessives producing an inferior dou- ble homozygote. Am. J. Hum. Genet. 29 195-207. Communicating editor: A. G. CIMK