Downloaded by guest on October 3, 2021 icto n lutaetersso nepeigsc aito in variation such framework. interpreting adaptive of purely risks a the diver- illustrate cellular and of useful sification aspects potentially numerous a understanding provide for results framework identi- These to pressures. exposed selection lineages par- among cal any exist variation may for significant 3) mean conditions, and in life; population-genetic of of tree set power the ticular the across in drift differences genetic by random molded of be allometric to gradients (i.e., likely size 2) opti- are organism achieved; relationships) 1) to respect rarely that with only shown phenotypes mean is be of it may basis, states genetic phenotypic framework additive mal a an Using with bias. traits of for absence the selective in functionally from even away alternative, optima, phenotypes mean of draw can multiplicity states addition, the equivalent In traits, phenotypes. long- a multilocus mean the have for of of can dispersion distribution and steady-state size, location term organism the on with size, influence associated population substantial effective negatively genetic is selection, the invariant by which of defined face barrier during the drift gen- the in pressures, to conspire even However, mutation selective and variation. drift erate stable of forces may of stochastic Moses) traits the M. which periods cellular Alan and mechanisms, long Burger Reinhard homeostatic by experience reviewed internal 2020; 10, to January review Owing for (sent 2020 12, March Lynch, Michael by Contributed a Lynch Michael by barrier imposed drift traits the cellular of scaling evolutionary The www.pnas.org/cgi/doi/10.1073/pnas.2000446117 the on Many given. 9). be for (8, could that atoms selected examples sulfur argued or other be nitrogen, carbon, have may of numbers nucleotides some of basis or and influence acids (7), respectively) amino fluidity 6, specific and and 20 width The than (6). tails membrane fewer fatty-acid protein the (generally per in lipids bonds sites double of modified and atoms of carbon number of numbers small a in described of be terms signal- can pathways) many protein-disposal as in and such transduction (involved modifications ubiquitinylation viewed Posttranslational and be (5). phosphorylation of can manner MicroRNA-binding similar motifs interfaces a preferred 4). protein–protein in the and (3, to interactions factors matches defined site transcription be of can cognate number length) the in their bp of 12 terms than more in no bind- (typically Transcription-factor sites manner. ing analog a an in than manifest effectively rather are digital biology cell biological of aspects actual many example, with acces- connections being as explicit viewed no is features. with but any effects that sible, phylogenetic mutational such or of assumed cellular, distribution is continuous molecular, parame- a different population-genetic Often, in a loci key contexts. specific exist on remains single of that constraints there biology ters of the underlying Nonetheless, and dynamics the 2). traits, theory, evolutionary (1, between the traits disconnect in quantitative muta- drift and selection, of enormously and roles been combined tion, has mutation the platform elucidating and at This successful . coefficients alternative selection to rates general assigning usually T bias mutation ellrevolution cellular idsg etrfrMcaim fEouin rzn tt nvriy ep,A 85287 AZ Tempe, University, State Arizona , of Mechanisms for Center Biodesign h eal pert atri ubro otxs For contexts. of number a in matter to appear details The r h vltoaybhvo fallsi eei way, generic a in alleles of consid- behavior typically evolutionary the population theoretical ers of field he | admgntcdrift genetic random a,1 | vltoaytheory evolutionary | pia phenotype optimal | doi:10.1073/pnas.2000446117/-/DCSupplemental at online information supporting contains article This 1 the under Published interest.y competing no Toronto.y declares of author University The A.M.M., and Vienna; of University R.B., Reviewers: uhrcnrbtos ..dsge eerh efre eerh nlzddt,and data, analyzed research, paper. performed the research, wrote designed M.L. contributions: Author ahwt w lentv lei states, allelic alternative with two model with simple each a with start We Theory General of patterns influences covariation such how considera- to variation. (10), given life genetic be of random will tree tion the and across rate nonindependent of mutation are forces drift, the primary notably the most across wander because evolution, to Second, free are space. drift mean-phenotype and identical selection, to exposed mutation, bar- populations of drift which forces to (the degree drift the genetic on random and rier) of scal- power to size) the population with subject and/or associated size are organism phenotypes with (e.g., the mean relationships on ing expected given regime which be will a to to results under extent degree Here, diverge pressure. the to selection consider invariant free to of are desirable lineages is phylogenetic explained it be which , can of variation terms phenotypic in all that view is common bias. mutation the this of absence below, selection the discussed in of even is ability case, As the the always states. with virtually higher-fitness compete can promote accessi- more this to are mutation, states by allelic suboptimal ble certain molecular if fitness. of addition, suboptimal of process In state neutral permanent same a essentially the with combined nearly an evolution have in two may resulting case, optimum the fitness, the not is straddling integer this states an If allelic with granularity. underlying coincides the it of unless multiple unattainable partic- be a for may transfer trait example, or ular binding For molecular of consequences. energy evolutionary optimum the unique have may ues mi:[email protected]. y Email: oe fditrte hnarflcino dpiedivergence. the adaptive of in reflection a differences than rather lineage-specific be drift of of may power features consequence cellular simple in variation a Thus, of bias. directional amount determinant of substantial absence important the a an in is even patterns, Mutation such size life. of in organism of to result tree respect the will with selection, across sizes phenotypes constant mean population of of effective face gradients genetic the suffi- in in with even variation effects, of small classes phylogenetic ciently are among there phenotypes Provided mean lineages. pressures. sig- in generate selective variation to conspire stable nificant still traits of mutation and cellular periods drift Nonetheless, mechanisms, long homeostatic experience may internal to Owing Significance h hoydvlpdblwhstocnrlgas is,given First, goals. central two has below developed theory The val- discontinuous to traits molecular simple of restriction The y NSlicense.y PNAS . y L https://www.pnas.org/lookup/suppl/ qiaetsts(factors), sites equivalent NSLts Articles Latest PNAS + and contributing −, | f10 of 1

EVOLUTION positively and negatively to the trait (Fig. 1). Under this model, ematically, it is common in to use one of two for all but the two most extreme , a multiplicity of alternative approximations. functionally equivalent classes exists with respect to the num- One extreme view is that drift is weak enough relative to ber of positive alleles, m, defined by the binomial coefficients. mutation and selection that the population can be treated as As an example, for the case of L = 4, there are five genotypic effectively infinite in size, yielding SI Appendix, Eqs. S3a and S3b. classes (m = 0, 1, 2, 3, 4, and 5), with multiplicities 1, 4, 6, A second approach assumes a situation in which, relative to the 4, and 1, respectively (Fig. 1). With equivalent fitness for all power of drift, mutation rates are low enough and the strength members (haplotypes) within a particular class, this variation in of selection high enough that polymorphisms are almost never multiplicity of states plays an important role in determining the present, with fixation of alternative monomorphic states being long-term evolutionary distribution of alternative classes. The the norm, and the long-term mean frequency of the beneficial site-specific per-generation mutation rates from the − to the + being state, and vice versa, are defined as u01 and u10, respectively. Unless stated otherwise, a haploid, nonrecombining population S ∗ βe will be assumed. The absolute population size consists of N p1 ' S , [1] individuals, so that a de novo mutation has initial frequency 1 + βe 1/N . However, the effective population size, Ne , which is gen- erally no greater than N and often considerably smaller, governs where S = 2Ne s is the strength of selection scaled to drift, and the magnitude of random and as discussed below β = u01/u10 is the ratio of mutation rates. Under this weak- is defined by the remaining features of the population-genetic mutation/strong-selection scenario, referred to below as the environment (including the mutation bias and the strength of sequential-fixation model, the equilibrium distribution depends selection). only on the product of the two ratios of rates, not on the This type of biallelic model has been widely exploited in absolute values of their components. Although the concern theoretical studies of the genetic structure of quantitative (mul- here is with the long-term probabilities of alternative states tilocus) traits (2), including cell-biological features (4, 5). Most at a particular locus, Eq. 1 was derived previously to describe quantitative-genetic theory on such systems has focused on the genome-wide expected frequencies of codons for partic- the equilibrium within-population variation under selection– ular amino acids (11, 12). Note that under neutrality (s = mutation balance in infinite-sized populations. However, in finite 0), all three approaches yield identical results, with the mean populations, the stochasticity of mutation and drift ensures frequency of the A allele being the neutral expectation η = that distributions are not fixed. Instead, the mean u01/(u01 + u10). genotype is expected to wander over time within limits dic- Although the deterministic and sequential models have very tated by the strength of selection and the structure of the different implications for the standing levels of variation within underlying genetic system. The goal here is to determine the populations, often in comparative biology, just a single genome long-term probability distribution of genotypic means defined will be sampled per taxon, rendering the matter of within- by the joint forces of selection, mutation, and random genetic population variation moot. Thus, it is desirable to know the drift. Justification of this quasi–steady-state view derives from domains in which these two approximations (if either) provide the fact that many internal cellular traits have functions (and the most reasonable approximation of actual long-term mean cytoplasmic environments) that likely remain relatively stable allele frequencies. Comparison with the general analytical solu- for millions of years (even in the face of a changing external tion, SI Appendix, Eq. S2, allows several general conclusions. environment). First, the sequential model begins to substantially overestimate the frequency of the beneficial allele once Ne u01 exceeds 0.01 A Single Biallelic Locus. As a point of departure, the case of a sin- (Fig. 2). These deviations are particularly large when outside gle locus with reversible mutation will highlight several key points the realm of effective neutrality, i.e., for Ne s > 0.1. Second, and provide a useful backdrop for the evaluation of more com- for weak to moderately strong selection (0.1 < Ne s < 1.0), a plex traits. Allele A (denoted as state 1 below) is taken to have a critical point near Ne u01 = 0.1 separates domains in which the fractional selective advantage s over allele a (denoted as state 0). sequential vs. deterministic model fits the data more closely. At For this limiting case, previous work provides an exact solution this crossover point, both of the simple models overestimate for the equilibrium mean frequencies of alleles (SI Appendix). the actual mean frequency of the beneficial allele, often sub- However, because the final expression is a bit cumbersome math- stantially so. Third, the predictions of the deterministic model converge on the results of the general model once the num- ber of beneficial mutations (Ne u01) entering the population

number of – sites exceeds 1.0 per generation. As the population-wide mutation mutaon rate, - → + rates (Ne u01 and Ne u10) increase beyond this point, mutation fixaon probability pressure begins to overwhelm both selection and drift, and the ++ - - asymptotic frequency of the beneficial allele converges on the + - - - + - + - -+++ neutral expectation, η. 4u · f 3u · f 2u · f u · f 01 01 -+ -- 01 12 + - - + 01 23 + - + + 01 34 ---- ++++ Two other points are illustrated in Fig. 2. First, in the pres- --+ - -++ - + + - + ence of mutation bias, there can exist a wide range of population u10 · f10 ---+ 2u10 · f21 -+ - + 3u10 · f32 + + + - 4u10 · f43 --+ + sizes in which the most common allelic state does not match the p < 0.5 m=0 1 2 3 4 optimal state; e.g., e1 . Second, situations may be common Genotypic Values in which there is ample phylogenetic variation in allelic state despite the constancy of selection. For example, in the domain Fig. 1. Schematic for the transition rates between adjacent classes under of the sequential model, when the directional forces of mutation the sequential-fixation model for the case of L = 4 sites. This schematic read- and selection balance such that βeS = 1, the mean frequencies ily generalizes to any value of L. u and u are the mutation rates from − 01 10 of the alternative allelic states will be equal to 0.5. In the absence to + allelic states, and vice versa, and fij is the probability of fixation of a newly arisen mutation to allele j from a background of i, defined as SI of knowledge on mutation bias, such cases could easily be mis- Appendix, Eq. S5. The number of − alleles in a class is denoted by m, and interpreted as implying neutral genotypes, when in fact the pop- except for the two extreme classes (m = 0 and m = 4), there are multiple ulation is under persistent selection in the opposite direction of equivalent genotypic states within each class of genotypic values. mutation bias.

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.2000446117 Lynch Downloaded by guest on October 3, 2021 Downloaded by guest on October 3, 2021 Lynch ttdohrie h eeca ae(u rate dis- one. equilibrium beneficial former ratio the the the on otherwise, on effect it stated only no low, depends has sufficiently which scaling are tribution, this rates that mutation out fur- the turns shown and provided above below, implied As ther site. nucleotide per rate known of nucleotides) adjacent that might of (which cluster site a per the be rate under deleterious-mutation performed a were of analyses assumption following the Thus, the 13). with (10, negatively scales site nucleotide per eeal al nterneof range the in falls generally for values Eqs. Appendix , Eq. (SI model Appendix, size (SI population model infinite general the under model, locus 2. Fig. xetdgntp itiuinadteefciepplto size population effective (N the and distribution genotype expected unattainable. is optimum the a range, of values, SD integer the to as to approached is (analogous if neutrality directional and purely fitness is Selection the distribution). of normal width the of where will we here function, individuals that fitness such class to the genotypic function in of states (bell-shaped) Gaussian form genotypic a the general assume the are to hereafter relating respect developed with function concepts the a Although fitness. specify to essary Sites. Two selection the with use inversely scales results product case optimal size constant this single population the a by infinite for inferred The case coefficient site). approximate (u a the alleles be at in would nucleotide deleterious (as case to rate the beneficial reciprocal (N the of for equal are rate points sizes mutation solid population The effective as and actual given the are which simulations Wright–Fisher (Eq. model from sequential-fixation the and lines), sacnrlga st eemn h eainhpbtenthe between relationship the determine to is goal central a As

e Mean Frequency of Beneficial Allele ,i sdsrbet efr nlsswt elsi parameter realistic with analyses perform to desirable is it ), u 0.2 0.4 0.8 0.6 10 1 10 vrg eeca leefeuny(ogtr vrg)frasingle- a for average) (long-term frequency allele beneficial Average θ 0 = -5 steotmmpeoyi au,and value, phenotypic optimum the is N rceigt oecmlxtat,i eoe nec- becomes it traits, complex more to Proceeding .00011 e n h uainrt.Ars h reo life, of tree the Across rate. mutation the and N N N N 10 m θ PopulationMutation Rate, e e e e N s s s s -4 =1.0 =0.1 =3.0 edntb,adif and be, not need =0.0 e W aefitness have −0.76 m N = hc sapoiaey1 ie the times 10 approximately is which , . e 10 −( 10 -3 m ω −θ 4 Although ∞. → to ) 2 /(2ω 10 Ns 10 ogdse ie) Results lines). long-dashed 1, 9 01 θ 10 n h uainrate mutation the and , 2 o ahcre othat so curve, each for -2 S3a sotieo the of outside is sstto set is ) ) , −7 β ∼0.76 and = at N u oi ie) the lines), solid S2, e ω short-dashed S3b, N 10 01 u 10 m 01 stretimes three is ) e sameasure a is /u e oe of power -1 10 = 10% = sconfined is θ 10 N 0 = Unless . = 4 fthe of such , 1,000). 0 2) (0, or [2] s N N 10 L, in e e 0 in,teei rgesv ucsino h rviiggeno- prevailing the of increasing succession with progressive classes a type is smaller there the tion), by 2 of class factor but a class), by this 2 of of 1 factor and multiplicity class the 1 of being class frequency 2 toward the the pressure weights although mutation alleles geno- because fitness, equivalent results 2 have bias and genotypes This 1 3). class (Fig. large the very by at straddled (µ mean genotypic is the types, Eq. optimum to the under- be where reference can which by of all stood model, one-site the with encountered of case and this frequency equilibrium an indicate to to reduces tegho eeto,tegntpcma a xii con- a exhibit and can pattern mean constant with 1. a genotypic gradient class of the face of siderable selection, the incidence of in the that, strength decreasing show results very thereby These with fixation, Finally, near 2. to class remains against there large but pressure 1, mutation class promoting effective the at effective to more owing becomes predominates toward 0 bias class mutation neutrality, effective impose out cancels it phenotype mean as The matter, constant. not normalization does the reference through specific the 0); class cso l ftecefiinspitn padyaddownwardly and upwardly pointing coefficients the prod- adjacent of the all all to of proportional ucts between are frequencies connections steady-state the nonzero states, are there Provided both probabilities, but fixation prefactor, the a as via defined as solution omitted equilibrium the is influence it way, same the size expressions depend population coefficients the absolute numerical of of the numbers where products the 1, the on Fig. in to arrows equal geno- the adjacent then on between are rates to classes flux restricted typic essentially absolute being The an changes states. in incremental adjacent proceeds with evolution manner, that stepwise enough rare are tions arbitrary with Traits. lineages Complex Moderately in time over three all expected of be incidences intermediate appreciable can that classes also genotypic and drift of power case which in fitness, highest the (e.g., genotype with reference class some the from fitness of of right deviations of specific the values coefficients, to on selection indexed takes terms The term the binomial of the sum and the of constant cal normalization the where by given Traits Complex erately the into lumped being m hslna eunilmdlhsarltvl ipesolution. simple relatively a has model sequential linear This eod o h aei which in case the for Second, not complexities introduces site second a to expansion This ihtosts hr r he osbegntpcclasses, genotypic possible three are there sites, two With ,1 2 1, 0, = N e eeto eoe fceteog odiecas2 class drive to enough efficient becomes selection , β trigwt h eunilfiainve htmuta- that view sequential-fixation the with starting L, ihtepeoyial equivalent phenotypically the with , 2 Eq. Appendix, SI 2η m . N h ubro alleles.) + of number the , N ntecs fnurlt.(Throughout, neutrality. of case the in e e . n nta ean uhcoe to closer much remains instead and , p e − m h ogtr qiiru rqece are frequencies equilibrium long-term the , and m = − µ 1 = N C m N lee.Wt increasing With alleles. + e = · enwtr otegnrlcs of case general the to turn now We nune l uainlflxrtsin rates flux mutational all influences S5.

m ls.Frraosdsrbdin described reasons For class. N ie ihnec ls.Bcuethe Because class. each within sites is,frtecs of case the for First, 3. wn nieyt hne nthe in changes to entirely owing m X ee ece h piu,even optimum, the reaches never ) m L e 2 =0 When . ! m θ β s m 2 = m C · e r esrso h class- the of measures are , p e 2N m seult h recipro- the to equal is NSLts Articles Latest PNAS N pr ietoa selec- directional (pure e µ e s ,2 1 2, 1, m odnt en in mean, a denote to ssfcetylwto low sufficiently is , C +− s m for for and 1 = N m e m N selection , − −+ p e 0 = ,1 2. 1, 0, = 2β W | θ and µ sused is 1 = m states f10 of 3 Mod- m o2, to (the 1 = or , [4] [3] N .5, − e

EVOLUTION of genotypes under selection are simple transformations of the neutral expectations, with each class being weighted by the expo- nential function of the scaled strength of selection eSm , with Sm = 2Ne sm ,

Sm pem = C · pen,m · e . [6] Provided η is not overly close to 0.0 or 1.0, the neutral distri- bution is approximately normal, and if 0  θ  L, the steady- state distribution of mean phenotypes with selection will also approach normality, with mean

2 2 (θ/σS ) + (Lη/σN ) µm ' 2 2 [7a] (1/σS ) + (1/σN )

and variance

2 1 σm ' 2 2 , [7b] (1/σS ) + (1/σN )

2 2 where σS = ω /(2Ne ) (14). Under these conditions, the grand mean is the average of the expectations under mutation and selection alone, with each component weighted by the inverse of the variance under the relevant conditions. 2 As σS → ∞, which implies a flatter fitness function, the mean and variance converge on the expectations for a purely neutral 2 2 process, Lη and σN . As σN → ∞, which implies a flattened dis- tribution under mutation and drift alone, the mean and variance converge on the expectations for a purely selection-driven pro- 2 cess, θ and σS . The pivot point separating these two domains 2 is Ne = ω /[2Lη(1 − η)], showing that an x-fold increase in the width of the fitness function shifts the critical point to an x 2-fold larger Ne . Eq. 6 predicts the steady-state distribution of the most recently Fig. 3. The response of the long-term mean genotypic state (number of + alleles, Upper) and the underlying mean class frequencies (Lower) over fixed state, and one must acknowledge that some small amount of a gradient of effective population sizes for a two-locus, two-allele model. exists in the time intervals between fixation. Thus, The mutation rate to beneficial alleles is 10% of that to deleterious alle- it is desirable to know how closely the preceding expressions les. Results are given for three phenotypic optima, with the width of the represent the true sampling probabilities when polymorphism fitness function ω = 5,000. In Lower panel, for each color-coded optimum, is allowed for. To evaluate this issue, extensive Wright–Fisher the mean frequencies of the three genotypic classes (0, 1, 2) are given as simulations were carried out, with recursive episodes of selec- solid, dotted, and dashed lines, and for each Ne sum to 1.0. Data points tion, mutation, and drift, for long enough periods to achieve in Upper panel were obtained by computer simulations of a Wright–Fisher highly precise estimates of the distributions of means to compare model, showing the near perfect agreement with the sequential-model with the analytical expectations for pm (SI Appendix). Because approximation. e violations of the sequential model will be a function of the absolute number of mutations entering the population per gen- eration, it is essential to perform such evaluations with realistic toward the state of interest. Prior results (4) lead to Eq. 3, parameter estimates for Ne and the mutation rate, and the which shows that the equilibrium frequencies of the alterna- scaling relationship between the mutation rate and Ne noted tive genotypes are functions of three factors: 1) the multiplic- above was adhered to. The consistency of the simulation results ity of configurations, as defined by the binomial coefficients; with the analytical approximations, even with L as high as 50 2) the ratio of mutation rates; and 3) the strength of selec- (Fig. 4, Left), justifies the use of Eq. 6 up to this level of tion scaled by the power of random genetic drift. All other granularity. things being equal, the within-class multiplicity magnifies the This expansion to larger numbers of sites again shows signifi- likelihood of residing in such a state. However, for β 6= 1, cant scaling of mean genotypic values with Ne for a wide range the net effect of the multiplicity is further modified by the of conditions, upholding the conclusion that situations exist in mutation bias term, βm . which the optimal phenotype is rarely achieved even in very Under neutrality (denoted by n), sm = 0 for all m, and Eq. 3 large populations (Fig. 4, Left). Moreover, for fixed selection simplifies to a binomial distribution and mutation functions, the direction of scaling of the mean ! genotype with Ne depends on the complexity of the underlying L m L−m pn,m = η (1 − η) [5] trait (L). If the genotypic mean in the absence of selection (Lη) e m exceeds the genotypic optimum, there will be negative scaling of µm with Ne , regardless of the direction of mutation bias. In with η = u01/(u01 + u10) being the expected frequency of + alle- addition, there will always be an intermediate level of L, such les at each site. In this case, the long-term mean and variance that the mean under neutrality fortuitously equals the selective 2 of the trait are µN = Lη and σN = Lη(1 − η), respectively. Com- optimum, i.e., Lη = θ, at which point there is no response of the parison of Eqs. 3 and 5 shows that the steady-state probabilities mean phenotype to Ne . Finally, for particular population-genetic

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.2000446117 Lynch Downloaded by guest on October 3, 2021 Downloaded by guest on October 3, 2021 eeeiu fet 1,1) ttsia neecsbsdo the Lynch on based inferences toward Statistical mutations 17). of (16, bias strong effects a consistently fit- deleterious with other accordance and species rate in growth traits, diverse in ness decline per-generation across slow a lines reveal with mutation- nega- (MA) consistent bottlenecked the serially are accumulation of in evidence studies biased First, of conditions. are lines both mutations Numerous they if small direction. with facilitated that tive pool species further effects substantial in be enough a fixation will be small to must drift with there can mutations trait, deleterious a such of of evolution the where was effects expo- mutational an independent evaluated, conditions, with selective function analyze such other fitness To under nential minimization. many traits error of to biomass, and evolution rates apply into the catalytic may resources as such selection limiting traits directional other ability persistent a and organism’s and an of energy promote cost convert relentlessly bioenergetic to expects to one simple generalization, selection first-order the natural a e.g., As (15). the way, pair in nucleotide some nucleotide every in essentially on genome the depends likely approximate growth which cellular closely the rate, is conditions to example such the model One distribution. violating population sequential steady-state 0.01, the the exceed entering for often mutations necessary will of number generation total per the trait, a Loci. of Numbers Large 4, (Fig. optimum the below muta- and above of Right). both direction be can the these bias, on tion depending and appreciable frequencies, have expected often classes genotypic multiple the environments, with agreement perfect near the showing model, Wright–Fisher a value of genotypic simulations optimum computer with (Bottom), by obtained (Right were approximation. points sequential-model plotted color-coded the panels, are ω 4. Fig. = o admgntcditt moeasgicn are to barrier significant a impose to drift genetic random For L ,0,adtertoo uainrtsis rates mutation of ratio the and 5,000, = oe Left (Lower sites. 5 m (Left stenme fbnfiilallsi h genome. the in alleles beneficial of number the is epneo h engntpcvlet hnei h fetv ouainsz,frtecs nwihtefins ucinhswidth has function fitness the which in case the for size, population effective the in change to value genotypic mean the of Response ) W hnlrenmeso oicnrbt to contribute loci of numbers large When h fet ficesn h ubro ie from sites of number the increasing of effects The ) m (1 = − s qiiru eoyedsrbtos(ubrof (number distributions genotype Equilibrium ) ) θ ( = L−m n it ftefins function fitness the of width and 7.0 u 01 ) /u , N 10 e = n hsprocess this and , pe Left (Upper 0.1. [8] h fet ficesn h piu hntp (θ phenotype optimum the increasing of effects The ) unyo eeeiu lee sapproximately is alleles deleterious of quency N lto ie(N size ulation s u near frequencies u deleterious-allele expected (s effects have with Sites selection. not 1/N to is limits the others about in selection but by lineages removal some deleteri- doubt. ensure in with in fixation to mutations allow enough of to large enough pools small below large effects of far ous existence typically the fitness Thus, in insertions, reductions nucleotide fractional small of bioener- comprise costs of typically Third, which the values. many of to near-zero as expected considerations toward be getic and skewed can more effects (26), protein- even of in case distribution be rea- sites true the theoretical nonsynonymous the the , are be on coding There only to focus (21–25). this studies these 0.0 expecting at) for not deleterious sons (if have organisms near diverse being than in smaller site- mutations that of effects the suggest 40% on commonly to alleles studies 10 segregating from of spectra derived frequency effects the inferences absolute and indirect having deleterious zero Second, distributions both from the for of indistinguishable modes bulk mutations the with beneficial effects, and fitness distri- skewed of highly butions imply performance MA-line of distribution fetv ugn yslcina h ouainsz nrae.A large increases. with size the emerges of population that the from domain complication as the move selection to by degree alleles pressure purging mutation the effective deleterious by then of accumulation is of classes issue domain specific central which The to mutation. of that exceeds L hc sna eowe h tegho eeto greatly selection of strength the when zero near is which ), = 01 01 e ω hs bevtosalwsm ipeqaiaiestatements qualitative simple some allow observations These o5,frtecs nwihteotmmpeoyeis phenotype optimum the which in case the for 50, to 2  hsqatt eed nyo h uainrt ai (β ratio mutation-rate the on /u only depends quantity This ). = e ,0.Rslsfo Eq. from Results 5,000. ilb ihyssetbet h aaiso eei rf and drift genetic of vagaries the to susceptible highly be will 10 (1/s ,nto h bouemtto ae ncnrs,when contrast, In rate. mutation absolute the on not ), + lee)frstain nwhich in situations for alleles) eeto vrhlsdit n h xetdfre- expected the and drift, overwhelms selection ), e sotnodr fmgiueblwteactual the below magnitude of orders often is ) 10 −5 ∼10% 6 r ie o orefciepplto sizes. population effective four for given are ihtemd fti olagain pool this of mode the with , fd oomttos(7,imply (27), mutations novo de of L = L ,frtecs nwihthere which in case the for ), NSLts Articles Latest PNAS ,ad50 and (Middle), 20 (Top), 10 sta h fetv pop- effective the that is u 10 mle than smaller ) <1% /(u θ u 10 10 = 10 −4 .I both In 2. (18–20). + /(u | u f10 of 5 (15). 01 10 + = +

EVOLUTION population size (N ) owing to selective interference between 100 simultaneously segregating mutations, which greatly elevates the level of N required for the effective purging of mildly deleterious alleles. Although the ultimate goal is to derive analytical expressions for the distribution of mean phenotypes in this large-L domain, the system is sufficiently complex that it is necessary to perform Wright–Fisher simulations to determine the degree of validity of any equations that emerge. For these analyses, for each L, it -1 was assumed initially that s = 1/L, which is roughly consistent 10 with the empirical observation that the frequency of mutation types is a decreasing function of the selective effects. Each set of simulations involved a single fixed value of L: L = 104, s = 10−4; L = 105, s = 10−5; and L = 106, s = 10−6. Example outcomes of the grand means are given in Fig. 5, along with the limiting expec- Rate Growth Relative tations for single sites (from SI Appendix, Eq. S2, which assumes no background interference from ). These u01 = 0.01u10 results highlight the expectations noted above—at sufficiently 10-2 small N , the mean frequency of + alleles converges on the neu- 104 105 106 107 108 tral expectation, whereas at large N , it converges on frequencies 100 close to 1.0. Notably, however, the gradient in the mean fre- quency of + alleles is much shallower in the case of multiple linked loci than in the ideal single-locus case, particularly when s is small. The key remaining challenge is to obtain an approximate ana- lytical expression for the equilibrium behavior in Fig. 5. As outlined in SI Appendix, despite some intriguing relationships, a solution from first principles has not been obtained. Nonethe- 10-1 less, the general scaling behavior in Fig. 5 suggests a formula of the form

u01L µem ' , [9] u01 + u10f (s, N ) Rate Growth Relative

where f (s, N ) is a function of s and N . For Eq. 9 to yield appro- u01 = 0.1u10 priate behavior, f (s, N ) must asymptotically approach 0 and 1 in 10-2 the limits of large and small N . A simple expression that allows 104 105 106 107 108 −2Ns for such behavior, f (s, N ) = e , is attractive because this 100 yields the known result for the sequential model in which selec- tion operates on a set of single sites without selective interference (Eq. 1), and Charlesworth (ref. 28, equation A13) suggested this as a solution. However, whereas Eq. 1 yields correct results in the limits of sN  1 and  1, the predictions are too high at inter- mediate sN . The likely reason for the poor fit in this region of parameter space is that the genetic effective population size (Ne ) N can be substantially lower than , owing to parallel selection -1 operating on multiple polymorphic sites. 10 To gain insight into this matter, Eq. 1 can be rearranged to yield the effective population size necessary to yield the mean frequency of + alleles, pe1 = µem /L, observed with computer simulations, L = 104, s = 10-4 Relative Growth Rate Growth Relative L = 105, s = 10-5   pe1 L = 106, s = 10-6 Ne = (1/2s) ln . [10] u = u β(1 − p1) 01 10 e 10-2 104 105 106 107 As can be seen in Fig. 6, there appear to be three central determinants of Ne using this formulation. First, provided the Effective Population Size (Ne) strength of selection is at least 0.1/N , Ne /N scales negatively with the ∼3/4 power of Ns, reflecting the increased involve- Fig. 5. The equilibrium mean frequency of + alleles (i.e., µem relative to the maximum possible value, L) as a function of the absolute population ment of selective interference. Second, Ne /N increases weakly as the incidence of beneficial mutations declines, scaling neg- size. Results are given for three different levels of mutation bias for three atively with the ∼1/4 power of β, presumably reflecting the numbers of loci (L) with equivalent additive effects on the growth rate and multiplicative effects on fitness. Absolute mutation rates are defined reduced competition among simultaneously segregating bene- by the empirical scaling relationship noted in the text. Analytical results ficial mutations. Third, there is a further depression in Ne /N (solid lines) are given for the case of free recombination, and simulation with increasing L, presumably reflecting more background varia- results (dashed lines and data points) for the case of complete linkage. tion in fitness resulting from elevated numbers of simultaneously Throughout, s = 1/L, the inverse of the number of factors contributing to segregating polymorphisms. the trait.

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.2000446117 Lynch Downloaded by guest on October 3, 2021 Downloaded by guest on October 3, 2021 Lynch the effects. in equal least with at mutations interference, of selective context of effects be collective must the Eq. ture there involved, although selec- terms Thus, individual particularly second-order S2). small Fig. additional are Appendix, with that (SI loci and effects of values tive space numbers parameter simulated large of with of accurate range 10% full within the generally for are for that occupied substituting sites predictions by of alleles fraction favorable mean by the for expectation equilibrium from alleles. derives beneficial genotypes segregating beneficial simultaneously promoting in interference the increases 100 of factor decreasing sites of number being parameter key the then Eqs. Together, Eq. on text, based the behavior expected in the map lines solid the whereas and so io infiac.Tekysae eeto aaee in parameter selection 11 scaled key The Eq. significance. and minor deterministically of nearly is behaves population for the except param- case of 6), range (Fig. full space the for eter observations with consistent is which single a in included be can three relationships. all scaling expression, inspection, three by these However, encapsulating expression cal modifying and size) population each at varying (although 6. Fig. ihEq. With ertal,i a o enpsil odrv nanalyti- an derive to possible been not has it Regrettably, by L 1, = S 2s N S onsaeetmtsfo h vrg eairi simulations, in behavior average the from estimates are Points 1/s. ersini h fetv ouainsz safnto of function a as size population effective the in Depression clsol lwywith slowly only scales n provided and , e 2 = β ' svre ykeigtedltrosmtto rate mutation deleterious the keeping by varied is S 11 [2.5 + 1 N nyb atrof factor a by only e s 11 nhn,i hnbcmspsil oepesthe express to possible becomes then it hand, in L a hnb prxmtdb utpyn Eq. multiplying by approximated be then can , sas olna,wt 0-odices in increase 100-fold a with nonlinear, also is and S log ' 12 10 S Ns 2.5 ( (Ns yafactor a by L) hwta provided that show > log −1 5(Ns hsrdcsfrhrto further reduces this 1, /β 10 − Ns Ns ( N ) L) ln 1/4 /β ial,decreasing Finally, '6. > n h uainbias mutation the and − β h feto nraigthe increasing of effect The . ) hwn httebl of bulk the that showing '3, ] 100 ln 2.5 1/4 · β N 1/4 and e β · nEq. in . (Ns Ns 11 s > ) snttosmall, too not is per ocap- to appears 3/4 10 u hsyields This 1. 01 −5 . , snoted As 11. u nwhich in , 10 constant β β N with , Ns, e [12] [11] ya by /N β L , vr h fet flnaebcm nraigysbtnil With substantial. increasingly become linkage how- of lengths, effects block the increasing ever, With smaller. moderately only are (Eq. model, expectations muta- fitness the site exponential For the 7. and Fig. mean in herein, given employed are rates (L) tion lengths segment of range of range the in fall commonly 10 may of lengths range block the bining in falling of generally factor species a by length for linkage-block allowing uni- and every many asexually, most recombination reproduce in However, primarily frequency species populations. allele cellular unicellular expected sexual the obligately of description Eq. time experi- that sonable coalescence of suggesting the probability alleles, in high individual event a for recombination have one will bp least 2 at as encing small as span a of 13), range the in span monly a within of events length recombination order of of the number expected on time the is tion mean population The a within arm. 2N the alleles chromosome on of per generally coalescence crossover is to there one as of genome chromosomes, overall order of the numbers on and depend numbers size exact The species. lular (c bases to adjacent between rate into nation genome of timescale the the on subdividing nonrecombining coalescent. by the effectively mat- simultaneously are treated behav- the sites that be multiple blocks overall address with can to the system way recombining segregating that a simple demonstrated of a ior They by provides qualitatively. introduced (30) ter argument al. extremely an et be introduc- Good but would the simulations computationally, here, the employed demanding into nonrecombin- recombination sizes be of population to tion the assumed Given been have ing. modeled being ments Recombination. of convenient Effects mathematically neutrality, the effective of pheno- of model. mean principles limitations infinitesimal a the the advance under illustrating to fit- expected selection total as directional of type, of subdivision ability loci, 5 diminishes puz- progressively the Fig. of contributions the fine in all numbers increasingly results has into finite despite ness the with notably possible contrast, that In more is show miniscule. and selection being directional genome) effects (bounded allelic all loci that the that genetic feature of of fact zling size numbers the the finite with mathemat- by by inconsistent its encoded is are Despite model traits loci. that this individual such change elegance, are at no loci, ical effectively frequencies traits of with allele evolve small number of traits in infinitesimally of infinite values phenotypes with an mean alleles the genotypic over by distributed the influenced effects (2, be model, genetics to this quantitative assumed in Under used 29). widely half-Gaussian model a “infinitesimal” of case the in different, given for is are function expression relationships fitness derived scaling similarly exact a the although Finally, of use Eqs. in of indicated range wide scalings bias a mutational for the results on with tion, only depend rate to mutation appears the mean of scaling lar olwn hs og udlns eut fsmltoswt a with simulations of results guidelines, rough these Following recombi- average the that know we (31), survey prior a From the to contrast interesting an provide also results These particu- a on relied analyses preceding the although Notably, 5 10 e pi uhorganisms. such in bp hpod)to (haploids) −5 + s o nclua uaytsand eukaryotes unicellular for 1 = leefeunisaevr ls otoefrtesingle- the for those to close very are frequencies allele L b /L is ntepeeigaaye ( analyses preceding the in 2c o 4N L b e N x 10 dpod)gnrtos ooe hsdura- this over so generations, (diploids) e when 1) eeain ol xadteaverage the expand would generations 7 to oti on,tecrmsmlseg- chromosomal the point, this To 11 IAppendix. SI to 4c and 10 o L 8 L b L o nclua uayts(10, eukaryotes unicellular for 12 N niaeta h approximate the that indicate ≤ e x hs because Thus, . r o euirt fthe of peculiarity a not are n hs for those and 10, With . N 10 0 si h ag of range the in is ) NSLts Articles Latest PNAS h vle genotypic evolved the , −9 10 1 i.S3). Fig. Appendix, SI ih ev sarea- a as serve might 4 to N to 10 e 10 nmulticellular in −7 6 nonrecom- , o multicel- for N β naddi- In . e L | scom- is 10 100 = f10 of 7 10 4 −7 to

EVOLUTION 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

Mean Allele FrequencyMean 0.2 0.2 u /u = 1.0 01 10 u01/u10 = 0.1 0.0 0.0 104 105 106 107 108 104 105 106 107 108 Population Size 1.0

0.8 105 4 0.6 10 103 102 0.4 10 1 0.2 Mean Allele Mean Number of Sites u01/u10 = 0.01 0.0 104 105 106 107 108 Population Size

Fig. 7. Expected frequencies of beneficial alleles as a function of Ne and length of the chromosomal segment (in number of sites), for the case in which s = 10−5. Dots are results from computer simulations, whereas the dashed line is the single-site expectation given by SI Appendix, Eq. S2.

L = 1, a transition from the neutral expectation to near fixation that the strength of selection is so overwhelming that observed occurs over a 10-fold range of N , whereas with L = 105 this tran- mean phenotypes must precisely reflect optima dictated by the sition unfolds over a four orders-of-magnitude range of Ne . This forces of . is consistent with the gradual three orders-of-magnitude reduc- The results here illustrate the riskiness of such an optimization tion in Ne /N with increasing N noted in Fig. 6. The key point approach. Mutation can cause mean phenotypes to deviate from here is that because linkage-block lengths are expected to decline the optimum in substantial and often unexpected ways that are with increasing Ne , in actual comparative analyses of different not simply a function of the magnitude of mutation bias. Rather, species, the scaling of the drift barrier with Ne is expected to be when alternative, functionally equivalent underlying genotypes a of the kinds of functions shown in Fig. 7. exist for a trait, the multiplicity of certain intermediate combi- nations can result in a mutational pull of the mean phenotype Discussion away from the optimum. This effect becomes especially signif- Although much of population genetics is focused on standing lev- icant when the phenotypic optimum is far from the expected els of and the impact on short-term responses mean under mutation alone and even more so if the level to selection (2), comparative biology is generally concerned with of multiplicity for the optimum is relatively small relative to the divergence of mean phenotypes among phylogenetic lin- other phenotypic states. Such effects can extend to populations eages. There is, therefore, a need for theory focused more on with the largest known effective sizes, and, as noted previously such long timescales (32, 33). The approach taken here starts (14), cases may even exist in which the selection gradient is with the assumption that the forces of mutation, drift, and selec- sufficiently strong that the equilibrium mean-phenotype distri- tion remain relatively constant over long periods of time (tens to bution can have two peaks, one driven by selection and the hundreds of millions of years), a feasible scenario for many intra- other by mutation. cellular traits. Even if the steady-state assumption is not fulfilled With an increased understanding of the rates and molecular in every respect, the results still provide insight into the rela- spectra of mutation, the approximate bounds on Ne , and the tive likelihoods of lineages wandering into alternative phenotypic relationships between these two (10, 13), for cell biological fea- states under a given set of population-genetic conditions (34), tures with well-understood genetic bases, this framework now and the suggested approach can be readily modified to allow provides a foundation for exploring the consequences of alter- for additional factors such as fluctuating selection or alternative native selection functions for the scaling and diversification of mutation functions. This framework of jointly accounting for the cellular features with lineage-specific estimates of Ne . The chal- forces of selection, mutation, and drift is similar in spirit to the lenge for future empirical work will be to obtain estimates of the steady-state distributions of allele frequencies at single diallelic key functional parameters, which minimally reduce to: 1) L, the loci developed by Wright (35) and provides a conceptual contrast number of sites (loci) encoding the trait; 2) the ratio of forward to to the common approach among empiricists of simply assuming reverse mutations rates, u01/u10; 3) Ne , the effective population

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.2000446117 Lynch Downloaded by guest on October 3, 2021 Downloaded by guest on October 3, 2021 Lynch in species among vary to large to free a appears selection Nonetheless, phosphosites 36–38). purifying of (6, under fraction status phosphory- are phosphosite many Comparative their threonines that retain charge. and simply indicate local serines may adequate and lated feature protein an yeasts critical a in of the of studies acquisition and surface the the regions, be Phosphory- on disordered level. clustered in which activity typically or proteins, modifying are on in sites role residues lation a acid plays amino commonly particular of rylation in freedom of degrees increased system. the overall selection) the constant to of owing face rapidly the increases (in As states divergent divergence. of complete probability of chance 1/9 a muta- the with with of site consistent nucleotide scaling per negative examples rate the striking tion is most hypothesis an the opti- drift-barrier the is of this to there One relative accessibility whether state. mutational on mum of deficit depending a to negative, or respect or excess with positive means man- then be genotypic multiplicative drift) can of a of gradients power selection, in Expected the distribution of ner. by baseline absence (scaled the the function transforming in selection genotypes the dictates of latter with The distribution loci). of null gran- number the the the by (i.e., influenced system also the of are ularity but bias, mutation and selection with scale means parameter. one just is there function, fitness two 2) just are 1) there selection, parameters: function. stabilizing fitness-function fitness Gaussian a on to focus values a genotypic With of mapping the 4) and size; etn hs w ttsbc otecascnann two containing states class nonoverlapping the to with back but states a two to these have leading verting lineages factors, both in different mutations involve will this + that chance a 50% state experiences initial species with descendant trait exam- species for four-factor Consider, a isolated class. ple, functional constitutions same above, the genetic in illustrated underlying residing despite nonoverlapping those to like diverge can systems in dom consequences are constraints. relationships raises biophysical such of that This ecol- assume physiological studies. simply and that evolutionary comparative ogy as as large such extreme fields in for as expected challenges least be at growth can exponents as 0.20 with (such scalings selection that power-law suggest directional 6 rate), persistent Fig. under in illustrated traits par- results for the for of mutations on kinds the with depending traits, associated vary ticular effects will fitness results of exact distribution the Although (10). life .W aiz .C lrc,T cnoh .Nehm .Eas feto chain of Effect Evans, E. Marli P. Surdin-Kerjan, Needham, Y. Baudouin-Cornu, D. P. McIntosh, 8. T. Olbrich, phosphorylation protein C. of K. Turnover Rawicz, Moses, M. W. A. Zarin, 7. T. Freschi, L. Landry, R. C. proteins. of 6. states multimeric the of diversification Evolutionary along Lynch, interactions M. intermolecular 5. of meandering Evolutionary Hagner, K. Lynch, M. 4. L M. Willmann, S. Berg, J. Lynch, 3. M. Walsh, B. J. 2. Charlesworth, D. Charlesworth, B. 1. −− − soeeapeo uhaseai,cnie h phospho- the consider scenario, a such of example one As genotypic which by magnitude and direction the Notably, ial,i swrhntn htbcueo h ere ffree- of degrees the of because that noting worth is it Finally, rti tmccomposition. atomic bilayers. protein lipid of elasticity on (2000). unsaturation and length selection. stabilizing under evolving U.S.A. Sci. Acad. barrier. drift the sites. 2018). Press, University 2010). ω h it ftefins ucin n iha exponential an with and function; fitness the of width the , M vl Biol. Evol. BMC and − 22–22 (2013). E2821–E2828 110, rc al cd c.U.S.A. Sci. Acad. Natl. Proc. + −− 2(2004). 42 4, N e vlto n eeto fQatttv Traits Quantitative of Selection and Evolution r o ipyfntoso h atr of pattern the of functions simply not are ofiuain.Snl needn reverse independent Single configurations. si,Aatv vlto ftasrpinfco binding factor transcription of evolution Adaptive assig, ¨ Science lmnso vltoayGenetics Evolutionary of Elements rn.Genet. Front. 9–0 (2001). 297–300 293, θ + h hntpcotmm and optimum, phenotypic the , N − − 3–3 (2014). E30–E38 112, + r,D hms oeua vlto of evolution Molecular Thomas, D. ere, e 2/9 ´ bevdars h reof tree the across observed − → + 4 (2014). 245 5, on rbblt fcon- of probability joint + + and L rniin hr sa is there transition, eoe agr the larger, becomes fec ftwo of each If −−. − ipy.J. Biophys. + + implying −, W .Freeman, H. (W. 328–339 79, rc Natl. Proc. + (Oxford sites N e 6 .Lynch M. . a 16. of costs bioenergetic The Marinov, features. K. G. biological Lynch, M. cell 15. of diversification Phylogenetic Lynch, M. 14. Long H. 13. usage. codon synonymous of theory selection-mutation-drift The Bulmer, M. 12. 1 .H i oeso erynurlmttoswt atclrimplications particular with mutations neutral nearly of Models Li, H. W. 11. Lynch M. 10. em fsau n oain(94) o xml,only example, For (39–42). location all and of status of terms n S wr C-586.ItakG osa,T asn .Ji,and Jain, K. Hansen, T. Bolstad, R35-GM122566-01, G. comments. Award thank helpful NIH for I Trickovic Office, MCB-1518060. B. Research Award Army NSF W911NF-14- US and and the W911NF-09-1-0444 from Awards 1-0411 Initiative Research University than ACKNOWLEDGMENTS. recombination of variable with role combination the in effects. of particularly mutational analysis herein, provided direct that more a for depression the reducing of thereby may mutations, alleles simulta- segregating of among neously discrimination effects selective fitness of the efficiency the On in the in increase. increase and variance drift selection greater to subject hand, to are other subject that are mutations populations phe- that of largest mean the populations fractions the smallest the of the as constant, response in change mean the population-size dis- the of to vari- a keeping gradient notype the the while for increasing prolong effects, hand, allow fixed-effects should one mutational to the the of to theory On ance contrast the herein. in taken generalize effects, approach to mutational of need tribution a inappropri- is Second, be differences. might there adaptive the underlying taxa as reflect to among taxa, assumed of means ately issue numbers of significant small potentially variance on a stochastic based is studies This comparative desirable. for are variance) mean among-population expres- the (or formal can variance of so there temporal mean, phenotype, the grand for mean the sions the around of drift expressions considerable value preceding be direc- expected the and the size although on population First, focus with ways bias. the scales mutation of barrier understanding tional drift fuller the a which achieve to in done dis- be the to remain for account fully between that specific tinction phenotypes the mean of of distributions turnover free- considerable of trait be degrees quantitative can enough of phosphosites. there with sort that but a (6, dom as selection, fashion operate individ- stabilizing neutral on to under residues appears effectively phosphorylated affected proteins of an the number ual in the of is, many wander That of 46). to protein’s locations appropri- free specific a an the residues of for with degree selection charge, the stabilizing obser- ate these under whereby together, is scenario Taken phosphorylation a (43–45). and suggest versa Asp vice vations phosphomimetic and from sites evolve Glu as often serve phosphosites counterparts; negatively, and phosphorylatable i.e., charged their Asp species, for naturally addition, replacements related are potential In which moderately differ. residues, two may Glu in (dating status phosphorylation present lineage their is yeast residue entire latable the across back conserved been have .C cusi .Kmr .J le,Sgaue fntoe iiaini h elemental the in limitation nitrogen of Signatures Elser, J. J. Kumar, S. Acquisti, C. 9. sd rmtene o omldrvto fteexpected the of derivation formal a for need the from Aside 59–59 (2015). 15690–15695 112, (2018). Evol. Ecol. Nature (1991). 897–907 129, (1987). o o-admuaeo yoyoscodons. synonymous of usage non-random for Genet. apparatus. metabolic the (2009). in 2605–2610 involved proteins the of composition N e ∼700 acaoye cerevisiae Saccharomyces eaieto relative 0–1 (2016). 704–714 17, vltoaydtriat fgnm-iencetd composition. nucleotide genome-wide of determinants Evolutionary al., et eei rf,slcin n vlto ftemtto rate. mutation the of evolution and selection, drift, Genetic al., et pnaeu eeeiu mutation. deleterious Spontaneous al., et ilo ) n vnwe h aephosphory- same the when even and y), million 3–4 (2017). 237–240 2, N N e hsrsac a upre yteMultidisciplinary the by supported was research This and ial,a oe bv,teei need a is there above, noted as Finally, . N o h aeo large of case the for hshrlto ie perto appear sites phosphorylation NSLts Articles Latest PNAS Evolution .Ml Evol. Mol. J. rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 4–6 (1999). 645–663 53, eea things several L, rc il Sci. Biol. Proc. Elife 337–345 24, | e34820 7, a.Rev. Nat. Genetics f10 of 9 ∼5% 276,

EVOLUTION 17. C. F. Baer, M. M. Miyamoto, D. R. Denver, Mutation rate variation in multicellular 32. R. Lande, Natural selection and random genetic drift in phenotypic evolution. eukaryotes: Causes and consequences. Nat. Rev. Genet. 8, 619–631 (2007). Evolution 30, 314–334 (1976). 18. P. D. Keightley, The distribution of mutation effects on viability in Drosophila 33. R. Burger,¨ R. Lande, On the distribution of the mean and variance of a quantitative melanogaster. Genetics 138, 1315–1322 (1994). trait under mutation-selection-drift balance. Genetics 138, 901–912 (1994). 19. L. Robert et al., Mutation dynamics and fitness effects followed in single cells. Science 34. D. M. McCandlish, Long-term evolution on complex fitness landscapes when mutation 359, 1283–1286 (2018). is weak. Heredity 121, 449–465 (2018). 20. K. B. Bondel¨ et al., Inferring the distribution of fitness effects of spontaneous 35. S. Wright, Evolution and the Genetics of Populations. The Theory of Gene Frequencies mutations in Chlamydomonas reinhardtii. PLoS Biol. 17, e3000192 (2019). (University of Chicago Press, 1969), vol. 2. 21. P. D. Keightley, A. Eyre-Walker, Joint inference of the distribution of fitness 36. A. N. Nguyen Ba, A. M. Moses, Evolution of characterized phosphorylation sites in effects of deleterious mutations and population demography based on nucleotide budding yeast. Mol. Biol. Evol. 27, 2027–2037 (2010). polymorphism frequencies. Genetics 177, 2251–2261 (2007). 37. V. E. Gray, S. Kumar, Rampant purifying selection conserves positions with post- 22. T. Bataillon, S. F. Bailey, Effects of new mutations on fitness: Insights from models and translational modifications in human proteins. Mol. Biol. Evol. 28, 1565–1568 data. Ann. N. Y. Acad. Sci. 1320, 76–92 (2014). (2011). 23. C. D. Huber, B. Y. Kim, C. D. Marsden, K. E. Lohmueller, Determining the factors driv- 38. E. D. Levy, S. W. Michnick, C. R. Landry, Protein abundance is key to distinguish ing selective effects of new nonsynonymous mutations. Proc. Natl. Acad. Sci. U.S.A. promiscuous from functional phosphorylation based on evolutionary information. 114, 4465–4470 (2015). Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 2594–2606 (2012). 24. B. Y. Kim, C. D. Huber, K. E. Lohmueller, Inference of the distribution of selection 39. A. M. Moses, M. E. Liku, J. J. Li, R. Durbin, Regulatory evolution in proteins by coefficients for new nonsynonymous mutations using large samples. Genetics 206, turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. 345–361 (2017). Proc. Natl. Acad. Sci. U.S.A. 104, 17713–17718 (2007). 25. M. Lynch, M. Ackerman, K. Spitze, Z. Ye, T. Maruki, Population genomics of Daphnia 40. L. J. Holt et al., Global analysis of Cdk1 substrate phosphorylation sites provides pulex. Genetics 206, 315–332 (2017). insights into evolution. Science 325, 1682–1686 (2009). 26. D. P. Rice, B. H. Good, M. M. Desai, The evolutionarily stable distribution of fitness 41. L. Freschi, M. Osseni, C. R. Landry, Functional divergence and evolutionary turnover effects. Genetics 200, 321–329 (2015). in mammalian phosphoproteomes. PLoS Genet. 10, e1004062 (2014). 27. W. Sung et al., Evolution of the insertion-deletion mutation rate across the tree of 42. R. A. Studer et al., Evolution of protein phosphorylation across 18 fungal species. life. G3 6, 2583–2591 (2016). Science 354, 229–232 (2016). 28. B. Charlesworth, Stabilizing selection, purifying selection, and mutational bias in 43. Y. Z. Kurmangaliyev, A. Goland, M. S. Gelfand, Evolutionary patterns of phosphory- finite populations. Genetics 194, 955–971 (2013). lated serines. Biol. Direct 6, 8 (2011). 29. M. Bulmer, The effect of selection on genetic variability. Am. Nat. 105, 201–211 (1971). 44. S. M. Pearlman, Z. Serber, J. E. Ferrell, Jr., A mechanism for the evolution of 30. B. H. Good, A. M. Walczak, R. A. Neher, M. M. Desai, in the phosphorylation sites. Cell 147, 934–946 (2011). interference selection limit. PLoS Genet. 10, e1004222 (2014). 45. G. Diss, L. Freschi, C. R. Landry, Where do phosphosites come from and where do they 31. M. Lynch, L.-M. Bobay, F. Catania, J.-F. Gout, M. Rho, The repatterning of eukary- go after gene duplication? Int. J. Evol. Biol. 2012, 843167 (2012). otic genomes by random genetic drift. Annu. Rev. Genom. Hum. Genet. 12, 347–366 46. G. E. Lienhard, Non-functional phosphorylations? Trends Biochem. Sci. 33, 351–352 (2011). (2008).

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.2000446117 Lynch Downloaded by guest on October 3, 2021