Fundamental Properties of the of Mutational

Lee Altenberg ∗

Abstract proposed it as a consequence of stabilizing selection, and Conrad (cf. [6, 7]) who proposed it could result Evolution on neutral networks of genotypes from higher-order epistatic mutations which smooth the has been found in models to concentrate on adaptive landscape at other sites and consequently en- genotypes with high mutational robustness, to hance . Another mechanism proposed is that a degree determined by the topology of the natural selection for adaptations robust to environmen- network. Here analysis is generalized beyond tal variation entails robustness to mutation as a generic neutral networks to arbitrary selection and side-effect [8]. parent-offspring transmission. In this larger An altogether different mechanism for the evolution realm, geometric features determine mutational of mutational robustness was proposed by Bornberg- robustness: the alignment of fitness with the Bauer and Chan [9] and van Nimwegen et al. [10, 11], orthogonalized eigenvectors of the mutation which is that evolution along neutral networks—sets matrix weighted by their eigenvalues. “House of mutationally connected genotypes with equivalent of cards” mutation is found to preclude the evo- fitnesses—would “concentrate at highly connected parts lution of mutational robustness. Genetic load of the network, resulting in phenotypes that are rel- is shown to increase with increasing mutation atively robust against mutations”[11], and “Indepen- in arbitrary single and multiple locus fitness dent of functional fitness, topology per se can lead landscapes. The rate of decrease in population to concentration of evolutionary population at some fitness can never grow as mutation rates get sequences.”[9]. higher, showing that “error catastrophes” for Numerous citations of these papers repeat the finding genotype frequencies never cause precipitous that “populations will evolve toward highly connected losses of population fitness. The “inclusive regions of the genome space” (e.g. [12], [13]). However, inheritance” approach taken here naturally ex- neither the original papers nor subsequent studies (to tends these results to a new concept of dispersal my knowledge) provide analytic proofs of this observa- robustness. tion. With few exceptions [14] progress has been limited in identifying exactly which properties of neutral net- Keywords: genetic load — spectral gap — lethal works determine their evolved mutational robustness. mutagenesis — epigenetic mutation — dispersal load As is shown by the following example, more is going on than simply a “tendency to evolve toward highly con- Based on theoretical considerations, Kimura [1] pre- nected parts of the network”. arXiv:1508.07866v1 [q-bio.PE] 31 Aug 2015 dicted that the majority of evolutionary changes in the Figure 1 compares the equilibrium population distri- genome in mammals should consist of neutral muta- bution for two neutral networks that have 63 equally tions. Since this time, research has been focused on fit genotypes, and one inviable genotype off the net- understanding the extent and nature of neutral genetic work. The only difference between the networks is their variation in organisms. One approach is to attempt mutational topology. One network is the set of all 64 to derive them from molecular first principles [2, 3]. nucleotide triplets, while the other is the set of copy- The idea that neutrality may not merely be a derived number variants, from 1 to 64 copies. Two properties of consequence of molecular interactions, but actually an the population equilibrium distributions on the neutral evolved property shaped by evolutionary dynamics, had networks stand out: first, the equilibrium frequency of a its first manifestation in Fisher’s theory for the evolu- genotype is determined not solely by how many neutral tion of dominance [4], followed by Waddington [5] who neighbors it has, but also by its position within the net- ∗The Konrad Lorenz Institute for Evolution and Cognition work. In the copy-number network, 62 of the genotypes Research, Martinstrasse 12, Klosterneuburg, A3400 Austria, are identical in having only neutral neighbors. Yet the [email protected] stationary distribution increases 20-fold over these 62

1 Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 2

EQUILIBRIUM DISTRIBUTION treatment has the potential to apply widely to diverse STEPWISE COPY-NUMBER MUTATION NETWORK mechanisms of “inclusive inheritance” [21]. 0.025 ··· With arbitrary fitnesses, one can no longer character- MUTATION NETWORK OF A ize mutation neighborhoods simply by the fraction of 0.020 NUCLEOTIDE TRIPLET LOAD = 0.0138 mutations that are neutral, since now the distribution 0.015 of fitness effects of mutation (DFE) includes advanta-

0.010 geous or deleterious mutations, as well as neutral and lethal. The more general statistic is the expected fitness 0.005 • LOAD = 0.0003 of offspring. Averaged over the population, one obtains an aggregate population mutational robustness — the 10 20 30 40 50 60 degree to which the population maintains its fitness in Neutral genotypes Lethal genotype the face of mutation pressure. A complete absence of mutational robustness occurs when the genetic load is the , corresponding to the Haldane-Muller Figure 1: Equilibrium distributions and genetic loads principle [22, 23]. Complete mutational robustness, on for two neutral networks having 63 neutral genotypes the other hand, would mean that the population suffers and one lethal genotype (circled), under two mutation no genetic load as the mutation rate increases. A given topologies: Mutation of a nucleotide triplet, and step- adaptive landscape will fall somewhere between these wise copy-number mutation. two extrema. The main results found here are that the population mutational robustness at a mutation-selection balance genotypes as they get more mutationally distant from is determined by abstract spectral properties: the align- the lethal genotype [15]. The same phenomenon is seen ment between the fitnesses and the eigenvectors of the in the trinucleotide network but to a lesser degree. Sec- mutation matrix, weighted by their eigenvalues. ondly, dramatic differences are seen between the two This spectral analysis provide a lens through which neutral networks in the size of the genetic loads they to examine models of mutation, dispersal, an inheri- maintain. The genetic load of the trinucleotide network tance generally. Models of mutation and selection are is 44 times the genetic load of the copy-number network. usually constructed without appreciation of how the as- The differences between the evolutionary outcomes sumptions manifest in the eigenvalues and eigenvectors on these two mutational graphs can only be the result of the mutation matrix (or variation operator in the of their different topologies, but what properties of their case of continuous variation). We will see, for exam- topologies? Numerous results from the field of spectral ple, that the widely encountered [24] “house-of-cards” graph theory are applicable to this question; here no re- mutation model [25] is incapable of supporting muta- view is attempted. While graph theory has been widely tional robustness. The results here provide direction used in models of mutation, the results typically impose for analyzing a wide variety of models for their capacity the assumptions that mutations occur only once during to support the evolution of mutational robustness, and replication, mutation is symmetric and occurs at a sin- enable comparisons to be made between different kinds gle rate, and fitness is either zero or a single other value of mutation processes, including nucleotide base muta- (all assumptions in [9, 11]). tion, epigenetic mark mutation, and gene copy number An alternative approach is taken here, which has variation. The comparisons extend to the spectral prop- proven valuable in previous work [16, 17, 18, 19, 20]. erties of dispersal matrices, thereby unifying the results The specific problem—evolution on neutral networks in for genetic robustness and genetic load with those for nucleotide sequence space—is embedded into the larger “dispersal robustness” and dispersal load. space of problems, which includes arbitrary mutation patterns and arbitrary selection values. The generality forces one to seek the appropriate fundamental mathe- 1 The Setting matical properties. It also expands the applicability of any results. Muta- The object of interest is the state of the population tion is treated generally enough to apply to non-genetic when it has converged in frequencies to an equilibrium information transmission, the principal example being under the following asexual, haploid evolutionary dy- organismal location, where the analog of mutation is namics, where the population is large enough to be dispersal. The results here thus automatically apply to treated as infinite and has discrete non-overlapping gen- dispersal load, and in the process define a new concept erations (semelparity). The only event that changes the of dispersal robustness. Moreover, the generality of the genotypes during reproduction is mutation; there is no Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 3

recombination. From (1), equilibrium frequencies must satisfy wbzˆi = Pn n j=1 Mijwjzˆj, or in vector form, 1 X z (t + 1) = M (µ) w z (t), (1) i w(t) ij j j > j=1 M(µ) D zˆ = (w zˆ) zˆ = wb zˆ, (4) where where

>   n is the number of possible haploid genotypes, e is the vector of ones, e = 1 1 1 its trans- pose, ··· zi(t) is the frequency of haploid genotype i in the pop- Pn D := diag[w ] is the diagonal matrix of fitness coeffi- ulation at time t, zi(t) 0, i=1 zi(t) = 1, i ≥ cients, wj is the fitness of genotype j, wi 0, ≥ w = De is the vector of fitness coefficients, and Mij(µ) is the probability, when the mutation rate is Pn > > µ [0, 1], that parent of genotype j has offspring wb := i=1 wizˆi = e Dzˆ = w zˆ is the mean fitness of ∈ Pn of genotype i, Mij 0, Mij = 1 j, and the population. ≥ i=1 ∀ Pn Equation (4) shows that the mean fitness w = e>Dzˆ is w(t)= i=1 wizi(t) is the mean fitness of the popu- b lation at time t, its Malthusian rate of increase in an eigenvalue of M(µ)D. size. It is further assumed that M(µ) is irreducible when µ > 0, meaning any genotype j can mutate in some In the case of a multilocus genotype where muta- number of steps to any other genotype i. M(µ)D is irre- tion occurs independently at each of L loci, the mu- ducible as well if no genotype is lethal, so from Perron- tation probabilities may be decomposed as products Frobenius theory we know there is only one possible of the mutation probabilities at each locus, Mij = equilibrium, the strictly positive Perron vector zˆ, and (1) (2) (L) M M M , where iξ, jξ index the alleles at eigenvalue wb is the Perron root and spectral radius of i1j1 i2j2 ··· iLjL locus ξ. M(µ)D, referred to as r(M(µ)D) = wb. Mutation at each locus ξ may be parameterized by If some mutations are lethal, then M(µ)D is re- a mutation rate µ and a mutation distribution, P (ξ) , ducible, and more interesting equilibrium behavior be- iξjξ (ξ) comes possible. The Frobenius normal form of a re- given that mutation occurs: M = (1 µ)δi j + iξjξ ξ ξ ducible matrix ([26, pp. 74-77]) partitions the set (ξ) − µ P , where δii = 1 and δij = 0 when i = j. In of genotypes into blocks, such that the restriction iξjξ 6 matrix form this is of M(µ)D to any block κ produces an irreducible matrix M(µ)[κ]D[κ]. The blocks constitute quasis- M(ξ) = (1 µ)I(ξ) + µP(ξ), (2) − pecies [27]. The block with the largest spectral radius [κ] [κ] (ξ) r(M(µ) D ) among the genotypes present at equi- where each P is an nξ nξ column-stochastic matrix, × (ξ) librium sets the mean fitness of the population. The nξ is the number of possible alleles at locus ξ, and I analysis will therefore focus on such irreducible block is the nξ nξ identity matrix. matrices. The multilocus× mutation matrix can be represented using the Kronecker product, N, as 1.1 Diploidy, frequency-dependent se- L O h i lection, recombination, assortative M(µ) = (1 µ)I(ξ) + µP(ξ) . (3) − mating, and dispersal ξ=1 The mutational robustness of population at equilibrium Matrices of the form (1 µ)I+µP represent the situa- can be analyzed without knowing whether this equi- − tion where only a single transforming event occurs dur- librium is dynamically stable. In the case of haploid ing reproduction and will be referred to as single-event selection on a quasispecies without recombination, the mutation matrices, while those of the form (3) represent system is essentially linear and the polymorphic equilib- the situation where multiple independent transforming rium is always stable. Relaxation of these assumptions events occur during reproduction and will be referred to accommodate greater biological variety may alter the to as multiple-event mutation matrices. In the limit stability and existence conditions of equilibria but does of small µ, multiple-event matrices converge to single- not change the robustness analysis. The results here event matrices, even when they model multiple loci. therefore apply to equilibria in systems with diploid and frequency-dependent selection, assortative mating, and Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 4

also recombination. In this case, letting Tijk be the Definition 1. At a mutation-selection balance, the probability that parents j and k produce offspring i, population mutational robustness is defined to be one gets a a frequency-dependent version of (1): n n wb r(MD) 1 X X Ro(M, D) := = . (8) z (t+1) = z (t) T w X (z), (5) maxi[wi] maxi[wi] i w(t) j ijk jk jk j=1 k=1 n The quantity r(MD)/ maxi[wi] = Ro(M, D) = 1 L 1 X − = M (z) w (z) z (t), (6) is the complement of the classical genetic load, L [22, w(t) ij j j j=1 23]. We would like to know where Ro(M, D) falls within the possible range of values given the mutation rate in where wjk is the fitness of diploid genotype M. The maximum possible value is Ro(M, D) = 1, jk, Xjk(z(t)) is the probability that the mate meaning no loss of fitness from mutation. The minimum of genotype j is genotype k (frequency depen- Pn possible value will be called the Haldane limit, which is dent), wj(z) = k=1 wjkXjk(z), and Mij(z) = L Pn (1 µ) for L loci. k=1 TijkwjkXjk(z)/wj(z). Under random mating, − Xjk(z(t)) = zk(t). Selfing cannot be accommodated Result 2 (Haldane Limits on Mutational Robustness). within the product structure in (5). In models of disper- sal, z(t) typically represents the density of organisms, r(M(µ)D) without normalization by w. Carrying capacities then Ro(M(µ), D) := (1 µ)L. max[wi] ≥ − makes the growth rates wi(z) density dependent. i Because frequency dependence is relevant only to sta- bility, which is not considered, the analysis to follow See also [28, eq. (2.3)], [29, pp. 149–150, eq. (5.31)– applies to both (1) and (6). However, frequency depen- (5.34)]. The proof for this result and those following are dence of both Mij(z) and wj(z) makes them composite provided in the Supporting Information (SI ). formal quantities, rather than biological essential quan- Robustness relative to the Haldane limit can be de- tities as in the case of mutation and haploid selection. fined. Definition 3. Define relative mutational robust- 1.2 Mean Fitness, Genetic Load, and ness as Mutational Robustness r(M(µ)D) (1 µ)L Let us retrace the calculation of population mutational max[wi] − − robustness in the neutral network model of van Nimwe- RRo(M(µ), D) := i [0, 1]. 1 (1 µ)L ∈ gen et al. [11]. Genotypes have only two possible fit- − − nesses, w, and 0. The mutational robustness of a geno- One further quantity of interest is the average fitness type with fitness w is the probability that its offspring mu Pn Pn of mutant offspring, wj := i6=j wiMij/ i6=j Mij. have fitness w. The set of genotypes in the neutral quasispecies is . Then the mutational robustness of Theorem 4. The average mutant offspring fitness in a N P a genotype—it’s neutrality— is νj := i∈N Mij. The population at mutation-selection balance is ˆ population neutrality at equilibrium z is the expecta- n tion of νj censused before reproduction, with parent mu X mu wjzˆj w (M(µ), D) := wj frequencies wzˆi/wb for i . Using (4), since wj = 0 wb ∈ N j=1 orz ˆj = 0 for j / ,  L  ∈ N (1 µ) wi  n n = wb 1 − L Var , X wjzˆj X wzˆj X X − 1 (1 µ) wb zˆi = Mij = Mij , wb = wizˆi = wzˆi. − − w w j=1 b j∈N b i=1 i∈N (ξ) assuming the mutation distributions have Pii =0 for all From these we see that the equilibrium average neutral- loci ξ and alleles i. ity is This identity precisely expresses Fisher’s “deteriora- X wzˆj X wzˆj X w r(MD) νˆ = ν = M = zˆ = b = . tion of the environment” argument [30, p. 42], that at j ij i w w j∈N wb i,j∈N wb i∈N any equilibrium, the average mutant fitness loss must (7) exactly offset the gain in fitness from the fitness vari- ance. The concept of mutational robustness can be ex- tended beyond neutral networks by simply generalizing w to maxi[wi]. Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 5

2 Basic Results µ < 1/2, when the quasispecies has variation in fitness within its irreducible blocks. The core finding of [11] is that when genotype neutrali- ties vary over the neutral network, “the population neu- Theorems 6 and 7 have important implications for the trality is typically larger than the network neutrality. theory of lethal mutagenesis. They show that the pop- . . . Thus, a population will evolve a mutational robust- ulation mean fitness will keep decreasing with greater ness that is larger than if the population were to spread mutation over at least the range 0 < µ < 1/2. uniformly over the neutral network.” While numerical Corollary 8 (Sufficiency for Lethal Mutagenesis). If examples are explored, no analytical results are pro- r(M(1/2)D) < 1 then for some µ < 1/2, the popula- vided that define precisely when it holds. It is claimed tion will go extinct for all µ > µ .∗ in [12, eq. (12)] that when the mutation matrix is sym- ∗ metric, i.e. M = M>, this result can be obtained from The observation in [11] that “Perhaps surpris- Perron-Frobenius theory, but no proof is provided. A ingly, the tendency to evolve toward highly connected proof is here provided (in SI ), but it requires more than parts of the network is independent of evolutionary Perron-Frobenius theory, specifically Rayleigh theory. parameters—such as mutation rate,” is solely due to there being only two fitnesses in the model, 0, and w. Theorem 5 (Neutral Network Robustness). The equi- librium population robustness νˆ = r(MD)/w is always Result 9 (Single-Event Relative Robustness Increases greater than the average robustness over the neutral net- with Mutation Rates). As the mutation rate increases work, with single-event mutation, if a quasispecies has more than one [only one] nonlethal fitness value, then the rel- P P M i∈N j∈N ij ative mutational robustness RRo(M(µ), D) strictly in- E[ν] = Pn P . i=1 j∈N Mij creases [is constant] with mutation rate. if (1) the mutation matrix M is symmetric, (2) fitness For multiple-event mutation, the relative mutational off the network is 0, and (3) not all genotypes on the robustness may increase or decrease in µ, as found in neutral network have the same mutational robustness. initial numerical exploration. When mutation is not symmetric, counterexamples can easily be constructed where the population evolves 3 Results for Reversible Markov to an average robustness that is less than the average robustness over the entire neutral network. One ex- Chain Mutation ample is where mutation is cyclic. Another example Theorem 7 was provable due to the tractability afforded is where asymmetric mutation is symmetrizable (a re- by reversible Markov chains. This same tractability car- versible Markov chain), but mutation is biased toward ries over to the analysis of mutational robustness. For genotypes with few neutral neighbors (in SI ). a reversible chain, M can be represented ([34, p. 33], The next result adopts to quasispecies a theorem of [35], [33]) as: Karlin, which has been available for over 30 years, and was independently proven in [31]. 1/2 > −1/2 M = Dπ KΛK Dπ Theorem 6 (5.2 of Karlin [32]). The equilibrium ge- where reversibility requires M π = M π , and netic load within any quasispecies under single-event ij j ji j > mutation strictly increases with mutation rate when π = (π1, . . . , πn) = Mπ is the stationary distribution there is any variation in fitness, i.e. for each irreducible for M, i.e. its Perron vector, block of loci κ, r([(1 µ)I[κ] + µP[κ]]D[κ]) strictly de- −[κ] 1/2 creases with µ when D = c I for any c > 0, or when Dπ is the diagonal matrix of the square roots of πi, r(P[κ]) < 1. 6 K is the matrix of orthogonalized eigenvectors of M, The same outcome has been shown for multiple-event KK> = K>K = I, i.e. for each j = 1, . . . , n, Pn 2 Pn mutation when mutation at each locus is reversible, by i=1 Kij = 1, i=1 KijKih = 0 if j = h, which I refer to M being the transition matrix of a 6 reversible Markov chain, which is also equivalent to M [K]j is the j-th column of K, being symmetrizable. 1/2 1/2 Ki1 = πi or [K]1 = Dπ e, Theorem 7 (Corollaries 1, 3 of [33]). The equilibrium h i genetic load of a quasispecies under irreducible multiple- Λ := diag λi is the diagonal matrix of eigenvalues event mutation strictly increases with mutation rate 0 < λ1 > λ2 λn of M. ≥ · · · ≥ Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 6

Applying this representation of M, we find that the corresponds to a smaller lower bound for the population spectral radius of MD is that of a symmetric matrix: mutational robustness, an effect that can appear even before there are multiple fitness peaks. 1/2 > −1/2 r(MD) = r(Dπ KΛK Dπ D) Because Theorem 12 applies to arbitrary fitnesses, it > −1/2 1/2 includes the mutational landscape modifiers described = r(KΛK Dπ DDπ ) in [35], early envisioned in [7]. Note that the range = r(KΛK>D) = r(ΛK>DK) of mutational robustness values possible from different = r(D1/2KΛK>D1/2). arrangements of fitness values depends on the spread of the eigenvalues λ2, . . . , λn. This allows use of the Rayleigh-Ritz variational formula Prior applications of Fourier expansions of landscapes for the spectral radius ([36, pp. 172–173], [37, pp. 176– (e.g. [39], [43]) do not actually examine selection- 180]), to give the first main result. mutation dynamics. Rather, only mutation occurs, and Theorem 10 (Expression for the Spectral Radius). fitnesses do not enter into reproduction but are simply When mutation has the transition matrix of a reversible recorded in each generation to produce a time series. 1/2 −1/2 The main object of analysis has been the autocorrela- Markov chain, M = D KΛK>D , then π π tion function of this time series when it is stationary, de- r(MD) = max x>D1/2KΛK>D1/2x rived from the covariance between parent and offspring x>x=1 fitnesses. The next result ties the mutation-only covari- n n 2 X  X  ance to the mutation-selection dynamics. = max λj xi√wiKij . (9) x>x=1 j=1 i=1 Theorem 13 (Relation to Random Mutational Walks). For a stationary mutational random walk with reversible Here we see a fundamental property of the population Markov chain transition matrix M, and stationary dis- mutational robustness: tribution π, let Covπ(WP ,WO) be the π-weighted co- Corollary 11. Holding K and w, fixed, r(MD) is non- variance between parent and offspring fitnesses. Then decreasing in each eigenvalue of the mutation matrix, at a mutation-selection balance, wb zˆ = MDzˆ, λi, i = 2, . . . , n.  n  " # X 1 r(MD)  wjπj 1 + Covπ[WP ,WO] . ≥ > 2 3.1 Bounds Derived from the Muta- j=1 (w π) tional Eigenvectors The expression states that the mean fitness of the Weinberger [38] introduced the idea of representing fit- population is increased above the π-weighted average ness landscapes using the eigenvectors of the matrix fitness of the genotype space by the covariance between representing the mutation structure, and this “Fourier the normalized parent and offspring fitnesses in the mu- expansion” approach has been elaborated by numerous tational random walk. further studies (cf. [39, 40]). The second main result is Corollary 14. When the eigenvalues of M are positive, now presented. then Covπ[WP ,WO] 0. ≥ Theorem 12. (Lower Bound from the Fourier Representation of Fitness) Represent the fitnesses 3.2 Bounds Derived from the Muta- Pn wi as wi = j=1 Kijaj or w = Ka, where ai is the tional Eigenvalues Fourier coefficient for eigenvector [K]i. Then Now we consider bounds on r(MD) that derive from the n eigenvalues of M for any fitness landscape, to conclude 1 X 2 r(MD) n λjaj . the main results. ≥ P w j=1 j j=1 Theorem 15 (Upper Bound from the Spectral Gap). The Fourier coefficients aj measure the ‘alignment’ Assuming 0 < µ < 1/2, then 2 between w and each column [K]j, and aj is its “ampli- n tude” [41]. From the theory of discrete nodal domains X wiπi r(MD) ≤ [42], we know that as j increases from 1 to n, the eigen- i=1 vectors [K]j exhibit more sign changes between their n n ! X X mutationally connected domains, which can be consid- wiπi + λ2(M) max[wi] wiπi , ≤ i − ered the ‘frequency’ of the eigenvector, and contribute i=1 i=1 to ruggedness in the fitness landscape. Greater rugged- (ξ) where λ2(M) = 1 µ+µ max λ2(P ). ness of the landscape (weight on the aj with larger j) − ξ=1,...,L Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 7

Pn Theorem 15 when applied to a neutral network at Under biological conditions the term i=1 πiwi is biological mutation rates (which give λi > 0 for all i) near zero, since it reflects the average fitness of geno- yields: types under the stationary distribution of mutation — i.e. a genome made out of random nucleotides. In this Corollary 16 (Spectral Gap Bound for Neutral Net- case, Ro(M(µ)D) 1 µ, which means that house- works). Let the neutral network be referred to as = ≈ − N of-cards mutation is incapable of supporting any mu- i : wi = w > 0 . Then { } tational robustness. Note that HoC mutation at in- X r(MD) X X dividual loci does not create HoC mutation for the πi = Ro(M, D) πi + λ2 πi. ≤ w ≤ entire genome. But also note that HoC mutation at i∈N i∈N i/∈N each locus combined with multiplicative non-epistasis (10) NL (ξ) (D = ξ=1 D ) decomposes into multiple HoC sys- Here we see the explanation for the different muta- tems. Therefore, with multi-locus HoC mutation, multi- tional loads in Fig. 1. For the trinucleotide network, plicative epistasis is required for population mutational robustness to rise above the Haldane limit. λ2 = 5/9, while for the copy-number variation network, λ2 = 0.999. Eq. (10) gives lower bounds on load as (1 λ2)/64. Comparison of the bounds with the actual loads− give 0.007 < 0.0138, 0.00002 < 0.0003, respec- 4 Discussion tively, for the two networks. Priority here has been given to presenting the new Next we shall see that the convexity of r(M(µ)D) in mathematical results on these decades old problems µ seen for single-event mutation in Result 9 extends to rather than to their biological interpretations or to par- multiple-event reversible mutation. ticular illustrations. But a few words are in order. The Theorem 17 (Convexity of the Spectral Radius in µ). chief contribution is to show how the elevation of a In the case where L loci mutate independently at rate µ, population’s mean fitness above the Haldane limit at a each in a reversible Markov chain, then for 0 < µ < 1/2, mutation-selection depends on underlying spectral rela- tionships between the mutation structure and the array r(M(µ)D) = of fitnesses. The results are obtained in almost com-  L  plete generality, encompassing arbitrary fitnesses and O > max x>D1/2 K(ξ)[(1 µ)I(ξ)+µΛ(ξ)]K(ξ) D1/2x for some results, arbitrary mutation structures, and in x>x=1 − ξ=1 others, reversible multilocus mutation. The results di- rectly extend to a novel concept of “dispersal robust- is convex in µ. ness”. Theorem 17 has an important implication: “error Infinite population theory involving the spectral ra- catastrophes” for individual genotypes as the mutation dius r(MD) and Perron vector zˆ > 0 was originally rate increases never correspond to a precipitous decline developed in one or two locus theory, or dispersal mod- in population mean fitness. The convexity of r(M(µ)D) els, where the population size could easily exceed the in µ means that no such declines are ever possible, that number of genotypes or patches, and large populations in contrast, r(M(µ)D) steadily declines at rates that were well approximated by infinite population models. never increase as µ increases. These observation, noted In quasispecies theory, however, results were aimed to by [35, 44, 45] based on specific fitness landscapes, are encompass the entire genome, entailing genotype spaces seen to hold for all possible fitness landscapes under orders of magnitude larger than any possible popula- reversible mutation. tion, rendering questionable the biological relevance of zˆ and of r(MD). 3.3 House-of-Cards Mutation Their relevance may possibly be retained in several circumstances: (1) when a small number of loci con- The “house-of-cards” (HoC) mutation model intro- tribute to a multiplicative fitness component, (2) when duced by [25] is a single-event mutation model where transient dynamics produce a meta-stable mutation- > Pij = πi for all j, which makes P = πe a rank-one selection balance [46], (3) when the dynamics can be matrix, and therefore λi = 0 for i 2. approximated by a coarse-graining to give a “pheno- ≥ Result 18 (house-of-cards Mutation). Let M(µ) = (1 typic quasispecies” [47], and (3) in a finite population µ) + µP, where P = πe>. Then − multitype branching process. In the latter, noted in [35], the expected number of offspring i produced from n t X one parent j defines a matrix MD, and (MD) z(0) is r(M(µ)D) (1 µ) max[wi] + µ πiwi. ≤ − i the trajectory of expected numbers from an initial pop- i=1 Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 8 ulation, z(0). The long-term expected growth rate of gen, Reinhard B¨urger, Joachim Hermisson, and Nick the population is then r(MD). Barton for their insightful comments. I gratefully ac- Each of these conditions merits further investigation. knowledge support from The KLI Institute, Austria, With these considerations as caveats, let us review the and the Mathematical Biosciences Institute at Ohio results here. State University, USA, through National Science Foun- The first main result (Theorem 15) is that the spec- dation Award #DMS 0931642. tral gap of the mutation matrix sets an upper bound on how much above than the Haldane limit the equi- librium population fitness can be. The spectral gap measures how rapidly mixing the mutation operator is. Theorem Theorem:Main:Eigenvalue thus shows that the more rapidly mixing mutation is, the small the muta- tional robustness at a mutation-selection balance. In Kingman’s well-known house-of-cards model, λ2 = 0 so the spectral gap is nearly maximal, so almost no muta- tional robustness is possible. Theorem 15 provides an impetus to compare different kinds of mutation models, as well as dispersal models, for their spectral gaps and therefore how mutational robustness is limited. The second main result (Theorem 12) is that the alignment of the fitnesses with the orthogonalized eigen- vectors of the mutation matrix, weighted by the eigen- values, sets a lower bound on the population mutational robustness. This bound is low when the landscape is rugged and fitnesses align with the higher-frequency eigenvectors, and high when the landscape is smooth and fitnesses align with the low-frequency eigenvectors. Several additional results generalize results that have been found for special cases of fitness landscapes to ar- bitrary landscapes: Theorems 6 and 7 show that the ag- gregate population mutational robustness can only de- cline with increasing mutation rate. Theorem 17 shows that this decrease happens at an at ever lessening rate as the mutation rate increases. So “error catastrophes” for genotypes can never cause the population fitness to plummet [44]. But high enough mutation rates will in- escapably drive the population to the fitness of a ran- dom genotype sequence, so for some rate less than this, lethal mutagenesis is assured. The issue of whether lethal mutagenesis can be evaded by the evolution of mutational robustness [45] is thus resolved: with high enough mutation rates, it cannot. These applications are to be seen as only initial il- lustrations of the potential usefulness of the analysis presented here. Other population dynamical processes may similarly be clarified by viewing them through this lens of their eigenvalues an eigenvectors.

Acknowledgements

Insight for this paper occurred while hearing Ludwig Geroldinger’s dissertation defense on stepping stone and island migration models [48].I thank Erik van Nimwe- Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 9

Supporting Information Proof. The average mutant offspring offspring fitness mu from parent j, wj , averaged over the population, cen- sused after selection, before reproduction, is Proofs of the Results n n Pn w M mu X mu wjzˆj X i6=j i ij wjzˆj For convenience, the results are restated along with w (M(µ), D) = wj = Pn wb i6=j Mij wb their proofs. j=1 j=1 n Pn X wiMij wjMjj wjzˆj Result 2 (Haldane Limits on Mutational Robustness). = i=1 − . 1 Mjj w j=1 − b r(M(µ)D) Recall that the multiple-event mutation rates are Ro(M(µ)D) := (1 µ)L. maxi[wi] ≥ − L L Y (ξ) Y (ξ) Proof. At equilibrium, we examinez ˆ where w = Mjj = M = [(1 µ) + µP ]. 1 1 jξjξ − jξjξ n ξ=1 ξ=1 maxi=1 wi. Then, n Assuming P (ξ) = 0 for all ξ, j, then M = (1 µ)L. X jξjξ jj wbzˆ1 = M11w1zˆ1 + M1jwjzˆj Substitution gives − j=2 n X L which implies wiMij wj(1 µ) n − − X i=1 wjzˆj n wmu(M(µ), D) = X 1 (1 µ)L w (wb M11w1)ˆz1 = M1jwjzˆj 0 j=1 b − ≥ − − j=2 n n 2 X wjzˆj X wj zˆj L wiMij (1 µ) hence [16, p. 62] − − i,j=1 wb j=1 wb wb = L . Ro(M(µ), D) = M11, 1 (1 µ) w1 ≥ − − Since with equality if and only if for all j = 2, . . . , n either M = 0, w = 0, orz ˆ = 0. Further, M (µ) = n 2 n 2   1j j j 11 X wj zˆj X wj zˆj wj (ξ)   QL L = wb 2 = wb 1 + Var , ξ=1(1 µ+µP11 ), so M11(µ) (1 µ) , with equality j=1 wb j=1 w wb − (ξ) ≥ − b if and only if P11 = 0 for all loci ξ. See also [28, eq. (2.3)], [29, pp. 149–150, eq. (5.31)– n X wj  L (5.34)]. wizˆi wb 1 + Var (1 µ) − wb − wmu(M(µ), D) = i=1 Theorem 4. Letting the average fitness of mutant off- 1 (1 µ)L − − spring produced by parent j be  L  (1 µ) wj  = wb wb − Var Pn w M − 1 (1 µ)L w mu i6=j i ij − − b wj = n ,  L  P (1 µ) wi i6=j Mij   = wb 1 − L Var . − 1 (1 µ) wb the average mutant offspring fitness in a population at − − mutation-selection balance is therefore Theorem 5 (Neutral Network Robustness). The equi- librium population robustness νˆ = r(MD)/w is always n mu X mu wjzˆj greater than the average robustness over the neutral net- w (M(µ), D) := wj work, j=1 wb L P P  (1 µ) w  Mij = w 1 Var i  , i∈N j∈N b − L E[ν] = Pn P . − 1 (1 µ) w Mij − − b i=1 j∈N (ξ) if assuming the mutation distributions have Pii =0 for all loci ξ and alleles i. 1. the mutation matrix M is symmetric, 2. fitness off the network is 0, and Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 10

3. not all genotypes on the neutral network have the the irreducible blocks, κ, from the Frobenius normal same mutational robustness. form of the reducible matrix M(µ)D. The irreducible restriction r([M(µ)D][κ]) has a unique equilibrium for Proof. Repeated use will be made of the identity P Pn each µ, and Corollaries 1 and 3 apply, meaning that if fi = fiwi/w for arbitrary values fi. We i∈N i=1 there is any variation in fitness among wi : i κ , then may write the network’s average robustness as r([M(µ)D][κ]) decreases strictly in µ.{ The quasispecies∈ } P P Pn Pn may comprise more than one block, but its mean fitness, Mij 1 wi Mijwj [ν] = i∈N j∈N = i=1 j=1 max r([M(µ)D][κ]) , will therefore also be decreasing E Pn P M w Pn Pn M w κ i=1 j∈N ij i=1 j=1 ij j in µ. { } 1 e>DMDe = . Result 9 (Single-Event Relative Robustness Increases w e>De with Mutation Rates). As the mutation rate increases We have r(DMD) = w r(MD) since (DMD)t = with single-event mutation, if a quasispecies has more wtD(MD)t = wt(DM)tD. than one [only one] nonlethal fitness value, then the rel- 2 1/2 Since wi w, 0 , D = wD, so √wD = D, hence ative mutational robustness RRo(M(µ), D) strictly in- ∈ { } creases [is constant] with mutation rate. 1 e>DMDe 1 ( e>D)DMD(De) [ν] = = E w e>De w2 ( e>D)(De) Proof. We examine the restriction of M(µ)D to an irre- ducible block M(µ)[κ]D[κ]. In the neutral network case 1 x>DMDx max (11) where wi = w for all i κ, then ≤ w2 x6=0 x>x ∈ 1 r(MD) r(M(µ)[κ]D[κ]) = w r([(1 µ)I[κ] + µP[κ])] = 2 r(DMD) = = Ro(M, D). − w w = w[(1 µ) + µr(P[κ])] = w[1 + µ(r(P[κ]) 1)], − − Equality holds if and only if e>D is a left eigenvector of DMD [49, Theorem 4.2.2 (Rayleigh)], in which case so with single-event mutation, L = 1, and [κ] [κ] > > > r(M(µ) D )/w 1 + µ λe D = (e D)DMD = we DMD, RRo(M(µ)[κ], D[κ]) = − 1 1 + µ − and so e>D is also an eigenvector of MD with eigen- 1 + µ(r(P[κ]) 1) 1 + µ = − − value λ/w. This is equivalent to the condition that all µ genotypes on the network have the same neutrality, i.e. [κ] P Pn wi = r(P ) < 1. for all j , c = Mij = Mij, or ∈ N i∈N i=1 w which shows the relative robustness is a constant for all > 1 > c e D = 2 e DMD µ. w [κ] [κ] For brevity write f(µ) := r(M(µ) D )/ maxi[wi] with λ = c/w. Therefore, if there are any differences in and the neutralities on the network, the inequality (11) is f(µ) 1 + µ strict. g(µ) := RRo(M(µ)[κ], D[κ]) = − µ Theorem 7 (Corollaries 1, 3 of [33]). The equilibrium f(µ) 1 = 1 + − . genetic load of a quasispecies under irreducible multiple- µ event mutation strictly increases with mutation rate 0 < µ < 1/2, when the quasispecies has variation in fitness Then within its irreducible blocks. f(µ1) 1 f(µ2) 1 g(µ1) g(µ2) = − − . Proof. Corollaries 1, 3 of [33] show that − µ1 − µ2

L Let µ = µ > 0 and µ = hµ, 0 < h < 1. O 1 2 r(M(µ)D) = r [(1 µ)I(ξ) + µP(ξ)]D (12) − hf(µ) h (f(hµ) 1) ξ=1 g(µ) g(hµ) = − − − . (13) − µh strictly decreases in µ (0, 1/2) when D = c I for any c > 0, and when there∈ is only one equilibrium6 for each In the case where there are more than one nonlethal µ. In the case where there are lethal genotypes, there fitness in the irreducible block, D[κ] is nonscalar (D[κ] = 6 may be more than one equilibrium, corresponding to c I[κ] for any c R). In [19] it is shown, by applying ∈ multiple quasispecies. In that case, one must focus on “dual convexity” to theorems Cohen [50] and Friedland Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 11

[51], that r([(1 µ)I[κ] + µP[κ]]D[κ]) is strictly convex Applying this representation of M, we find that the in µ when D[κ] −is nonscalar. Convexity means that for spectral radius of MD is that of a symmetric matrix: all 0 < p < 1, and α1, α2, 1/2 > −1/2 r(MD) = r(Dπ KΛK Dπ D) (1 p)f(α1) + pf(α2) > f((1 p)α1 + pα2). (14) > −1/2 1/2 > − − = r(KΛK Dπ DDπ ) = r(KΛK D) > 1/2 > 1/2 Let α1 = 0 and α2 = µ. We know that = r(ΛK DK) = r(D KΛK D ).

[κ] r(D ) maxi∈κ wi Note the remarkable symmetry here: r(MD) = f(0) = = 1. > > maxi=1,...,n wi maxi=1,...,n wi ≤ r(KΛK D) = r(ΛK DK), which sandwiches one of two diagonal matrices, Λ and D, between the change- So 14 becomes of-basis matrices K and K>. From this symmetric form, the theorem results imme- 1 p + pf(µ) (1 p)f(0) + pf(µ) > f(pµ). − ≥ − diately from application of the Rayleigh quotient char- acterization of the spectral radius of any symmetric Now, let p = h. Using f(0) 1 and (13) we get ≤ real matrix S ([36, pp. 172–173], [37, pp. 176–180]), > > 0 < (1 h)f(0) + hf(µ) f(hµ) r(S) = maxx6=0(x Sx/x x). − − 1 h + hf(µ) f(hµ) = µh(g(µ) g(hµ)). Remark 1. No assumptions on the irreducibility ≤ − − − of M or MD enter here. In order for r(MD) to Therefore g(µ) = RRo(M(µ)[κ], D[κ]) strictly increases be the mean fitness of the population when MD is in 0 < µ < 1 for any quasispecies κ. reducible, we must impose the additional assumption that the population contain a quasispecies κ such that Theorem 10 (Expression for the Spectral Radius). r(M[κ]D[κ]) = r(MD). When mutation has the transition matrix of a reversible Remark 2. The approach taken here may have util- 1/2 > −1/2 Markov chain, M = Dπ KΛK Dπ , then ity for graph theory. One obtains bounds on the spectral radius of a graph by embedding it into a larger graph r(MD) = max x>D1/2KΛK>D1/2x x>x=1 whose eigenvectors and eigenvalues are known, and pro- n n 2 jecting those eigenvectors onto the graph. X  X  = max λj xi√wiKij . (15) x>x=1 j=1 i=1 Theorem 12. (Lower Bound from the Fourier Representation of Fitness)Represent the fitnesses wi Pn Proof. The transition matrix of an irreducible reversible as wi = j=1 Kijaj or w = Ka, where ai is the Fourier Markov chain can be represented ([34, p. 33] [33]) as coefficient for eigenvector [K]i. Then

M = D1/2KΛK>D−1/2 n π π 1 X 2 r(MD) Pn λjaj . (16) ≥ j=1 wj where reversibility requires Mijπj = Mjiπi, and j=1

> π = (π1, . . . , πn) = Mπ is the stationary distribution Proof. The square-root terms √wi in (15) can be “di- for M, i.e. its Perron vector, gested” by employing xi = c√wi, where the constraint > > 1/2 x x = 1 implies c = 1/√e De. Then Dπ is the diagonal matrix of the square roots of πi, n n !2 K is the matrix of orthogonalized eigenvectors of M, X X > > r(MD) = max λj xi√wiKij KK = K K = I, i.e. for each j = 1, . . . , n, x>x=1 Pn 2 Pn j=1 i=1 Kij = 1, KijKih = 0 if j = h, i=1 i=1 6 n n !2 2 X X [K] is the j-th column of K, c λj √wi√wiKij j ≥ j=1 i=1 1/2 1/2 Ki1 = πi or [K]1 = Dπ e, n n !2 2 X X h i = c λj wiKij Λ := diag λi is the diagonal matrix of eigenvalues j=1 i=1 λ1 > λ2 λn 1 of M, these inequalities 1 ≥ · · · ≥ ≥ − = e>DKΛK>De. (17) coming from Perron-Frobenius theory. e>De Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 12

Substitution with the Fourier representation w be represented in terms of the Fourier representation of De = Ka into (17) gives (16): ≡ fitnesses, w = Ka, as

n 1 > > 1 X r(MD) e DKΛK De Cov [W ,W ] = λ a2. ≥ e>De e/n P O n i i i=2 1 > > > = Pn a K KΛK Ka wj j=1 Proof. When M is symmetric, π = e/n,[K]1 = e/√n, n > > 1 1 X and e K = √n e1 , and using the Fourier representation = a>Λa = λ a2. Pn w Pn w j j w = Ka, one gets j=1 j j=1 j j=1 n X > > > Several lemmas are needed for the proofs to follow. wj = e w = e Ka = e1 a√n = a1√n. The first is a generalization of the result for graphs in j=1 [39]. Substituting the above into (19), one obtains: Lemma 13 (Parent-Offspring Fitness Covariance). For > 1/2 > 1/2 > 2 a stationary mutational random walk with reversible Cove/n[WP ,WO] = w (D KΛK D )w (w π) π π − Markov chain transition matrix M, and stationary dis- = w>(KΛK>)w/√n2 (w>e)2/n2 tribution π, the π-weighted covariance between parent − = a>K>KΛK>Ka/n (a>K>e)2/n2 and offspring fitnesses (WP , WO respectively) is − > > 2 2 = a Λa/n (a e1√n) /n n n !2 − X X 1/2 n ! n Covπ[WP ,WO] = λj wiπi Kij . (18) 1 X 2 2 1 X 2 = λiai a1 = λiai . j=2 i=1 n − n i=1 i=2

Proof. Theorem 15 (Relation to Random Mutational Walks). For a stationary mutational random walk with reversible Cov [W ,W ] π P O Markov chain transition matrix M, and stationary dis- n n  n  n  X X X X tribution π, let Covπ(WP ,WO) be the π-weighted co- = πjwj wiMij πjwj πjwiMij − variance between parent and offspring fitnesses. Then j=1 i=1 j=1 i,j=1 at a mutation-selection balance, wbzˆ = MDzˆ, = w>MDπ (w>π)(w>Mπ) − > 1/2 > −1/2 > >  n  " # = w (Dπ KΛK Dπ )Dπ w πw π (19) X 1 − r(MD) wjπj 1 + Covπ[WP ,WO] . > 1/2 > 1/2 > >   > 2 = e DD KΛK D De e DDπee DDπe ≥ (w π) π π − j=1 = e>DD1/2(KΛK>)D1/2De π π Proof. Using (19), e>DD1/2(D1/2ee>D1/2)D1/2De − π π π π > 1/2 > 1/2 > 1/2 1/2 x>(D1/2KΛK>D1/2)x = e DDπ (KΛK Dπ ee Dπ )Dπ De r(MD) = max − x6=0 x>x = e>DD1/2(KΛK> [K] [K ]>)D1/2De π 1 1 π > 1/2 1/2 > 1/2 1/2 − e (DπD) (D KΛK D )(DπD) e > 1/2 h i > 1/2 > = e DDπ (K diag 0, λ2, . . . , λn K )Dπ De ≥ e (DπD)e 2 > 1/2 > 1/2 n n ! e DDπ KΛK Dπ De X X 1/2 = = λj wiπi Kij . (20) w>π j=2 i=1 1 = [Cov [W ,W ] + (w>π)2] w>π π P O  1  = (w>π) 1 + Cov [W ,W ] . The expression (18) simplifies when M is symmetric, (w>π)2 π P O to give the following, originally derived in [39, 43]. Corollary 16 (Condition for Positive Parent/Offspring Lemma 14 (Parent-Offspring Fitness Covariance Un- Fitness Covariance). When the eigenvalues of M are der Symmetric Mutation [39]). For a stationary muta- positive, then Covπ[WP ,WO] 0. tional random walk with a doubly stochastic transition ≥ matrix M = M> = KΛK>, and stationary distribution Proof. When λi > 0 for all i, then all terms in the sum π = e/n, the parent-offspring covariance in fitness can (18) are nonnegative. Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 13

 n  Lemma 17. Since K is orthogonal, its columns form > 1/2 1/2 2 X > 1/2 2 Pn = max (1 λ2)(x D π ) +λ2 (x D [K]j) a basis to represent x = Kc, or xi = Kijcj. The x>x=1 − j=1 j=1 > > > > > constraint x x = 1 = c K Kc = c c implies c c = 1. > 1/2 1/2 2 (1 λ2) max (x D π ) + Then ≤ − x>x=1 n n n n X X X X > 1/2 2 x K = ( K c )K λ2 max (x D [K]j) i ij ih h ij x>x=1 i=1 i=1 h=1 j=1 n n n n n X X X X X > 2 = ch KihKij = chδhj = cj. (1 λ2) wiπi + λ2 max[wi] max (x [K]j) ≤ − i x>x=1 h=1 i=1 h=1 i=1 j=1 n n Lemma 18. X X 2 = (1 λ2) wiπi + λ2 max[wi] max cj − i c>c=1 i=1 j=1 > > 1/2 > > 1/2 max x y = (y y) , min x y = (y y) . (21) n x>x=1 x>x=1 − X = (1 λ2) wiπi + λ2 max[wi]. > − i > x y i=1 Proof. Critical points of maxx>x=1 x y = maxx6=0 x>x satisfy Pn The lower bound, r(MD) i=1 wiπi, was proven in [32, Corollary F.2.]. Here,≥ a separate proof is pro- ∂ x>y y x>y i vided. > = > xi > 2 = 0, ∂xi x x x x − (x x) r(MD) which, for x>x = 1, are solved by y = x (x>y), hence i i  n n  > > 2 p > > > 1/2 1/2 2 X X 2 y y = (x y) so y y = x y is the maximum, and = max (x D π ) + λj( xi√wiKij) p > | | x>x=1 y>y = x y the minimum. j=2 i=1 − −| |  n n  Theorem 15 (Upper Bound from the Spectral Gap). > 1/2 1/2 2 X X 2 max (x D π ) + λn ( xi√wiKij) ≥ x>x=1 Assuming 0 < µ < 1/2, then j=2 i=1 n n   > 1/2 1/2 2 X X max (x D π ) = wiπi. wiπi r(MD) ≥ x>x=1 ≤ i=1 i=1 n n ! X X wiπi + λ2(M) max[wi] wiπi , ≤ i − i=1 i=1 Theorem 17 (Convexity of the Spectral Radius in µ). (ξ) In the case where L loci mutate independently at rate µ, where λ2(M) = 1 µ+µ max λ2(P ). − ξ=1,...,L each in a reversible Markov chain, then for 0 < µ < 1/2, r(M(µ)D) = Proof. The condition 0 < µ < 1/2 assures that all the eigenvalues of M(µ)(ξ) are positive, and thus all  L  > 1/2 O (ξ) (ξ) (ξ) (ξ)> 1/2 λi(M(µ)) > 0. A series of inequalities are obtained, the max x D K [(1 µ)I +µΛ ]K D x x>x=1 − last steps using the representation x = Kc from Lemma ξ=1 17, and Lemma 18. is convex in µ. r(MD) = Proof. For the situation where L loci mutate indepen-  n  dently at rate µ, and locus ξ has mutation distribution > 1/2 1/2 2 X > 1/2 2 max (x D π ) + λj(x D [K]j) (ξ) x>x=1 matrix P , the mutation matrix for the entire genome j=2 is  n  > 1/2 1/2 2 X > 1/2 2 L L max (x D π ) + λ2 (x D [K]j) O O ≤ x>x=1 M(µ)= [(1 µ)I(ξ)+µP(ξ)] = [I(ξ)+µ(P(ξ) I(ξ))]. j=2 − −  ξ=1 ξ=1 > 1/2 1/2 2 = max (1 λ2)(x D π ) + x>x=1 − For the case where mutation at each locus forms a re-  n  versible Markov chain, > 1/2 1/2 2 X > 1/2 2 λ2 (x D π ) + (x D [K]j) (ξ) > j=2 P(ξ) = D1/2K(ξ)Λ K(ξ) D−1/2. πξ πξ Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 14

Then by the Rayleigh theorem, since for irreducible P(κ) with real eigenvalues, Perron- Frobenius theory gives 1 λ(κ) 1. jκ r(M(µ)D) = (22) It takes only one positive≥ term≥ − in the sum to make  L  d2 > 1/2 O (ξ) (ξ) (ξ) (ξ)> 1/2 max x D K [(1 µ)I +µΛ ]K D x. 2 βj(µ) positive. The only way to get all the terms x>x=1 − dµ ξ=1 to be zero is for jκ = 1 or jγ = 1 for every distinct pair QL of loci κ, γ 1,...,L , which requires that index j The total number of genotypes is G := nξ, where { } ⊂ { } | | ξ=1 have only one jξ = 1. For all the j with at least two nξ is the number of alleles at the ξ-th locus. As a way 6 loci where jγ = 1 and jκ = 1, the term βj(µ) is strictly to index the genotypes for summation we can use lexi- convex in µ.6 For the other6 j where j = 1 except for cographical order, ξ at most one locus, βj(µ) is a straight line. From this convexity we know that for any j 1,..., G , for L : 1, . . . , n1 1, . . . , nL 1,..., G , { } × · · · × { } → { | |} h (0, 1), ∈ { | |} ∈ where we use the notation j = L (j1, j2, . . . , jL) QL −1 ∈ (1 h)βj(µ1) + hβj(µ2) βj((1 h)µ1 + hµ2). 1,..., ξ=1 nξ , and jξ := [L (j)]ξ. − ≥ − { Now (22) can} be expressed as Now we return to the full expression, r(M(µ)D) |G| |G|  L  X > 1/2 2 (ξ) r(M(µ)D) = max βj(µ)(x D [K]j) . X Y > 1/2 2 x>x=1 = max (1 µ+µλj ) (x D [K]j) , j=1 x>x=1 − ξ j=1 ξ=1 For a given x, each term (x>D1/2[K] )2 is nonnegative, NL (ξ) j where [K]j is the j-th column of K = ξ=1 K . De- so fine |G|   L L X > 1/2 2 Y  (ξ) Y h (ξ) i (1 h)βj(µ1) + hβj(µ2) (x D [K]j) βj(µ) = 1 µ + µλ = 1 µ(1 λ ) . − − jξ − − jξ j=1 ξ=1 ξ=1 |G| X > 1/2 2 First we will see that each βj(µ) is convex in µ. The = (1 h) βj(µ1)(x D [K]j) − case L = 1 is subsumed in Result 9. We assume L 2. j=1 ≥ The first derivative is |G| X > 1/2 2 L + h βj(µ2)(x D [K]j) d X (κ) Y (ξ) βj(µ) = (1 λ ) [1 µ(1 λ )]. j=1 dµ − − jκ − − jξ κ=1 ξ6=κ |G|   X > 1/2 2 βj((1 h)µ1 + hµ2) (x D [K]j) . The second derivative is ≥ − j=1 d2 βj(µ) Hence, for any µ1, µ2 (0, 1/2), and h (0, 1), dµ2 ∈ ∈ L X (κ) X (γ) Y (ξ) (1 h) r(M(µ1)D) + h r(M(µ2)D) = (1 λ ) (1 λ ) [1 µ(1 λ )] − − − jκ − − jγ − − jξ |G| κ=1 γ6=κ ξ6=κ,γ X > 1/2 2 = (1 h) max βj(µ1)(x D [K]j) L − x>x=1 X X (κ) (γ) Y (ξ) j=1 = (1 λj )(1 λj ) [1 µ(1 λj )], − κ − γ − − ξ |G| κ=1 γ6=κ ξ6=κ,γ X > 1/2 2 + h max βj(µ2)(x D [K]j) (23) x>x=1 j=1 Q (ξ) |G| where for L = 2 there is no term ξ6=κ,γ [1 µ(1 λj )]. − − ξ X > 1/2 2 (κ) (γ) max βj((1 h)µ1 + hµ2)(x D [K]j) All of the terms (1 λj ), (1 λj ) are nonnegative. ≥ x>x=1 − κ γ j=1 (ξ−) − To have 1 µ(1 λj ) > 0 for all jξ, the upper bound ξ = r(M((1 h)µ1 + hµ2)D). on µ is − − − Therefore, r(M(µ)D) is convex in µ. With additional ∗ 1 1 µ := min = 1/2, (24) care, the conditions for strict convexity can be elicited. j (ξ) (ξ) ξ 1 λ 1 minj λ ≥ − jξ − ξ jξ Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 15

Result 18 (House of Cards Mutation). Let M(µ) = robustness. Let genotype 1 (111) be favored by muta- (1 µ)I + µP, where P = πe>. Then tion bias, where the transition matrix of the reversible − Markov chain is n X   r(M(µ)D) (1 µ) max[wi] + µ πiwi. 0 3/4 3/4 0 3/4 0 0 0 ≤ − i i=1  1/3 0 0 1/3 0 1/3 0 0     1/3 0 0 1/3 0 0 1/3 0  Proof.    0 1/8 1/8 0 0 0 0 1/3  M =   . >  1/3 0 0 0 0 1/3 1/3 0  r(M(µ)D) = r([(1 µ)D + µπe D)   −  0 1/8 0 0 1/8 0 0 1/3  = r([(1 µ)D + µD1/2D1/2ee>D1/2D1/2)   − π π  0 0 1/8 0 1/8 0 0 1/3  > 1/2 1/2 > 1/2 1/2 = max x [(1 µ)D + µD Dπ ee Dπ D ]x 0 0 0 1/3 0 1/3 1/3 0 x>x=1 − " n # Genotypes 2, 3, 5 (101, 110, 011) are its neighbors. X 2 > 1/2 1/2 2 = max (1 µ) wixi + µ(x D Dπ e) Genotypes 2, 3, 4 (101, 110, 100) will be off the neutral x>x=1 − i=1 network, so the fitnesses are  n n !2 X 2 X > = max (1 µ) wixi + µ xi√πiwi  e D = (1, 0, 0, 0, 1, 1, 1, 1). x>x=1 − i=1 i=1 Then the equilibrium population neutrality is less n n !2 X 2 X than the average neutrality of the neutral network: (1 µ) max wixi + µ max xi√πiwi ≤ − x>x=1 x>x=1 i=1 i=1 e>DMDe n r(MD) = 0.6517 < = 0.6667. X e>De = (1 µ) max[wi] + µ πiwi, − i i=1 the last term coming from Lemma 18.

Counterexamples to Increased Mutational Robustness

For neutral networks with non-symmetric mutation ma- trices, it is no longer true that population must con- centrate on genotypes with above average mutational robustness. Two counterexamples are provided.

Counterexample 1: Cyclic Mutation Consider a space of three genotypes, two of which form a neutral network, with cyclic mutation

0 1 0 1 0 0 M = (1 µ)I + µ 0 0 1 , D = 0 1 0 . − 1 0 0 0 0 0

The neutral network genotypes have average offspring fitness e>DMDe/ e>De = 1 µ/2 > r(MD) = 1 µ. − − Counterexample 2: Biased Reversible Mutation Consider a mutation matrix of three loci with two- alleles, under single-event mutation. Suppose that mu- tation is biased towards a genotype with low mutational Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 16

References [19] Altenberg L (2012) Resolvent positive linear operators exhibit the reduction phenomenon. Proceedings of the [1] Kimura M (1968) Evolutionary rate at the molecular National Academy of Sciences U.S.A. 109:3705–3710. level. Nature 217:624–626. [20] Altenberg L (2012) The evolution of dispersal in ran- [2] King JL, Jukes TH (1969) Non-darwinian evolution. dom environments and the principle of partial control. Science 164:788–798. Ecological Monographs 82:297–333. [3] Hietpas RT, Jensen JD, Bolon DN (2011) Experimental [21] Danchin E,´ Wagner RH (2010) Inclusive heritability: illumination of a fitness landscape. Proceedings of the combining genetic and non-genetic information to study National Academy of Sciences 108:7896–7901. animal behavior and culture. Oikos 119:210–218. [4] Fisher RA (1928) The possible modification of the re- [22] Haldane J (1937) The effect of variation of fitness. sponse of the wild type of recurrent mutations. Amer- American Naturalist 71:337–349. ican Naturalist 62:115–126. [5] Waddington CH (1942) Canalization of development [23] Muller H (1950) Our load of mutations. American and the inheritance of acquired characters. Nature Journal of Human Genetics 2:111–176. 150:563–565. [24] Hermisson J, Hansen TF, Wagner GP (2003) Epis- [6] Conrad M (1974) in Physics and mathematics of the tasis in polygenic traits and the evolution of genetic nervous system (Springer), pp 82–107. architecture under stabilizing selection. The American Naturalist 161:708–734. [7] Conrad M (1979) Mutation-absorption model of the enzyme. Bulletin of Mathematical Biology 41:387–405. [25] Kingman JFC (1977) On the properties of bilinear models for the balance between genetic mutation and [8] Ancel LW, Fontana W (2000) Plasticity, evolvability selection. Mathematical Proceedings of the Cambridge and modularity in RNA. Journal of Experimental Zool- Philosophical Society 81:443–453. ogy (Molecular and Developmental Evolution) 288:242– 283. [26] Gantmacher FR (1959) The Theory of Matrices [9] Bornberg-Bauer E, Chan HS (1999) Modeling evolu- (Chelsea Publishing Company, New York) Vol. 2. tionary landscapes: mutational stability, topology, and [27] Eigen M, Schuster P (1977) The hypercycle: A prin- superfunnels in sequence space. Proceedings of the Na- ciple of natural self-organization. Naturwissenschaften tional Academy of Sciences 96:10689–10694. 64:541–565. [10] van Nimwegen E (1999) Ph.D. thesis (Universiteit [28] B¨urger R, Hofbauer J (1994) Mutation load Utrecht, Amsterdam). and mutation-selection-balance in quantitative genetic [11] Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral traits. Journal of Mathematical Biology 32:193–218. evolution of mutational robustness. Proceedings of the [29] B¨urgerR (2000) The Mathematical Theory of Selection, National Academy of Sciences U.S.A. 96:9716–9720. Recombination, and Mutation (John Wiley & Sons, [12] Aguirre J, Buld´uJM, Manrubia SC (2009) Evolution- LTD, Chichester). ary dynamics on networks of selectively neutral geno- [30] Fisher RA (1930) The Genetical Theory of Natural Se- types: effects of topology and sequence stability. Physi- lection (Clarendon Press, Oxford). cal Review E: Stat Nonlin Soft Matter Phys 80:066112. [31] Kirkland S, Li CK, Schreiber SJ (2006) On the evolu- [13] Kun A,´ et al. (2015) The dynamics of the RNA tion of dispersal in patchy landscapes. SIAM Journal world: insights and challenges. Annals of the New York on Applied Mathematics 66:1366–1382. Academy of Sciences. [32] Karlin S (1982) in , eds Hecht MK, [14] Reeves T, Farr R, Blundell J, Gallagher A, Fink T Wallace B, Prance GT (Plenum Publishing Corpora- (2015) Eigenvalues of neutral networks: interpolating tion, New York) Vol. 14, pp 61–204. between hypercubes. arXiv preprint arXiv:1504.03065. [15] Altenberg L (2005) Evolvability suppression to stabilize [33] Altenberg L (2011) An evolutionary reduction prin- far-sighted adaptations. Artificial Life 11:427–444. ciple for mutation rates at multiple loci. Bulletin of Mathematical Biology 73:1227–1270. [16] Altenberg L (1984) A Generalization of Theory on the Evolution of Modifier Genes (ProQuest) [34] Keilson J (1979) Markov Chain Models: Rarity and http://search.proquest.com/docview/303425436/ Exponentiality (Springer-Verlag, New York). abstractProQuest document ID: 303425436. [35] Hermisson J, Redner O, Wagner H, Baake E (2002) [17] Altenberg L, Feldman MW (1987) Selection, general- Mutation–selection balance: Ancestry, load, and maxi- ized transmission, and the evolution of modifier genes. mum principle. Theoretical Population Biology 62:9–46. I. The reduction principle. Genetics 117:559–572. [36] Wilkinson JH (1965) The Algebraic Eigenvalue Problem [18] Altenberg L (2010) Proof of the Feldman-Karlin conjec- (Clarendon Press, Oxford). ture on the maximum number of equilibria in an evolu- [37] Horn RA, Johnson CR (1985) Matrix Analysis (Cam- tionary system. Theoretical Population Biology 77:263– bridge University Press, Cambridge). 269. Fundamental Properties of the Evolution of Mutational Robustness— Lee Altenberg 17

[38] Weinberger ED (1991) Fourier and Taylor series on fitness landscapes. Biological Cybernetics 65:321–330. [39] Stadler PF (1996) Landscapes and their correlation functions. Journal of Mathematical chemistry 20:1–45. [40] Klemm K, Stadler PF (2014) in Theory and Principled Methods for the Design of Metaheuristics (Springer), pp 41–61. [41] Hordijk W, Stadler PF (1998) Amplitude spectra of fitness landscapes. Advances in Complex Systems 1:39– 66. [42] Davies EB, Gladwell GML, Leydold J, Stadler PF (2001) Discrete nodal domain theorems. Linear Algebra and its Applications 336:51–60. [43] Reidys CM, Stadler PF (2001) Neutrality in fitness landscapes. Applied Mathematics and Computation 117:321–350. [44] Bull JJ, Meyers LA, Lachmann M (2005) Quasispecies made simple. PLoS Computational Biology 1. [45] Tejero H, Montero F, Nu˜noJ (2015) Theories of lethal mutagenesis: From error catastrophe to lethal defec- tion. Current Topics in Microbiology and Immunology pp 1–19. [46] van Nimwegen EJ, Crutchfield JP, Huynen M (1999) Metastable evolutionary dynamics: Crossing fitness barriers or escaping via neutral paths? Bulletin of Mathematical Biology 62:799–848. [47] Van Nimwegen E, Crutchfield JP (2001) Optimizing epochal evolutionary search: Population-size depen- dent theory. Machine Learning 45:77–114. [48] Geroldinger L, B¨urgerR (2015) Clines in quantitative traits: The role of migration patterns and selection sce- narios. Theoretical Population Biology 99:43–66. [49] Horn RA, Johnson CR (2013) Matrix Analysis (Cam- bridge University Press, Cambridge), 2 edition. [50] Cohen JE (1981) Convexity of the dominant eigenvalue of an essentially nonnegative matrix. Proceedings of the American Mathematical Society 81:657–658. [51] Friedland S (1981) Convex spectral functions. Linear and Multilinear Algebra 9:299–316.