<<

Genet. Res. Camb.(1997), 69, pp. 127–136. With 2 figures. Printed in the United Kingdom # 1997 Cambridge University Press 127

Model dependency of error thresholds: the role of fitness functions and contrasts between the finite and infinite sites models

THOMAS WIEHE Department of IntegratiŠe Biology, UniŠersity of California, Berkeley, CA 94720, USA (ReceiŠed 31 January 1996 and in reŠised form 26 August 1996 and 8 NoŠember 1996)

Summary Based on a deterministic –selection model the concept of error thresholds is critically examined. It has often been argued that genetic information – for instance, an advantageous allele – can be selectively maintained in a population only if the is below a certain limit, the error threshold, which is inversely related to the genome size. Here, I will show that such an inverse relationship strongly depends on the fitness model. To produce the error threshold, as given by Eigen (1971), requires that the fitness model is an extreme form of diminishing . The error threshold, in a strict sense, vanishes as epistasis changes from diminishing to synergistic. In the latter case even the usual definition of error thresholds becomes ambiguous. Initially, a finite sites model has been used to describe error thresholds. However, they can also be defined within the framework of the infinite sites model. I study both models in parallel and compare their properties as far as error thresholds are concerned. It is concluded that error thresholds possibly play a much less important role in molecular evolution than has often been assumed in the past.

rate there is a maximal evolutionarily permissible 1. Introduction genome size if genetic information (in the form of an A discussion about the evolutionary role of what was advantageous allele) is to be maintained in the later called error threshold was initiated in the early population. As a corollary, a given sequence length 1970s. Underlying this discussion is a model of defines a maximal permissible mutation rate, the error biochemical replicator systems, devised by Eigen threshold. Based on these relations, it was argued (1971) for a description of prebiotic evolutionary (Eigen & Schuster, 1979) that evolution at the early processes. A somewhat simplified version of this stages of life was confronted with a catch-22 situation model is equivalent to the deterministic population (no enzymes without a large genome and no large genetical mutation–selection equation (see Crow & genome without enzymes, which enhance the rep- Kimura, 1970, and references therein). The concept of lication accuracy), also named ‘information crisis’. error thresholds translates to the question of how In this article I revisit the deterministic mutation– large the mutation rate can be in order not to selection model and investigate the dependency of completely eliminate an advantageous allele from the error thresholds on assumptions such as the underlying population, when mutation and selection are balanced. fitness function or the finite (cf. Wright, 1949) versus For a finite sites model the answer may – among other infinite sites model (Kimura, 1969). Both turn out to quantities – depend on the genome size, measured by be crucial for the existence and magnitude of error the number of sites. Error thresholds are usually thresholds. associated with an inŠerse relationship between per Based on an infinite sites model and using discrete nucleotide mutation probability (p) and genome size time difference equations Wagner & Krall (1993) (ν). The formula given by Eigen (1971)is reported a condition under which no error threshold exists. However, their model is somewhat more log σ ν l ,(1)restrictive than the one used here. It allows only for max p single-step and considers merely fitness where σ is a so-called superiority parameter which functions which decrease monotonically as the number depends on the fitness model. The interpretation of of mutations increases. For the continuous time this equation is the following. For a given mutation model, I found results which are analogous to those of

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 T. Wiehe 128

the former authors. Further, I treated several quency of the wild-type becomes zero. Alternatively, generalizations of fitness functions for the finite and overall population statistics may be used, such as the infinite sites models in parallel. In particular, explicit average distance E of an allele from the wild-type (in threshold formulae are derived. Since one of the terms of number of mutations) or the index of important characteristics of error thresholds is the dispersion D, both defined below. limit they set on genome size (Nee & Maynard Smith, Mutation–selection balance had been an extensively 1990), an essential point is missed if the study of error discussed topic long before it attracted the attention thresholds and their dependence on fitness functions is of biochemically motivated research. Kimura & restricted to the infinite sites model. Maruyama (1966) investigated the effect of epistasis Here, I concentrate on the haploid version of the on the mutation load. Their interest has been in coupled (Hadeler, 1981) mutation–selection model. It quantifying the genetic load for different models of has been used previously in this context (e.g. Wiehe et selection and contrasting sexual and asexual rep- al., 1995). It describes the dynamics of a wild-type lication. They found that diminishing epistasis pro- allele and its variants, derived by mutation, for an duces a much higher mutational load than synergistic infinitely large population. In this model alleles are – epistasis, if reproduction is diploid. However, they did for simplicity – binary nucleotide sequences with a not directly compare the magnitudes of mutation common length ν %_, but which may differ by point rates which are permissible for the different fitness mutations. Furthermore, it is assumed that reverse models so as not to stall natural selection. This is done mutations can be neglected, and that alleles which below and it is found that the kind of epistasis also differ from a particular type, called wild-type in the strongly influences the (haploid) error threshold. following, by the same number of point mutations have identical fitness. A fitness model which is often adopted in discussions The model of error thresholds (Swetina & Schuster, 1982; Nowak The fundamental mutation–selection equation, in the & Schuster, 1989; Wiehe et al., 1995) is the so-called form of coupled mutation and selection terms, is single-peaked fitness landscape: only the wild-type is distinguished by some fitness advantage. Any other ν yc k l  yi Ši mkikyk Š` ,0%k%Š,ν%_. (2) type, differing in as little as a single mutated site, is at i=! a disadvantage, expressed in its fitness 1ks. Such a two-class model was found to be adequate, for Here, yk denotes the frequency of the class of alleles instance, to describe evolution of the coliphage Qβ which differ from the wild-type (class 0) by exactly k (Domingo et al., 1978). Other examples discussed point mutations. Šk are their associated fitness values. ν (Eigen & Biebricher, 1988) in this context are the The mean fitness of the population is Š` l i=o yi Ši. highly error-prone replication of viruses (Ortin et al., mki are entries of a mutation matrix M. They describe 1980; Martinez-Salas et al., 1985) or the serial transfer transitions from type i to type k. For the finite and the experiments of in Šitro replication of RNA (Spiegel- infinite sites models the mutation rates have to be man et al., 1965). However, concerns about the defined separately. In the former case, single sites biological adequacy of a single-peaked model and its mutate with probability p; in the latter, p has to be more general applicability, for instance to evolution substituted by a genome mutation rate, denoted λ. of higher organisms, have been raised (Maynard The probabilities mij are from a binomial distribution Smith, 1983; Charlesworth, 1990; A. S. Kondrashov, in the former case (see (3)). They are replaced by personal communication). Poisson mutation rates mV ij in the second case. To first- Although some consideration has been given by order accuracy, p and λ are related b λ l νp. M has Eigen & Biebricher (1988) to the ‘fitness topography size (νj1)i(νj1) and entries (cf. Higgs, 1994) of sequence space’, the dependency of error thresholds 1 νkj i−j −i on this topography has not been investigated. In p (1kp)ν ,ifi&j m 2 ikj (3) particular, one is left with the impression that any ij l 3

fitness topography with a ‘superior master’ allele 4 0, if i ! j. (Eigen & Biebricher, 1988, p. 222) might produce an The Poisson approximation of Mp of M has entries error threshold which is inversely related to genome i−j λ 1 −λ size. e (i−j)!,ifi&j 2 It therefore still appears worthwhile to ask how mij l 3 (4) 0, if i ! j. general error thresholds are and, if possible, to 4 quantify them in terms of the usual parameters. Matrices M and Mp are asymptotically equivalent (i.e. Despite numerous studies, there is no unambiguous or entries of M converge to the respective entries of Mp ) generally accepted definition of the term ‘error as ν becomes large and p small. When working threshold’. I will adhere to two characterizations without reverse mutation (see mij in (3)), this follows which are commonly used. One is to determine the immediately from the approximation of a binomial mutation parameter such that the equilibrium fre- distribution by a Poisson distribution. However, with

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 Error thresholds and fitness functions 129

little more effort it can be shown that it is still true yi((1ks)jsy!); an easy representation in closed form even if this condition is dropped (proof not shown). is not available. However, analytical expressions for Therefore, if ν is large, model assumptions about y` ! and Ez (p) can be derived (see Appendix). For both reverse mutation do not have a bearing on the fitness functions the condition existence or magnitude of error thresholds nor is it the y` ! l 0 suppression of back mutations which distinguishes models of Muller’s ratchet (Muller, 1964; Felsenstein, is equivalent to 1974) from those of error thresholds, as was previously y` i l 0, i ! ν and y` l 1 (ν finite) suggested (Nowak & Schuster, 1989). ν Further general definitions are the aŠerage distance y` i l 0, i & 0(νinfinite), ν from the wild-type, given by E(p) l i=! iyi(p), and indicating that the equilibrium is unique in these cases the Šariance in distance from the wild-type V(p) l (note that the theorem which states uniqueness and ν # i=!(ikE(p)) yi(p) (respectively for λ instead of p). global stability of the equilibrium distribution for the Derived from these, the index of dispersion D(p) l haploid mutation–selection equation (Moran, 1976) V(p)\E(p) is used to measure the concentration of the does not a priori hold if reverse mutation is suppressed, population around the wild-type allele. as in the model above). The error threshold may therefore either be identified as the smallest p l pmax 3. Results (λ l λmax) which yields y` ! l 0 (property of a single type) or, equivalently, via a property of the entire (i) Single-peaked and multiplicatiŠe fitness functions population, namely F The single-peaked function, SP, is defined by two Ez (pmax) l ν, fitness levels: one for the wild-type, Š! l 1, and one Ez (λmax) l_, (5) for the mutants Ši l 1ks, i & 1, s " 0. Under the multiplicative function, FM, the fitness values are Ši l for finite and infinite ν, respectively. The analytical i (1ks) , i & 0. This fitness function is often associated expressions are given in Table 1. The threshold for (Haigh, 1978; Stephan et al., 1993) with the process finite ν and the single-peaked function FSP is to a first- known as Muller’s ratchet: the accumulation of order approximation pmax $ s\ν (valid if QsQ ' 1). This deleterious mutations, together with random loss of coincides with the formula (1), given by Eigen (1971), rare alleles, leads to a ratchet-like decrease of mean as his superiority parameter σ equals 1\(1ks) for the fitness of a population. Analytical expressions for the single-peaked fitness function. For FM, a threshold equilibrium frequencies y` i and the average distance, can be found if ν is finite. Interestingly, it does not Ez (p), can readily be obtained for FM (cf. Kimura & depend on ν. Furthermore, for infinite ν, both criteria Maruyama, 1966, for the infinite sites case). For the fail to detect any threshold at all. Obviously, under finite sites model, it is a straightforward calculation to multplicativity no limiting relationship exists between i show that y` i (see Table 1) satisfy j=! yj Šj mij l yi Š` . genome size and mutation rate. The index of dispersion For FSP the frequencies y` i, i & 1 may be ob- (depicted in Fig. 1) captures the qualitative difference i tained recursively from the relation j=! yj Šj mij l between the two fitness models and its bearing on the

10

4 8

FSP 6

2 4 Index of dispersion Index of dispersion Index FSP

2

FM FM 0 0 0·00 0·05 0·10 0·15 0·0 1·0 2·0 3·0 p λ

Fig. 1. Left: Comparison of single-peaked (FSP) and multiplicative (FM) fitness functions for finite ν. Plot of the index of dispersion V(p)\E(p). Parameters are s l 0.1 (FM), s l 0n95 (FSP), ν l 30. Right: Index of dispersion V(λ)\E(λ) for infinite ν. Parameters are s l 0n1 (FM), s l 0n95 (FSP). The error threshold is located where the dispersion function becomes singular (case FSP). Obviously, mutation has no influence on dispersion in case FM. Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 T. Wiehe 130

Table 1. Analytical expressions for equilibrium frequencies y` i, aŠerage distance from wild type Ez , index of dispersion Dz and error threshold for fitness functions FSP and FM

Model FSP FM ν finite y` i(p) (1kp)νk(1ks) ν (p\s)i(1kp\s)ν−i s i z −" E(p) νp(1kp)ν νp " (1kp(ν− k(1ks) s Dz (p) (see Appendix) p 1k  s ("/ν) pmax 1k(1ks) s

ν infinite ( /s)i −(λ/s) λi y` i(λ) e−λk(1ks) e ! s Ez (λ) λ λ 1k(1ks)eλ s z D(λ) 1k(1kλ)(1ks)eλ 1 1k(1ks)eλ λmax klog (1ks) _

In case FSP entries y` i(p) and y` i(λ) refer only to i l 0.

error threshold. Under the multiplicative model the The analogue for the infinite sites model is obtained index of dispersion remains bounded. Under the when one multiplies the latter equation by ν and takes single-peaked model, the index of dispersion increases the limit. The result is around pmax (λmax) dramatically and exhibits a λ λ (s, t) log (1 t). (8) singularity, if ν is infinite. The singularity reflects the max l max lk k sudden loss of localization of the allele distribution. As expected, the limiting threshold is independent of the multiplicative part described by the decay terms i (ii) A more general fitness function (1ks) . To test the assumption that both threshold criteria An obvious generalization is to combine functions FSP can be used equivalently, I compared the analytical and FM. The superposition, called F, sheds some more formula (7) with a numerical evaluation of the ODE light on the dependencies of the error threshold. Let system for various parameter choices, and found that i both criteria, saturation of E and vanishing of the Ši l t(1ks) j(1kt), (0 % s, t % 1). (6) wild-type, yield identical error threshold values (results Letting t l 1 yields the multiplicative and s l 1 the not shown). single-peaked fitness functions. s l 0ortl0 produce In the finite sites case either parameter t or a neutral model. For fitness functions with parameters parameter s predominantly characterizes the error on the boundaries of the s–t unit square, error threshold. This depends on whether 1kt & (1ks)ν or thresholds can be detected using either one of the two ν 1kt ! (1ks) , i.e. the structure of the component Šν, criteria. Assuming this is true also for parameters in the minimum of the fitness function, is the decisive the interior, an analytical threshold formula can be factor. Furthermore, the negative correlation between obtained as follows. At equilibrium, the wild-type error threshold and genome size is weak if the slope of frequency satisfies the fitness landscape around its ‘adaptive peak’ is moderate (s small). k (1kp)ν l  yk Šk l t  yk(1ks) j(1kt) If ν 4_, and in agreement with (8), only parameter k k t plays a role. The error threshold does not depend on l t(1ks)νj(1kt) whether the decay in fitness around the adaptive peak is smooth or abrupt (s small versus s large). The error since the distribution will be concentrated in class ν threshold may not exist at all (t l 1). This observation, once the threshold is surpassed (saturation of E(p)). as shown in the next section, can be generalized to Thus, for the finite sites model one has non-Fisherian fitness functions, i.e. to cases without ν "/ν pmax l pmax(s, t) l 1k(t(1ks) j(1kt)) . (7) an unique adaptive peak.

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 Error thresholds and fitness functions 131

Thus, an upper bound for the error threshold for any (iii) Strictly positiŠe fitness and truncation selection ∆ fitness function of type F δ is given by Since for large ν the quantity y! may not be robust δ enough to detect thresholds numerically, in this section λmax %klog .(16) the population statistics E(λ) and V(λ) are used to ∆ identify possible error thresholds. The following two To calculate the index of dispersion more must be ∆ examples represent possible generic cases. Let F δ be known about the individual fitness assignments. The defined by general expression is # # 0 ! δ % Ši % ∆ !_ (0 % i %_) (9) λjΣk k yk Šk\Š` k(Σk kyk Šk\Š` ) Dz (λ) l . and FνV by λjΣk kyk Šk\Š` 1 arbitrary for i % νV On the other hand, for F any finite λ produces E(λ) Š l 2 (10) νV i 3 !_. No error threshold exists. This can easily be seen 4 l 0 for i " νV . by studying the worst case scenario Ši l 0 for all i 

νV . Then Š` l ŠνV yνV and from (11) follows for k & νV F ∆ contains F as a special case. F is a model of δ SP νV k−νV − λ truncation selection: carrying more than νV mutations yk l e λ .(17) is lethal for an individual. From the equilibrium (kkνV )! condition The latter is independent of the particular choice for k ŠνV . Since yk l 0 for k ! νV , by summing over k one  yi Ši mV ki l yk Š` ,(k&0) (11) derives i=!

follows Ez (λ) l  kyk l λjνV .(18) k k−i k − λ Š` E(λ) l Š`  kyk l   iyi Ši e λ Ez (λ) is therefore limited from above by an affine linear k k i=! (kki)! function and cannot exhibit a singularity for finite λ. k k−i − λ The variance in this case is Vz (λ) l λ. Therefore, the j  (kki) yi Ši e λ .(12) k i=! (kki)! index of dispersion in the worst case scenario has an upper bound of 1, supporting the claim that there is The two series on the right side in (12) are each no error threshold. These results agree with the products of series. The first one simplifies to findings by Wagner & Krall (1993). For a slightly k − λ different model, these authors proved that error e λ  kyk Šk n  l  kyk Šk, k k k! k thresholds may be produced by monotonically de- creasing fitness functions which are bounded from the second one to below by a strictly positive value. As shown above, k monotony is not even necessary; what matters is only −λ λ e  yk Šk n  k l λŠ` , the ratio of lowest and highest fitness values. k k k! Analogous arguments apply to the finite sites case. which together gives ∆ An upper bound to the error threshold for F δ is given by Š` E(λ) l λŠ` j kyk Šk.(13) "/ k δ ν pmax % 1k .(19) ∆ ∆ For F δ , the last series is trivially bounded from below by For truncation selection a new property is encountered: the two threshold criteria may yield  kyk Šk & δE(λ). differing results. This fact is dealt with more k thoroughly in the following section. This yields −λ (iv) Epistasis z λŠ` λδe E(λ) & & −λ .(14) Š` kδ ∆e kδ Error thresholds depend strongly on the amount and The last inequality can be seen as follows. Let kV be (for kind of epistasis. Let the epistatic fitness function FE(α) be defined by a fixed λ) the lowest index such that ykV  0. Then ykV −λ −λ −λ iα ŠkV mV kV kV l ykV Š` . Thus, Š` l e ŠkV & e δ and Š` % e ∆, Ši l (1ks) , (20) independent of kV . Clearly, the last expression shows a with α a positive parameter. One distinguishes singularity for (a) diminishing epistasis (0 ! α ! 1), δ 1 λ lklog .(15) (b) multiplicativity or absence of epistasis (α l ), ∆ (c) synergistic epistasis (α " 1).

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 T. Wiehe 132

FM and FSP are again special cases of the epistatic 1 functions: FSP is an extreme case of diminishing 0·8 epistasis and recovered when α 4 0;FM corresponds to p max the case when α l 1. The extreme form of synergistic 0·6 epistasis is truncation selection (α l _). Unfortu- p nately, for general α, no analytical solution for the 0·4 max stationary frequencies y` i is known. However, ana- lytical threshold formulae can still be derived. The 0·2 two threshold conditions ‘loss of wild-type’ and ‘saturation of E(p)’ are in general not equivalent. The 0 0·25 0·5 0·75 1 1·25 1·5 1·75 2 conditions coincide only if α % 1. There is a simple α heuristic explanation for this. The fraction Ši\Ši−" is # p pmax 1ks[iαk(ik1)α]jO(s ). For diminishing epistasis the Fig. 2. Error thresholds max (see (23) and (25)) and (see (22)) versus epistasis parameter α. A ‘bifurcation’ of expression in brackets tends to 0 as i increases. That thresholds occurs at α l 1 (absence of epistasis). $ means additional mutations cause a smaller decline Parameters are ν l 10 and s l 0n5. of relative fitness than previous mutations. Thus, maximal mutation pressure is needed to remove the −" # wild-type from the population. As soon as this is satisfy (according to (2)) y"(1ks)(1kp)ν ly"(1ks). −" accomplished the stationary distribution is concen- Thus, y" l (1kp)ν . Inserting this into the equation −" trated in the most distant mutation class ν and E(p) for p yields (1kp)ν l (1kp)ν (1ks), or p l s. For has saturated. For synergistic epistasis (α " 1) the fitness functions of type FE(α) the relation α" ! α# situation is reversed. Each additional mutation causes implies y! % y! (as functions of p). Now, by (α") (α#) a larger decline of relative fitness. Thus, to remove continuity, it is concluded that class i from the population requires stronger mutation 0 y (p s) y (p s) y (p s) 0. pressure than is needed for removal of class ik1.In l !(α=") l % !(α) l % !(α=_) l l particular, the wild-type is lost long before E(p) This means y! (p l s) l 0 for all 1 % α %_. saturates. Therefore, one has to distinguish between (α) In agreement with the results before, (22) and (23) pmax (loss of the wild-type) and the higher mutation max coincide for α l 1 (the two threshold criteria are probability p (saturation of E(p)). Below are equivalent for F ). analytical formulae for both. M To treat the case of diminishing epistasis note that To derive pmax let p be sufficiently large such that the two detection criteria are equivalent (since F the second-to-last mutation class is just lost by SP and F both have this property, a similar continuity mutation from a stationary population (any other M argument can be invoked to prove the property for 0 class, but the last, is already lost for smaller p). The ! α ! 1). When the threshold is surpassed then Š` l following three algebraic equations have to be satisfied α (1ks)ν , since the distribution is then concentrated in in this case: mutation class ν. This, together with the condition for the wild-type frequency to satisfy y Š m y Š` , yν−" Šν−" pjyν Šν l yν(yν−" Šν−"jyν Šν), ! ! !! l ! implies yν−" Šν−"(1kp) l yν−"(yν−" Šν−"jyν Šν), α (1kp)ν l (1ks)ν . (24) yν−"jyν l 1. Solving for p leads to The solution is −" να pmax l 1k(1ks) ,(α%1). (25) (ν−")α να ((1ks) k(1ks) )(1kyν−") p l ( −")α ,(21)These results have been validated numerically (results (1ks) ν not shown). Results for the extrapolation to ν l_ are given in Table 2. which, for yν−" l 0, yields Most important to note about these formulae is max α−( −")α p l 1k(1ks)ν ν ,(α&1). (22) that the error threshold is uniquely defined (in the sense used here) and inversely related to sequence On the other hand, the wild-type is already lost if length ν, only if α ! 1. A ‘bifurcation’ occurs for α l 1 and two threshold ‘branches’ split off (see Fig. 2). pmax l s,(α&1). (23) One of them is independent of the sequence length, the Eq. (23) holds for any α & 1 and independently of ν. other is even positively correlated with ν. This is in To justify this, note that it holds for α l 1 (see above). sharp contrast to the negative correlation of ν and p Letting α go to _ one obtains fitnesses Š! l 1, Š" l observed for FSP and which originally stimulated the 1ks and Ši l 0(i0, 1). At equilibrium y! satisfies debate about the information crisis. Furthermore, max (1kp)ν l y!jy"(1ks). Putting y! l 0 the latter note that the error threshold (p ) is an increasing "/ implies p l 1k(y"(1ks)) ν. Furthermore, y" has to function of the parameter (α) of epistasis. The lower

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 Error thresholds and fitness functions 133

Table 2. Comparison of thresholds for different fitness functions

Error threshold

Fitness function ν finite ν infinite

"/ν FSP (single-peaked) 1k(1ks) klog (1ks) FM (multiplicative) s −" _ να FE(α) (epistatic) 0 ! α % 1: 1k(1ks) _ p α " 1: max:s _ α " α (pmax: 1k(1ks)ν −(ν− ) ∆ "/ν F δ (arbitrary, bounded above and below) δ δ % 1k %klog ∆ ∆ a FνV (truncation selection ) pmax: s _ (pmax: 1 " F (superimposed) 1k(t(1ks)νj(1kt)) /ν klog (1kt)

∆ For definition of the fitness functions see the text. Function F δ is not unambiguously defined. In this case, only an upper bound to a threshold can be given. Note that thresholds in the cases of finite ν refer to p (nucleotide mutation probability), whereas in the case of infinite ν they refer to λ (genome mutation rate). a For finite ν truncation at νV l 1, for infinite ν the truncation point is arbitrary.

bound is given by the threshold of the single-peaked speculative. Insight into the shape of fitness functions fitness function (α l 0), the upper bound by the one associated with short sequences has been gained by for truncation selection (α l_); in this latter case one examining cases with a simple relationship between max has p l 1. genotype and phenotype. For instance, for tRNAs the The effect of synergistic epistasis on Muller’s ratchet phenotype may be associated with the molecular has been studied by Kondrashov (1994). He showed secondary structure, and folding properties of the that synergistic epistasis may effectively halt Muller’s primary sequence into its secondary structure are ratchet. This parallels the effect on the threshold. viewed as a principal determinant for the performance Synergism can – for any mutation rate – protect the (fitness) (Fontana & Schuster, 1987; Schuster et al., fittest allele from . 1994). Such models, reflecting sequence–structure Summarizing, the results of this section show that relations (Forst et al., 1995), show a threshold phenomenon, closely related to that observed under a (a) error thresholds do not generally exist (de- single-peaked fitness function. These results, however, pendency on the fitness function), rely mainly on theoretical models and computer (b) error thresholds are ambiguous (dependency on simulations, and remain to be confirmed experi- the definition in terms of a population property mentally. There seems to be little biological reason to or that of an individual allele), assume that a single-peaked fitness function provides (c) error thresholds need not be negatively corre- a generally applicable model underlying the evolution- lated with the size of the genome (epistatic ary dynamics across a wide spectrum of genes and effects). species. Whether multiplicative or epistatic models are closer to reality may surely be questioned as well. The latter have initially been studied with emphasis on 4. Discussion theoretical aspects. The view that synergistic epistasis Error thresholds have originally been described for a must be more common in nature than diminishing finite sites model, termed the quasispecies model epistasis has been prevalent since the 1960s (Kimura (Eigen & Schuster, 1979). They have been interpreted & Maruyama, 1966, and references therein). The as the minimal required replication accuracy to ensure discussion of the evolution of sex has often been that heredity of self-replicating molecules at the early linked to that of the operation of Muller’s ratchet in stages of life does not break down. It has been natural populations. models used in this suggested that the replication process of viruses and context are the multiplicative and epistatic ones (e.g. phages is adequately described within the quasispecies Felsenstein, 1974; Kondrashov, 1988; Charlesworth, concept and that these organisms replicate under 1990). Experimental evidence for the operation of conditions close to an error threshold (e.g. Domingo Muller’s ratchet in populations of various species has & Holland, 1988; Eigen & Biebricher, 1988). However, been compiled (Bell, 1988; Chao, 1990; Duarte et al., even if one or a few types of the viral quasispecies may 1992; Clarke et al., 1993). A recent study by Lynch be distinguished by a fitness advantage due to better (1996) detects Muller’s ratchet and a gradual loss of adaptation to the host environment, a full charac- fitness in mitochondrial tRNAs. For many nuclear terization of the fitness function remains largely eukaryotic genes synergistic epistasis appears to be a

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 T. Wiehe 134

more appropriate model than a single-peaked fitness out that, depending on the kind of epistasis, re- function. For instance, some genetical disorders are combination can alter the error threshold. Based on associated with an excess in the number of trimer simulations, they observe that recombination together repeats in certain genes (e.g. the number of CTG with synergistic or diminishing epistasis increases or repeats involved in penetrance of myotonic dystrophy: decreases permissible mutation rates and genome Harley et al., 1992, 1993). A suitable fitness function, sizes. They suggested that the presence of recombina- drawn over the number of trimer repeats, may be of tion might allow viruses to have larger genomes and the synergistic epistatic or truncation selection type. yet avoid the pitfalls of the information crisis. More generally, truncation selection is believed to Another problem with error thresholds arises from play an important role in the evolutionary dynamics the lack of a generally accepted definition. Two of repetitive sequences in eukaryotes (for a review see approaches have been taken in the past to identify Charlesworth et al., 1994). error thresholds. One is via a property of the entire With the conceptual dichotomy between models of population, the other is via a property of an individual error thresholds and Muller’s ratchet (cf. the title of allele. The above analysis shows that the two Wagner & Krall, 1993) which currently exists, it definitions need not be congruent (i.e. produce the appears natural to concentrate the analysis on two same threshold value). Rather, the presence of epistasis special fitness functions, which are representative of may make any definition in these terms obsolete. the two models: the single-peaked (FSP) and the The dynamics of the haploid model, treated here, is multiplicative function (FM). I investigated several equivalent to that of a diploid model as long as extensions; first, the superposition (function F)of dominance defects are absent (i.e. fitness of the both. The single-peaked, multiplicative and the neutral heterozygotes is intermediate between that of the models arise as special cases of F. Furthermore, homozygotes). Qualitative new features, however, special attention has been paid to truncation selection emerge if dominance plays a role. Its impact on error and a general function with the only restriction that it thresholds has been treated for the diploid analogue is bounded (a special case of which, again, is FSP). In of the single-peaked fitness function (Wiehe et al., agreement with the finding by Wagner & Krall (1993), 1995) and, recently, for a more general fitness model they harbour the minimum requirements to distinguish as well (Baake & Wiehe, 1996). fitness functions which are error threshold free from In a strict sense, the above deterministic analysis is

those which are not. Finally, the set of functions FE(α) valid for an infinitely large population only. Stochastic provides the possibility for a unified treatment of both versions – accounting for random drift – have been models within the concept of epistasis. Two generic studied as well (Nowak & Schuster, 1989; Stephan et cases emerge: synergistic and diminishing epistasis. al., 1993; Wiehe et al., 1995). The qualitative features For the finite sites case, I showed that error thresholds of the deterministic model are recovered. The presence exist in a strict sense (inverse relationship with genome of random drift and – in its wake – random loss of size, uniqueness) only if epistasis is diminishing (α ! rare alleles may only emphasize that it is an imminent 1). Any form of synergistic epistasis implies absence of ratchet mechanism, possibly not an information crisis, a (strict) error threshold. For the infinite sites model, evolution has primarily to cope with. the existence of error thresholds reduces even to a After all, in the light of the different fitness models non-generic special case (α l 0) (see Table 2). discussed above, it appears that the importance of Despite the study by Wagner & Krall (1993) and error thresholds as a limiting factor to molecular their proof that certain fitness functions do not evolution has been greatly overrated. produce error thresholds, the perception that the latter ubiquitously and independently of the shape of Appendix fitness functions set a limit to the evolutionary potential of a species continues to persist (Schuster, For function FSP and the mutation matrix as in (3) hold 1995, pp. 45f). Obviously, there is still a need to raise (1kp)νk(1ks) more caution about such a viewpoint. Even recent y` ! l max 0, , textbook accounts of error thresholds (Maynard  s  Smith & Szathma! ry, 1995) seem to overlook the fact and that the superiority parameter σ (see (1)) need not be −" a constant for general fitness functions (but may itself νp(1kp)ν Ez (p) l −" . depend on the mutation rate), and therefore the error (1kp)ν k(1ks) threshold need not be a general phenomenon.  Furthermore, other forces, recombination for in- P . The first part follows immediately from the stance, have a high impact on shaping the genetic equation for the wild-type material as it is passed on from generation to y! Š!(1kp)ν l y! Š` generation. Boerlijst et al.(1996) studied recombina- and the fact that for the single-peaked fitness function tion in a viral quasi-species and its influence on the error threshold. Nee & Maynard Smith (1990) point Š` l 1ksjsy!.

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 Error thresholds and fitness functions 135

As long as y!  0 one has Š` l (1kp)ν. To calculate Domingo, E. & Holland, J. J. (1988). High error rates, Ez (p), one multiplies the equilibrium equation on both population equilibrium, and evolution of RNA rep- sides by k, then sums over k to obtain lication systems. In RNA Genetics: Variability of RNA Genomes (ed. E. Domingo, J. J. Holland & P. Ahlquist), ν k vol. III, p. 3–36. Boca Raton: CRC Press. νki k−i −k   kyi(1ks) p (1kp)ν Domingo, E., Sabo, D., Taniguchi, T. & Weissmann, C. k=! i=! kki (1978). Nucleotide sequence heterogeneity of an RNA

ν phage population. Cell 13, 735–744. jsy! νp l E(p) n (1kp) . Duarte, E., Clarke, D., Moya, A., Domingo, E. & Holland, J. (1992). Rapid fitness losses in mammalian RNA virus On changing the order of summation and readjusting clones due to Muller’s ratchet. Proceedings of the National summation indices the latter equation becomes Academy of Sciences, USA 89,6015–6019. Eigen, M. (1971). Selforganization of matter and the ν ν−i νki k −i−k evolution of biological macromolecules. Naturwissen- (1ks)  yi  k p (1kp)ν i=! k=!  k  schaften 58, 465–523. Eigen, M. & Biebricher, C. K. (1988). Sequence space and ν ν−i νki k −i−k quasispecies distribution. In RNA Genetics: Variability of j iyi  p (1kp)ν jsy! νp RNA Genomes (ed. E. Domingo, J. J. Holland & k i=! k=!    P. Ahlquist), vol. III, p. 211–245. Boca Raton: CRC ν ν Press. l (1ks)  yi(νki) pj iyi j((1kp)νk(1ks)) νp Eigen, M. & Schuster, P. (1979). The Hypercycle. Berlin: i=! i=!  Springer. ν Felsenstein, J. (1974). The evolutionary advantage of l (1ks)(νpjE(p)n(1kp))j((1kp) k(1ks)) νp recombination. Genetics 78, 737–756. l E(p) n (1kp)ν. Fontana, W. & Schuster, P. (1987). A computer model of evolutionary optimization. Biophysical Chemistry 26, The last equation may be solved for E(p). * 123–147. In a similar manner one can find an analytic Forst, C. V., Reidys, C. & Weber, J. (1995). Evolutionary Dynamics and Optimization. In Lecture Notes in Artificial Vz p Vz p Ez p expression for the variance ( ). The ratio ( )\ ( ), Intelligence. Vol. 929: AdŠances in Artificial Life (ed. F. the index of dispersion, is Moran, A. Moreno, J. Marelo & P. Chacon) Berlin: z Springer. D(p) l Hadeler, K. P. (1981). Stable polymorphisms in a selection " model with mutation. SIAM Journal of Applied Math- (1 s (1 p)ν)((1 p)ν− (1 s)(1 νp)) k k k k k k k ematics 41, 1–7. ν−" ν−" , (1ksk(1kp) )((1kp) k(1ks)(1kp)) Haigh, J. (1978). The accumulation of deleterious genes in a population: Muller’s ratchet. Theoretical Population p ! pmax. Biology 14,251–267. Harley, H. G., Rundle, S. A., Reardon, W., Myring, J., Partial support of this work by a grant from the German et al 1 Academic Exchange Service (DAAD) E. Boake, is gratefully Crow, S., Brook, J. D., Harper, P. S., .(992). Lancet acknowledged. I would like to thank A. Kondrashov, Unstable DNA sequence in myotonic dystrophy. J. Mountain, R. Nielsen and an anonymous reviewer for 339, 1125–1128. constructive and helpful comments. Harley, H. G., Rundle, S. A., MacMillan, J. C., Myring, J., Brook, J. D., Crow, S., Reardon, W., Fenton, I., Shaw, D. J. & Harper, P. S. (1993). Size of the unstable CTG repeat sequence in relation to phenotype and parental References transmission in myotonic dystrophy. American Journal of Human Genetics 52, 1164–1174. Baake, E. & Wiehe, T. (1997). Bifurcations in diploid Higgs, P. G. (1994). Error thresholds and stationary mutant models on sequence space. Journal of Mathematical distributions in multi-locus diploid genetics models. Biology, 35,321–343. Genetical Research (Cambridge) 63, 63–78. Bell, G. (1988). Sex and Death in Protozoa: The History of Kimura, M. (1969). The number of heterozygous nucleotide an Obsession. Cambridge: Cambridge University Press. sites maintained in a finite population due to steady flux Boerlijst, M. C., Bonhoeffer, S. & Nowak, M. A. (1996). of mutations. Genetics 61, 893–903. Viral Quasi-Species and Recombination, Proceedings of Kimura, M. & Maruyama, T. (1966). The mutational load the Royal Society of London, Series B 263, 1577–1584. with epistatic gene interactions in fitness. Genetics 54, Chao, L. (1990). Fitness of RNA virus decreased by 1337–1351. Muller’s ratchet. Nature 348, 454–455. Kondrashov, A. S. (1988). Deleterious mutations and the Charlesworth, B. (1990). Mutation–selection balance and evolution of sexual reproduction. Nature 336, 435–440. the evolutionary advantage of sex and recombination. Kondrashov, A. S. (1994). Muller’s ratchet under epistatic Genetical Research (Cambridge) 55, 199–221. selection. Genetics 136, 1469–1473. Charlesworth, B., Sniegowski, P. & Stephan, W. (1994). The Lynch, M. (1996). Mutation accumulation in transfer evolutionary dynamics of repetitive DNA in eukaryotes. RNAs: molecular evidence for Muller’s ratchet in Nature 371,215–220. mitochondrial genomes. Molecular Biology and EŠolution Clarke, D., Duarte, E., Moya, A., Elena, S., Domingo, E. & 13, 209–220. Holland, J. (1993). Genetic bottlenecks and population Martinez-Salas, E., Ortin, J. & Domingo, E. (1985). passages cause profound fitness differences in RNA Sequence of the viral replicase gene from foot and mouth viruses. Journal of Virology 67, 222–228. disease virus C"-Santa Pau (C-S8). Gene 35, 55–61. Crow, J. F. & Kimura, M. (1970). An Introduction to Maynard Smith, J. (1983). Models of evolution. Proceedings Theory. New York: Harper & Row. of the Royal Society of London, Series B 219,315–325.

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619 T. Wiehe 136

Maynard Smith, J. & Szathma! ry, E. (1995). The Major in RNA secondary structures. Proceedings of the Royal Transitions in EŠolution. Oxford: W. H. Freeman. Society of London, Series B 255, 279–284. Moran, P. A. P. (1976). Global stability of genetic systems Spiegelman, S., Haruna, I., Holland, I. B., Beaudreau, G. & governed by mutation and selection. Mathematical Pro- Mills, D. R. (1965). The synthesis of a self-propagating ceedings of the Cambridge Philosophical Society 80, and infectious nucleic acid with a purified enzyme. 331–336. Proceedings of the National Academy of Sciences, USA 54, Muller, H. J. (1964). The relation of recombination to 919–927. mutational advance. Mutational Research 1, 2–9. Stephan, W., Chao, L. & Smale, J. G. (1993). The advance Nee, S. & Maynard Smith, J. (1990). The evolutionary of Muller’s ratchet in a haploid asexual population: biology of molecular parasites. Parasitology 100 (Suppl.), approximate solutions based on diffusion theory. S5–S18. Genetical Research (Cambridge) 61, 225–231. Nowak, M. & Schuster, P. (1989). Error thresholds of Swetina, J. & Schuster, P. (1982). Self-replication with replication in finite populations: mutation frequencies errors: a model for polynucleotide replication. Biophysical and the onset of Muller’s ratchet. Journal of Theoretical Chemistry 16, 329–345. Biology 137, 375–395. Wagner, G. P. & Krall, P. (1993). What is the difference Ortin, J., Na! jera, R., Lopez, C., Davila, M. & Domingo, E. between models of error thresholds and Muller’s ratchet? (1980). Genetic variability of Hong Kong (H3N2) Journal of Mathematical Biology 32, 33–44. influenza viruses: spontaneous mutations and their Wiehe, T., Baake, E. & Schuster, P. (1995). Error location in the viral genome. Gene 11,319–331. propagation in reproduction of diploid organisms: a case Schuster, P. (1995). Extended molecular evolutionary study on single peaked landscapes. Journal of Theoretical biology: artificial life bridging the gap between chemistry Biology 177, 1–15. and biology. In Artificial Life: An OŠerŠiew (ed. Wright, S. (1949). Adaptation and selection. In Genetics, C. G. Langton). Cambridge, Mass.: MIT Press. Paleontology and EŠolution (ed. G. G. Simpson, Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. G. L. Jepson & E. Mayr), pp. 365–389. Princeton: (1994). From sequences to shapes and back: a case study Princeton University Press.

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.22, on 02 Oct 2021 at 02:25:45, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672397002619