Exact Solution of the Eigen Model with General Fitness Functions and Degradation Rates
Total Page:16
File Type:pdf, Size:1020Kb
Exact solution of the Eigen model with general fitness functions and degradation rates David B. Saakian*† and Chin-Kun Hu*‡ *Institute of Physics, Academia Sinica, Nankang, Taipei 11529, Taiwan; and †Yerevan Physics Institute, Alikhanian Brothers Street 2, Yerevan 375036, Armenia Edited by H. Eugene Stanley, Boston University, Boston, MA, and approved January 18, 2006 (received for review June 13, 2005) We present an exact solution of Eigen’s quasispecies model with a the spins can take four values corresponding to the natural general degradation rate and fitness functions, including a square nucleotide types, but a two-value spin model already catches the root decrease of fitness with increasing Hamming distance from essential features and can be studied more easily. Two-value spin the wild type. The found behavior of the model with a degradation models also have been used to study long-range correlations in rate is analogous to a viral quasispecies under attack by the DNA sequences (17) and DNA unzipping (18, 19), and valuable immune system of the host. Our exact solutions also revise the results have been obtained. A generalization of our results for known results of neutral networks in quasispecies theory. To the four-value spin case is presented in the Supporting Text, which explain the existence of mutants with large Hamming distances is published as supporting information on the PNAS website. from the wild type, we propose three different modifications of However, such results include more cumbersome formula, and the Eigen model: mutation landscape, multiple adjacent mutations, from now on we will only consider the two-value spin model. and frequency-dependent fitness in which the steady-state solu- Let sj ϭϩ1 represent purines (R) and sj ϭϪ1 pyrimidines ϵ (i) (i) (i) tion shows a multicenter behavior. (Y). Type i is then specified by Si (s1 , s2 ,..., sN ). The model describes replicating molecules under control of variation quasispecies ͉ virus evolution ͉ error threshold and natural selection with Eq. 1, to be defined below. The rate coefficients of replication and mutation are assumed to be olecular models of biological evolution have attracted independent of the concentration of the types. The model BIOPHYSICS Mmuch attention in recent decades (1–15). Among them, describes the exponential growth phase of virus evolution, in Eigen’s concept of quasispecies plays a fundamental role (1, 2). which there are enough nutrients and low virus concentration. It describes the evolution of a population consisting of a wild The multistep cross-catalytic reactions are replaced by an auto- type accompanied by a large number of mutant types in se- catalytic one. Here the evolution picture is rather simple, quence space by a large system of ordinary differential equa- compared with the linear growth phase in the case of strong tions. The Eigen model has been found to describe quite well the saturation effects. evolution of viral populations (3) and has deeply changed our Selection is on a genotype level: fitness is a function of Si. The view of the process of evolution: adaptation does not wait for variation is assumed to be produced only by point mutations. better adapted mutants to arise but starts with the selection of Eigen made a deterministic approach with kinetic rate equations the better adapted mutants and then explores by mutation the that requires an infinite population, whereas classical population surrounding sequence space for even better mutants. When the genetics uses probabilistic equations. We denote the probability for the appearance of S at time t by p ϵ p (t) and define fitness mutation rate surpasses an error threshold, the population gets i Si i r S genetically unstable, and it could be shown that indeed virus i of i as the average number of offsprings produced per unit time and degradation rate D of S as an inverse mean longevity. populations can be driven to extinction when the error rate is i i The chosen r and D are functions in genome sequence space S , artificially raised beyond the error threshold. i i i i.e., r ϭ f(S ) and D ϭ D(S ). To describe the population precisely, we should know the i i i i The mutation matrix element Q is the probability that an fitness value of each type and the mutation rates to go from one ij offspring produced by state j changes to state i, and the evolution type to another. The experimental efforts to do so are immense. is given by the set of equations for 2N probabilities p (2, 6) During the last three decades, the model has been investigated i numerically as well as analytically for a simple fitness function. 2N 2N Although this sort of data reduction does allow a view on a large dpi ϭ ͓Q r Ϫ ␦ D ͔p Ϫ p ͩ͑r Ϫ D ͒p ͪ. [1] population, the fitness functions chosen are too simplistic to dt ij j ij j j i j j j ϭ ϭ explain realistic cases such as a population of RNA virus. In this j 1 j 1 work, we solve the system of differential equations exactly, ͚2N ϭ ϭ NϪd(i,j) Ϫ d(i,j) Here pi satisfies iϭ1 pi 1 and Qij q (1 q) with assuming uniform degradation rates and fitness functions in- q being the mean nucleotide incorporation fidelity, and d(i, j) ϵ cluding a square root decrease of fitness with increasing Ham- N (i) (j) (N Ϫ͚ϭ s s )͞2 being the HD between Si and Sj (1, 2); d(i, j) ming distance (HD) from the wild type. Our exact solutions also l 1 l l represents the total number of different spin values in Si and Sj. revise the known theoretical results of neutral networks in In ref. 1 the concept of an error threshold is introduced, and quasispecies theory (2). To explain biological systems more the error threshold has been quantified by a formula. For the realistically (16), we propose three different modifications of the calculation of the error threshold, the selection values and Eigen model: mutation landscape, multiple adjacent mutations, mutation rates of all types would be required, which is still not and frequency-dependent fitness in which the steady-state so- lution shows a multicenter behavior. Conflict of interest statement: No conflicts declared. Model This paper was submitted directly (Track II) to the PNAS office. Several excellent reviews (2–5) emphasize the merits of the Abbreviations: AS, antiselective; FM, ferromagnetic; HD, Hamming distance; PM, quasispecies model for the interpretation of virological studies. paramagnetic. Let us give a brief description of the quasispecies model as we ‡To whom correspondence should be sent at the * address. E-mail: [email protected]. use it in this work: a sequence type of length N is specified by a edu.tw. sequence of N spin values sk ϭϮ1, 1 Յ k Յ N (1, 2). In reality © 2006 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0504924103 PNAS ͉ March 28, 2006 ͉ vol. 103 ͉ no. 13 ͉ 4935–4939 Downloaded by guest on September 24, 2021 feasible. For data reduction, several fitness functions (land- obtained. Eigen introduced AeϪ␥ as a ‘‘selective value’’ of the scapes) have been considered (2). The simplest one has a single peak (equation II-31 in ref. 1). In case of several isolated peaks peak (2) in a flat landscape. Without loss of generality we set the the system chooses the one with the maximal selective value. peak to S1 ϵ (1, 1, . , 1), i.e., the state with all spin up, and have Case 2: Single Peak Fitness Function with Degradation. Consider nonzero degradation d (k) ϭ c ϩ ␣k, where the positive number ͑ ͒ ϭ Ͼ ͑ ͒ ϭ 0 f S1 A 1, and f Si 1 for Si S1, [2] ␣ Ͼ 0 is the degradation parameter, and the same fitness function as in case 1. Physically, d0(k) should be always positive for any for which Eigen error threshold formula (equation II-45 in ref. Ͼ ␣ 1) gives an exact result value of k and we take c . Because of symmetry of Eigen’s equations under the transformation Di 3 Di ϩ C with C being Ϫ␥ Ae Ͼ 1, [3] a constant, we can simply choose d0(k) ϭ ␣k, which will be used in the following discussion. Now we have the PM phase with k ϭ ␥ ϭ for successful selection (p. 180 in ref. 2). The parameter 0 and ln Z͞ ϭ 1 ϵ WPM, the FM phase with k ϭ 1 and N(1 Ϫ q) describes the mutation efficiency. When the inequality is satisfied, a mutant distribution is built around the peak ln Z ϭ Ϫ␥ Ϫ ␣ ϵ ͑ ͒ configuration in the steady state. Otherwise the distribution is  Ae WFM k . [7] flat in the infinite genome length limit, i.e., no sequence is preferred [error catastrophe (1, 2)]. Besides these phases, we also have the antiselective (AS) phase, We consider general fitness function f(Si) and degradation which can be located by finding the maximum of D(Si), which depend on the HD (the number of nucleotide differences between two sequences) from the peak configuration ln Z ͱ ϭ ͓ Ϫ␥͑1Ϫ 1Ϫk2͒ Ϫ ␣ ͔ ϵ ͑ ͒ e k WAS k , [8] S1. In statistical physics this case corresponds to mean-field-like  interaction, which is exactly solvable (20). We can write f(Si) and D(Si) as (6) in the interval Ϫ1 Յ k Յ 1. To find the maximum we should take the value of Eq.