The contribution of statistical physics to

Harold P. de Vladara,∗, Nicholas H. Bartona,b

aI.S.T. Austria. Klosterneuburg A-3400, Austria bInstitute of Evolutionary Biology, . Edinburgh, United Kingdom

Abstract Evolutionary biology shares many concepts with statistical physics: both deal with populations, whether of molecules or organisms, and both seek to simplify in very many dimensions. Often, methodologies have undergone parallel and independent development, as with stochastic methods in population . We discuss aspects of that have embraced methods from physics: amongst others, non-equilibrium statistical mechanics, travelling waves, and Monte-Carlo methods have been used to study polygenic evolution, rates of , and range expansions. These applications indicate that evolutionary biology can further benefit from interactions with other areas of statistical physics, for example, by following the distribution of paths taken by a population through time. Keywords: Population Genetics, Statistical Thermodynamics, Evolutionary dynamics, Haldane’s Principle, Selection, Drift, Diffusion Equation, Fitness Flux, Entropy, Information, Travelling Waves, Monte Carlo.

Parallel foundations of evolution and statistical physics

In the late 19th century, Boltzmann established the theoretical foundations of statistical me- chanics, in which the behaviour of ensembles of particles explains large-scale phenomena [1]. For example, the position and velocity of the particles in a gas can fluctuate between very many states (termed micro-states), but averages over all the configurations that give the same observ- able macroscopic states (temperature and pressure, say) [2]. A similar averaging over equivalent micro-states is made in both population and : we average over individual gene combinations to describe a population by its allele frequencies, and we can further average over all the allele frequencies that are consistent with a given mean and variance of a quanti- tative trait. In this sense, physicists and evolutionary biologists both model populations (a gas or a gene pool) rather than precise types (individual particles or genotypes). This “statistical” arXiv:1104.2854v1 [q-bio.PE] 14 Apr 2011 description in terms of a few variables, the macro-states, summarizes the many possible con- figurations of the micro-states (degrees of freedom), which cannot be accurately measured or described. Furthermore, the macro-states are then sufficient to predict other properties without reference to the micro-states. For example, thermodynamics describes macroscopic properties

∗Corresponding author Email addresses: [email protected] (Harold P. de Vladar), [email protected] (Nicholas H. Barton) Preprint submitted to Trends in Ecology and Evolution May 22, 2018 without referring to individual particles; similarly, quantitative genetics does not refer to allele frequencies to predict the trait mean in the next generation. Hence, evolutionary biology and statistical physics often use similar theoretical methodolo- gies, although studying very different phenomena. We argue that there are close analogies be- tween evolutionary genetics and statistical physics. Physical techniques had an early influence on molecular biology (Appendix A). But more specifically, non-equilibrium methods are based on the same theory of stochastic processes that is used in population genetics. Thus, some phys- ical theories promise further developments that can deepen our understanding of evolution in two ways: either by applying common mathematical techniques (e.g. diffusion equations, see Appendix B), or by developing precise analogies that incorporate new concepts (e.g. ensemble averaging, information, or entropy). These techniques are being applied in different aspects of evolutionary biology. This article focuses mainly in those that we consider most promising for population genetics. We aim to introduce to the reader to these methods by reviewing represen- tative examples in the literature.

The cost of selection, entropy and information

It is extraordinary that the selection of random has created complex organisms that appear exquisitely designed to fit their environment. Selection can be seen as taking information from the environment, and coding it into the DNA sequence [3]: thus, the gene pool contains information about those specific sequences that confer high fitness. This idea can be quantified using the concepts of entropy and information [4]. Entropy is a measure of the number of differ- ent states in which a population is likely to be found: thus, selection of one specific genotype, or genotype frequency, corresponds to minimal entropy. A asking how the genotype of an individ- ual, or a population, depends on the selection that they have experienced, can be quantified by an entropy that measures how strongly selection has clustered the population around a specific genotype. Haldane [5] showed that the number of selective deaths needed to fix an allele is independent of the selection pressure, and Kimura [6] pointed out that Haldane’s “cost of natural selection” is exactly the information gained by fixing a specific allele. This relation applies very generally to asexual populations [7] but fails with recombination (see below). The theory of quasispecies (a model of -selection balance), emphasizes that the reproductive rate (se- lection) limits the amount of information that can be maintained in the face of random mutations [8]. However, this constraint can be relaxed with recombination and [9]. Analogies with statistical physics help us to understand how selection accumulates information. We first consider infinite populations – evolving deterministically – and then the more general case where random drift in finite populations drives evolution.

Deterministic evolution and the role of recombination The information content of a single large population that is evolving deterministically is mea- P sured by the entropy, defined as S = x px log(px) where px is the frequency of alleles or geno- type x [5, 8]. This entropy reflects the information accumulated and maintained by evolution, and is closely related to Shannon’s information [3-4]. Remarkably, the replicator dynamics can P  d 2 be obtained by maximizing a different measure, Fisher information: F = x px dt log(px) , which measures divergence between two distributions [10, 11]. Fisher’s information is the “ac- 2 2 d celeration” of the entropy, i.e. d S/dt = F, where we interpret dt log(px) as the information contained about selection when we observe how the frequency of x has changed. For example, 2 for a beneficial allele under selection, this would be proportional to the selective value s. Nor- mally, we predict the change of the frequency distribution when we know s. Fisher’s information takes the parent and the offspring distributions as given, and measures the effect of selection from the difference between these two [11].

Sexual vs. asexual reproduction. It has long been understood that, when combined with trun- cation selection, sexual reproduction is much more efficient than asexual in fixing beneficial genotypes [12, 13, Ch. 2]. In the former case, the maximum information increases as n1/2 (n being the number of loci), while in the latter, it increases by only one unit per generation [3, 14]. The maximum number of loci that can be maintained despite the randomizing effect of mutation is 1/µ for asexuals [14], whilst for sexuals (with free recombination) it can be as high as 1/µ2, where µ is the mutation rate at each locus [14, 15]. (More loci would produce more mutants, and hence decrease the amount of information). According to Haldane’s principle [16] every delete- rious mutation must be eliminated by a failure to reproduce (a “selective death”). Therefore, the mutation load is independent of selection strength, and is half as great if selection eliminates two copies in a recessive homozygote at the same time. In haploids, redundancy leads to a similar gain in efficiency [15, 17].

Stochastic evolution: the diffusion of allele frequencies

The diffusion approximation shows how the distribution of allele frequencies at many loci changes through time in finite populations. In this case, selection, mutation and migration are modelled as deterministic factors, and introduces random fluctuations to populations within an ensemble (Appendix B). (Other treatments are possible, where mutations are regarded as carrying random changes to individuals within a single population.) In other fields, constant diffusion coefficients have been widely used, leading to simple Gaussian solutions (Appendix B). However, Gaussian solutions are not appropriate for population genetics, because allele frequen- cies range between zero and one, and sometimes cluster near fixation, in a bimodal distribution. After its introduction by Fisher [18], Kolmogorov applied the more general diffusion method to the neutral island model [19], which Wright [20] had already solved by different means. Kimura relied on the diffusion approximation to model the evolution of finite populations [21], and for his neutral theory of molecular evolution [22]. The diffusion approximation, central to both population genetics and statistical physics, pro- vides a way to model many factors in a mathematically tractable way. Crucially, it approximates a wide variety of more detailed models. Mathematically, it is equivalent to the coalescent pro- cess that describes the evolution of samples from a population, and to path ensemble methods that describe the distribution of population histories (see below). In physics, diffusion equa- tions describe non-equilibrium processes and are hard to relate to quantities like temperature, entropy, or free energy, which are well-defined only in thermodynamic equilibrium through the Boltzmann distribution. Wright showed that selection, mutation and drift give an explicit distribution, proportional to W¯ 2N , where W¯ is the mean fitness of a population of size N [20]. This is closely analogous to the   Boltzmann distribution ∼ e−E/kT [1, 2], with log(W¯ ) corresponding to (negative) energy, −E, and 1/2N to the temperature, kT (Appendix B). This result was the basis for Wright’s metaphor of an adaptive landscape: a surface of mean fitness laid over the multidimensional space of allele frequencies [23] (Appendix B and Fig. 1). 3 Jumps between adaptive peaks. When the stationary distribution is clustered around alter- native peaks in the adaptive landscape, the rate at which random drift causes shifts between these states is approximated by a general formula that is proportional to the probability of being at the saddle point (adaptive valley) that separates them, and to the leading eigenvalue that describes the instability at that point [24, 25]. Wright [25] worked out transition rates for chromosome rear- rangements, ideas rigorously formulated later using diffusions [26]. Rouhani and Barton[24, 27] found the rate of peak shifts in a spatially structured population, borrowing from an identical analysis of transitions between alternative vacuum states.

Traveling waves. The distribution of a quantitative trait, or of fitness itself, can be seen as a traveling wave that travels at a steady rate as the population adapts, either in actual or in pheno- typic space. Most analyses have been of asexuals, which increase their fitness by accumulation of favourable mutations, or decline under Muller’s ratchet [28, 29, 30]. Beneficial mutations increase in frequency independently at the wave front, where frequencies are low and subject to drift, but the rest of the wave follows deterministically [31]. The wave thus moves at a velocity proportional to the mutation rate, and which depends logarithmically on the population size be- cause of strong random drift at the leading edge [28]. This approach has been extended to low rates of recombination [32, 33]. With sexual reproduction, random drift has much less effect, and the population adapts much more quickly [30, 34]. However, when there is a very high rate of substitution and recombination, Hill-Robertson interference limits adaptation rate [35].

Spatial evolution and range expansions. Fisher introduced a simple non-linear diffusion equa- tion describing the spread of a beneficial mutation through space [36]. Though motivated by an evolutionary problem, this model raised interest among physicists and mathematicians (estab- lishing a sub-discipline studying the Fisher-KPP model –for Kolmogorov-Piskounov-Petrovski, co-discoverers of the model [37]). Travelling waves explain the decreased genetic diversity that arises from hitchhiking at the leading edge [38, 39]. This approach also provides a practical way to measure selection coefficients [40], and perhaps, a means to distinguish fixation due to selective sweeps from simple drift [41, 42].

Statistical mechanics and the quantitative genetics of finite populations

Although the diffusion equation provides an exact description of evolution, the joint distri- bution at many loci is hard to grasp. Statistical mechanics simplifies the problem by following just a few variables that summarize all the allele frequencies (or in physics, the particles’ states). These map the fitness landscapes for allele frequency onto a simpler one for quantitative traits [43], which are analogous to macroscopic quantities in statistical physics (Appendix B, Fig. 2)[44].

Maximization of entropy. This reduction in dimensionality requires a way to account for the degrees of freedom lost in averaging over the underlying genetic states. This can be achieved by applying the principle of entropy maximization: we assume that the unknown micro-states follow a distribution that maximizes their entropy, S , given the values of macroscopic quantities [45]. Entropy can be defined in several ways. The definition appropriate here is analogous to the above, but extends to the case when p (the vector of allele frequencies at each locus) is R the random variable: S = − ψ log[ψ/ϕ]dp. This defines the dispersion of the distribution of allele frequencies, ψ, relative to a base distribution, ϕ: it is maximized when the distribution 4 is selectively neutral (ψ = ϕ) and decreases as the distribution becomes more tightly clustered around states that are a priori improbable [46, 47, 48]. If S is maximized whilst constraining the R expectations of some macroscopic variables,hAii = Aiψdp, we obtain a distribution of allele −1 P frequencies ψ = Z ϕ exp[2N i αiAi] [46], where Z normalizes the distribution and N is the population size. Remarkably, this distribution corresponds exactly to the stationary solution of the diffusion equation (Appendix B), when the A’s are chosen according to the particular mode of selection (quantitative traits, genetic variance, etc.) and heterozygosity, and are conjugated with the α’s , which are the selection coefficients, mutation rates, etc. [46, 47, 48]. This analogy between statistical mechanics and evolution of a finite population has yielded several results, of which we will mention a few. The dynamics of polygenic evolution can be approximated by a quasi-equilibrium assumption, that is, that the transient distribution of allele frequencies behaves as if the entropy is maximized at all times, given the current values of macroscopic variables. In this way, the change through time of quantitative characters – including their genetic variance – can be computed for popula- tions affected by mutation, selection and drift, for an arbitrary number of loci [46, 49]. In physics, macroscopic systems often change far more slowly than the microscopic fluctuations, justifying this approximation. In biology, we do not have such a stark separation. But nevertheless, the approximation is remarkably accurate even when the environment changes abruptly [46, 47, 49]; traveling waves may provide an explanation [31]

Adaptive landscapes and detailed balance. Wright’s formula for the stationary distribution [20] requires detailed balance [50]. Population geneticists have shown that detailed balance is generally violated when there are more than two alleles at a locus [20], when recombination or migration are comparable with the strength of selection, or under frequency-dependent selection [51]. Without detailed balance, the dynamics cannot be represented by an adaptive landscape, and can mathematically intractable (though see [52]). Phylogenetic analysis reveals deviations from detailed balance – for example, when genomic GC content changes over time [53]. So, we need methods for analyzing populations that are in a stationary state that violates detailed balance, or that are not at a statistical equilibrium at all.

Path ensembles. An alternative method that holds without detailed balance is thepath ensem- ble [24, 54]. Instead of describing the distribution of allele frequencies at any single time, we follow the distribution of paths of allele frequencies between two states at different time-points (Fig. 3). The probability of any path can be written down in a simple form, and the chance of a transition from one state to another obtained (in principle) by integrating over paths (Appendix B). The trajectories are weighted with respect to an optimal one, through three terms: Fisher’s in- formation, the variance in fitness, and the fitness flux, φ (Appendix C) [54]. The latter measures dp the net amount of adaptation given a population’s history. It is defined as φ = s dt , where s is the selective coefficient of the beneficial allele; φ is the increase in mean fitness that is expected from changes in allele frequency –but without allowing for changes in selection. The fitness flux is distinct from the change in mean fitness, which in general is not well-defined when selection changes through time. Fitness flux includes changes in allele frequencies due to all evolutionary processes, and to the extent that these interfere with selection, can be negative. In considering the history of a population, the path ensemble methods give an understanding of the adaptation and evolution of complex traits that accounts for historical contingencies, an advantage over models that only consider a population’s state at a given time. 5 Evolutionary biology and Monte-Carlo methods

Monte Carlo methods are now widely used in statistical inference. When many variables are involved it is not feasible to explore the whole space of possible states (e.g. all possible phyloge- netic trees amongst multiple species). A group working on nuclear weapon development at Los Alamos introduced a simple but widely used algorithm [55]. One simply makes a random change to the microscopic variables, accepting it if it increases some measure, L (for example, mean fit- ness). Changes that decrease L to L∗, are accepted with probability L∗/L. This ensures that the microscopic variables will follow a distribution proportional to the stationary distribution of the diffusion equation, which would in turn, be determined solely by the random changes, multiplied by L (Fig. 1). This Metropolis algorithm has been developed in a statistical context [56], and applied to generate likelihood surfaces for statistical inference [57, 58, 59]. Intriguingly, this algorithm uses a simple form of selection to generate a distribution equal to the product of a neutral base distribution, and the measure L – just as selection and random drift lead to Wright’s distribution under the diffusion approximation (see Appendix B). Both rely on detailed balance, but a path ensemble approach allows extension to more general cases[24].

Obstacles to overcome

Toy models and method-oriented analyses. Over the last decade, physicists have shown strong interest in evolution. For example, in the last five years, over 2000 publications on evolution appeared in physics journals (chiefly Physical Review journals, Physica A, and PNAS). Unfortu- nately, most of these works pay little attention to the fundamental biology, because the motivation is often the specific methods rather than the biological questions. Consequently, many of these contributions remain unconnected to the rest of the evolutionary theory; for the most part, there is very little communication between the disciplines. Two examples follow. In the Bak-Sneppen model [60], populations evolve by removing the least fit individual together with two unrelated neighbours, and replacing them by three new individuals with random fitness. A “critical value” is reached, but with repeated periods where the fitness distribution spreads, and then re-organizes to the critical value. The Bak-Sneppen model attempted to explain the distributions of extinc- tion episodes [61], and patterns of experimental evolution [62], but had little impact in biology because it lacks any mechanistic basis. Notably, only 13 of 700 citations of the Bak-Sneppen model [60] were by non-physicists. Second, in the Penna bit-string model of ageing [63], the position in the genome of an allele dictates the age at which its detrimental effect is expressed. A threshold for the total number of such deleterious mutations is set arbitrarily, and the population evolves under mutation and competition. Senescence arises because selection is less effective in late life –a phenomenon already well-understood from Hamilton’s general analysis [64]. Here, out of roughly 230 to ref. [63], only 5 did not include physicists. These two approaches, and others alike, are not taken seriously since they rest on ”toy models” that are not connected with biological reality.

Two problems that restrict communication between disciplines. First, the language and nomenclature employed by physicists are often not consistent with basic concepts in genetics: they employ terms such as energy, spin glass, magnetization, Ising chain, etc. where they should use mean fitness, polygenes, directional selection, or polygenic trait [65, 66, 47, 67]. Standard population genetics notation is largely ignored, making even the most basic equations appear 6 unfamiliar. To take a central example, the diffusion equation includes deterministic and stochas- tic “forces”. In evolution, the stochastic part models genetic drift. However, the term “drift” is used in physics to refer to the deterministic part! Different nomenclatures make it difficult for physicists to address important biological questions, and for biologists to understand the ques- tions posed by physicists. This is amplified when new ideas are introduced. For example, in an explanation of the advantages of sex, the idea of mixability was introduced [68]: i.e. sex favours alleles that are fit across different genetic backgrounds. A recently proposed measure of mixabil- ity” [68] is identical to Fisher’s analysis of variance, which was devised precisely as a measure of epistasis [69]. Take another example: a statistical mechanics approach was used to find the distributions of contributions made by individual ancestors to future generations. This defined the statistical “weight” of each individual’s contribution in a lineage [70], which, in biological terms, is just the reproductive value of an individual – again, a concept introduced by Fisher [71]. Second, known results are often rediscovered due to the lack of a common language. For example, the original result that free fitness increases in evolution was illustrated with several examples from population and quantitative genetics, and was interpreted in terms of selection and drift [48]. Yet, the same principle was twice rediscovered by physicists decades later but with more restricted scope [50] Another example is the NK model, where the fitness landscape can be“tuned”, altering the degree of epistasis for fitness, was used to show that recombination is an evolvable trait [72]. Yet, the theoretical analysis of the evolution of sex and recombination has been a thriving field since the 1970’s [73]. No doubt population geneticists have re-derived results well known in physics (e.g. Wright’s calculation of rates of shift between adaptive peaks), but these are not usually published as new physics, and are typically studied for their biological implications. Nevertheless, physicists have also had a serious commitment to subjects mean- ingful to evolution. Significant works include those discussed in this article, clonal interference in asexuals [29, 32, 33, 38], an application of percolation theory to [74], extending Haldane’s principle to a multilocus trait with partial dominance, epistasis and sexual reproduc- tion [65, 66], and ecological explanations of replicator dynamics [75, 76]. All these are aimed directly at a biological audience, published in appropriate journals. Generally, physicists often have a sharp intuition about their models, which greatly helps in finding solutions. Statistical physics is based on universal physical laws. In contrast, biological concepts are relative, plastic, or even arbitrary (e.g. mean fitness, traits). Hence the analogies with statistical- mechanical models are limited, depending on the nature of epistasis, physical linkage of the genes, unpredictable fluctuating selection, etc. Moreover, there are different ways in which precise analogies can be drawn, limiting their scope: some factors act deterministically (e.g. selection) and other stochastically (mutations or drift).

Conclusions

Many of the fundamental processes of both population genetics and statistical physics are described by diffusion. In evolution, it provides a common framework for features such as the change in allele frequencies [20, 77], genealogies [78], and spatial dispersal [36]. All these, and others, can benefit from methods of non-equilibrium statistical mechanics, which is a major and active field in physics. The concept of a path ensemble is especially useful, shifting the paradigm from tracking fre- quencies at each point in time, to considering selection over the whole history of the alleles [78]. This can be applied to both, deterministic [79] and stochastic evolution [24]. In turn, 7 long-standing questions about the efficiency of natural selection in building complex phenotypes [3, 5, 6], and evolution under fluctuating selection, can be re-addressed. Of course, we can ask whether the mathematical paraphernalia that we advocate is of any practical use. Although we should not take mathematical models too literally, they are useful both for generating hypotheses about evolution, and for making sense of ecological and genetic data. Most notably, the neutral theory provides the conceptual framework for analyses of sequence data [22], and quantitative genetics predicts the effects of selection on complex traits [80]. Ideas from statistical mechanics may help by providing new ways to describe the evolution of complex traits, and by suggesting constraints on the efficacy of selection. A clearer understanding of concepts such as fitness flux and entropy suggest new ways to think about the evolution of quantitative traits. To understand adaptation, we need to contemplate not only the current state of populations, but also their history. This is of course an old idea, but the rationale that we review, suggest new ways to understand the process of adaptation in a historical and quantitative way.

Acknowledgments. We would like to thank J.P. Bollback, R. Cipriani, J. Hermisson, J. Pole- chova, and D. Weissman for their comments and observations. This research was funded by the ERC-2009-AdG Grant for project 250152 SELECTIONINFORMATION.

8 Glossary Boltzmann distribution A probability measure of the microscopic states of a physical system that is composed of classical (i.e. not quantum) particles in thermodynamic equilibrium. This distribution has a density proportional to the factor exp(−E/kT), where E is the energy of a state, k is Boltzmann’s constant, and T is the absolute temperature.

Detailed balance An equilibrium where the probability flux of the transitions between any two states is equal in either direction. In population genetics this implies that the numbers of adaptive and deleterious substitutions have to be equal on average.

Entropy A measure of the number of possible configurations of a system. The classical measure of entropy is due to Boltzmann: S = −k log Ω, where Ω is the number (or density) of microscopic states (e.g. allele frequencies) that a system can realize for a given macroscopic state (mean fitness, a quantitative variable, etc.) and k is Boltzmann’s constant. Relative R entropy is defined as S = − ψ log(ψ/ϕ)dp, where the p are the microscopic states, and the sum goes over all possible realizations; ψ is the distribution of micro-states, and ϕ is a base or reference distribution (satisfying ϕ = 2NVδp). However, when ϕ = const. we have Shannon’s entropy, which is the form used in statistical physics. Entropy is also equivalent to the log-likelihood of ϕ (the proposed distribution), and ψ is the sampling probability of the actual distribution.

Fisher’s information A measure of how much an infinitesimal change in an unknown parameter θ affects the likelihood ψ of an observed data set, p. Fisher’s information is defined as R  ∂ 2 F = ψ(p; θ) ∂θ log[ψ(p; θ)] dp. When the parameter θ is time, Fisher’s information describes the amount of information gained through selection. Fitness flux A measure of adaptation defined as φ(t) = s(p, t)dp/dt, where s is the selection coefficient (fitness gradient) and p is the allelic frequency. Geometrically, it is the strength of fitness change (since s is the gradient of fitness, W), along the direction of evolution R (given by dp/dt). The cumulative fitness flux, Φ = φdt, is a measure of the total amount of adaptation through the history of a population. Free fitness The expected gain in log-mean fitness after selection; after an analogy with the free energy of a physical system, that is the amount of work that can be done in a thermodynamic system. Free fitness (I) emerges naturally when computing the gain in entropy S after an allele or a trait underwent selection [48], and has an equivalent expression to free energy, i.e. I = hlog(W¯ )i − S/2N (in physics hlog(W¯ )i should be replaced by hEi , and 2N by 1/kT; see entry for Boltzmann distribution).

Hill-Robertson interference Interference in the selective sweep of an allele, due to the selec- tive effects at another linked loci. Hill-Robertson interference implies that in the presence of recombination, genotypes with multiple mutations arise easier by recombining existing single mutations than by multiple mutation events.

Path ensemble A formalism of non-equilibrium statistical mechanics and quantum mechanics where the description of the system emphasizes not the states of a population of entities, but rather the distribution of possible stochastic paths that such a population can follow. 9 Quasispecie Population of replicators (typically asexual) with a high genotypic variability main- tained by elevated mutation rates.

Replicator dynamics Dynamical equations that describe the change in time the frequency p of the different types (in particular genotypes). It has the general form dp/dt = p∆W + T , where ∆W is the difference between the fitness of the type and the mean fitness, and T are the “transmission” terms, that may involve mutation, migration, recombination, etc.

Selective death Failure to survive or reproduce due to differences in genotype. Stationary distribution A probability distribution that does not change in time. This is found from the diffusion equation by setting ∂ψ/∂t = 0 , and solving the resulting differential equation that is independent of time. A stationary solution might not exist (e.g. if selection is changing in time in particular ways), and if it exists, it might require detailed balance. Statistical mechanics A mathematical framework explaining the relationship between the macroscopic properties of a system, in terms of the dynamics of the microscopic variables. At equilibrium, it leads to the classical concepts of entropy, free energy, and temperature, for example. Out of equilibrium, these quantities cannot be defined formally, and current research focuses in finding probabilistic measures that apply in general, but are still based on the microscopic dynamics. Based principally on the properties of stochastic processes (e.g. the diffusion equations, or path ensembles), these measures can be applied to the distribution of allele frequencies (e.g. Fisher’s information and fitness flux).

Traveling waves Solutions to non-linear differential equations characterized by functions that are of stable shape, and move at a certain velocity either in physical space, or in genetic space. (Traveling waves are also known as solitons in the physics and mathematics litera- ture.) Truncation selection Scheme where individuals that have traits outside a prescribed range are eliminated. This type of selection is popular in artificial selection.

10 Appendix A. Evolution and the material basis of heredity

Early in the last century, Fisher embarked on the mathematical formalization of the Mendelian principles of heredity, following the earlier development of biometrics by Galton, Pearson and Weldon, all with the aim of quantifying evolution by natural selection. Fisher used the diffusion approximation (see Appendix B) to describe the evolution of allele frequencies [18]. In 1929, he introduced the Fundamental Theorem of Natural Selection [71]; comparing it to the second law of thermodynamics, the increase of entropy, he intended the theorem to be an exact result, a “bi- ological law” [81, 82]. Although his comparison with the second law is flawed, it shows how the quantitative approach to heredity was influenced by statistical thermodynamics (indeed, Fisher had studied with E.T. Jeans, a physicist). That mechanistic basis of evolution, population ge- netics, was formulated without knowledge of the physical nature of the Mendelian genes, which was still unknown in the 1930’s: the structure of DNA was not established until 1953. In the following decades, Delbruck,¨ formerly an astrophysicist, started a collaboration employing ion- izing radiation on Drosophila to understand the physical nature of the genes, as a working system to try to identify fundamental physical laws that would account for living and non-living matter [83]. Later, the ingenious Luria-Delbruck¨ experiment proved the basis of Darwinian evolution. They performed a statistical comparison between the number of bacteria developing resistance to lysogenic viruses and its expected distribution, which was derived from a mathematical analysis, an unusual quantitative approach for the biologists of the time [84]. Soon after, the quantum theorist Schrodinger¨ published What is life? [85], posing fundamental biological questions in physicists’ language – partly based on Delbruck’s¨ discoveries. This book gave strong motivation to the first molecular biologists (among them Perutz, Wilkins, Crick and Watson) to find how DNA transmitted the heritable information to future generations [86]. Molecular biology was influenced in large part by the use of physical techniques such as X-ray crystallography to deter- mine biological structures. Evolution, however, while resting on that material basis of DNA, is not explained by it. Indeed, the population genetic framework that we use today was developed prior to the discovery of the structure of DNA, and was not changed by the establishment of molecular biology, since it rests only on Mendels laws. However, the theoretical methods that are common to statistical physics and to evolutionary biology give a deeper understanding of the evolutionary consequences of heredity.

Appendix B. The Diffusion Equation

The diffusion equation originated in Bachelier’s models of fluctuations in share prices in 1900, and was rediscovered 73 years later as the Black-Scholes formula, disastrously popular amongst economists. Diffusion theory in economics is equivalent to the theory of Brownian motion, de- vised by Einstein to explain random molecular collisions, and soon after extended in physical applications [87]. Fisher [18] compared Mendelian genetics to “the theory of gases” and intro- duced the diffusion methods for the allele frequencies. Kolmogorov [88] gave a more formal approach to selection and drift. Kimura [89] later extended this formalism to non-equilibrium cases. For population genetics, the diffusion equation is a rather convenient representation of evolution of finite populations where genetic drift is present. We could choose to model the change in allele frequencies directly, in what is known a Wright-Fisher process. But genetic drift evolves stochastically, making the outcomes of evolution unpredictable. The diffusion equation gauges these outcomes in a probabilistic way, describing the distribution of allele frequencies at each time. (A third way to describe an evolving populations is to use the whole history as a 11 random variable instead of the allele frequencies at a time, in what is known as a path ensemble; see Appendix C). In short, the diffusion equation is a partial differential equation describing the change in time of the probability density ψ of the allele frequencies p, namely

∂ψ ∂ 1 ∂2 = − (M ψ) + (V ψ) , ∂t ∂p δp 2 ∂p2 δp where Mδp are the deterministic factors, due to selection, mutation, migration, etc. and Vδp is the variance of the fluctuations by drift usually of the form p(1 − p)/2N. Making the left-hand side equal to zero, leads to the stationary solution derived by Wright by other means [16]:

ψ = CW¯ 2N p(1 − p)4Nµ−1 .

For details of the derivation see [89]. In particular, the term W¯ defines the “fitness landscape” which can be thought as a surface in the space of allele frequency (Fig. 1), or in the quantitative variables (Fig. 2). The diffusion equation, the coalescent process, and the path ensemble all describe the same process and are mathematically equivalent. Each has different advantages and limitations; whereas a stochastic differential equation, the diffusion equation and the path ensemble do not require detailed balance, the stationary distribution above, does. Yet, this solution is exact, quite general and relatively simple.

Appendix C. Path ensembles and fitness flux

A path ensemble considers all the possible histories of a population between two fixed states p0 and pT at times 0 and T. In this description, each history is the variable be- ing described. The probability of a particular trajectory ρ(t) is proportional to the factor   R  dp 2 dt exp −N dt − Mδp p(1−p) , where the allele frequencies p are evaluated at each point of the history ρ(t). Here, Mδp is the same factor in the diffusion equation –suggesting the connec- tion between the two methods. The path integral can be understood as a sum if the history is sampled at discrete times, ρ = {ρ0, ρ1, . . . , ρT }. Notice that because the integral is always pos- itive, if it achieves a minimum for a given history, then that history has the highest probability. To understand the meaning of the integral we may develop the binomial expression inside the integral into three terms: F − 2φ + ν, and consider the case of selection, Mδp = p(1 − p)s ; F is Mδp dp dp Fisher’s information, φ = p(1−p) dt = s dt is the fitness flux, and ν is the additive genetic variance in fitness. Thus the histories occur as a compromise between minimizing Fisher’s information and genetic variance –both regarded as measures of the speed of adaptation– and maximizing the fitness flux. Fitness flux is a measure of adaptation of beneficial alleles [54] the cumulative flux  R  Φ = φdt of a population history is the equivalent measure to the fitness of a population (if we think of successive substitutions, it is the total of all the selection coefficients associated with each substitution). The expectation of cumulative fitness flux is necessarily greater than the re- duction in entropy between the initial and final equilibrium states (which can be understood as the information gained by the population) [49]: 2NhΦi ≥ −∆S . That is, it takes a certain amount of selection (measured precisely by the fitness flux) to move the allele frequency distribution away from its neutral state (as measured by the decrease in entropy). This result is quite generally valid, and is not restricted to, say, constant selection. Moreover, if selection changes slowly so that the 12 distribution stays close to the stationary state, then 2NhΦi = −∆S ; such changes are termed “re- versible”. For example, assume that the allele frequencies initially follow a neutral distribution (mutation-drift balance). Suddenly, directional selection is applied so that W¯ = exp(sp), and loci move toward a new distribution under selection and drift. The fitness flux is then substantially greater than the decrease in entropy (Fig. 3). If, on the other hand, selection were increased very slowly, eventually to reach the same strength, the net fitness flux would necessarily be much smaller, and equal to the decrease in entropy (lower curve in Fig. 3). The fitness flux method is surprisingly general. However, its relation with quantities that might actually constrain the extent of selection. In particular, the additive variance in fitness is proportional to s2 p(1 − p) ; we see that the additive genetic variance in fitness is just twice the fitness flux, when that includes only the change in allele frequency due to selection, ∆s p = sp(1 − p). Further understanding could emerge relating the decrease in entropy due to selection to the additive genetic variance in fitness [90]. Last, it is relevant that fitness flux presents an extension of Fisher’s Fundamental Theorem of Natural selection [71]: it considers not only the change due to selection, but also the effects of drift, and unlike Fisher’s theorem, the fitness flux theorem holds also for weak selection (Ns ∼ 1) [54].

13 References

[1] L. Boltzmann, Vorlesungen uber¨ Gastheorie, J.A. Barth, 1896. [2] H. B. Callen, Thermodynamics and an introduction to thermostatics, John Wiley & Sons, 1985. [3] J. Maynard-Smith, The concept of information in biology, Phil. Sci. 67 (2000) 177–194. [4] C. E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948) 379–423. [5] J. B. S. Haldane, The cost of natural selection, Genetics 55 (1957) 511–524. [6] M. Kimura, Natural selection as the process of accumulating genetic information in adaptive evolution, Genet. Res. 2 (1) (1961) 127–140. [7] R. P. Worden, A speed limit for evolution, J. Theor. Biol. 176 (1995) 137–152. [8] M. Eigen, P. Schuster, The hypercycle: A principle of natural self-organization. Part A: Emergence of the hypercy- cle, Naturwissenschaften 64 (11) (1977) 541–565. [9] M. Kimura, T. Maruyama, Mutational load with epistatic gene interactions in fitness, Genetics 54 (6) (1966) 1337– &. [10] B. Frieden, A. Plastino, B. Soffer, Population genetics from an information perspective, J. Theor. Biol. 208 (1) (2001) 49–64. [11] S. Frank, Natural selection maximizes fisher information, J. Evol. Biol 22 (2) (2009) 231. [12] J. Crow, M. Kimura, The theory of genetic loads, Proc. 11th Intern. Congr. Genet 3 (1964) 495–506. [13] W. J. Ewens, Mathematical Population Genetics, Springer - Verlag, Berlin, Germany, 1979. [14] D. J. C. Mackay, Information theory, inference, and learning algorithms, Cambridge University Press, 2003. [15] J. Peck, D. Waxman, Is life impossible? Information, sex and the origin of complex organism, Evolution 64 (11) (2010) 3300–3309. [16] J. B. S. Haldane, The effect of variation in fitness, Am. Nat. 72 (1937) 337–349. [17] C. J. C. H. Watkins, The channel capacity of evolution: ultimate limits on the amount of information maintainable in the genome, Proc. 3rd Intern. Conf. Bioinf. Genome Reg. Struct. 2 (2002) 58–60. [18] R. A. Fisher, On the dominance ratio, Proc. Roy. S. Edinb. 42 (1922) 321–341. [19] A. N. Kolmogorov, Deviations from Hardy’s formula in partial isolation, C.R. Acad Sci. U.R.S.S. 8 (1935) 129– 132. [20] S. Wright, Evolution in Mendelian populations, Genetics 16 (1931) 97–159. [21] M. Kimura, Stochastic processes and distribution of gene frequencies under natural selection, Cold Spring Harb. Symp. Quan.t Biol. 20 (1955) 33–53. [22] M. Kimura, The neutral theory of molecular Evolution, Cambridge Univ. Press, Cambridge, UK, 1985. [23] S. Wright, Surfaces of selective value revisited, Am. Nat. 131 (1) (1988) 115–123. [24] N. H. Barton, S. Rouhani, The frequency of shifts between alternative equilibria, J. Theor. Biol. 125 (4) (1987) 397–418. [25] S. Wright, On the probability of fixation of reciprocal translocations, Am. Nat. 75 (1941) 513–522. [26] R. Lande, The fixation of chromosomal rearrangements in a subdivided population with local extinction and colo- nization, Heredity 54 (1985) 323–332. [27] S. Rouhani, N. H. Barton, Speciation and the shifting balance in a continuous population, Theor. Popul. Biol. 31 (1987) 465–492. [28] C. O. Wilke, The speed of adaptation in large asexual populations, Genetics 167 (2004) 2045–2053. [29] I. M. Rouzine, E.´ Brunet, C. Wilke, The traveling-wave approach to asexual evolution: Muller’s ratchet and speed of adaptation, Theor. Pop. Biol. 73 (2007) 24–46. [30] R. Burger,¨ Evolution of genetic variability and the advantage of sex and recombination in changing environments, Genetics 153 (1999) 1055–1069. [31] O. Hallatschek, The noisy edge of traveling waves, P. Natl. Acad. Sci. USA 108 (5) (2011) 1783–1787. [32] I. M. Rouzine, J. M. Coffin, Multi-site adaptation in the presence of infrequent recombination, Theor. Popul. Biol. 77 (467-481). [33] R. A. Neher, B. I. Shraiman, D. S. Fisher, Rate of adaptation in large sexual populations, Genetics 181 (2010) 467–481. [34] J. R. Peck, D. Waxman, Sex and adaptation in a changing environment, Genetics 153 (1999) 1041–1053. [35] N. H. Barton, Why sex and recombination?, Cold Spring Harb. Symp. Quant. Biol. 74 (2009) 158–170. [36] R. Fisher, The wave of advance of advantageous genes, Ann. Eugen. 7 (1937) 355–369. [37] A. N. Kolmogorov, I. Petrovskii, N. Piscounov, A study of the diffusion equation with increase in the amount of substance, and its application to a biological problem, in: V. M. Tikhomirov (Ed.), Selected Works of A. N. Kolmogorov I, Kluwer, 1991, pp. , 248–270. [38] O. Hallatschek, D. R. Nelson, Gene surfing in expanding populations, Theor. Pop. Biol. 73 (2008) 158–170. [39] P. Ralph, G. Coop, Parallel adaptation: one or any waves of advance of an advantageous allele?, Genetics 186 (2010) 647–668. 14 [40] G. J. Bauer, J. S. McCaskill, H. Otten, Traveling waves of in vitro evolving RNA, Proc. Natl. Acad. Sci. U.S.A. 20 (1989) 7937–7941. [41] M. Korolev, K. S amd Avlund, O. Hallatschek, D. R. Nelson, Genetic demixing and evolution in linear stepping stone models, Rev. Mod. Phys. 82 (2010) 1691–1718. [42] O. Hallatschek, D. R. Nelson, Life at the front of an expanding population, Evolution 64 (2010) 193–206. [43] R. Lande, Natural selection and random genetic drift in phenotypic evolution, Evolution 30 (1976) 314–334. [44] M. Rattray, J. Shapiro, Cumulant dynamics of a population under multiplicative selection, mutation, and drift, Theor. Pop. Biol. 60 (2001) 17–32. [45] A. Prugel-Bennett,¨ Modelling evolving populations, J. Theor. Biol. 185 (1) (1997) 81–95. [46] N. H. Barton, H. P. de Vladar, Statistical mechanics and the evolution of polygenic quantitative traits, Genetics 181 (3) (2009) 997–1011. [47] N. H. Barton, J. Coe, On the application of statistical physics to evolutionary biology, J. Theor. Biol. 259 (2009) 317–324. [48] Y. Iwasa, Free fitness that always increases in evolution, J. Theor. Biol. 135 (1988) 265–281. [49] H. P. de Vladar, N. H. Barton, The statistical mechanics of a polygenic character under stabilizing selection, muta- tion and drift, J. Roy. Soc. Interface 8 (58) (2011) 720–739. [50] G. Sella, A. Hirsh, The application of statistical physics to evolutionary biology, P. Natl. Acad. Sci. USA 102 (27) (2005) 9541–9546. [51] C. Taylor, Y. Iwasa, M. A. Nowak, A symmetry of fixation times in evoultionary dynamics, J. Theor. Biol. 243 (2) (2006) 245–251. [52] P. Ao, Emerging of stochastic dynamical equalities and steady state thermodynamics from darwinian dynamics, Commun. Theor. Phys. 49 (5) (2008) 1073–1090. [53] N. Galtier, G. Piganeau, D. Mouchiroud, L. Duret, GC-content evolution in mammalian genomes: The biased gene conversion hypothesis, Genetics 159 (2) (2001) 907–911. [54] V. Mustonen, M. Lassig, Fitness flux and ubiquity of adaptive evolution, P. Natl. Acad. Sci. USA 107 (9) (2010) 4248–4253. [55] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21 (6) (1953) 1087. [56] W. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57 (1) (1970) 97. [57] J. Szymura, N. H. Barton, Genetic-analysis of a hybrid zone between the fire-bellied toads, Bombina bombina and Bombina variegata, near Cracow in southern Poland, Evolution 40 (1986) 1141–1159. [58] C. J. Geyer, E. A. Thompson, Constrained monte carlo maximum likelihood for dependent data (with discussion)., J. Roy. Statist. Soc. B 54 (1992) 657–699. [59] M. A. Beaumont, Approximate Bayesian computation in evolution and ecology, Ann. Rev.Ecol. Evol. Syst. 41 (2010) 379–406. [60] P. Bak, K. Sneppen, Punctuated equilibrium and criticality in a simple model of evolution, Phys. Rev. Lett. 71 (1993) 4083–4086. [61] K. Sneppen, P. Bak, H. Flyvbjerg, M. Jensen, Evolution as a self-organized critical phenomenon, P. Natl. Acad. Sci. USA 92 (11) (1995) 5209–5213. [62] S. F. Elena, R. Sanjuan,´ RNA viruses as complex adaptive systems, Biosystems 81 (2005) 31–41. [63] T. J. P. Penna, A bit-string model for biological aging, J. Stat. Phys. 78 (1995) 629–1633. [64] W. Hamilton, Moulding of senescence by natural selection, J. Theor. Biol. 12 (1966) 12–45. [65] E. Baake, M. Baake, H. Wagner, Ising quantum chain is equivalent to a model of biological evolution, Phys. Rev. Lett. 78 (1997) 559–562. [66] E. Baake, H. Wagner, Mutation-selection models solved exactly with methods of statistical mechanics, Genet. Res. 78 (1) (2001) 93–117. [67] J. Hermisson, O. Redner, H. Wagner, E. Baake, Mutation-selection balance: ancestry, load and maximum principle., Theor. Pop. Biol. 62 (2002) 9–46. [68] A. Livnat, C. Papadimitriou, N. Pippenger, M. W. Feldman, Sex, mixability, and modularity, P. Natl. Acad. Sci. USA 107 (4) (2010) 1452–1457. [69] R. A. Fisher, The correlation between relatives on the supposition of Mendelian inheritance, Trans. Roy. Soc. Edinb. 52 (1918) 399–433. [70] B. Derrida, S. Manrubia, D. Zanette, Distribution of repetitions of ancestors in genealogical trees, Physica A 281 (2000) 1–16. [71] R. A. Fisher, The genetical theory of natural selection, 1st Edition, Clarendon, Oxford, UK, 1930. [72] S. A. Kauffman, The origins of order, Oxford University Press, 1993. [73] J. Maynard-Smith, The evolution of sex, Cambridge University Press, 1978. [74] S. Gavrilets, Evolution and speciation on holey adaptive landscapes, Trends Ecol. Evol. 12 (8) (1997) 307–312. 15 [75] L. Demetrius, Statistical mechanics and population biology, J. Stat. Phys. 30 (1983) 709–753. [76] L. Demetrius, M. Ziehe, Darwinian fitness, Theor. Pop. Biol. 72 (3) (2007) 323–345. [77] S. Wright, The distribution of gene frequencies in populations, P. Natl. Acad. Sci. USA 23 (1937) 307–320. [78] N. H. Barton, A. Etheridge, The effect of selection on genealogies, Genetics 166 (2) (2004) 1115. [79] S. Leibler, E. Kussell, Individual histories and selection in heterogeneous populations, P. Natl. Acad. Sci. USA 107 (29) (2010) 13183–13188. [80] M. Lynch, B. Walsh, Genetics and Analysis of Quantitative Traits, Sinauer Associates, Sunderland, USA, 1998. [81] G. R. Price, Fisher’s fundamental theorem made clear, Ann. Hum. Genet. 36 (1972) 129–140. [82] W. J. Ewens, An interpretation and proof of the fundamental theorem of natural selection, Theor. Popul. Biol. 36 (1989) 167–180. [83] N. Timofee´ ff-Ressovsky, K. Zimmer, M. Delbruck,¨ Uber¨ die natur der genmutation und der genstruktur, Nachr. Ges. Wiss. Gottingen 1 (1935) 189–245. [84] S. Luria, M. Delbruck, Mutations of bacteria from virus sensitivity to virus resistance, Genetics 28 (6) (1943) 491–511. [85] E. Schrodinger,¨ What is life?, Cambridge University Press, 1944. [86] H. F. Judson, The eigth day of creation: makers of a revolution in biology, Plainview, 1998. [87] M. Davis, A. Etheridge, Louis Bachelier’s Theory of Speculation: the origins of modern finance, Princeton Univ. Press, 2006. [88] A. N. Kolmogorov, On the analytical methods in probability calculations, Math. Ann. 104 (1931) 415–458. [89] M. Kimura, Diffusion models in population genetics, J. Appl. Probab. 1 (2) (1964) 177–232. [90] N. Barton, L. Partridge, Limits to natural selection, Bioessays 22 (12) (2000) 1075–1084.

16 ΨHpL 10

8 Distribution of Allele Frequencies 6 Fitness Neutral Landscape 4 Base

2

p 0.0 0.2 0.4 0.6 0.8 1.0

Figure 1: In the solution to the diffusion equation, the effects of fitness (blue) combine with neutral factors (green) to give the distribution of allele frequencies (red). The Metropolis-Hastings algorithm has an analogous structure: the acceptance weights (blue) and the random fluctuations (green) combine to give the distribution that is being estimated (red).

1011

0 0011 -5 2.0 log@WD -10 1.5 -15 0111 0010 0001 -4 1.0 Ν -2

0 0.5

z 2 0.0 4

Figure 2: Mapping the genetic fitness landscape to a quantitative-trait fitness landscape. Left: different combinations of allele frequencies, lie in a hyperspace (shown only for a projection of 4 loci), where the axes represent the frequency of each allele. In this plot each point represents a population. The dense cloud of points towards the centre is an optimal peak, set at 0011. The other clouds are at sub-optimal adaptive peaks one mutation away from the optimum. However, each genotype determines a trait, and the population is mapped to a space of trait means, z, and genetic variance, ν. Thus, mean fitness, trait mean, and genetic variance, although related by the allele frequencies, generate a fitness landscape in quantitative variables (yellow surface, the height indicating log-mean fitness). The number of variables (degrees of freedom) is collapsed from a hyperspace of an arbitrary number of allele frequencies at each locus to two quantitative variables: trait mean and genetic variance.

17 1 p

0 0 1

2 S D - , F N 2 1

0 0 0.5 1 Time

Figure 3: The top panel shows the distribution of allele frequencies through time (shown as contour levels). Initially, populations follow the neutral distribution (left axis; Nµ = 0.7). Directional selection Ns = 2.5 is then applied, and populations settle to a new distribution (right axis). Any actual realization (red curve) fluctuates stochastically around an optimal one (illustrated with the green curve). The white dashed line is the deterministic solution, shown as a reference. The lower panel shows the fitness flux (upper curve, green) and the decrease in entropy (lower curve, red). When selection changes abruptly, as here, fitness flux is substantially greater than the decrease in entropy. However, if selection were to change slowly, the two would be equal throughout.

18