Arxiv:1912.03395V2 [Q-Bio.PE] 19 Feb 2020
Total Page:16
File Type:pdf, Size:1020Kb
Information-geometric optimization with natural selection Jakub Otwinowski and Colin H. LaMont Max Planck Institute for Dynamics and Self-Organization∗ Evolutionary algorithms, inspired by natural evolution, aim to optimize difficult objective func- tions without computing derivatives. Here we detail the relationship between population genetics and evolutionary optimization and formulate a new evolutionary algorithm. Optimization of a continuous objective function is analogous to searching for high fitness phenotypes on a fitness land- scape. We summarize how natural selection moves a population along the non-euclidean gradient that is induced by the population on the fitness landscape (the natural gradient). Under normal ap- proximations common in quantitative genetics, we show how selection is related to Newton’s method in optimization. We find that intermediate selection is most informative of the fitness landscape. We describe the generation of new phenotypes and introduce an operator that recombines the whole population to generate variants that preserve normal statistics. Finally, we introduce a proof-of- principle algorithm that combines natural selection, our recombination operator, and an adaptive method to increase selection. Our algorithm is similar to covariance matrix adaptation and natural evolutionary strategies in optimization, and has similar performance. The algorithm is extremely simple in implementation with no matrix inversion or factorization, does not require storing a co- variance matrix, and may form the basis of more general model-based optimization algorithms with natural gradient updates. INTRODUCTION type with a one percent fitness advantage will go extinct [2]. Finding the optimal parameters of a high dimensional Some form of deterministic selection in optimization function is a common problem in many fields. We seek is desirable to not waste computational effort, and with- protein conformations with minimal free energy in bio- out stochasticity a population based algorithm will still physics, the genotypes with maximal fitness in evolu- be robust to noise in fitness since a population effec- tion, the parameters of maximum likelihood in statisti- tively integrates information over some region of the fit- cal inference, and optimal design parameters in count- ness landscape. Some ES and GAs use deterministic less engineering problems. Often derivatives of the ob- rank based selection which removes individuals below jective function are not available or are too costly, and some threshold. However, such truncation selection is derivative-free algorithms must be applied. very coarse, and does not affect proportionally the geno- Evolutionary optimization algorithms (EA) use a types that survive. population of candidate solutions to generate new can- Many population based algorithms, including ES and didate solutions for the objective “fitness” function. estimation of distribution algorithms (EDA), are based In particular, genetic algorithms (GA) and evolution on drawing a population of candidate solutions from a strategies (ES) are two classes of EAs most directly in- parameterized distribution P(θ) and iteratively updat- spired by the Wright-Fisher and Moran models from ing the parameters θ [3]. The basic approach is to move population genetics [1, 2]. GAs are initialized with some the parameters in the direction of the gradient of the population of genotypes, representing candidate solu- mean fitness: rθF . To account for the uncertainty of tions, and use some form of stochastic reproduction, in- the parameters many algorithms move the parameters in −1 corporating a bias known as selection. Among the dif- the direction of the natural gradient [4], g rθF , where ferent selection schemes, fitness-proportionate selection g−1 is the inverse of the fisher information, which can be is equivalent to natural selection in population genetics. estimated from the population. The popular covariance arXiv:1912.03395v2 [q-bio.PE] 19 Feb 2020 Stochasticity of reproduction may be helpful in over- matrix adaptation ES algorithm (CMA-ES) [5, 6] and coming local optima, and noise in fitness. In population related natural evolution strategies (NES) [7–10] param- genetics, stochasticity of reproduction is known as ge- eterize a population as a normal distribution, and use netic drift, and has the important effect of scaling the samples from the distribution to update the mean and strength of selection inversely with the magnitude of covariance with a natural gradient descent step. More stochasticity [2]. Stochasticity also causes the loss of generally, natural gradients describe ascent of the fitness many genotypes, even if they have high fitness. For ex- landscape in terms of information geometry, and their ample in the strong selection weak mutation regime of use characterizes a wide class of information-geometric the Moran model, the probability that a single genotype algorithms [11]. These algorithms differ from GAs and will sweep a population (fixation) is proportional to its population genetics, in that there are no selection or selective advantage, and there is a 99% chance a geno- mutation operators applied directly to individuals in a population. Here, we point out that the natural gradient used in information-geometric optimization also appears in nat- ∗ [email protected] ural selection. Under normally distributed phenotypes, 2 A 3 B 0.2 1 5 2 0.20 2 x 1 0.15 0 0.10 −1 phenotype frequencies p(t) 0.05 −2 −3 0.00 −3 −2 −1 0 1 2 3 −2−1 0 1 2 −2−1 0 1 2 −2−1 0 1 2 phenotype x1 phenotype x1 Figure 1. A) An example of 100 variants in a 2D phenotype space on a quadratic fitness landscape (blue contours). B) Frequencies evolve over time according to eq. 2, with t = 0:2 (red), t = 1 (green) and t = 5 (blue). as is done in multivariate quantitative genetics, we show in mean fitness equals the fitness variance (Fisher’s the- how selection is related to Newton’s method in optimiza- orem), and higher moments evolve as well. As an exam- tion. Then, we describe how intermediate levels of se- ple we show 100 variants in a quadratic fitness landscape lection are best for optimization, and how mutation and and how their frequencies change over time (Fig. 1). recombination generate new variants without having to Remarkably, replicator dynamics can be rewritten in explicitly sample from the distribution. Finally, we de- terms of information geometry [12]. Frequencies can be velop a proof of principle quantitative genetic algorithm considered as the parameters of a discrete categorical (QGA) which combines selection, recombination, and a distribution, and selection moves them in the direction form of adaptive selection tuning that shrinks the pop- of the covariant derivative [13, 14], (also known as the ulation towards an optimum. In contrast to GAs, QGA natural gradient [4]), has deterministic selection, and compared to CMA-ES and NES, it is much simpler and does not store a co- dp = g−1r F; variance matrix. dt p where p is the vector of (linearly independent) frequen- −1 NATURAL SELECTION GRADIENT cies, rpF is the vector of partial derivatives, and g is the inverse of the fisher information metric of the We begin by considering a population of infinite size, categorical distribution, which defines distances on the but with a finite number of unique phenotypes. Each curved manifold of probability distributions (see ap- unique variant i has a continuous multivariate pheno- pendix A). Selection changes the frequencies in the di- rection of steepest ascent in non-euclidean coordinates type xi (a vector), with frequency pi and growth rate, defined by the geometry of the manifold of the distribu- or fitness f(xi), independent of time and frequencies. In the context of optimization, phenotypes are candi- tion. date solutions, and fitness is the objective function to The natural gradient is independent of parameteri- be maximized. Classical replicator dynamics, leaving zation, and therefore, if the distribution over x can out mutation and genetic drift, describe the change in be approximated by another distribution, selection will frequencies as change those parameters in the direction of their natural gradient. This can be demonstrated by projecting onto dpi a normal phenotype distribution, as is assumed in clas- = pi (f(xi) − F ) ; (1) dt sic multivariate quantitative genetics. The population P with mean fitness F = i pif(xi). In stochas- mean µ and population covariance matrix Σ parame- tic descriptions, these dynamics describe the expected terize the distribution, and selection changes the mean changes due to selection. as ([15, 16], appendix A) In the absence of other processes, frequencies can be integrated over time resulting in dµ = ΣrµF; (3) 1 dt tf(xi) pi(t) = pi(0) e ; (2) Zt where Σ−1 is the associated Fisher information metric. with normalization Zt ensuring the probabilities sum to Similarly, the covariance follows a natural gradient with one. At long times, the phenotype distribution will con- a more complex metric (appendix A). If phenotype co- centrate on high fitness phenotypes until the highest fit- variance reaches zero, then the population is monomor- ness phenotype reaches a frequency of unity. The change phic and there is no selection. However, an alternative 3 population genetics model in the strong selection weak distribution of frequencies. The exponential of entropy St mutation regime can search a fitness landscape with lim- Kt = e defines an effective number of variants, such ited diversity, with the mutation covariance matrix serv-