Mathematical Analysis of Evolutionary Algorithms for Optimization

Mathematical Analysis of Evolutionary Algorithms for Optimization Heinz Muhlenbein¨ Thilo Mahnig GMD – Schloss Birlinghoven GMD – Schloss Birlinghoven 53754 Sankt Augustin, Germany 53754 Sankt Augustin, Germany [email protected] [email protected] Abstract Simulating evolution as seen in nature has been identified as one of the key computing paradigms for the new decade. Today evolutionary algorithms have been successfully used in a number of applications. These include discrete and continuous optimization problems, synthesis of neural networks, synthesis of computer programs from examples (also called genetic programming) and even evolvable hardware. But in all application areas problems have been encountered where evolutionary algorithms performed badly. In this survey we concentrate on the analysis of evolutionary algorithms for optimization. We present a mathematical theory based on probability distributions. It gives the reasons why evolutionary algorithms can solve many difficult multi-modal functions and why they fail on seemingly simple ones. The theory also leads to new sophisticated algorithms for which convergence is shown. 1 Introduction In Section 5.2 ÍÅ is extended to the Factorized Distri- bution Algorithm . We prove convergence of the al- We first introduce the most popular algorithm, the simple ge- gorithm to the global optima if Boltzmann selection is used. netic algorithm. This algorithm has many degrees of free- The theory of factorization connects with the theory of dom, especially in the recombination scheme used. We show graphical models and Bayesian networks. We derive a new that all genetic algorithms behave very similar, if recombina- adaptive Boltzmann selection schedule SDS using ideas from tion is done without selection a sufficient number of times be- the science of breeding. fore the next selection step. We correct the classical schema In Section 6.1 we use results from the theory of Bayesian analysis of genetic algorithm. We show why the usual schema networks for the Learning Factorized Distribution Algorithm theorem folklore is mathematically wrong. We approximate Ä , which learns a factorization from the data. We make genetic algorithms by a conceptual algorithm. This algo- a preliminary comparison between the efficiency of rithm we call the Univariate Marginal Distribution Algorithm and Ä . ÍÅ, which is analyzed in Section 3. We compute the dif- In Section 7 we describe the system dynamics approach to ference equation for the univariate marginal distributions un- optimization. The difference equations obtained for ÍÅ der the assumption of proportionate selection. This equation are iterated until convergence. Thus the continuous opti- has been proposed in populations genetics by Sewall Wright mization problem is mathematically solved without using a as early as 1937 [Wri70]. This is an independent confirma- population of points at all. We present numerical results tion of our claim that ÍÅ approximates any genetic algo- for three different system dynamics equations. They con- rithm. Using Wright’s equation we show that ÍÅ solves sists of Wright’s equation, the diversified replicator equation a continuous optimization problem. The function to be opti- and a modified version of Wright’s equation which converges mized is given by the average fitness of the population. faster. Proportionate selection is far too weak for optimization. In the final section we classify the different evolutionary This has been recognized very early in breeding of livestock. computation methods presented. The classification criterion Artificial selection as done by breeders is a much better model is whether a microscopic or a macroscopic model is used for for optimization than natural selection modelled by propor- selection and/or recombination. tionate selection. Unfortunately an exact mathematical analysis of efficient artificial selection schemes seems impossible. 2 Analysis of the Simple Genetic Algorithm Therefore breeders have developed an approximate theory, using the concepts of regression of offspring to parent, her- In this section we investigate the standard genetic algorithm, itability and response to selection. This theory is discussed also called the Simple Genetic Algorithm (SGA). The algo- in Section 4. At the end of the section numerical results are rithm is described by Holland [Hol92] and Goldberg [Gol89]. shown which show the strength and the weakness of ÍÅ It consists of as a numerical optimization method. ¯ ÍÅ optimizes very efficient some difficult optimiza- fitness proportionate selection tion problems, but it fails on some simple problems. For these ¯ recombination/crossover problems higher order marginal distributions are necessary which capture the nonlinear dependency between variables. ¯ mutation Here we will analyze selection and recombination only. Mu- 2.2 Proportionate Selection tation is considered to be a background operator. It can Proportionate selection changes the probabilities according to be analyzed by known techniques from stochastics [MSV94, Müh97]. ´Üµ ´ÜØ ·½µ Ô´ÜØµ There have been many claims concerning the optimization Ô (2.3) ´Øµ power of Ë. Most of them are based on a rather qualita- tive application of the schema theorem. We will show the Lemma 2.1. For proportionate selection the response is shortcomings of this approach. Our analysis is based on tech- given by ´Øµ niques used in population genetics. The analysis reveals that Î ´Øµ Ê (2.4) Ë ´Øµ an exact mathematical analysis of is possible for small problems only. For a binary problem of size Ò the exact anal- Proof: Ò We have ysis needs the computation of ¾ equations. But we propose ¾ Î ´Øµ ´Üµ an approximation often used in population genetics. The ap- ´Øµ ´Øµ Ô´Ü Øµ Ê (2.5) proximation assumes that the gene frequencies are in linkage ´Øµ ´Øµ equilibrium. The main result is that any genetic algorithm can Ü be approximated by an algorithm using Ò parameters only, the univariate marginal gene frequencies. With proportionate selection the average fitness never de- 2.1 Definitions creases. This is true for every rational selection scheme. Ü ´Ü Ü µ Ò Let ½ denote a binary vector. For notational 2.3 Recombination Ü ¾ simplicity we restrict the discussion to binary variables ¼ ½ . We use the following conventions. Capital letters For the analysis of recombination we introduce a special dis- Ü denote variables, small letters assignments. tribution. ¼ Ê Definition 2.1. Let a function be given. We consider the optimization problem Definition 2.4. Robbins’ proportions are given by the distribution Ò Ü Ö ÑÜ ´Üµ ÓÔØ (2.1) Ô ´Ü Øµ ´Ü Øµ (2.6) ½ ´Üµ Ë We will use as the fitness function for the .We A population in Robbins’ proportions is also called to be in will investigate two widely used recombination/crossover linkage equilibrium. schemes. Geiringer [Gei44] has shown that all reasonable recombination schemes lead to the same limit distribution. Ý Definition 2.2. Let two strings Ü and be given. In one- point crossover the string Þ is created by randomly choosing Theorem 2.1 (Geiringer). Recombination does not change ¼ Ð Ò Þ Ü Ð a crossover point and setting for and Ô ´Ü Ø ·½µ the univariate marginal frequencies, i.e. Þ Ý Ð Þ for .Inuniform crossover is randomly chosen Ô ´Ü Øµ . The limit distribution of any complete recombina- Ü Ý with equal probability from . ´Üµ tion scheme is Robbins’ proportions . ´ÜØµ Ü Definition 2.3. Let Ô denote the probability of Complete recombination means that for each subset Ë of Ø Ô ´Ü Øµ in the population at generation . Then ½Ò È , the probability of an exchange of genes by re- ´ÜØµ Ô defines a univariate marginal distribution. Ü Ü combination is greater than zero. Convergence to the limit distribution is very fast. We have to mention an important Ô ´Ü µ We often write if just one generation is discussed. In this notation the average fitness of the population and the vari- fact. In a finite population linkage equilibrium cannot be ex- ance is given by actly achieved. We take the uniform distribution as example. Ò ´Üµ¾ Here linkage equilibrium is given by Ô . This value Æ can only be obtained if the size of the population is sub- ´Øµ Ô´ÜØµ ´Üµ Ò Æ ½¼¼¼ stantial larger than ¾ ! For a population of the Ü ¡ ¾ Ë É minimum deviation ÑÒ from Robbins’ proportions is Î ´Øµ Ô´ÜØµ ´Üµ ´Øµ already achieved after four generations, then Ë É slowly in- Ü creases due to stochastic fluctuations by genetic drift. Ul- ´Øµ The response to selection Ê is defined by timately the population will consist of one genotype only. Genetic drift has been analyzed by Asoh and & Mühlenbein ´Øµ ´Ø ·½µ ´Øµ Ê (2.2) [AM94b]. It will not be considered here. Ü 2.4 Selection and Recombination strings where the gene at locus i is fixed to . The univari- Ô´Ü Øµ ate marginal frequency is obviously identical to the ´Øµ We have shown that the average never decreases after se- À ´Ü µ frequency of schema . The fitness of the schema at lection and that any complete recombination scheme moves generation Ø is given by the genetic population to Robbins’ proportions. Now the question arises: What happens if recombination is applied af- ½ Ô´ÜØµ ´Üµ ´À ´Ü µØµ ter selection. The answer is very difficult. The problem still (2.9) Ô ´Ü Øµ Ü Ü puzzles populations genetics [Nag92]. Formally the difference equations can be easily written. From Theorem 2.2 we obtain: Ê Ê Let a recombination distribution be given. ÜÝ Þ denotes Þ Ü the probability that Ý and produce after recombination. Corollary 2.1 (First-order schema theorem). For a genetic Then algorithm with proportionate selection using any complete × × Ô´ÜØ ·½µ Ê Ô ´Ý µÔ ´Þµ ÜÝ Þ (2.7) recombination the frequency of first-order schemata changes ÝÞ according to × ´Üµ Ü Ô denotes the probability of string after selection. Ò Ò ´À ´Ü µØµ Ê ¾ £ ¾ For Ò loci the recombination distribution consists of Ô ´Ü Ø ·½µ Ô ´Ü Øµ (2.10) ´Øµ parameters.

Mathematical Analysis of Evolutionary Algorithms for Optimization

Mathematical Analysis of the Accordion Grating Illusion: a Differential Geometry Approach to Introduce the 3D Aperture Problem

Iasinstitute for Advanced Study

A Mathematical Analysis of Student-Generated Sorting Algorithms Audrey Nasar

Arxiv:1912.01885V1 [Math.DG]

The Orthosymplectic Lie Supergroup in Harmonic Analysis Kevin Coulembier

On P-Adic Mathematical Physics B

A. Ya. Khintchine's Work in Probability Theory

Analytic Trends in Mathematical Physics

Geometric Analysis

Mathematical Analysis, Second Edition

Bernstein-Gelfand-Gelfand Resolutions for Linear Superalgebras

Cs6402 Design and Analysis of Algorithms