CMA-ES and Advanced Adaptation Mechanisms Youhei Akimoto, Nikolaus Hansen

CMA-ES and Advanced Adaptation Mechanisms Youhei Akimoto, Nikolaus Hansen To cite this version: Youhei Akimoto, Nikolaus Hansen. CMA-ES and Advanced Adaptation Mechanisms. GECCO ’18 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion, Jul 2018, Kyoto, Japan. hal-01959479 HAL Id: hal-01959479 https://hal.inria.fr/hal-01959479 Submitted on 18 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. CMA-ES and Advanced Adaptation Mechanisms Youhei Akimoto1 & Nikolaus Hansen2 1. University of Tsukuba, Japan 2. Inria, Research Centre Saclay, France [email protected] [email protected] Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. GECCO '18 Companion, July 15–19, 2018, Kyoto, Japan © 2018 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-5764-7/18/07. https://doi.org/10.1145/3205651.3207854 1 Tutorial: Evolution Strategies and CMA-ES (Covariance Matrix Adaptation) Anne Auger & Nikolaus Hansen Inria We are happy to answerProject team questions TAO at any time. Research Centre Saclay – Île-de-France University Paris-Sud, LRI (UMR 8623), Bat. 660 91405 ORSAY Cedex, France https://www.lri.fr/~hansen/gecco2014-CMA-ES-tutorial.pdf Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). GECCO ’14, Jul 12-16 2014, Vancouver, BC, Canada ACM 978-1-4503-2881-4/14/07. http://dx.doi.org/10.1145/2598394.2605347 2 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 1 / 81 Topics 1. What makes the problem difficult to solve? 2. How does the CMA-ES work? • Normal Distribution, Rank-Based Recombination • Step-Size Adaptation • Covariance Matrix Adaptation 3. What can/should the users do for the CMA-ES to work effectively on their problem? • Choice of problem formulation and encoding (not covered) • Choice of initial solution and initial step-size • Restarts, Increasing Population Size • Restricted Covariance Matrix 3 Topics 1. What makes the problem difficult to solve? 2. How does the CMA-ES work? • Normal Distribution, Rank-Based Recombination • Step-Size Adaptation • Covariance Matrix Adaptation 3. What can/should the users do for the CMA-ES to work effectively on their problem? • Choice of problem formulation and encoding (not covered) • Choice of initial solution and initial step-size • Restarts, Increasing Population Size • Restricted Covariance Matrix 4 Problem Statement Black Box Optimization and Its Difficulties Problem Statement Continuous Domain Search/Optimization Task: minimize an objective function (fitness function, loss function) in continuous domain n f : R R, x f (x) X ✓ ! 7! Black Box scenario (direct search scenario) x f(x) I gradients are not available or not useful I problem domain specific knowledge is used only within the black box, e.g. within an appropriate encoding Search costs: number of function evaluations 5 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 3 / 81 Problem Statement Black Box Optimization and Its Difficulties Problem Statement Continuous Domain Search/Optimization Goal I fast convergence to the global optimum . or to a robust solution x I solution x with small function value f (x) with least search cost there are two conflicting objectives Typical Examples I shape optimization (e.g. using CFD) curve fitting, airfoils I model calibration biological, physical I parameter calibration controller, plants, images Problems I exhaustive search is infeasible I naive random search takes too long I deterministic search is not successful / takes too long Approach: stochastic search, Evolutionary Algorithms 6 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 4 / 81 Problem Statement Black Box Optimization and Its Difficulties What Makes a Function Difficult to Solve? Why stochastic search? non-linear, non-quadratic, non-convex on linear and quadratic functions much better search policies are available 100 ruggedness 90 80 70 non-smooth, discontinuous, multimodal, and/or 60 50 noisy function 40 30 20 10 0 −4 −3 −2 −1 0 1 2 3 4 dimensionality (size of search space) 3 (considerably) larger than three 2 1 non-separability 0 −1 dependencies between the objective variables −2 −3 ill-conditioning −3 −2 −1 0 1 2 3 non-smooth level sets gradient direction Newton direction 7 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 6 / 81 Problem Statement Black Box Optimization and Its Difficulties Ruggedness non-smooth, discontinuous, multimodal, and/or noisy 100 90 80 70 60 50 Fitness 40 30 20 10 0 −4 −3 −2 −1 0 1 2 3 4 cut from a 5-D example, (easily) solvable with evolution strategies 8 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 7 / 81 Problem Statement Non-Separable Problems Separable Problems Definition (Separable Problem) A function f is separable if arg min f (x1,...,xn)= arg min f (x1,...),...,arg min f (...,xn) (x ,...,x ) x1 xn 1 n ✓ ◆ it follows that f can be optimized in a sequence of n independent ) 1-D optimization processes Example: Additively 3 decomposable functions 2 1 n 0 f (x1,...,xn)= fi(xi) −1 Xi=1 Rastrigin function −2 −3 −3 −2 −1 0 1 2 3 9 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 9 / 81 Problem Statement Non-Separable Problems Non-Separable Problems Building a non-separable problem from a separable one (1,2) Rotating the coordinate system f : x f (x) separable 7! f : x f (Rx) non-separable 7! R rotation matrix 3 3 2 2 1 R 1 0 0 ! −1 −1 −2 −2 −3 −3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 1 Hansen, Ostermeier, Gawelczyk (1995). On the adaptation of arbitrary normal mutation distributions in evolution strategies: The generating set adaptation. Sixth ICGA, pp. 57-64, Morgan Kaufmann 2 Salomon (1996). ”Reevaluating Genetic Algorithm Performance under Coordinate Rotation of Benchmark Functions; A survey of some theoretical and practical aspects of genetic algorithms.” BioSystems, 39(3):263-278 10 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 10 / 81 Problem Statement Ill-Conditioned Problems Ill-Conditioned Problems Curvature of level sets Consider the convex-quadratic function 1 T 1 2 1 f (x)= 2 (x x⇤) H(x x⇤)= 2 i hi,i (xi xi⇤) + 2 i=j hi,j (xi xi⇤)(xj xj⇤) − − − 6 − − H is Hessian matrix of f and symmetric positive definite P P gradient direction f (x)T − 0 Newton direction H 1f (x)T − − 0 Ill-conditioning means squeezed level sets (high curvature). Condition number equals nine here. Condition numbers up to 1010 are not unusual in real world problems. If H I (small condition number of H) first order information (e.g. the ⇡ gradient) is sufficient. Otherwise second order information (estimation 1 of H− ) is necessary. 11 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 11 / 81 Section Subsection Non-smooth level sets (sharp ridges) Similar difficulty but worse than ill-conditioning 1-norm scaled 1-norm 1/2-norm 12 Problem Statement Ill-Conditioned Problems What Makes a Function Difficult to Solve? . and what can be done The Problem Possible Approaches Dimensionality exploiting the problem structure separability, locality/neighborhood, encoding Ill-conditioning second order approach changes the neighborhood metric Ruggedness non-local policy, large sampling width (step-size) as large as possible while preserving a reasonable convergence speed population-based method, stochastic, non-elitistic recombination operator serves as repair mechanism restarts ...metaphors 13 Anne Auger & Nikolaus Hansen CMA-ES July, 2014 12 / 81 Topics 1. What makes the problem difficult to solve? 2. How does the CMA-ES work? • Normal Distribution, Rank-Based Recombination • Step-Size Adaptation • Covariance Matrix Adaptation 3. What can/should the users do for the CMA-ES to work effectively on their problem? • Choice of problem formulation and encoding (not covered) • Choice of initial solution and initial step-size • Restarts, Increasing Population Size • Restricted Covariance Matrix 14 Evolution Strategies (ES) A Search Template Stochastic Search A black box search template to minimize f : Rn R ! Initialize distribution parameters ✓, set population size λ N 2 While not terminate n 1 Sample distribution P (x ✓) x1,...,xλ R | ! 2 2 Evaluate x1,...,xλ on f 3 Update parameters ✓ F✓(✓, x1,...,xλ, f (x1),...,f (xλ)) Everything depends on the definition of P and F✓ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for (incremental) Estimation

Load more