A CMA-ES with Multiplicative Covariance Matrix Updates

A CMA-ES with Multiplicative Covariance Matrix Updates Oswin Krause Tobias Glasmachers Department of Computer Science Institut für Neuroinformatik University of Copenhagen Ruhr-Universität Bochum Copenhagen,Denmark Bochum, Germany [email protected] [email protected] ABSTRACT N (m; σ2C), namely the global step size σ and the covari- Covariance matrix adaptation (CMA) mechanisms are core ance matrix C. Adaptation of the step size enables linear building blocks of modern evolution strategies. Despite shar- convergence on scale invariant functions [4], while covari- ing a common principle, the exact implementation of CMA ance matrix adaptation (CMA) [10] renders the asymptotic varies considerably between different algorithms. In this pa- convergence rate independent of the conditioning number of per, we investigate the benefits of an exponential parametriza- the Hessian in the optimum of a twice continuously differ- tion of the covariance matrix in the CMA-ES. This technique entiable function. was first proposed for the xNES algorithm. It results in a The most prominent algorithm implementing the above multiplicative update formula for the covariance matrix. We principles is CMA-ES [10, 8, 12]. Nowadays there exists a show that the exponential parameterization and the mul- plethora of variants and extensions of the basic algorithm. tiplicative update are compatible with all mechanisms of A generic principle for the online update of parameters CMA-ES. The resulting algorithm, xCMA-ES, performs at (including the search distribution) is to maximize the ex- least on par with plain CMA-ES. Its advantages show in pected progress. This goal can be approximated by adapting particular with updates that actively decrease the sampling the search distribution so that the probability of the pertur- variance in specific directions, i.e., for active constraint han- bations that generated successful offspring in the past are dling. increased. This is likely to foster the generation of better points in the near future.1 The application of the above principle to CMA means to Categories and Subject Descriptors change the covariance matrix towards the maximum likeli- [Continuous Optimization] hood estimator of successful steps. To this end let N (m; C) denote the search distribution, and let x1; : : : ; xµ denote suc- General Terms cessful offspring. The maximum likelihood estimator of the covariance matrix generating step δi = xi − m is the rank- T Algorithms one matrix δiδi . All CMA updates of CMA-ES are of the generic form C (1 − c) · C + c · δδT , which is a realization Keywords of this technique keeping an exponentially fading record of previous successful steps δ. CMA-ES variants differ in which evolution strategies, covariance matrix adaptation, CMA- step vectors enter the covariance matrix update. Early vari- ES, multiplicative update, exponential coordinates ants were based on cumulation of directions in an evolution path vector pc and a single rank-one update of C per gener- 1. INTRODUCTION ation [10]. Later versions added a rank-µ update based on Evolution Strategies (ES) are randomized direct search immediate information from the current population [8]. algorithms suitable for solving black box problems in the A different perspective on CMA techniques is provided d within the framework of information geometric optimization continuous domain, i.e., minimization problems f : R ! R defined on a d-dimensional real vector space. Most of these (IGO) [14], in particular by the natural evolution strategy algorithms generate a number of normally distributed off- (NES) approach [17, 16]. It turns out that the rank-µ update spring in each generation. The efficiency of this scheme, equation can be derived from stochastic natural gradient de- at least for unimodal problems, crucially depends on online scent of a stochastically relaxed problem on the statistical adaptation of parameters of the Gaussian search distribution manifold of search distributions. This more general perspective opens up new possibilities for CMA mechanisms, e.g., reparameterizing the covariance matrix in exponential form as done in the xNES algorithm [7]. This results in an update Permission to make digital or hard copies of all or part of this work for equation with the following properties: a) the update is mul- personal or classroom use is granted without fee provided that copies are tiplicative, in contrast to the standard additive update, b) it not made or distributed for profit or commercial advantage and that copies is possible to leave the variance in directions orthogonal to bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 1 permission and/or a fee. This statement holds only under (mild) assumptions on the GECCO’15, July 11-15, 2015, Madrid, Spain. regularity of the fitness landscape, which remain implicit, Copyright 2015 ACM TBA ...$15.00. but may be violated, e.g., in the presence of constraints. all observed steps unchanged, and c) even when performing Algorithm 1: CMA-ES \active" (negative) updates the covariance matrix is guaran- Input: m, σ teed to remain positive definite. C I In this paper we incorporate the exponential parameteri- while stopping condition not met do zation of the covariance matrix into CMA-ES. We derive all // sample and evaluate offspring mechanisms found in the standard CMA-ES algorithm in for i 2 f1; : : : ; λg do this framework, demonstrating the compatibility of cumula- 2 xi N (m; σ C) tive step size adaptation and evolution paths (two features end missing in xNES) with exponential coordinates. The new sort fxig with respect to f(xi) algorithm is called xCMA-ES. Its performance on standard // internal update (paths and parameters) (unimodal) benchmarks coincides with that of CMA-ES, 0 Pµ m wixi however, in addition it benefits from neat properties of the i=1 p (1 − c ) · p + pc (2 − c )µ · C−1=2(m0 − m) exponential parameters, which show up prominently when s s s s s eff p m0−m performing active CMA updates with negative weights, e.g., pc (1 − cc) · pc + cc(2 − cc)µeff · σ T for constraint handling. C (1 − c1 − cµ) · C + c1 · pcpc T In the next section we recap CMA-ES and xNES. Based Pµ xi−m xi−m +cµ · i=1 wi σ σ thereon we present our new xCMA-ES algorithm. In sec- σ σ · exp cs · kpsk − 1 tion 3 its performance is evaluated empirically on standard Dσ χd 0 benchmarks. We demonstrate the superiority for special m m tasks involving active CMA updates. end 2. ALGORITHMS In this section we provide the necessary background for it is assumed that its computational cost is substantial, our new algorithm before introducing the xCMA-ES. We will usually exceeding the internal computational complexity cover the well-known CMA-ES algorithm as well as xNES, of the algorithm. both with a focus on the components required for our own 3. Sort the offspring by fitness. Post-condition: f(x ) ≤ contribution. In the following algorithms, a few implemen- 1 f(x ) ≤ · · · ≤ f(x ). tation details are not shown, e.g., the eigen decomposition 2 λ 4. Perform environmental selection: keep fx ; : : : ; x g, dis- of C required for sampling as well as for the computation of 1 µ card fx ; : : : ; x g, usually the worse half. the inverse matrix square root C−1=2. µ+1 λ 5. Update the evolution path for cumulative step size adap- 2.1 CMA-ES tation: the path ps is an exponentially fading record of steps of the mean vector, back-transformed by multipli- The Covariance Matrix Adaptation Evolution Strategy −1=2 (CMA-ES) [10, 12] is the most prominent evolution strategy cation with the inverse matrix square root C into a in existence. It comes in many variants, e.g., with extensions coordinate system where the sampling distribution is a for handling of fitness noise [9] and multi-modality [5]. Here standard normal distribution. This path is supposed to we describe what can be considered a baseline version, fea- follow a standard normal distribution. turing non-elitist (µ, λ) selection, cumulative step size con- 6. Update the evolution path for covariance matrix adap- trol (CSA), and two different types of covariance matrix up- tation: the path pc is an exponentially fading record of dates, namely a rank-1 update based on an evolution path, steps of the mean vector divided (corrected) by the step and the so-called rank-µ update based on the survivors of size. This path models the movement direction of the environmental truncation selection. distribution center over time. d The state of CMA-ES is given by the parameters m 2 R , 7. Update the covariance matrix: a new matrix is obtained d×d σ > 0, and C 2 R of its multi-variate normal search dis- by additive blending of the old matrix with a rank-one tribution N (m; σ2C), as well as by the two evolution paths matrix formed by the path pc and a rank-µ matrix formed d by successful steps of the current population. ps; pc 2 R . Pseudo-code of a basic CMA-ES is provided in algorithm 1. The algorithm has a number of tuning con- 8. Update the step size: the step size is changed if the norm stants, e.g., the sizes of parent and offspring population µ of ps indicates a systematic deviation from the standard and λ, the various leaning rates, and the rank-based weights normal distribution. Note (for later reference) that this w1; : : : ; wµ. For robust default settings for the different pa- update has a multiplicative form, involving the exponen- rameters we refer to [12]. tial function. In each generation, CMA-ES executes the following steps: 9.

Load more