Simplify Your Cma-Es 3

Total Page:16

File Type:pdf, Size:1020Kb

Simplify Your Cma-Es 3 ACCEPTED FOR: IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Simplify Your Covariance Matrix Adaptation Evolution Strategy Hans-Georg Beyer and Bernhard Sendhoff Senior Member, IEEE Abstract—The standard Covariance Matrix Adaptation Evo- call the algorithm’s peculiarities, such as the covariance matrix lution Strategy (CMA-ES) comprises two evolution paths, one adaptation, the evolution path and the cumulative step-size for the learning of the mutation strength and one for the rank- adaptation (CSA) into question. Notably in [4] a first attempt 1 update of the covariance matrix. In this paper it is shown that one can approximately transform this algorithm in such has been made to replace the CSA mutation strength control a manner that one of the evolution paths and the covariance by the classical mutative self-adaptation. While getting rid of matrix itself disappear. That is, the covariance update and the any kind of evolution path statistics and thus yielding a much covariance matrix square root operations are no longer needed simpler strategy1, the resulting rank-µ update strategy, the so- in this novel so-called Matrix Adaptation (MA) ES. The MA- called Covariance Matrix Self-Adaptation CMSA (CMSA-ES) ES performs nearly as well as the original CMA-ES. This is shown by empirical investigations considering the evolution does not fully reach the performance of the original CMA- dynamics and the empirical expected runtime on a set of standard ES in the case of small population sizes. Furthermore, some test functions. Furthermore, it is shown that the MA-ES can of the reported performance advantages in [4] were due to a be used as search engine in a Bi-Population (BiPop) ES. The wrongly implemented stalling of the covariance matrix update resulting BiPop-MA-ES is benchmarked using the BBOB COCO in the CMA-ES implementation used. framework and compared with the performance of the CMA-ES- v3.61 production code. It is shown that this new BiPop-MA-ES Meanwhile, our theoretical understanding of the evolution – while algorithmically simpler – performs nearly equally well dynamics taking place in CMSA- and CMA-ES has advanced. as the CMA-ES-v3.61 code. Basically two reasons for the superior CMA-ES performance Index Terms—Matrix Adaptation Evolution Strategies, Black can be identified in the case of small population sizes: Box Optimization Benchmarking. Concentrate evolution along a predicted direction. • Using the evolution path information for the covariance matrix update (this rank-1 update was originally used I. INTRODUCTION in the first CMA-ES version [7] as the only update) HE Covariance Matrix Adaptation Evolution Strategy can be regarded as some kind of time series prediction T (CMA-ES) has received considerable attention as an of the evolution of the parent in the RN search space. algorithm for unconstrained real-parameter optimization, i.e. That is, the evolution path carries the information in solving the problem which direction the search steps were most successful. This directional information may be regarded as the yˆ = argopty RN f(y) , (1) ∈ most promising one and can be used in the CMA- since its publication in its fully developed form including ES to shrink the evolution of the covariance matrix.2 rank-µ update in [1]. While there are proposals regarding As a result, optimization in landscape topologies with modifications of the original CMA-ES such as the (1 + 1)- a predominant search direction (such as the Cigar test CMA-ES [2] and its latest extension in [19], or the active function or the Rosenbrock function) can benefit from CMA-ES [3], or the CMSA-ES [4], the design presented in [1] the path cumulation information. Increased mutation strength. has only slightly changed during the years and state-of-the-art • presentations such as [5] still rely on that basic design. Taking As has been shown by analyzing the dynamic of the CSA the advent of the so-called Natural Evolution Strategies (NES) on the ellipsoid model in [10], the path-length based mu- into account, e.g. [6], one sees that even a seemingly principled tation strength control yields larger steady state mutation design paradigm such as the information gain constrained strengths up to a factor of µ (parental population size) evolution on statistical manifolds did not cause a substantial compared to ordinary self-adaptive mutation strength change in the CMA-ES design. Actually, in order to get control (in the case of non-noisy objective functions). NES competitive, those designers had to borrow from and Due to the larger mutation strengths realized, the CSA- rely on the ideas/principles empirically found by the CMA- based approach can better take advantage of the genetic ES designers. Yet, it is somehow remarkable why the basic 1By simplicity we mean a reduced complexity of code and especially a framework of CMA-ES has not undergone a deeper scrutiny to decreased number of strategy specific parameters to be fixed by the algorithm designer. Furthermore, the remaining strategy specific parameters have been H.-G. Beyer is with the Research Center Process and Product Engineering determined rather by first principles than empirical parameter tuning studies. at the Vorarlberg University of Applied Sciences, Dornbirn, Austria, Email: 2Shrinking is a well-known technique for covariance matrix estimation in [email protected] order to control the undesired growth of the covariance matrix [8]. Considering B. Sendhoff is with the Honda Research Institute Europe GmbH, Offen- the covariance matrix update in the CMA-ES from the perspective of shrinking bach/Main, Germany, Email: [email protected] has been done first by Meyer-Nieberg and Kropat [9]. 2 ACCEPTED FOR: IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION repair effect provided by the intermediate (or weighted) (µ/µw, λ)-CMA-ES recombination. The performance advantage vanishes, however, when the pop- Initialize y(0), σ(0),g := 0, p(0) := 0, ulation size is chosen sufficiently large. However, considering (0) (0) the CMA-ES as a general purpose algorithm, the strategy s := 0, C := I (C1) should cope with both small and large population sizes. Repeat (C2) Therefore, it seems that both the rank-1 and the rank-µ update M(g) := √C(g) (C3) are needed. This still calls in question whether it is necessary For l := 1 To λ (C4) to have two evolution paths and an update of the covariance matrix at all. (g) N ˜zl := l(0, I) (C5) In this paper we will show that one can drop one of the d˜(g) := M(g)z˜(g) (C6) evolution paths and remove the covariance matrix totally, thus l l (g) (g) (g) ˜(g) removing a “p” and the “C” from the CMA-ES without sig- y˜l := y + σ dl (C7) nificantly worsening the performance of the resulting “Matrix f˜(g) := f y˜(g) (C8) Adaptation” (MA)-ES. As a byproduct of the analysis, which l l End (C9) is necessary to derive the MA-ES from the CMA-ES, one can draw a clearer picture of the CMA-ES, which is often regarded SortOffspringPopulation (C10) as a rather complicated evolutionary algorithm. y(g+1) := y(g) + σ(g) d˜(g) (C11) This work is organized as follows. First, the CMA-ES w D E algorithm is presented in a manner that allows for a simpler (g+1) (g) (g) s := (1 cs)s + µeff cs(2 cs) ˜z (C12) analysis of the pseudocode lines. As the next step, the p- − − w evolution path (realized by line C13, Fig. 1, see Section II q D E (g+1) (g) (g) p := (1 cp)p + µeff cp(2 cp) d˜ (C13) for details) will be shown to be similar to that of the s- − − w evolution path (realized by line C12, Fig. 1). Both paths q D ET C(g+1) := (1 c c )C(g) + c p(g+1) p(g+1) can be approximately transformed (asymptotically exact for − 1 − w 1 ˜(g) ˜(g) T search space dimensionality N ) into each other by a + cw d d (C14) linear transformation where the→ transformation ∞ matrix is just w c s(g+1)D E the matrix square root M of the covariance matrix C. After σ(g+1) := σ(g) exp s 1 (C15) d E[ N (0, I) ] − removing the p-path from the CMA-ES, the update of the " σ k k !# C matrix will be replaced by transforming the covariance g := g +1 (C16) learning to a direct learning of the M matrix. Thus, one Until(termination condition(s) fulfilled) (C17) needs no longer the covariance matrix and operations such as Cholesky decomposition or spectral decomposition to calculate Fig. 1. Pseudocode of the CMA-ES with rank > 1 weighted covariance the matrix square root. This simplifies the resulting ES consid- matrix update. erably both from viewpoint of the algorithmic “complexity” and the numerical algebra operations needed. Furthermore, the resulting M-update allows for a new interpretation of the performed in Fig. 1 and show that those lines are mathemati- CMA-ES working principles. However, since the derivations cally equivalent to the original CMA-ES [1]. In order to have rely on assumptions that are only asymptotically exact for a clear notion of generational changes, a generation counter search space dimensionality N , numerical experiments (g) is used even though it is not necessarily needed for the are provided to show that the novel→ ∞ MA-ES performs nearly functioning in real CMA-ES implementations. equally well as the original CMA-ES. Additionally, the CMA- A CMA-ES generation cycle (also referred to as a gen- ES and MA-ES will be used as “search engines” in the eration) is performed within the repeat-until-loop (C2–C17).
Recommended publications
  • Conjugate Gradient Method (Part 4)  Pre-Conditioning  Nonlinear Conjugate Gradient Method
    18-660: Numerical Methods for Engineering Design and Optimization Xin Li Department of ECE Carnegie Mellon University Pittsburgh, PA 15213 Slide 1 Overview Conjugate Gradient Method (Part 4) Pre-conditioning Nonlinear conjugate gradient method Slide 2 Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate ( ) ( ) ( ) D 0 = R 0 = B − AX 0 Step 3: update solution D(k )T R(k ) X (k +1) = X (k ) + µ (k )D(k ) where µ (k ) = D(k )T AD(k ) Step 4: calculate residual ( + ) ( ) ( ) ( ) R k 1 = R k − µ k AD k Step 5: determine search direction R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 3 Convergence Rate k ( + ) κ(A) −1 ( ) X k 1 − X ≤ ⋅ X 0 − X κ(A) +1 Conjugate gradient method has slow convergence if κ(A) is large I.e., AX = B is ill-conditioned In this case, we want to improve convergence rate by pre- conditioning Slide 4 Pre-Conditioning Key idea Convert AX = B to another equivalent equation ÃX̃ = B̃ Solve ÃX̃ = B̃ by conjugate gradient method Important constraints to construct ÃX̃ = B̃ à is symmetric and positive definite – so that we can solve it by conjugate gradient method à has a small condition number – so that we can achieve fast convergence Slide 5 Pre-Conditioning AX = B −1 −1 L A⋅ X = L B L−1 AL−T ⋅ LT X = L−1B A X B ̃ ̃ ̃ L−1AL−T is symmetric and positive definite, if A is symmetric and positive definite T (L−1 AL−T ) = L−1 AL−T T X T L−1 AL−T X = (L−T X ) ⋅ A⋅(L−T X )> 0 Slide
    [Show full text]
  • Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space
    Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space Matthew C. Fontaine Julian Togelius Viterbi School of Engineering Tandon School of Engineering University of Southern California New York University Los Angeles, CA New York City, NY [email protected] [email protected] Stefanos Nikolaidis Amy K. Hoover Viterbi School of Engineering Ying Wu College of Computing University of Southern California New Jersey Institute of Technology Los Angeles, CA Newark, NJ [email protected] [email protected] ABSTRACT We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diver- sity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single- objective continuous domains. This paper proposes a new QD algo- rithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Figure 1: Comparing Hearthstone Archives. Sample archives Our new algorithm combines the self-adaptation techniques of for both MAP-Elites and CMA-ME from the Hearthstone ex- CMA-ES with archiving and mapping techniques for maintaining periment. Our new method, CMA-ME, both fills more cells diversity in QD. Results from experiments based on standard con- in behavior space and finds higher quality policies to play tinuous optimization benchmarks show that CMA-ME finds better- Hearthstone than MAP-Elites. Each grid cell is an elite (high quality solutions than MAP-Elites; similarly, results on the strategic performing policy) and the intensity value represent the game Hearthstone show that CMA-ME finds both a higher overall win rate across 200 games against difficult opponents.
    [Show full text]
  • AI, Robots, and Swarms: Issues, Questions, and Recommended Studies
    AI, Robots, and Swarms Issues, Questions, and Recommended Studies Andrew Ilachinski January 2017 Approved for Public Release; Distribution Unlimited. This document contains the best opinion of CNA at the time of issue. It does not necessarily represent the opinion of the sponsor. Distribution Approved for Public Release; Distribution Unlimited. Specific authority: N00014-11-D-0323. Copies of this document can be obtained through the Defense Technical Information Center at www.dtic.mil or contact CNA Document Control and Distribution Section at 703-824-2123. Photography Credits: http://www.darpa.mil/DDM_Gallery/Small_Gremlins_Web.jpg; http://4810-presscdn-0-38.pagely.netdna-cdn.com/wp-content/uploads/2015/01/ Robotics.jpg; http://i.kinja-img.com/gawker-edia/image/upload/18kxb5jw3e01ujpg.jpg Approved by: January 2017 Dr. David A. Broyles Special Activities and Innovation Operations Evaluation Group Copyright © 2017 CNA Abstract The military is on the cusp of a major technological revolution, in which warfare is conducted by unmanned and increasingly autonomous weapon systems. However, unlike the last “sea change,” during the Cold War, when advanced technologies were developed primarily by the Department of Defense (DoD), the key technology enablers today are being developed mostly in the commercial world. This study looks at the state-of-the-art of AI, machine-learning, and robot technologies, and their potential future military implications for autonomous (and semi-autonomous) weapon systems. While no one can predict how AI will evolve or predict its impact on the development of military autonomous systems, it is possible to anticipate many of the conceptual, technical, and operational challenges that DoD will face as it increasingly turns to AI-based technologies.
    [Show full text]
  • Hybrid Algorithms for Efficient Cholesky Decomposition And
    Hybrid Algorithms for Efficient Cholesky Decomposition and Matrix Inverse using Multicore CPUs with GPU Accelerators Gary Macindoe A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of UCL. 2013 ii I, Gary Macindoe, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Signature : Abstract The use of linear algebra routines is fundamental to many areas of computational science, yet their implementation in software still forms the main computational bottleneck in many widely used algorithms. In machine learning and computational statistics, for example, the use of Gaussian distributions is ubiquitous, and routines for calculating the Cholesky decomposition, matrix inverse and matrix determinant must often be called many thousands of times for com- mon algorithms, such as Markov chain Monte Carlo. These linear algebra routines consume most of the total computational time of a wide range of statistical methods, and any improve- ments in this area will therefore greatly increase the overall efficiency of algorithms used in many scientific application areas. The importance of linear algebra algorithms is clear from the substantial effort that has been invested over the last 25 years in producing low-level software libraries such as LAPACK, which generally optimise these linear algebra routines by breaking up a large problem into smaller problems that may be computed independently. The performance of such libraries is however strongly dependent on the specific hardware available. LAPACK was originally de- veloped for single core processors with a memory hierarchy, whereas modern day computers often consist of mixed architectures, with large numbers of parallel cores and graphics process- ing units (GPU) being used alongside traditional CPUs.
    [Show full text]
  • Long Term Memory Assistance for Evolutionary Algorithms
    mathematics Article Long Term Memory Assistance for Evolutionary Algorithms Matej Crepinšekˇ 1,* , Shih-Hsi Liu 2 , Marjan Mernik 1 and Miha Ravber 1 1 Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia; [email protected] (M.M.); [email protected] (M.R.) 2 Department of Computer Science, California State University Fresno, Fresno, CA 93740, USA; [email protected] * Correspondence: [email protected] Received: 7 September 2019; Accepted: 12 November 2019; Published: 18 November 2019 Abstract: Short term memory that records the current population has been an inherent component of Evolutionary Algorithms (EAs). As hardware technologies advance currently, inexpensive memory with massive capacities could become a performance boost to EAs. This paper introduces a Long Term Memory Assistance (LTMA) that records the entire search history of an evolutionary process. With LTMA, individuals already visited (i.e., duplicate solutions) do not need to be re-evaluated, and thus, resources originally designated to fitness evaluations could be reallocated to continue search space exploration or exploitation. Three sets of experiments were conducted to prove the superiority of LTMA. In the first experiment, it was shown that LTMA recorded at least 50% more duplicate individuals than a short term memory. In the second experiment, ABC and jDElscop were applied to the CEC-2015 benchmark functions. By avoiding fitness re-evaluation, LTMA improved execution time of the most time consuming problems F03 and F05 between 7% and 28% and 7% and 16%, respectively. In the third experiment, a hard real-world problem for determining soil models’ parameters, LTMA improved execution time between 26% and 69%.
    [Show full text]
  • Matrix Inversion Using Cholesky Decomposition
    Matrix Inversion Using Cholesky Decomposition Aravindh Krishnamoorthy, Deepak Menon ST-Ericsson India Private Limited, Bangalore [email protected], [email protected] Abstract—In this paper we present a method for matrix inversion Upper triangular elements, i.e. : based on Cholesky decomposition with reduced number of operations by avoiding computation of intermediate results; further, we use fixed point simulations to compare the numerical ( ∑ ) accuracy of the method. … (4) Keywords-matrix, inversion, Cholesky, LDL. Note that since older values of aii aren’t required for computing newer elements, they may be overwritten by the I. INTRODUCTION value of rii, hence, the algorithm may be performed in-place Matrix inversion techniques based on Cholesky using the same memory for matrices A and R. decomposition and the related LDL decomposition are efficient Cholesky decomposition is of order and requires techniques widely used for inversion of positive- operations. Matrix inversion based on Cholesky definite/symmetric matrices across multiple fields. decomposition is numerically stable for well conditioned Existing matrix inversion algorithms based on Cholesky matrices. decomposition use either equation solving [3] or triangular matrix operations [4] with most efficient implementation If , with is the linear system with requiring operations. variables, and satisfies the requirement for Cholesky decomposition, we can rewrite the linear system as In this paper we propose an inversion algorithm which … (5) reduces the number of operations by 16-17% compared to the existing algorithms by avoiding computation of some known By letting , we have intermediate results. … (6) In section 2 of this paper we review the Cholesky and LDL decomposition techniques, and discuss solutions to linear and systems based on them.
    [Show full text]
  • AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization
    AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric Positive-Definite Matrices n×n Symmetric matrix A 2 R is symmetric positive definite (SPD) if T n x Ax > 0 for x 2 R nf0g n×n Hermitian matrix A 2 C is Hermitian positive definite (HPD) if ∗ n x Ax > 0 for x 2 C nf0g SPD matrices have positive real eigenvalues and orthogonal eigenvectors Note: Most textbooks only talk about SPD or HPD matrices, but a positive-definite matrix does not need to be symmetric or Hermitian! A real matrix A is positive definite iff A + AT is SPD T n If x Ax ≥ 0 for x 2 R nf0g, then A is said to be positive semidefinite Xiangmin Jiao Numerical Analysis I 2 / 11 Properties of Symmetric Positive-Definite Matrices SPD matrix often arises as Hessian matrix of some convex functional I E.g., least squares problems; partial differential equations If A is SPD, then A is nonsingular Let X be any n × m matrix with full rank and n ≥ m. Then T I X X is symmetric positive definite, and T I XX is symmetric positive semidefinite n×m If A is n × n SPD and X 2 R has full rank and n ≥ m, then X T AX is SPD Any principal submatrix (picking some rows and corresponding columns) of A is SPD and aii > 0 Xiangmin Jiao Numerical Analysis I 3 / 11 Cholesky Factorization If A is symmetric positive definite, then there is factorization of A A = RT R where R is upper triangular, and all its diagonal entries are positive Note: Textbook calls is “Cholesky
    [Show full text]
  • 1. Positive Definite Matrices a Matrix a Is Positive Definite If X>Ax > 0 for All Nonzero X
    CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 GENE H. GOLUB 1. Positive Definite Matrices A matrix A is positive definite if x>Ax > 0 for all nonzero x. A positive definite matrix has real and positive eigenvalues, and its leading principal submatrices all have positive determinants. From the definition, it is easy to see that all diagonal elements are positive. To solve the system Ax = b where A is positive definite, we can compute the Cholesky decom- position A = F >F where F is upper triangular. This decomposition exists if and only if A is symmetric and positive definite. In fact, attempting to compute the Cholesky decomposition of A is an efficient method for checking whether A is symmetric positive definite. It is important to distinguish the Cholesky decomposition from the square root factorization.A square root of a matrix A is defined as a matrix S such that S2 = SS = A. Note that the matrix F in A = F >F is not the square root of A, since it does not hold that F 2 = A unless A is a diagonal matrix. The square root of a symmetric positive definite A can be computed by using the fact that A has an eigendecomposition A = UΛU > where Λ is a diagonal matrix whose diagonal elements are the positive eigenvalues of A and U is an orthogonal matrix whose columns are the eigenvectors of A. It follows that A = UΛU > = (UΛ1/2U >)(UΛ1/2U >) = SS and so S = UΛ1/2U > is a square root of A. 2. The Cholesky Decomposition The Cholesky decomposition can be computed directly from the matrix equation A = F >F .
    [Show full text]
  • Cholesky Decomposition 1 Cholesky Decomposition
    Cholesky decomposition 1 Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky triangle is a decomposition of a symmetric, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. It was discovered by André-Louis Cholesky for real matrices and is an example of a square root of a matrix. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.[1] Statement If A has real entries and is symmetric (or more generally, is Hermitian) and positive definite, then A can be decomposed as where L is a lower triangular matrix with strictly positive diagonal entries, and L* denotes the conjugate transpose of L. This is the Cholesky decomposition. The Cholesky decomposition is unique: given a Hermitian, positive-definite matrix A, there is only one lower triangular matrix L with strictly positive diagonal entries such that A = LL*. The converse holds trivially: if A can be written as LL* for some invertible L, lower triangular or otherwise, then A is Hermitian and positive definite. The requirement that L have strictly positive diagonal entries can be dropped to extend the factorization to the positive-semidefinite case. The statement then reads: a square matrix A has a Cholesky decomposition if and only if A is Hermitian and positive semi-definite. Cholesky factorizations for positive-semidefinite matrices are not unique in general. In the special case that A is a symmetric positive-definite matrix with real entries, L can be assumed to have real entries as well.
    [Show full text]
  • The CMA Evolution Strategy: a Tutorial
    The CMA Evolution Strategy: A Tutorial Nikolaus Hansen June 28, 2011 Contents Nomenclature3 0 Preliminaries4 0.1 Eigendecomposition of a Positive Definite Matrix...............5 0.2 The Multivariate Normal Distribution.....................6 0.3 Randomized Black Box Optimization.....................7 0.4 Hessian and Covariance Matrices........................8 1 Basic Equation: Sampling8 2 Selection and Recombination: Moving the Mean9 3 Adapting the Covariance Matrix 10 3.1 Estimating the Covariance Matrix From Scratch................ 10 3.2 Rank-µ-Update................................. 12 3.3 Rank-One-Update................................ 14 3.3.1 A Different Viewpoint......................... 14 3.3.2 Cumulation: Utilizing the Evolution Path............... 16 3.4 Combining Rank-µ-Update and Cumulation.................. 17 4 Step-Size Control 17 5 Discussion 21 A Algorithm Summary: The CMA-ES 25 B Implementational Concerns 27 B.1 Multivariate normal distribution........................ 28 B.2 Strategy internal numerical effort........................ 28 B.3 Termination criteria............................... 28 B.4 Flat fitness.................................... 29 B.5 Boundaries and Constraints........................... 29 C MATLAB Source Code 31 1 D Reformulation of Learning Parameter ccov 33 2 Nomenclature We adopt the usual vector notation, where bold letters, v, are column vectors, capital bold letters, A, are matrices, and a transpose is denoted by vT. A list of used abbreviations and symbols is given in alphabetical order. Abbreviations CMA Covariance Matrix Adaptation EMNA Estimation of Multivariate Normal Algorithm ES Evolution Strategy (µ/µfI;Wg; λ)-ES, Evolution Strategy with µ parents, with recombination of all µ parents, either Intermediate or Weighted, and λ offspring. RHS Right Hand Side. Greek symbols λ ≥ 2, population size, sample size, number of offspring, see (5).
    [Show full text]
  • A Covariance Matrix Self-Adaptation Evolution Strategy for Optimization Under Linear Constraints Patrick Spettel, Hans-Georg Beyer, and Michael Hellwig
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. X, MONTH XXXX 1 A Covariance Matrix Self-Adaptation Evolution Strategy for Optimization under Linear Constraints Patrick Spettel, Hans-Georg Beyer, and Michael Hellwig Abstract—This paper addresses the development of a co- functions cannot be expressed in terms of (exact) mathematical variance matrix self-adaptation evolution strategy (CMSA-ES) expressions. Moreover, if that information is incomplete or if for solving optimization problems with linear constraints. The that information is hidden in a black-box, EAs are a good proposed algorithm is referred to as Linear Constraint CMSA- ES (lcCMSA-ES). It uses a specially built mutation operator choice as well. Such methods are commonly referred to as together with repair by projection to satisfy the constraints. The direct search, derivative-free, or zeroth-order methods [15], lcCMSA-ES evolves itself on a linear manifold defined by the [16], [17], [18]. In fact, the unconstrained case has been constraints. The objective function is only evaluated at feasible studied well. In addition, there is a wealth of proposals in the search points (interior point method). This is a property often field of Evolutionary Computation dealing with constraints in required in application domains such as simulation optimization and finite element methods. The algorithm is tested on a variety real-parameter optimization, see e.g. [19]. This field is mainly of different test problems revealing considerable results. dominated by Particle Swarm Optimization (PSO) algorithms and Differential Evolution (DE) [20], [21], [22]. For the case Index Terms—Constrained Optimization, Covariance Matrix Self-Adaptation Evolution Strategy, Black-Box Optimization of constrained discrete optimization, it has been shown that Benchmarking, Interior Point Optimization Method turning constrained optimization problems into multi-objective optimization problems can achieve better performance than I.
    [Show full text]
  • Natural Evolution Strategies
    Natural Evolution Strategies Daan Wierstra, Tom Schaul, Jan Peters and Juergen Schmidhuber Abstract— This paper presents Natural Evolution Strategies be crucial to find the right domain-specific trade-off on issues (NES), a novel algorithm for performing real-valued ‘black such as convergence speed, expected quality of the solutions box’ function optimization: optimizing an unknown objective found and the algorithm’s sensitivity to local suboptima on function where algorithm-selected function measurements con- stitute the only information accessible to the method. Natu- the fitness landscape. ral Evolution Strategies search the fitness landscape using a A variety of algorithms has been developed within this multivariate normal distribution with a self-adapting mutation framework, including methods such as Simulated Anneal- matrix to generate correlated mutations in promising regions. ing [5], Simultaneous Perturbation Stochastic Optimiza- NES shares this property with Covariance Matrix Adaption tion [6], simple Hill Climbing, Particle Swarm Optimiza- (CMA), an Evolution Strategy (ES) which has been shown to perform well on a variety of high-precision optimization tion [7] and the class of Evolutionary Algorithms, of which tasks. The Natural Evolution Strategies algorithm, however, is Evolution Strategies (ES) [8], [9], [10] and in particular its simpler, less ad-hoc and more principled. Self-adaptation of the Covariance Matrix Adaption (CMA) instantiation [11] are of mutation matrix is derived using a Monte Carlo estimate of the great interest to us. natural gradient towards better expected fitness. By following Evolution Strategies, so named because of their inspira- the natural gradient instead of the ‘vanilla’ gradient, we can ensure efficient update steps while preventing early convergence tion from natural Darwinian evolution, generally produce due to overly greedy updates, resulting in reduced sensitivity consecutive generations of samples.
    [Show full text]