Simplify Your Cma-Es 3
Total Page:16
File Type:pdf, Size:1020Kb
ACCEPTED FOR: IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Simplify Your Covariance Matrix Adaptation Evolution Strategy Hans-Georg Beyer and Bernhard Sendhoff Senior Member, IEEE Abstract—The standard Covariance Matrix Adaptation Evo- call the algorithm’s peculiarities, such as the covariance matrix lution Strategy (CMA-ES) comprises two evolution paths, one adaptation, the evolution path and the cumulative step-size for the learning of the mutation strength and one for the rank- adaptation (CSA) into question. Notably in [4] a first attempt 1 update of the covariance matrix. In this paper it is shown that one can approximately transform this algorithm in such has been made to replace the CSA mutation strength control a manner that one of the evolution paths and the covariance by the classical mutative self-adaptation. While getting rid of matrix itself disappear. That is, the covariance update and the any kind of evolution path statistics and thus yielding a much covariance matrix square root operations are no longer needed simpler strategy1, the resulting rank-µ update strategy, the so- in this novel so-called Matrix Adaptation (MA) ES. The MA- called Covariance Matrix Self-Adaptation CMSA (CMSA-ES) ES performs nearly as well as the original CMA-ES. This is shown by empirical investigations considering the evolution does not fully reach the performance of the original CMA- dynamics and the empirical expected runtime on a set of standard ES in the case of small population sizes. Furthermore, some test functions. Furthermore, it is shown that the MA-ES can of the reported performance advantages in [4] were due to a be used as search engine in a Bi-Population (BiPop) ES. The wrongly implemented stalling of the covariance matrix update resulting BiPop-MA-ES is benchmarked using the BBOB COCO in the CMA-ES implementation used. framework and compared with the performance of the CMA-ES- v3.61 production code. It is shown that this new BiPop-MA-ES Meanwhile, our theoretical understanding of the evolution – while algorithmically simpler – performs nearly equally well dynamics taking place in CMSA- and CMA-ES has advanced. as the CMA-ES-v3.61 code. Basically two reasons for the superior CMA-ES performance Index Terms—Matrix Adaptation Evolution Strategies, Black can be identified in the case of small population sizes: Box Optimization Benchmarking. Concentrate evolution along a predicted direction. • Using the evolution path information for the covariance matrix update (this rank-1 update was originally used I. INTRODUCTION in the first CMA-ES version [7] as the only update) HE Covariance Matrix Adaptation Evolution Strategy can be regarded as some kind of time series prediction T (CMA-ES) has received considerable attention as an of the evolution of the parent in the RN search space. algorithm for unconstrained real-parameter optimization, i.e. That is, the evolution path carries the information in solving the problem which direction the search steps were most successful. This directional information may be regarded as the yˆ = argopty RN f(y) , (1) ∈ most promising one and can be used in the CMA- since its publication in its fully developed form including ES to shrink the evolution of the covariance matrix.2 rank-µ update in [1]. While there are proposals regarding As a result, optimization in landscape topologies with modifications of the original CMA-ES such as the (1 + 1)- a predominant search direction (such as the Cigar test CMA-ES [2] and its latest extension in [19], or the active function or the Rosenbrock function) can benefit from CMA-ES [3], or the CMSA-ES [4], the design presented in [1] the path cumulation information. Increased mutation strength. has only slightly changed during the years and state-of-the-art • presentations such as [5] still rely on that basic design. Taking As has been shown by analyzing the dynamic of the CSA the advent of the so-called Natural Evolution Strategies (NES) on the ellipsoid model in [10], the path-length based mu- into account, e.g. [6], one sees that even a seemingly principled tation strength control yields larger steady state mutation design paradigm such as the information gain constrained strengths up to a factor of µ (parental population size) evolution on statistical manifolds did not cause a substantial compared to ordinary self-adaptive mutation strength change in the CMA-ES design. Actually, in order to get control (in the case of non-noisy objective functions). NES competitive, those designers had to borrow from and Due to the larger mutation strengths realized, the CSA- rely on the ideas/principles empirically found by the CMA- based approach can better take advantage of the genetic ES designers. Yet, it is somehow remarkable why the basic 1By simplicity we mean a reduced complexity of code and especially a framework of CMA-ES has not undergone a deeper scrutiny to decreased number of strategy specific parameters to be fixed by the algorithm designer. Furthermore, the remaining strategy specific parameters have been H.-G. Beyer is with the Research Center Process and Product Engineering determined rather by first principles than empirical parameter tuning studies. at the Vorarlberg University of Applied Sciences, Dornbirn, Austria, Email: 2Shrinking is a well-known technique for covariance matrix estimation in [email protected] order to control the undesired growth of the covariance matrix [8]. Considering B. Sendhoff is with the Honda Research Institute Europe GmbH, Offen- the covariance matrix update in the CMA-ES from the perspective of shrinking bach/Main, Germany, Email: [email protected] has been done first by Meyer-Nieberg and Kropat [9]. 2 ACCEPTED FOR: IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION repair effect provided by the intermediate (or weighted) (µ/µw, λ)-CMA-ES recombination. The performance advantage vanishes, however, when the pop- Initialize y(0), σ(0),g := 0, p(0) := 0, ulation size is chosen sufficiently large. However, considering (0) (0) the CMA-ES as a general purpose algorithm, the strategy s := 0, C := I (C1) should cope with both small and large population sizes. Repeat (C2) Therefore, it seems that both the rank-1 and the rank-µ update M(g) := √C(g) (C3) are needed. This still calls in question whether it is necessary For l := 1 To λ (C4) to have two evolution paths and an update of the covariance matrix at all. (g) N ˜zl := l(0, I) (C5) In this paper we will show that one can drop one of the d˜(g) := M(g)z˜(g) (C6) evolution paths and remove the covariance matrix totally, thus l l (g) (g) (g) ˜(g) removing a “p” and the “C” from the CMA-ES without sig- y˜l := y + σ dl (C7) nificantly worsening the performance of the resulting “Matrix f˜(g) := f y˜(g) (C8) Adaptation” (MA)-ES. As a byproduct of the analysis, which l l End (C9) is necessary to derive the MA-ES from the CMA-ES, one can draw a clearer picture of the CMA-ES, which is often regarded SortOffspringPopulation (C10) as a rather complicated evolutionary algorithm. y(g+1) := y(g) + σ(g) d˜(g) (C11) This work is organized as follows. First, the CMA-ES w D E algorithm is presented in a manner that allows for a simpler (g+1) (g) (g) s := (1 cs)s + µeff cs(2 cs) ˜z (C12) analysis of the pseudocode lines. As the next step, the p- − − w evolution path (realized by line C13, Fig. 1, see Section II q D E (g+1) (g) (g) p := (1 cp)p + µeff cp(2 cp) d˜ (C13) for details) will be shown to be similar to that of the s- − − w evolution path (realized by line C12, Fig. 1). Both paths q D ET C(g+1) := (1 c c )C(g) + c p(g+1) p(g+1) can be approximately transformed (asymptotically exact for − 1 − w 1 ˜(g) ˜(g) T search space dimensionality N ) into each other by a + cw d d (C14) linear transformation where the→ transformation ∞ matrix is just w c s(g+1)D E the matrix square root M of the covariance matrix C. After σ(g+1) := σ(g) exp s 1 (C15) d E[ N (0, I) ] − removing the p-path from the CMA-ES, the update of the " σ k k !# C matrix will be replaced by transforming the covariance g := g +1 (C16) learning to a direct learning of the M matrix. Thus, one Until(termination condition(s) fulfilled) (C17) needs no longer the covariance matrix and operations such as Cholesky decomposition or spectral decomposition to calculate Fig. 1. Pseudocode of the CMA-ES with rank > 1 weighted covariance the matrix square root. This simplifies the resulting ES consid- matrix update. erably both from viewpoint of the algorithmic “complexity” and the numerical algebra operations needed. Furthermore, the resulting M-update allows for a new interpretation of the performed in Fig. 1 and show that those lines are mathemati- CMA-ES working principles. However, since the derivations cally equivalent to the original CMA-ES [1]. In order to have rely on assumptions that are only asymptotically exact for a clear notion of generational changes, a generation counter search space dimensionality N , numerical experiments (g) is used even though it is not necessarily needed for the are provided to show that the novel→ ∞ MA-ES performs nearly functioning in real CMA-ES implementations. equally well as the original CMA-ES. Additionally, the CMA- A CMA-ES generation cycle (also referred to as a gen- ES and MA-ES will be used as “search engines” in the eration) is performed within the repeat-until-loop (C2–C17).