Deriving and Improving CMA-ES with Information Geometric Trust Regions
Total Page:16
File Type:pdf, Size:1020Kb
Deriving and Improving CMA-ES with Information Geometric Trust Regions Abbas Abdolmaleki Bob Price Nuno Lau PARC, Palo Alto Research Center PARC, Palo Alto Research Center DETI, IEETA, University of Aveiro IEETA, University of Aveiro [email protected] [email protected] LIACC, University of Porto [email protected] Luis Paulo Reis Gerhard Neumann DSI, University of Minho CLAS, TU Darmstadt, Darmstadt LIACC, University of Porto University of Lincoln [email protected] [email protected] ABSTRACT don’t require gradients or higher derivatives of the fitness function. CMA-ES is one of the most popular stochastic search algorithms. Stochastic search algorithms typically maintain a search distribution It performs favourably in many tasks without the need of extensive over individuals or candidate solutions, which is typically a Gaussian parameter tuning. The algorithm has many beneficial properties, distribution. This search distribution is used to generate a population including automatic step-size adaptation, efficient covariance up- of individuals which are evaluated by their corresponding fitness val- dates that incorporates the current samples as well as the evolution ues. Subsequently, a new search distribution is computed by either path and its invariance properties. Its update rules are composed computing gradient based updates [19], expectation-maximisation- of well established heuristics where the theoretical foundations of based updates [7, 12], evolutionary strategies [10], the cross-entropy some of these rules are also well understood. In this paper we method [14] or information-theoretic policy updates [1], such that will fully derive all CMA-ES update rules within the framework of the individuals with higher fitness will have better selection proba- expectation-maximisation-based stochastic search algorithms using bility. The Covariance Matrix Adaptation - Evolutionary Strategy information-geometric trust regions. We show that the use of the trust (CMA-ES) is one of the most popular stochastic search algorithms region results in similar updates to CMA-ES for the mean and the [10]. It performs well in many tasks and does not require extensive covariance matrix while it allows for the derivation of an improved parameter tuning. There are three ingredients for the success of update rule for the step-size. Our new algorithm, Trust-Region Co- CMA-ES. Firstly, the covariance matrix update efficiently combines variance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is the old covariance matrix with the sample covariance. Secondly, the fully derived from first order optimization principles and performs evolution path, which stores the average update direction is also in- favourably in compare to standard CMA-ES algorithm. corporated in the covariance matrix to shape future update directions. Lastly, the step-size adaptation of CMA-ES scales the distribution KEYWORDS efficiently, increasing it when subsequent updates are correlated while decreasing it otherwise. All these update rules are well es- Stochastic Search, Expectation Maximisation, Trust Regions tablished heuristics. The foundations of some of these heuristics ACM Reference format: are also already theoretically well understood. However, a unique Abbas Abdolmaleki, Bob Price, Nuno Lau, Luis Paulo Reis, and Gerhard mathematical framework that explains all these update rules is so Neumann. 2017. Deriving and Improving CMA-ES far still missing. In contrast, expectation maximisation-based algo- with Information Geometric Trust Regions . In Proceedings of the Genetic rithms [7, 12, 14] (Section 2.2) optimize a clearly defined objective, and Evolutionary Computation Conference 2017, Berlin, Germany, July i.e., the maximization of a lower-bound. The maximisation of lower 15–19, 2017 (GECCO ’17), 8 pages. DOI: 10.475/123 4 bound in each iteration is equivalent to weighted maximum likeli- hood estimation (MLE) of the distribution. It is well-known that ML estimation of the covariance matrix yields a degenerate, over-fitted 1 INTRODUCTION distribution [2] which typically results in premature convergence Stochastic search algorithms [10, 19] are black box optimizers of a of the algorithm. We extend the EM-based framework by incor- fitness function that is either unknown or too complex to be mod- porating information geometric trust regions. Using trust regions elled explicitly. These algorithms only use the fitness values and will restrict the change of the search distribution in each update. We implement such trust regions by bounding the Kullback-Leibler Permission to make digital or hard copies of part or all of this work for personal or (KL) divergence1 of the search distribution (Section 3). Using these classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation trust regions, we can recover exactly the same form of covariance on the first page. Copyrights for third-party components of this work must be honored. update as in CMA-ES. And the mean update of CMA-ES is a de- For all other uses, contact the owner/author(s). generate case of the trust region update. Furthermore, we can derive GECCO ’17, Berlin, Germany © 2017 Copyright held by the owner/author(s). 123-4567-24-567/08/06. $15.00 DOI: 10.475/123 4 1For simplicity we use ’KL’ and ’KL divergence’ interchangeably GECCO ’17, July 15–19, 2017, Berlin, GermanyAbbas Abdolmaleki, Bob Price, Nuno Lau, Luis Paulo Reis, and Gerhard Neumann an improved step-size update rule that is based on the same theoret- seek to find a distribution over individuals x, denoted π (x;θ ), that ical foundation as the remaining update rules. Our new algorithm, minimises the expected fitness Trust-Region Covariance Matrix Adaptation Evolution Strategy (TR- Z CMA-ES) works favourably in compare to CMA-ES in all tested E[f (x)] = π (x;θ )f (x)dx: (1) scenarios, which includes typical standard benchmark functions and x complex policy optimization tasks from robotics. At each new iteration t + 1, N individuals x are drawn from the current Gaussian distribution π (x;θt ), with θt = fµt ;σt ;Σt g. These 1.1 Related Work individuals are evaluated and, with their fitness values, construct a dataset fx ; f (x )g that is subsequently used to compute a We will review CMA-ES and Expectation-Maximisation-based algo- i i i=1:::N new Gaussian search distribution π , with θ = fµ ;σ ;Σ g, rithms in the preliminaries section. In this section, we will briefly θ t +1 t+1 t+1 t+1 such that better individuals will have higher selection probability. review other existing stochastic search algorithms. The Natural Evolution Strategy (NES) uses the natural gradient 2.1 Covariance Matrix Adaptation - Evolutionary to optimize the expected fitness value under the search distribution [19]. The natural gradient has been shown to outperform the stan- Strategy dard gradient in many applications in machine learning [6]. The As CMA-ES will be our main baseline for comparisons, in this intuition of the natural gradient is that we want to obtain an update section we briefly explain the update rules of the CMA-ES algorithm vector of the parameters of the search distribution that optimises the to obtain a new search distribution. value of expected fitness while the KL-divergence between new and Fitness Transformation. CMA-ES uses a rank preserving trans- current search distributions is bounded. To obtain this update vector, formation of fitness values which makes it invariant under monotone a second order approximation of the KL, which is equivalent to the transformations of the fitness function [9]. In particular, CMA-ES Fisher information matrix, is used. The resulting natural gradient is gives zero weights to the worse half of the population and the weight obtained by multiplying the standard gradient with the inverse of the of the jth best individuals is proportional to Fisher matrix. Yet, in contrast to our algorithm, NES family algo- wj / ln(N =2 + 0:5) − ln(j): (2) rithms do not exactly enforce a desired bound of the KL-divergence but use a constant learning rate [19]. We will use the same weighting method in our algorithm. The use of trust regions is a common approach in policy search Mean Update Rule. Using the weights, the next distribution to obtain a stable policy update and to avoid premature convergence mean µt+1 is set to the weighted sum of the individuals, i.e, [1, 17, 18]. The model-based relative entropy stochastic search al- PN =2 N =2 wix X gorithm (MORE)[1] uses a local quadratic surrogate to update it’s µ = i=1 ; Z = w : t+1 Z i Gaussian search distribution. As these surrogates can be inaccurate, i=1 MORE doesn’t exploit the surrogate greedily, but uses a trust region Evolution Path. The evolution path records the sum of consec- in form of a KL-bound. This trust region defines how much the utive update steps, i.e, µt+1 − µt . If consecutive update steps are algorithm can exploit the quadratic model. Moreover, MORE explic- towards the same direction, i.e., they are correlated, their updates will itly bounds the entropy loss to avoid premature convergence. This sum up. In contrast, if they are decorrelated, the update directions method has been shown very competitive when the hyper parameters cancel each other out. Using the information from the evolution path are set correctly. Similar trust regions have been used in other policy leads to significant improvements in terms of convergence speed, as search methods such as Relative Entropy Policy Search [17]. it enables the algorithm to exploit correlations between consecutive Similar to these methods, our framework also uses a KL bound to steps. define a trust region. However, in difference to all these policy search Covariance Matrix Update Rule. The covariance matrix update methods, firstly we use this bound in an EM-based framework and is computed using the local information about the fitness function secondly we use the reverse KL bound.