Gaussian Adaptation As a Unifying Framework for Continuous Black-Box Optimization and Adaptive Monte Carlo Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2010 Gaussian Adaptation as a unifying framework for continuous black-box optimization and adaptive Monte Carlo sampling Müller, Christian L ; Sbalzarini, Ivo F Abstract: We present a unifying framework for continuous optimization and sampling. This framework is based on Gaussian Adaptation (GaA), a search heuristic developed in the late 1960’s. It is a maximum- entropy method that shares several features with the (1+1)-variant of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). The algorithm samples single candidate solutions from a multivariate normal distribution and continuously adapts the first and second moments. We present modifications that turn the algorithm into both a robust continuous black-box optimizer and, alternatively, an adaptive Random Walk Monte Carlo sampler. In black-box optimization, sample-point selection is controlled by a monotonically decreasing, fitness-dependent acceptance threshold. We provide general strategy parameter settings, stopping criteria, and restart mechanisms that render GaA quasi parameter free. We also introduce Metropolis GaA (M-GaA), where sample-point selection is based on the Metropolis acceptance criterion. This turns GaA into a Monte Carlo sampler that is conceptually similar to the seminal Adaptive Proposal (AP) algorithm. We evaluate the performance of Restart GaA on the CEC 2005 benchmark suite. Moreover, we compare the efficacy of M-GaA to that of the Metropolis-Hastings and AP algorithms on selected target distributions. DOI: https://doi.org/10.1109/CEC.2010.5586491 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-79215 Conference or Workshop Item Originally published at: Müller, Christian L; Sbalzarini, Ivo F (2010). Gaussian Adaptation as a unifying framework for continuous black-box optimization and adaptive Monte Carlo sampling. In: 2010 IEEE Congress on Evolutionary Computation (CEC), Barcelona, Spain, 18 July 2010 - 23 July 2010, 1-8. DOI: https://doi.org/10.1109/CEC.2010.5586491 Gaussian Adaptation as a unifying framework for continuous black-box optimization and adaptive Monte Carlo sampling Christian L. Muller,¨ Ivo F. Sbalzarini Institute of Theoretical Computer Science and Swiss Institute of Bioinformatics ETH Zurich, CH-8092 Z¨urich, Switzerland Abstract— We present a unifying framework for continu- For both problem classes, Monte Carlo methods have ous optimization and sampling. This framework is based on become the prevalent computational paradigm. They rely on Gaussian Adaptation (GaA), a search heuristic developed in iterative random sampling in order to approximate the desired the late 1960’s. It is a maximum-entropy method that shares several features with the (1+1)-variant of the Covariance Ma- result. A crucial design decision is how the random samples trix Adaptation Evolution Strategy (CMA-ES). The algorithm are generated. In continuous spaces, multivariate Gaussian samples single candidate solutions from a multivariate normal distributions are the standard choice. Several continuous distribution and continuously adapts the first and second black-box optimization methods, such as Simulated Anneal- moments. We present modifications that turn the algorithm into ing (SA) in general state spaces [1], Gaussian Adaptation both a robust continuous black-box optimizer and, alternatively, an adaptive Random Walk Monte Carlo sampler. In black-box (GaA) [2], and Evolution Strategies (ES) use Gaussian sam- optimization, sample-point selection is controlled by a monoton- pling to generate candidate solutions. For indirect sampling, ically decreasing, fitness-dependent acceptance threshold. We Green and Han [3] were among the first to employ Gaussian provide general strategy parameter settings, stopping criteria, distributions. In order to sample from a specific target and restart mechanisms that render GaA quasi parameter free. distribution, their algorithm draws random variates from a We also introduce Metropolis GaA (M-GaA), where sample- point selection is based on the Metropolis acceptance criterion. Gaussian distribution and evaluates the target distribution at This turns GaA into a Monte Carlo sampler that is conceptually these sample points. A specific acceptance-rejection scheme, similar to the seminal Adaptive Proposal (AP) algorithm. We proposed by Metropolis et al. [4], guarantees that the process evaluate the performance of Restart GaA on the CEC 2005 follows the desired target distribution. Methods of this type benchmark suite. Moreover, we compare the efficacy of M- are generally referred to as Markov Chain Monte Carlo GaA to that of the Metropolis-Hastings and AP algorithms on selected target distributions. (MCMC) methods. Both the first ES, Rechenberg’s (1+1)-ES, and the standard I. INTRODUCTION Random Walk Metropolis sampling algorithm [5] use single samples from an isotropic multivariate Gaussian distribu- A large class of problems in science and engineering tion. More recent algorithms constantly adapt the covariance can be formulated as global optimization problems or as matrix of the sampling distribution according to previously sampling problems. Global optimization is concerned with accepted samples. This includes optimization algorithms finding a single or a set of optimal solutions for a given such as Hansen’s ES with Covariance Matrix Adaptation problem specification. Sampling consists of correctly draw- (CMA-ES) [6] and Kjellstr¨om’s GaA algorithm [2], [7]. An ing random samples from a given probability distribution. important conceptual difference between CMA-ES and GaA In many cases, optimization and sampling algorithms have is the purpose of covariance adaptation: While CMA-ES is to operate in a black-box scenario, where only zeroth-order designed to increase the likelihood of generating successful information about the objective function or the target prob- mutations, GaA adapts the covariance such as to maximize ability distribution is available. In black-box optimization, the entropy of the search distribution under the constraint only objective function values can be obtained. Analytical that acceptable search points are found with a predefined, gradients or Hessians are not available, or do not exist. fixed hitting probability. Covariance matrix adaptation is Many practical applications, including parameter estimation also used in indirect sampling. Haario et al. [8] remedied in electrical or biological networks, are of this kind. Indirect the well-known inefficiency of the Metropolis algorithm on (or black-box) sampling is used when the target probability high-dimensional and/or highly distorted target distributions distribution is not explicitly known, or is known only up to by continuously adapting the Gaussian proposal distribution. a normalizing constant. This is often the case in Bayesian They thus introduced the seminal Adaptive Proposal (AP) statistics and in statistical physics, where the unknown nor- algorithm [8] based on covariance matrix adaptation. The malization constant is given by the partition function of the AP algorithm has been empirically shown to outperform state space. the classical Metropolis algorithm, yet at the expense of sacrificing rigorous convergence proofs for general target C. L. M¨uller and I. F. Sbalzarini are with the Institute of Theoretical distributions. Computer Science and the Swiss Institute of Bioinformatics, ETH Zurich, CH–8092 Z¨urich, Switzerland (phone: +41-44-6325512, +41-44-6326344; Here, we present a unifying formulation for continuous fax: +41-44-6321562; e-mail: [email protected], [email protected]). black-box optimization and adaptive Monte Carlo sampling based on GaA. We first revisit the key concepts of GaA and As Eq. 1 shows, this can be achieved by maximizing the its relation to ES. We suggest general parameter settings, determinant of the covariance matrix. In order to minimize convergence criteria, and a restart mechanism for GaA. The a real-valued objective function f(x), GaA uses a fitness- resulting Restart GaA is a quasi parameter-free off-the-shelf dependent acceptance threshold cT that is monotonically black-box optimizer. We benchmark Restart GaA on the lowered until some convergence criteria are met. full set of the IEEE CEC 2005 test suite, and we provide 1) The GaA algorithm: The GaA algorithm starts by guidelines when to use Restart GaA in practice. We then setting the mean m(0) of a multivariate Gaussian to an initial introduce Metropolis’ acceptance-rejection scheme as selec- point x(0) . The covariance matrix C(g) is decomposed tion mechanism in GaA and show that this modification turns as follows:∈ A T T GaA into a Metropolis algorithm with adaptive proposal (M- (g) (g) (g) 2 (g) (g) GaA). We highlight the similarities and differences between C = r Q r Q = r Q Q , · · M-GaA and the AP algorithm, and we assess the performance (2) (g) of M-GaA on benchmark target distributions. where r is the scalar step size of the algorithm and Q is the normalized square root of C(g). Like in CMA-ES, Q(g) is II. GAUSSIAN ADAPTATION found by Cholesky or eigendecomposition of the covariance (g) (0) We summarize the key concepts of the canonical GaA matrix C . The initial Q is set to the identity matrix I. algorithm as developed by Kjellstr¨om and co-workers. We The point at iteration g +1 is then sampled as: then propose a standard parametrization, constraint handling, x(g+1) = m(g) + r(g)Q(g)η(g) , (3) convergence criteria, and a restart strategy, resulting in the (g) Restart GaA algorithm. We further introduce M-GaA as an where η (0, I). The objective function is then ∼ N (g+1) adaptive sampling algorithm based on GaA. evaluated at the position of the new sample, f(x ). Only (g+1) (g) if this fulfills f(x ) < cT , the following adaptation A. Canonical Gaussian Adaptation rules are applied: The step size r is increased according to: GaA has been developed in the context of analog circuit (g+1) (g) r = fe r , (4) design. There, one key objective is to find optimal values for · certain design parameters x Rn, e.g.