Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space
Total Page:16
File Type:pdf, Size:1020Kb
Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space Matthew C. Fontaine Julian Togelius Viterbi School of Engineering Tandon School of Engineering University of Southern California New York University Los Angeles, CA New York City, NY [email protected] [email protected] Stefanos Nikolaidis Amy K. Hoover Viterbi School of Engineering Ying Wu College of Computing University of Southern California New Jersey Institute of Technology Los Angeles, CA Newark, NJ [email protected] [email protected] ABSTRACT We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diver- sity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single- objective continuous domains. This paper proposes a new QD algo- rithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Figure 1: Comparing Hearthstone Archives. Sample archives Our new algorithm combines the self-adaptation techniques of for both MAP-Elites and CMA-ME from the Hearthstone ex- CMA-ES with archiving and mapping techniques for maintaining periment. Our new method, CMA-ME, both fills more cells diversity in QD. Results from experiments based on standard con- in behavior space and finds higher quality policies to play tinuous optimization benchmarks show that CMA-ME finds better- Hearthstone than MAP-Elites. Each grid cell is an elite (high quality solutions than MAP-Elites; similarly, results on the strategic performing policy) and the intensity value represent the game Hearthstone show that CMA-ME finds both a higher overall win rate across 200 games against difficult opponents. quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These re- ACM Reference Format: sults suggest that QD algorithms augmented by operators from Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K. Hoover. 2020. Covariance Matrix Adaptation for the Rapid Illumination state-of-the-art optimization algorithms can yield high-performing of Behavior Space. In Genetic and Evolutionary Computation Conference methods for simultaneously exploring and optimizing continuous (GECCO ’20), July 8–12, 2020, CancÞn, Mexico. ACM, New York, NY, USA, search spaces, with significant applications to design, testing, and 9 pages. https://doi.org/10.1145/3377930.3390232 reinforcement learning among other domains. arXiv:1912.02400v2 [cs.LG] 7 May 2020 1 INTRODUCTION KEYWORDS We focus on the challenge of finding a diverse collection of quality Quality diversity, illumination algorithms, evolutionary algorithms, solutions on complex continuous domains. Consider the example Hearthstone, optimization, MAP-Elites application of generating strategies for a turn-based strategy game (e.g., chess, Go, Hearthstone). What makes these games appealing Permission to make digital or hard copies of all or part of this work for personal or to human players is not the presence of an optimal strategy, but the classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation variety of fundamentally different viable strategies. For example, on the first page. Copyrights for components of this work owned by others than the “aggressive” strategies aim to end the game early through high-risk author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or high-reward play, while “controlling” strategies delay the end of the republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. game by postponing the targeting of the game objectives [11]. These GECCO ’20, July 8–12, 2020, CancÞn, Mexico example strategies vary in one aspect of their behavior, measured © 2020 Copyright held by the owner/author(s). Publication rights licensed to the by the number of turns the game lasts. Association for Computing Machinery. ACM ISBN 978-1-4503-7128-5/20/07...$15.00 Finding fundamentally different strategies requires navigating https://doi.org/10.1145/3377930.3390232 a continuous domain, such as the parameter space (weights) of a GECCO ’20, July 8–12, 2020, CancÞn, Mexico Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K.Hoover neural network that maps states to actions, while simultaneously ex- 2 BACKGROUND ploring a diverse range of behaviors. Quality Diversity algorithms, This section outlines previous advancements in quality diversity such as Novelty Search with Local Competition (NSLC) [34] and (QD) including one of the first QD algorithms, MAP-Elites, and back- Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) [8], ground in CMA-ES to provide context for the CMA-ME algorithm drive a divergent search for multiple good solutions, rather than proposed in this paper. a convergent search towards a single optimum, through sophisti- cated archiving and mapping techniques. Solutions are binned in 2.1 Quality Diversity (QD) an archive based on their behavior and compete only with others QD algorithms are often applied in domains where a diversity of exhibiting similar behaviors. Such stratified competition results in good but meaningfully different solutions is valued. For example QD the discovery of potentially sub-optimal solutions called stepping algorithms can build large repertoires of robot behaviors [7, 9, 10] stones, which have been shown in some domains to be critical for or a diversity of locomotive gaits to help robots quickly respond to escaping local optima [19, 33]. However, these algorithms typically damage [8]. By interacting with AI agents, QD can also produce a require a large number of evaluations to achieve good results. diversity of generated video game levels [1, 22, 31]. Meanwhile, when seeking a single global optimum, variants of While traditional evolutionary algorithms speciate based on the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [23, encoding and fitness, a key feature of the precursor to QD (e.g., 27] are among the best-performing derivative-free optimizers, with Novelty Search (NS) [32]) is speciation through behavioral diver- the capability to rapidly navigate continuous spaces. However, the sity [32]. Rather than optimizing for performance relative to a fit- self-adaptation and cumulation techniques driving CMA-ES have ness function, searching directly for behavioral diversity promotes yet to successfully power a QD algorithm. the discovery of sub-optimal solutions relative to the objective We propose a new hybrid algorithm called Covariance Matrix function. Called stepping stones, these solutions mitigate premature Adaptation MAP-Elites (CMA-ME), which rapidly navigates and convergence to local optima. To promote intra-niche competition optimizes a continuous space with CMA-ES seeded by solutions [40] objectives were reintroduced in the QD algorithms Novelty stored in the MAP-Elites archive. The hybrid algorithm employs Search with Local Competition (NSLC) [34] and MAP-Elites [8]. CMA-ES’s ability to efficiently navigate continuous search spaces by While NSLC and MAP-Elites share many key features necessary maintaining a mixture of normal distributions (candidate solutions) for maintaining a diversity of quality solutions, a core difference dynamically augmented by objective function feedback. The key between the two is whether the archive is dynamically or statically insight of CMA-ME is to leverage the selection and adaptation rules generated. NSLC dynamically creates behavior niches by growing of CMA-ES to optimize good solutions, while also efficiently exploring an archive of sufficiently novel solutions while MAP-Elites (detailed new areas of the search space. in the next section) maintains a static mapping of behavior. For Building on the underlying structure of MAP-Elites, CMA-ME CMA-ME we choose MAP-Elites as our diversity mechanism to di- maintains a population of modified CMA-ES instances called emit- rectly compare benefits inherited from CMA-ES. Though we make ters, which like CMA-ES operate based on a sampling mean, covari- this design choice for the CMA-ME algorithm solely for compara- ance matrix, and adaptation rules that specify how the covariance bility, the same principles can be applied to the NSLC archive. matrix and mean are updated for each underlying normal distribu- tion. The population of emitters can therefore be thought of as a Gaussian mixture where each normal distribution focuses on im- 2.2 MAP-Elites proving a different area of behavior space. This paper explores three While one core difference between two early QD algorithms NSLC types of candidate emitters with different selection and adaptation [34] and MAP-Elites [8] is whether behaviors are dynamically or rules for balancing quality and diversity, called the random direc- statically mapped, another is the number of behavioral dimensions tion, improvement, and optimizing emitters. Solutions generated by among which solutions typically vary. Rather than defining a sin- the emitters are saved in a single unified archive based on their