Sample Efficient Deep Neuroevolution in Low

Total Page:16

File Type:pdf, Size:1020Kb

Sample Efficient Deep Neuroevolution in Low Under review as a conference paper at ICLR 2019 SAMPLE EFFICIENT DEEP NEUROEVOLUTION IN LOW DIMENSIONAL LATENT SPACE Anonymous authors Paper under double-blind review ABSTRACT Current deep neuroevolution models are usually trained in a large parameter search space for complex learning tasks, e.g. playing video games, which needs billions of samples and thousands of search steps to obtain significant perfor- mance. This raises a question of whether we can make use of sequential data generated during evolution, encode input samples, and evolve in low dimensional parameter space with latent state input in a fast and efficient manner. Here we give an affirmative answer: we train a VAE to encode input samples, then an RNN to model environment dynamics and handle temporal information, and last evolve our low dimensional policy network in latent space. We demonstrate that this ap- proach is surprisingly efficient: our experiments on Atari games show that within 10M frames and 30 evolution steps of training, our algorithm could achieve com- petitive result compared with ES, A3C, and DQN which need billions of frames. 1 INTRODUCTION Training an agent that can handle difficult tasks in complex environment is always challenging for artificial intelligence. Most works for solving such problems are using reinforcement learning algo- rithms, e.g. learning a value function (Watkins & Dayan, 1992), or a policy function (Kakade, 2002), or both (Sutton et al., 2000). Combined with deep neural networks, the so-called deep reinforcement learning achieved great success in playing Atari games (Mnih et al., 2015), operating robots (Yahya et al., 2017), and winning human experts in challenging competitions like Go and Dota (Silver et al., 2016; OpenAI, 2018). Recently, a portion of works are trying to combine deep neural networks with evolution strategies to solve similar problems. This approach, called deep neuroevolution, achieved competitive results compared to deep reinforcement learning on Atari Games and Mujoco tasks as well in much shorter training time due to its outstanding scalability and feasibility for parallelization (Salimans et al., 2017; Such et al., 2017). However, large-scale deep neuroevolution is both data and computation inefficient: it consumes thousands of CPU cores and needs billions of frames and thousands of generations to obtain signif- icant results. In this work, we introduce a novel way to improve its efficiency: we train a variational autoencoder (VAE) to encode samples, and a recurrent neural network to model environment dy- namics and handle temporal information; afterward, we train a two layer policy network with latent state vectors using covariance matrix adaption evolution strategy (CMA-ES). We evaluate our sample efficient deep neuroevolution algorithm, or in short SEDN, on 50 Atari Games from OpenAI Gym (Brockman et al., 2016), experiment shows that our approach surpassed DQN (Mnih et al., 2013), A3C (Mnih et al., 2016), and ES (Salimans et al., 2017) methods in several Atari games within much fewer state frames, shown in Table 1. Our key findings are listed below: • Evolution goes faster with latent state input. SEDN takes less than 30 evolution steps with only 32 children to surpass ES (Salimans et al., 2017) in several games: the latter needs thousands of children and generations. • Training is more data efficient. Compared to ES and asynchronous RL methods which needs billions of frames to train, SEDN takes less than 10 million frames to achieve targets. • Training is more computation efficient. On a workstation of one Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz and one Titan X Pascal GPU, SEDN takes less than 4 hours to finish the evolution, compared to ES which needs a cluster to finish. 1 Under review as a conference paper at ICLR 2019 2 METHOD Many real world decision making problems could be modeled as Markov Decision Process (MDP). Since proposed by Bellman (1957), it has been applied in many disciplines, e.g. robotics, economics and game playing. A Markov decision process is defined by: a set of states S, a set of actions A, transition probabilities defining probability over next state given current state and action P : S × A × S ! [0; 1], a initial state distribution d : S! [0; 1], and a reward function that maps states and actions into a scalar R : S × A ! R. A finite time horizon episode τ of T steps is a sequence of state-action-reward tuples: (s0; a0; r0); (s1; a1; r1);:::; (st; at; rt);::: (sT ; ;; ;) called trajectory, where st 2 S; at 2 A; rt = R(st; at); st+1 ∼ P (st; at); s0 ∼ d0(s) where sT is called terminal state. For a finite time horizon MDP, reinforcement learning (RL) algorithms learn a parameterized policy that defining probability of actions to be taken given current states πθ : S × A ! [0; 1], and PT −1 maximize episodic reward: i=0 ri, or formally: T X maximize E R(st; at) (1) θ i=0 where s0 ∼ d(s);at ∼ πθ(st); st+1 ∼ P (st; at) (2) Above objective function takes the expectation over episodic rewards, which could be approximated by sampling N trajectories from policy π and environment dynamics P and d. We misuse notations such that r(τ) donates episodic reward and p(τ) represents trajectory probability, then we have below reformulation: N X maximize L(θ) = r(τi)p(τi) (3) θ i=1 T −1 T −1 Y X where p(τ) = d(s0) πθ(st; at)p(st; at; st+1); r(τ) = ri (4) t=0 t=0 Vanilla policy gradient (PG) methods optimize L(πθ) by taking its gradient: N 1 X r L ≈ r(τ )r log p (τ ) (5) θ N i θ θ i i=1 N T −1 1 X Xi = r(τ ) r log π (s ; a ) (6) N i θ θ t t i=1 t=0 and updates policy πθ to maximize expected episodic reward. Variants of policy gradient methods achieved great success in recent years, e.g. TRPO (Schulman et al., 2015) and PPO Schulman et al. (2017). Evolution strategies (ES), instead, take another approach to solve above optimization problem with- out any differentiation: it approximates the gradient by sampling, shown in Equation 7. 1 r L ≈ (L(θ + θ^)θ^) (7) θ σ2 Eθ^∼N (0,σ2) Vanilla ES solves MDP problems iteratively shown in Algorithm 1: firstly, it samples Gaussian noise from a normal distribution; secondly, it injects noise into current policy, and evaluates it in environment, obtaining episodic rewards; thirdly, it updates current policy parameters by the product of episode reward and noise injected. ES and PG are interpreted as two faces of Gaussian smoothing on policy: ES is on parameter space smoothing and PG is on action space smoothing (Salimans et al., 2017). 2 Under review as a conference paper at ICLR 2019 Algorithm 1 Evolution Strategies Input: Learning rate η, noise standard derivation σ, initial policy parameter θ 1: for t = 0; 1;::: do ^ 2 2: Sample N parameter noise: θi ∼ N (0; σ I) for i = 1;:::;N 3: Execute perturbated deterministic policies πθi : at = arg maxa πθi (st) to produce trajectory ^ τi, where θi = θ + θi for i = 1;:::;N 2 PN ^ 4: Update policy parameter θ θ + α=(Nσ ) i=1 r(τi)θi 5: end for ES enjoys several advantages over PG. Firstly, it is a black-box optimization method that does not need gradients. In problems where gradients are not available, or state encoding is not feasible, or gradient is blocked by some operations in neural networks, ES exhibits its wider range of appli- cations over PG. Secondly, in low dimensional optimization problems like Mujoco tasks, ES has competitive performance with PG (Salimans et al., 2017), but is less sensitive to hyperparameters (Heidrich-Meisner & Igel, 2008). Thirdly, the exploration-exploitation trade-off needs to be bal- anced in RL here is integrated: policy parameter perturbation ensures different behaviors during evaluation, whereas RL has to use a stochastic policy to explore. Lastly, ES is highly scalable therefore can be trained faster by a group of workers in parallel. However, in n-dimensional search space where n is large, the exploration directions O(2n) increases in orders with n, therefore random sampling for gradient approximation would be very inefficient. For video games like Atari, high dimensional pixel state space induces more parameters of policy to be optimized, hence less efficient for ES. Surprisingly, Salimans et al. (2017) shows that even natural evolution strategies (NES) can beat common RL algorithms in several Atari games. Among the class of evolution strategies, the covariance matrix adaption evolution strategy (Hansen & Oster- meier, 2001) is the most successful one in solving low dimensional optimization problems. Would covariance matrix adaption evolution strategy (CMA-ES) perform better or faster over NES for high dimensional tasks? To answer this question, one problem has to be solved, that the application of CMA-ES on high dimensional space is infeasible since it needs to compute a large covariance ma- trix of size O(n2). Recently advances in deep learning show promises to encode high dimensional spatial-temporal information into low dimensional latent vectors (Oh et al., 2015; Pathak et al., 2018; Ha & Schmidhuber, 2018) by modeling environment dynamics. In this case, we can convert high dimensional RL tasks into low dimensional tasks, where CMA-ES can take its advantage. 2.1 COVARIANCE MATRIX ADAPTION EVOLUTION STRATEGY CMA-ES improves ES in two ways. Firstly, vanilla ES samples noise from a fixed Gaussian distri- bution N (0; σ2I) of only one degree of freedom σ, the step size; whereas CMA-ES, samples noise from N (0; C) where covariance matrix C has n(n + 1)=2 degree of freedom.
Recommended publications
  • AI, Robots, and Swarms: Issues, Questions, and Recommended Studies
    AI, Robots, and Swarms Issues, Questions, and Recommended Studies Andrew Ilachinski January 2017 Approved for Public Release; Distribution Unlimited. This document contains the best opinion of CNA at the time of issue. It does not necessarily represent the opinion of the sponsor. Distribution Approved for Public Release; Distribution Unlimited. Specific authority: N00014-11-D-0323. Copies of this document can be obtained through the Defense Technical Information Center at www.dtic.mil or contact CNA Document Control and Distribution Section at 703-824-2123. Photography Credits: http://www.darpa.mil/DDM_Gallery/Small_Gremlins_Web.jpg; http://4810-presscdn-0-38.pagely.netdna-cdn.com/wp-content/uploads/2015/01/ Robotics.jpg; http://i.kinja-img.com/gawker-edia/image/upload/18kxb5jw3e01ujpg.jpg Approved by: January 2017 Dr. David A. Broyles Special Activities and Innovation Operations Evaluation Group Copyright © 2017 CNA Abstract The military is on the cusp of a major technological revolution, in which warfare is conducted by unmanned and increasingly autonomous weapon systems. However, unlike the last “sea change,” during the Cold War, when advanced technologies were developed primarily by the Department of Defense (DoD), the key technology enablers today are being developed mostly in the commercial world. This study looks at the state-of-the-art of AI, machine-learning, and robot technologies, and their potential future military implications for autonomous (and semi-autonomous) weapon systems. While no one can predict how AI will evolve or predict its impact on the development of military autonomous systems, it is possible to anticipate many of the conceptual, technical, and operational challenges that DoD will face as it increasingly turns to AI-based technologies.
    [Show full text]
  • A Covariance Matrix Self-Adaptation Evolution Strategy for Optimization Under Linear Constraints Patrick Spettel, Hans-Georg Beyer, and Michael Hellwig
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. X, MONTH XXXX 1 A Covariance Matrix Self-Adaptation Evolution Strategy for Optimization under Linear Constraints Patrick Spettel, Hans-Georg Beyer, and Michael Hellwig Abstract—This paper addresses the development of a co- functions cannot be expressed in terms of (exact) mathematical variance matrix self-adaptation evolution strategy (CMSA-ES) expressions. Moreover, if that information is incomplete or if for solving optimization problems with linear constraints. The that information is hidden in a black-box, EAs are a good proposed algorithm is referred to as Linear Constraint CMSA- ES (lcCMSA-ES). It uses a specially built mutation operator choice as well. Such methods are commonly referred to as together with repair by projection to satisfy the constraints. The direct search, derivative-free, or zeroth-order methods [15], lcCMSA-ES evolves itself on a linear manifold defined by the [16], [17], [18]. In fact, the unconstrained case has been constraints. The objective function is only evaluated at feasible studied well. In addition, there is a wealth of proposals in the search points (interior point method). This is a property often field of Evolutionary Computation dealing with constraints in required in application domains such as simulation optimization and finite element methods. The algorithm is tested on a variety real-parameter optimization, see e.g. [19]. This field is mainly of different test problems revealing considerable results. dominated by Particle Swarm Optimization (PSO) algorithms and Differential Evolution (DE) [20], [21], [22]. For the case Index Terms—Constrained Optimization, Covariance Matrix Self-Adaptation Evolution Strategy, Black-Box Optimization of constrained discrete optimization, it has been shown that Benchmarking, Interior Point Optimization Method turning constrained optimization problems into multi-objective optimization problems can achieve better performance than I.
    [Show full text]
  • Natural Evolution Strategies
    Natural Evolution Strategies Daan Wierstra, Tom Schaul, Jan Peters and Juergen Schmidhuber Abstract— This paper presents Natural Evolution Strategies be crucial to find the right domain-specific trade-off on issues (NES), a novel algorithm for performing real-valued ‘black such as convergence speed, expected quality of the solutions box’ function optimization: optimizing an unknown objective found and the algorithm’s sensitivity to local suboptima on function where algorithm-selected function measurements con- stitute the only information accessible to the method. Natu- the fitness landscape. ral Evolution Strategies search the fitness landscape using a A variety of algorithms has been developed within this multivariate normal distribution with a self-adapting mutation framework, including methods such as Simulated Anneal- matrix to generate correlated mutations in promising regions. ing [5], Simultaneous Perturbation Stochastic Optimiza- NES shares this property with Covariance Matrix Adaption tion [6], simple Hill Climbing, Particle Swarm Optimiza- (CMA), an Evolution Strategy (ES) which has been shown to perform well on a variety of high-precision optimization tion [7] and the class of Evolutionary Algorithms, of which tasks. The Natural Evolution Strategies algorithm, however, is Evolution Strategies (ES) [8], [9], [10] and in particular its simpler, less ad-hoc and more principled. Self-adaptation of the Covariance Matrix Adaption (CMA) instantiation [11] are of mutation matrix is derived using a Monte Carlo estimate of the great interest to us. natural gradient towards better expected fitness. By following Evolution Strategies, so named because of their inspira- the natural gradient instead of the ‘vanilla’ gradient, we can ensure efficient update steps while preventing early convergence tion from natural Darwinian evolution, generally produce due to overly greedy updates, resulting in reduced sensitivity consecutive generations of samples.
    [Show full text]
  • A Restart CMA Evolution Strategy with Increasing Population Size
    1769 A Restart CMA Evolution Strategy With Increasing Population Size Anne Auger Nikolaus Hansen CoLab Computational Laboratory, CSE Lab, ETH Zurich, Switzerland ETH Zurich, Switzerland [email protected] [email protected] Abstract- In this paper we introduce a restart-CMA- 2 The restart CMA-ES evolution strategy, where the population size is increased for each restart (IPOP). By increasing the population The (,uw, )-CMA-ES In this paper we use the (,uw, )- size the search characteristic becomes more global af- CMA-ES thoroughly described in [3]. We sum up the gen- ter each restart. The IPOP-CMA-ES is evaluated on the eral principle of the algorithm in the following and refer to test suit of 25 functions designed for the special session [3] for the details. on real-parameter optimization of CEC 2005. Its perfor- For generation g + 1, offspring are sampled indepen- mance is compared to a local restart strategy with con- dently according to a multi-variate normal distribution stant small population size. On unimodal functions the ) fork=1,... performance is similar. On multi-modal functions the -(9+1) A-()(K ( (9))2C(g)) local restart strategy significantly outperforms IPOP in where denotes a normally distributed random 4 test cases whereas IPOP performs significantly better N(mr, C) in 29 out of 60 tested cases. vector with mean m' and covariance matrix C. The , best offspring are recombined into the new mean value (Y)(0+1) = Z wix(-+w), where the positive weights 1 Introduction w E JR sum to one. The equations for updating the re- The Covariance Matrix Adaptation Evolution Strategy maining parameters of the normal distribution are given in (CMA-ES) [5, 7, 4] is an evolution strategy that adapts [3]: Eqs.
    [Show full text]
  • Neuroevolution Strategies for Episodic Reinforcement Learning ∗ Verena Heidrich-Meisner , Christian Igel
    J. Algorithms 64 (2009) 152–168 Contents lists available at ScienceDirect Journal of Algorithms Cognition, Informatics and Logic www.elsevier.com/locate/jalgor Neuroevolution strategies for episodic reinforcement learning ∗ Verena Heidrich-Meisner , Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum, 44780 Bochum, Germany article info abstract Article history: Because of their convincing performance, there is a growing interest in using evolutionary Received 30 April 2009 algorithms for reinforcement learning. We propose learning of neural network policies Available online 8 May 2009 by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variable- metric search algorithm for continuous optimization. We argue that this approach, Keywords: which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for Reinforcement learning Evolution strategy reinforcement learning, in particular because it is based on ranking policies (and therefore Covariance matrix adaptation robust against noise), efficiently detects correlations between parameters, and infers a Partially observable Markov decision process search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on Direct policy search five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance. © 2009 Elsevier Inc. All rights reserved. 1. Introduction Neuroevolution denotes the application of evolutionary algorithms to the design and learning of neural networks [54,11]. Accordingly, the term Neuroevolution Strategies (NeuroESs) refers to evolution strategies applied to neural networks. Evolution strategies are a major branch of evolutionary algorithms [35,40,8,2,7].
    [Show full text]
  • Evolutionary Computation Evolution Strategies
    Evolutionary computation Lecture Evolutionary Strategies CIS 412 Artificial Intelligence Umass, Dartmouth Evolution strategies Evolution strategies • Another approach to simulating natural • In 1963 two students of the Technical University of Berlin, Ingo Rechenberg and Hans-Paul Schwefel, evolution was proposed in Germany in the were working on the search for the optimal shapes of early 1960s. Unlike genetic algorithms, bodies in a flow. They decided to try random changes in this approach − called an evolution the parameters defining the shape following the example of natural mutation. As a result, the evolution strategy strategy − was designed to solve was born. technical optimization problems. • Evolution strategies were developed as an alternative to the engineer’s intuition. • Unlike GAs, evolution strategies use only a mutation operator. Basic evolution strategy Basic evolution strategy • In its simplest form, termed as a (1+1)- evolution • Step 2 - Randomly select an initial value for each strategy, one parent generates one offspring per parameter from the respective feasible range. The set of generation by applying normally distributed mutation. these parameters will constitute the initial population of The (1+1)-evolution strategy can be implemented as parent parameters: follows: • Step 1 - Choose the number of parameters N to represent the problem, and then determine a feasible range for each parameter: • Step 3 - Calculate the solution associated with the parent Define a standard deviation for each parameter and the parameters: function to be optimized. 1 Basic evolution strategy Basic evolution strategy • Step 4 - Create a new (offspring) parameter by adding a • Step 6 - Compare the solution associated with the normally distributed random variable a with mean zero offspring parameters with the one associated with the and pre-selected deviation δ to each parent parameter: parent parameters.
    [Show full text]
  • A Note on Evolutionary Algorithms and Its Applications
    Bhargava, S. (2013). A Note on Evolutionary Algorithms and Its Applications. Adults Learning Mathematics: An International Journal, 8(1), 31-45 A Note on Evolutionary Algorithms and Its Applications Shifali Bhargava Dept. of Mathematics, B.S.A. College, Mathura (U.P)- India. <[email protected]> Abstract This paper introduces evolutionary algorithms with its applications in multi-objective optimization. Here elitist and non-elitist multiobjective evolutionary algorithms are discussed with their advantages and disadvantages. We also discuss constrained multiobjective evolutionary algorithms and their applications in various areas. Key words: evolutionary algorithms, multi-objective optimization, pareto-optimality, elitist. Introduction The term evolutionary algorithm (EA) stands for a class of stochastic optimization methods that simulate the process of natural evolution. The origins of EAs can be traced back to the late 1950s, and since the 1970’s several evolutionary methodologies have been proposed, mainly genetic algorithms, evolutionary programming, and evolution strategies. All of these approaches operate on a set of candidate solutions. Using strong simplifications, this set is subsequently modified by the two basic principles of evolution: selection and variation. Selection represents the competition for resources among living beings. Some are better than others and more likely to survive and to reproduce their genetic information. In evolutionary algorithms, natural selection is simulated by a stochastic selection process. Each solution is given a chance to reproduce a certain number of times, dependent on their quality. Thereby, quality is assessed by evaluating the individuals and assigning them scalar fitness values. The other principle, variation, imitates natural capability of creating “new” living beings by means of recombination and mutation.
    [Show full text]
  • Integrating Heuristiclab with Compilers and Interpreters for Non-Functional Code Optimization
    Integrating HeuristicLab with Compilers and Interpreters for Non-Functional Code Optimization Daniel Dorfmeister Oliver Krauss Software Competence Center Hagenberg Johannes Kepler University Linz Hagenberg, Austria Linz, Austria [email protected] University of Applied Sciences Upper Austria Hagenberg, Austria [email protected] ABSTRACT 1 INTRODUCTION Modern compilers and interpreters provide code optimizations dur- Genetic Compiler Optimization Environment (GCE) [5, 6] integrates ing compile and run time, simplifying the development process for genetic improvement (GI) [9, 10] with execution environments, i.e., the developer and resulting in optimized software. These optimiza- the Truffle interpreter [23] and Graal compiler [13]. Truffle is a tions are often based on formal proof, or alternatively stochastic language prototyping and interpreter framework that interprets optimizations have recovery paths as backup. The Genetic Compiler abstract syntax trees (ASTs). Truffle already provides implementa- Optimization Environment (GCE) uses a novel approach, which tions of multiple programming languages such as Python, Ruby, utilizes genetic improvement to optimize the run-time performance JavaScript and C. Truffle languages are compiled, optimized and of code with stochastic machine learning techniques. executed via the Graal compiler directly in the JVM. Thus, GCE In this paper, we propose an architecture to integrate GCE, which provides an option to generate, modify and evaluate code directly directly integrates with low-level interpreters and compilers, with at the interpreter and compiler level and enables research in that HeuristicLab, a high-level optimization framework that features a area. wide range of heuristic and evolutionary algorithms, and a graph- HeuristicLab [20–22] is an optimization framework that provides ical user interface to control and monitor the machine learning a multitude of meta-heuristic algorithms, research problems from process.
    [Show full text]
  • Evolution Strategies for Deep Neural Network Models Design
    J. Hlavácovᡠ(Ed.): ITAT 2017 Proceedings, pp. 159–166 CEUR Workshop Proceedings Vol. 1885, ISSN 1613-0073, c 2017 P. Vidnerová, R. Neruda Evolution Strategies for Deep Neural Network Models Design Petra Vidnerová, Roman Neruda Institute of Computer Science, The Czech Academy of Sciences [email protected] Abstract: Deep neural networks have become the state-of- The proposed algorithm is evaluated both on benchmark art methods in many fields of machine learning recently. and real-life data sets. As the benchmark data we use the Still, there is no easy way how to choose a network ar- MNIST data set that is classification of handwritten digits. chitecture which can significantly influence the network The real data set is from the area of sensor networks for performance. air pollution monitoring. The data came from De Vito et This work is a step towards an automatic architecture al [21, 5] and are described in detail in Section 5.1. design. We propose an algorithm for an optimization of a The paper is organized as follows. Section 2 brings an network architecture based on evolution strategies. The al- overview of related work. Section 3 briefly describes the gorithm is inspired by and designed directly for the Keras main ideas of our approach. In Section 4 our algorithm library [3] which is one of the most common implementa- based on evolution strategies is described. Section 5 sum- tions of deep neural networks. marizes the results of our experiments. Finally, Section 6 The proposed algorithm is tested on MNIST data set brings conclusion. and the prediction of air pollution based on sensor mea- surements, and it is comparedto several fixed architectures and support vector regression.
    [Show full text]
  • Evolution Strategies
    Evolution Strategies Nikolaus Hansen, Dirk V. Arnold and Anne Auger February 11, 2015 1 Contents 1 Overview 3 2 Main Principles 4 2.1 (µ/ρ +; λ) Notation for Selection and Recombination.......................5 2.2 Two Algorithm Templates......................................6 2.3 Recombination Operators......................................7 2.4 Mutation Operators.........................................8 3 Parameter Control 9 3.1 The 1/5th Success Rule....................................... 11 3.2 Self-Adaptation........................................... 11 3.3 Derandomized Self-Adaptation................................... 12 3.4 Non-Local Derandomized Step-Size Control (CSA)........................ 12 3.5 Addressing Dependencies Between Variables............................ 14 3.6 Covariance Matrix Adaptation (CMA)............................... 14 3.7 Natural Evolution Strategies.................................... 15 3.8 Further Aspects............................................ 18 4 Theory 19 4.1 Lower Runtime Bounds....................................... 20 4.2 Progress Rates............................................ 21 4.2.1 (1+1)-ES on Sphere Functions............................... 22 4.2.2 (µ/µ, λ)-ES on Sphere Functions.............................. 22 4.2.3 (µ/µ, λ)-ES on Noisy Sphere Functions........................... 24 4.2.4 Cumulative Step-Size Adaptation.............................. 24 4.2.5 Parabolic Ridge Functions.................................. 25 4.2.6 Cigar Functions.......................................
    [Show full text]
  • Simplify Your Cma-Es 3
    ACCEPTED FOR: IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Simplify Your Covariance Matrix Adaptation Evolution Strategy Hans-Georg Beyer and Bernhard Sendhoff Senior Member, IEEE Abstract—The standard Covariance Matrix Adaptation Evo- call the algorithm’s peculiarities, such as the covariance matrix lution Strategy (CMA-ES) comprises two evolution paths, one adaptation, the evolution path and the cumulative step-size for the learning of the mutation strength and one for the rank- adaptation (CSA) into question. Notably in [4] a first attempt 1 update of the covariance matrix. In this paper it is shown that one can approximately transform this algorithm in such has been made to replace the CSA mutation strength control a manner that one of the evolution paths and the covariance by the classical mutative self-adaptation. While getting rid of matrix itself disappear. That is, the covariance update and the any kind of evolution path statistics and thus yielding a much covariance matrix square root operations are no longer needed simpler strategy1, the resulting rank-µ update strategy, the so- in this novel so-called Matrix Adaptation (MA) ES. The MA- called Covariance Matrix Self-Adaptation CMSA (CMSA-ES) ES performs nearly as well as the original CMA-ES. This is shown by empirical investigations considering the evolution does not fully reach the performance of the original CMA- dynamics and the empirical expected runtime on a set of standard ES in the case of small population sizes. Furthermore, some test functions. Furthermore, it is shown that the MA-ES can of the reported performance advantages in [4] were due to a be used as search engine in a Bi-Population (BiPop) ES.
    [Show full text]
  • Algorithm and Experiment Design with Heuristiclab Instructor Biographies
    19.06.2011 Algorithm and Experiment Design with HeuristicLab An Open Source Optimization Environment for Research and Education S. Wagner, G. Kronberger Heuristic and Evolutionary Algorithms Laboratory (HEAL) School of Informatics, Communications and Media, Campus Hagenberg Upper Austria University of Applied Sciences Instructor Biographies • Stefan Wagner – MSc in computer science (2004) Johannes Kepler University Linz, Austria – PhD in technical sciences (2009) Johannes Kepler University Linz, Austria – Associate professor (2005 – 2009) Upper Austria University of Applied Sciences – Full professor for complex software systems (since 2009) Upper Austria University of Applied Sciences – Co‐founder of the HEAL research group – Project manager and chief architect of HeuristicLab – http://heal.heuristiclab.com/team/wagner • Gabriel Kronberger – MSc in computer science (2005) Johannes Kepler University Linz, Austria – PhD in technical sciences (2010) Johannes Kepler University Linz, Austria – Research assistant (since 2005) Upper Austria University of Applied Sciences – Member of the HEAL research group – Architect of HeuristicLab – http://heal.heuristiclab.com/team/kronberger ICCGI 2011 http://dev.heuristiclab.com 2 1 19.06.2011 Agenda • Objectives of the Tutorial • Introduction • Where to get HeuristicLab? • Plugin Infrastructure • Graphical User Interface • Available Algorithms & Problems • Demonstration • Some Additional Features • Planned Features • Team • Suggested Readings • Bibliography • Questions & Answers ICCGI 2011 http://dev.heuristiclab.com
    [Show full text]