A Natural Evolution Strategy with Asynchronous Strategy Updates
Total Page:16
File Type:pdf, Size:1020Kb
A Natural Evolution Strategy with Asynchronous Strategy Updates Tobias Glasmachers Institut für Neuroinformatik Ruhr-Universität Bochum, Germany [email protected] ABSTRACT since it is a standard assumption that the lion’s share of the We propose a generic method for turning a modern, non- overall computation time is spent on this step. The pos- elitist evolution strategy with fully adaptive covariance ma- sibility to perform individual steps asynchronously enables trix into an asynchronous algorithm. This algorithm can more efficient parallelism, since synchronization points form process the result of an evaluation of the fitness function potential bottlenecks. anytime and update its search strategy, without the need Evolution Strategies (ESs) are special within the wider to synchronize with the rest of the population. The asyn- field of evolutionary computation in that they rely heavily chronous update builds on the recent developments of natu- on active“self”-adaptation of their search or mutation distri- ral evolution strategies and information geometric optimiza- bution. While this technique is a blessing in many respects tion. (it enables linear convergence into twice continuously dif- Our algorithm improves on the usual generational scheme ferentiable optima), it is also a curse, since it can interfere in two respects. Remarkably, the possibility to process fit- with typical difficulties of evolutionary optimization, such ness values immediately results in a speed-up of the sequen- as fitness noise and constraint handling. The same holds tial algorithm. Furthermore, our algorithm is much better for asynchronous processing. In the present study we show suited for parallel processing. It allows to use more proces- how to make fitness evaluation in a modern non-elitist evo- sors than offspring individuals in a meaningful way. lution strategy an asynchronous operation, while preserving meaningful strategy updates of the full covariance structure. Parallelization of evolutionary algorithms (EAs) has been Categories and Subject Descriptors subject to a large number of studies since EAs, being popula- [Evolution Strategies and Evolutionary Programming] tion-based algorithms, are well suited for parallelization. Different types of parallelism have been identified, see e.g. [1]. General Terms Relatively simple master-slave architectures can distribute fitness evaluations within an offspring population to multiple Algorithms processors. More complex systems are based on distributed or otherwise structured populations, like for example island Keywords models [2] with synchronous or asynchronous message pass- Evolution strategies, Speedup technique, Parallelization ing. Recently there have been impressive demonstrations of 1. INTRODUCTION massively parallel evolutionary algorithms using huge pop- ulation sizes of tens of thousands [7]. Such implementation In pure form, most evolutionary algorithms (EAs) oper- rely heavily of the general purpose computing capabilities ate in generation cycles. For many specific variants it is of modern graphics processing units (GPUs). Such mas- easy to weaken this assumption and to allow for anytime sively parallel implementations are most commonly found or asynchronous application of certain operators, such as for genetic algorithms (GAs) and genetic programming (GP) selection of parents, generation of offspring (application of systems, where the search distribution is defined by static variation operators), as well as the evaluation of offspring, operators (without self-adaptation) applied to the current an operation that is typically considered expensive. Other population. operators basically require synchronous operation, such as In general, parallelism and synchronity of different steps survivor selection. Among these, the evaluation of individ- on an algorithm are distinct properties. However, asyn- uals by means of the fitness function is of primary interest, chronicity of an operation is useless in a strictly sequential program, while it can avoid synchronization overheads in a parallel computation. Asynchronicity is not a prerequisite Permission to make digital or hard copies of all or part of this work for for parallelism, but it can increase its efficiency. personal or classroom use is granted without fee provided that copies are In this paper we do not aim for the massive parallelism not made or distributed for profit or commercial advantage and that copies mentioned above. Speed-ups in the order of hundreds are bear this notice and the full citation on the first page. To copy otherwise, to not to be expected for (todays) ESs, simply for their com- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. paratively small population sizes. Therefore, instead of GPU GECCO’13, July 6-10, 2013, Amsterdam, The Netherlands. hardware, we consider a setting that is better fitted for the Copyright 2013 ACM 978-1-4503-1963-8/13/07 ...$15.00. standard ES: assume the fitness evaluation involves a com- tially become stochastic gradient steps in parameter space, putation intensive simulation that needs a full blown CPU- which are augmented by learning rates. based architecture to run efficiently. We have a few dedi- Our approach modifies a generational ES so that it be- cated compute servers available for the task, with a total comes a fully asynchronous algorithm. The asynchronous of c independent processors, with c in the order of maybe scheme creates an offspring xi as soon as a compute node dozens. Then we can choose any multiple of c for the off- becomes idle. This compute node is then busy for some time spring population size n to obtain a basically linear speed- evaluating the offspring’s fitness f(xi). In general, fitness up, compared to the naive sequential implementation. This values may be returned in an arbitrary order by the com- can be achieved with a simple master-slave architecture. It pute nodes, so that evaluated offspring (xi,f(xi)) become is assumed in particular that the lion’s share of the com- available in an order that is different from the generation of putation time is spent on fitness evaluations, not on com- offspring. In this setting we argue that evaluated individuals munication overhead, strategy updates, sampling of random should not be viewed as members of populations in a discrete numbers, and relatively cheap bookkeeping tasks that are cycle, but rather as a continuous stream of individuals. performed after synchronization by the central master pro- This view offers the opportunity to use all information as cess. Such work has been conducted, e.g., for the highly soon as they become available for updating the search distri- efficient CMA-ES algorithm [5]. bution from which new offspring are generated. Now assume Although we have already made quite a few assumptions that a compute node has just returned the fitness value of at this point, our performance calculation can still turn out an individual. Put in a nutshell, our approach amounts to to be very far off. For example, it may well happen that applying a 1/n fraction of the strategy update of a stan- the ES would actually run at its highest efficiency with a dard generational ES, to account for the higher update fre- smaller population size. While this effect is often small (al- quency, using the n most recently arrived individuals from though considerable for huge populations, see e.g. [5]), an- the stream as the current population. This rule needs minor other one may turn out to be crucial: the runtime of the corrections to account for noise induced by delayed arrival simulation may vary unpredictably (with the search point, of individuals due to varying evaluations times. with the compute node, with factors deeply hidden in indus- Our simple yet efficient scheme can deal with variation in trial simulation software, or with other factors outside our the runtimes of fitness evaluations without wasting compu- control). Then most processors will have to wait for the syn- tation time. Interestingly it also improves sequential search chronization with the slowest-to-evaluate individual in the on a single core. The reason is that the first individual of a population. hypothetical offspring population is readily evaluated before A principled solution to this problem is to break up the the second one needs to be generated. The information con- generation cycle and to turn to asynchronous algorithms. tained in the first fitness value can already guide the search The synchronous generation cycle of many EAs is obviously to better points, even within a single generation. an over-simplification of natural evolution, and there has The remainder of this paper is organized as follows: Nat- been considerable work to overcome this restriction algo- ural evolution strategies in general and the xNES algorithm rithmically [2]. in particular are presented in the context of information ge- The standard solution to the asynchronous update re- ometric optimization. Then we introduce the asynchronous quirement in EAs is steady state selection. The elitist char- update. The new algorithm is benchmarked in a number of acter of selection schemes directly suitable for asynchronicity different sequential and parallel settings against the popula- is problematic in the presence of multi-modality and noise tion-based ES. We close with our conclusions. in fitness values, which are both common characteristics of fitness functions based on simulations. 2. NATURAL EVOLUTION STRATEGIES Furthermore, most existing schemes are designed for