Contextual Covariance Matrix Adaptation Evolutionary Strategies
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Contextual Covariance Matrix Adaptation Evolutionary Strategies Abbas Abdolmaleki1;2;3, Bob Price5, Nuno Lau1, Luis Paulo Reis2;3, Gerhard Neumann4∗ 1: IEETA, DETI, University of Aveiro 2: DSI, University of Minho 3: LIACC, University of Porto, Portugal 4: CLAS, TU Darmstadt, Germany 5: PARC, A Xerox Company, USA Abstract the ball to its team mates which are positioned on differ- ent locations on the field. Here, the soccer robot should Many stochastic search algorithms are designed to learn to kick the ball to any given target location, which is optimize a fixed objective function to learn a task, specified by the context vector, on the field. In such cases, i.e., if the objective function changes slightly, for learning for every possible context is clearly inefficient or example, due to a change in the situation or con- even infeasible. Therefore our goal is to generalize learned text of the task, relearning is required to adapt to tasks from similar contexts to a new context. To do so, we the new context. For instance, if we want to learn learn a context-dependent policy for a continuous range of a kicking movement for a soccer robot, we have to contexts without restarting the learning process. In this pa- relearn the movement for different ball locations. per, we consider stochastic search algorithms for contextual Such relearning is undesired as it is highly inef- learning. Stochastic search algorithms [Hansen et al., 2003; ficient and many applications require a fast adap- Sun et al., 2009; Ruckstieߨ et al., 2008] optimize an objec- tation to a new context/situation. Therefore, we tive function in a continuous domain. These algorithms as- investigate contextual stochastic search algorithms sume that the objective function is a black box function, i.e., that can learn multiple, similar tasks simultane- they only use the objective values and don’t require gradients ously. Current contextual stochastic search meth- or higher-order derivatives of the objective function. Con- ods are based on policy search algorithms and suf- textual stochastic search algorithms have been investigated in fer from premature convergence and the need for the field of policy search for robotics [Kupcsik et al., 2013; parameter tuning. In this paper, we extend the well Kober et al., 2010]. However, these policy search algorithms known CMA-ES algorithm to the contextual setting typically suffer from premature convergence and perform un- and illustrate its performance on several contextual favourably in comparison to state of the art stochastic search tasks. Our new algorithm, called contextual CMA- methods [Stulp and Sigaud, 2012] such as the CMA-ES al- ES, leverages from contextual learning while it pre- gorithm [Hansen et al., 2003]. The CMA-ES algorithm is serves all the features of standard CMA-ES such considered as the state of the art in stochastic optimization. as stability, avoidance of premature convergence, CMA-ES performs favourably in many tasks without the need step size control and a minimal amount of parame- of extensive parameter tuning. The algorithm has many bene- ter tuning. ficial properties, including automatic step-size adaptation, ef- ficient covariance updates that incorporates the current sam- 1 Introduction ples as well as the evolution path and its invariance proper- ties. However, CMA-ES is lacking the important feature of The notion of multi-task learning1 has been stablished in contextual (multi-task) learning. Please note that standard the machine learning community for at least the past two CMA-ES can be used to optimize the parameters of a closed decades [Caruana, 1997]. The main motivation for contex- loop policy that depends on context. However, the dimension tual learning is the potential for exploiting relevant infor- of parameter space would grow linearly with the number of mation available in related tasks by concurrent learning us- context dimensions. Moreover, to evaluate a parameter vector ing a shared representation. Therefore, instead of learning we would need to call the objective function for many differ- one task at a time, we would like to learn multiple tasks ent contexts and average the results. Both facts limit the data at once and exploit the correlations between related tasks. efficiency as it has been shown by [Ha and Liu, 2016]. There- We use a context vector to characterize a task which typi- fore, in this paper, we extend the well known CMA-ES algo- cally changes from one task execution to the next. For ex- rithm to contextual setting using inspiration from contextual ample, consider a humanoid soccer robot that needs to pass policy search. Our new algorithm, called contextual CMA- ES, generalizes the learned solution to new, unseen contexts ∗Contact email: [email protected] 1 during the optimization process while it preserves all the fea- The terms ”contextual learning” and ”multi-task learning” are tures of standard CMA-ES such as stability, avoidance of pre- used interchangeably throughout this paper. 1378 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) mature convergence, step size control, a minimal amount of tual Relative Entropy Policy Search (REPS) algorithm [Kupc- parameter tuning and simple implementation. In our deriva- sik et al., 2013], which will provide insights for the develop- tion of the algorithm, we also provide a new theoretical jus- ment of the new contextual CMA-ES algorithm. tification for the covariance matrix update rule of contextual CMA-ES algorithm that also applies to the non-contextual 3.1 Problem Statement case and gives new insights into how the covariance update We consider contextual black-box optimization problems that can be motivated. For illustration of the algorithm, we will are characterized by a ns-dimensional context vector s. The use contextual standard functions and two contextual simu- task is to find for each context vector s, an optimal parame- ∗ lated robotic tasks which are robot table tennis, and a robot ter vector θs that maximizes an objective function R(s; θ): kick task. We show that our contextual CMA-ES algorithm fRns × Rnθ g ! R. Note that the objective function is also performs favourably in comparison to other contextual learn- dependent on the given context vector s. We want to find an ing algorithms. optimal context dependent policy m∗(s) in form of θ∗ = m∗(s) = A∗'(s); m∗(s): Rns ! Rnθ 2 Related Work s that outputs the optimal parameter vector θ∗ for the given In order to generalize a learned parameter vector for a con- s context s.The vector '(s) is an arbitrary n -dimensional text to the other contexts, a standard approach is to opti- ' feature function of the context s and the gain matrix A is mize the parameters for several target contexts independently. a n × n matrix that models the dependence of parameters Subsequently, regression methods are used to generalize the θ ' θ on the context s. Throughout this paper, we use '(s) = optimized contexts to a new, unseen context [Da Silva et [1 s], which results in linear generalization over contexts. al., 2012; Stulp et al., 2013]. Although such approaches However, other feature functions such as radial basis func- have been used successfully, they are time consuming and tions (RBF) for non-linear generalization over contexts [Ab- inefficient in terms of the number of needed training sam- dolmaleki et al., 2015b] can also be used. The only acces- ples as optimizing for different contexts and the generaliza- sible information on objective function R(s; θ) are returns tion between optimized parameters for different contexts are fR g of context-parameters samples fs ; θ g , two independent processes. Hence, we cannot reuse data- k k=1:::N k k k=1:::N where k is the index of the sample and N is number of sam- points obtained from optimizing a task with context s to ples. improve and accelerate the optimization of a task with an- other context s0. Learning for multiple tasks without restart- ing the learning process is known as contextual (multi-task) Algorithm 1 Contextual Stochastic Search Algorithms policy search [Kupcsik et al., 2013; Kober et al., 2010; 1: given nθ; ns; N; '(s) = [1 s] Deisenroth et al., 2014]. In the area of contextual stochas- At=0 ; σt=0 > 0; Σt=0 = I ; 0 t 2: initialize na×n' nθ ×nθ tic search algorithms, such a multi-task learning capability 3: repeat was established for information-theoretic policy search al- 4: for k = 1,...,N do [ ] gorithms Peters et al., 2010 , such as the Contextual Rel- 5: Observe sk ative Entropy Policy Search (CREPS) algorithm [Kupcsik tT 6: m(s ) = A '(s ) et al., 2013]. However, it has been shown that REPS suf- k k θ = m(s ) + σt × N 0; Σt fers from premature convergence [Abdolmaleki et al., 2016; 7: k k 2015a] as the update of the covariance matrix, which is based 8: Rk = R(sk; θk) only on the current set of samples, is reducing the variance 9: end for 10: d = ComputeWeights(fsk; θk;Rkgk=1:::N ) of the search distribution too quickly. In order to allevi- t+1 ate this problem, the authors of [Abdolmaleki et al., 2016; 11: A = UpdateMean(fsk; θk; dkgk=1:::N ) t+1 t t+1 t 2015a] suggest to combine the sample covariance with the old 12: Σ = UpdateCov(fsk; θk; dkgk=1:::N ; Σ ; A ; A ) covariance matrix, similar to the CMA-ES algorithm [Hansen 13: σt+1 = UpdateStepSize(σt; At+1; At) et al., 2003]. However this method does not take advantage of 14: t t + 1 other features of CMA-ES such as step-size control.