Population-Based Evolution Optimizes a Meta-Learning Objective
Total Page:16
File Type:pdf, Size:1020Kb
Population-Based Evolution Optimizes a Meta-Learning Objective Kevin Frans Olaf Witkowski Cross Labs, Cross Compass Ltd. Cross Labs, Cross Compass Ltd. Japan Japan Massachusetts Institute of Technology Earth-Life Science Institute, Tokyo Institute of Technology Cambridge, MA, USA Japan College of Arts and Sciences, University of Tokyo Japan ABSTRACT Meta-learning models, or models that learn to learn, have been a long-desired target for their ability to quickly solve new tasks. Traditional meta-learning methods can require expensive inner and outer loops, thus there is demand for algorithms that discover strong learners without explicitly searching for them. We draw parallels to the study of evolvable genomes in evolutionary sys- tems – genomes with a strong capacity to adapt – and propose that meta-learning and adaptive evolvability optimize for the same ob- jective: high performance after a set of learning iterations. We argue that population-based evolutionary systems with non-static fitness landscapes naturally bias towards high-evolvability genomes, and Figure 1: Meta-learning and adaptive evolvability both mea- therefore optimize for populations with strong learning ability. sure parallel objectives: performance after a set of learn- We demonstrate this claim with a simple evolutionary algorithm, ing iterations. An evolutionary system can be considered as Population-Based Meta Learning (PBML), that consistently discov- "learning to learn" if its population of genomes consistently ers genomes which display higher rates of improvement over gen- improve their evolvability, thus allowing the population to erations, and can rapidly adapt to solve sparse fitness and robotic more efficiently adapt to changes in the environment. control tasks. 1 INTRODUCTION genome’s offspring [20, 31, 37], while others capture the ability for In recent years, there has been a growing interest in the commu- offspring to adapt to new challenges [12, 22,28]. nity about meta-learning, or learning to learn. In a meta-learning From a meta-learning perspective, the ability for offspring to viewpoint, models are judged not just on their strength, but on adapt to new challenges – which we refer to as adaptive evolvability their capacity to increase this strength through future learning. – is of particular interest. Consider a shifting environment in which Strong meta-learning models are therefore desired for many practi- genomes must repeatedly adapt to maintain high fitness levels. A cal applications, such as sim-to-real robotics [4, 27, 43] or low-data genome with high adaptive evolvability would more easily produce learning [15, 33, 42, 47], for their ability to quickly generalize to new offspring that explore towards high-fitness areas. Thus, adaptive tasks. The inherent challenge in traditional meta-learning lies in evolvability can be measured as the expected fitness of a genome’s its expensive inner loop – if learning is expensive, then learning to descendants after some number of generations. arXiv:2103.06435v1 [cs.NE] 11 Mar 2021 learn is orders of magnitude harder. Following this problem, there Here we consider the evolutionary system from an optimiza- is a desire for alternate meta-learning methods that can discover tion perspective. Evolutionary systems are learning models which fast learners without explicitly searching for them. iteratively improve through mutation and selection. Thus, the learn- In the adjacent field of evolutionary computation, a similar inter- ing ability of a system directly depends on its genomes’ ability to est has risen for genomes with high evolvability. Genomes represent efficiently explore via mutation. Take two populations and allow individual units in a world, such as the structure or behavior or a them to adapt for a number of generations; the population with creature, which together form a global population. An evolutionary higher learning ability will produce descendants with higher aver- system improves by mutating this population of genomes to form age fitness. The meta-learning objective of an evolutionary system offspring, which are then selected based on their fitness scores. is therefore the same as maximizing the adaptive evolvability of its A genome’s evolvability is its ability to efficiently carry out this population – genomes should learn to produce high-fitness descen- mutation-selection process [29, 41]. Precise definitions of evolv- dants through mutation (Figure 1). ability are debated: some measurements capture the diversity of a The key insight in connecting these ideas is that evolvability is known to improve naturally in certain evolutionary systems. In contrast to meta-learning, where meta-parameters are traditionally Population-Based Evolution Optimizes a Meta-Learning Objective Kevin Frans and Olaf Witkowski optimized for in a separate loop [24], evolvability is often a property A common bottleneck in meta-learning is in the computational that is thought to arise indirectly as an evolutionary process runs needs of a two-loop process. If the inner loop requires many iter- [1, 12, 28, 32], a key example being the adaptability of life on Earth ations of training, then the outer loop can become prohibitively [37]. A large body of work has examined what properties lead expensive. Thus, many meta-learning algorithms focus specifically to increasing evolvability in evolutionary systems [1, 10, 12, 28, on the one-shot or few-shot domain to ensure a small inner loop 32], and if a population consistently increases in evolvability, it is [33, 42, 47]. On many domains, however, it is unrealistic to suc- equivalent to learning to learn. cessfully solve a new task in only a few updates. In addition, meta- In this work, we claim that in evolutionary algorithms with a parameters that are strong in the short term can degrade in the large population and a non-static fitness landscape, high-evolvability long term [51]. In this case, other methods must be used to meta- genomes will naturally be selected for, creating a viable meta- learn over the long term, such as implicit differentiation [39]. The learning method that does not rely on explicit search trees. The approach we adopt is to combine the inner and outer loops, and key idea is that survivable genes in population-based evolution- optimize parameters and meta-parameters simultaneously, which ary systems must grant fitness not only in the present, but alsoto can significantly reduce computation cost [5, 14, 16, 19]. The key inheritors of the gene. In environments with non-static fitness land- issue here is that models may overfit on their current tasks instead scapes, this manifests in the form of learning an efficient mutation of developing long-term learning abilities. We claim that evolving function, often with inductive biases that balance exploration and large populations of genomes reduces this problem, as shown in exploitation along various dimensions. experiments below. We solidify this claim by considering Population-Based Meta Learning (PBML), an evolutionary algorithm that successfully dis- 2.2 Evolvability covers genomes with a strong capacity to improve themselves over The definition of evolvability has often shifted around, and while generations. We show that learning rates consistently increase even there is no strict consensus, measurements of evolvability gener- when a genome’s learning behavior is decoupled from its fitness, ally fall into two camps: diversity and adaptation. Diversity-based and genomes can discover non-trivial mutation behavior to effi- evolvability focuses on the ability for a genome to produce phe- ciently explore in sparse environments. We show that genomes notypically diverse offspring [20, 29, 31, 37]. This measurement, discovered through population-based evolution consistently out- however, often depends on a hand-specified diversity function. In perform competitors on both numeric and robotic control tasks. the context of learning to learn, diverse offspring are not always op- timal, rather diversity is only a factor in an exploration-exploitation tradeoff to produce high-fitness offspring [9]. Instead, we consider 2 BACKGROUND adaptation-based evolvability [12, 28, 41], which measures the per- 2.1 Meta Learning formance of offspring some generations after a change in the fitness landscape. The field of meta-learning focuses on discovering models that dis- As evolvability is a highly desired property in evolutionary algo- play a strong capacity for additional learning, hence the description rithms, many works aim to explain how evolvability emerges in evo- learning to learn [49]. Traditionally, meta-learning consists of a lutionary systems. Some works attribute evolvability due the rules distribution of tasks, along with an inner and outer loop. In the of the environment. Periodically varying goals have been shown to inner loop, a new task is selected, and a model is trained to solve induce evolvability in a population of genomes, as genomes with the task. In the outer loop, meta-parameters are trained to increase more evolvable representations can adapt quicker to survive selec- expected performance in the inner loop. A strong meta-learning tion [22, 28]. In addition, environments where novelty is rewarded model will encode shared information in these meta-parameters, or where competition is spread around niches will naturally favor so as to quickly learn to solve a new task. genomes that produce diverse offspring [32]. Others take a different Meta-parameters can take many forms [24], as long as