Program Synthesis Using Uniform Mutation by Addition and Deletion
Total Page:16
File Type:pdf, Size:1020Kb
Program Synthesis using Uniform Mutation by Addition and Deletion Thomas Helmuth Nicholas Freitag McPhee Lee Spector Hamilton College University of Minnesota, Morris Hampshire College Clinton, New York, USA Morris, Minnesota, USA Amherst, Massachusetts, USA [email protected] [email protected] [email protected] ABSTRACT of populations, each derived from previous, with the aim of nding Most genetic programming systems use mutation and crossover a program that solves a given target problem. Each new program operators to create child programs from selected parent programs. is produced by passing one or more selected parents to genetic Typically, the mutation operator will replace a randomly chosen operators that construct children, incorporating material from the subprogram in the parent with a new, randomly generated subpro- parents while also introducing variation. The traditional approach gram. In systems with linear genomes, a uniform mutation operator produces some children for each population with a mutation opera- can be used that has some probability of replacing any particular tor that uses material only from a single parent, and others with gene with a new, randomly chosen gene. In this paper, we present a recombination or crossover operator that uses material from two a new uniform mutation operator called Uniform Mutation by Ad- parents. The traditional mutation operator replaces a randomly dition and Deletion (UMAD), which rst adds genes with some chosen subprogram of the single parent program with a new, ran- probability before or after every existing gene, and then deletes ran- domly generated subprogram. The traditional crossover operator dom genes from the resulting genome. In UMAD it is not necessary does something similar but obtains the replacement subprogram that the new genes replace old genes, as the additions and deletions from a second parent, rather than generating it randomly. Many can occur in dierent locations. We nd that UMAD, with relatively variants of these methods have been explored in the literature; for high rates of addition and deletion, results in signicant increases example, see the “Mutation Cookbook” section of [14]. in problem-solving performance on a range of program synthesis Here we describe a novel single-parent mutation operator, which benchmark problems. In our experiments, we compare this method we demonstrate in the context of genetic programming on bench- to a variety of alternatives, showing that it equals or outperforms mark software synthesis problems [6]. We show that by using this all of them. We explore this new mutation operator and other well- novel mutation operator, without any form of crossover, we achieve performing high-rate mutation schemes to determine what traits signicant and general improvements in program synthesis, im- are crucial to improved performance. proving the state of the art over all previous known approaches. This new operator, which we call Uniform Mutation by Addition CCS CONCEPTS and Deletion (UMAD), stems from the idea that mutations based on replacement, such as the traditional mutation operator, are overly • Computing methodologies → Genetic programming; constrained. They delete parts of a parent program and add new parts, but the additions are always in the same places as the dele- KEYWORDS tions. With UMAD we dissociate the additions from the deletions, mutation, genetic operators, Push, program synthesis so that we can get one without the other and/or both but in dif- ACM Reference Format: ferent parts of the parent program. Intriguingly, the high levels of Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program problem-solving performance that we demonstrate with UMAD— Synthesis using Uniform Mutation by Addition and Deletion. In GECCO ’18: the highest ever demonstrated for the general program synthesis Genetic and Evolutionary Computation Conference, July 15–19, 2018, Kyoto, benchmarks considered here—are achieved without recombination. Japan. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3205455. One explanation for improved performance with dissociated 3205603 additions and deletions, relative to replacements, is that the former can provide more paths through the search space from a parent to a 1 INTRODUCTION descendant than can the latter. For example, if we have a parent ABC Genetic programming (GP) is a problem solving tool that begins and we seek a child ADC, in a context in which the loss of either A or with a population of random programs, and produces a succession C would be fatal, then there is only one path to ADC by replacement: replacing B with D. With dissociated additions and deletions, it Permission to make digital or hard copies of all or part of this work for personal or would still be possible to go directly from ABC to ADC but there classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation would also be additional paths, including paths through AC, ABDC, on the rst page. Copyrights for components of this work owned by others than the and ADBC. Depending on the details of the program representation author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and the problem’s tness function, some of these alternative paths and/or a fee. Request permissions from [email protected]. might either be more likely to arise randomly or more likely to be GECCO ’18, July 15–19, 2018, Kyoto, Japan favored, at each step, by selection. © 2018 Copyright held by the owner/author(s). Publication rights licensed to Associa- tion for Computing Machinery. Here we study UMAD in the context of the PushGP genetic ACM ISBN 978-1-4503-5618-3/18/07...$15.00 programming system [15, 16], recent versions of which generate https://doi.org/10.1145/3205455.3205603 GECCO ’18, July 15–19, 2018, Kyoto, Japan Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector and manipulate programs in the form of linear genomes [8]. It Table 1: Genetic operator parameter settings and the rates at should be directly applicable, however, to any genetic programming which we use them in our benchmark genetic operators. system that uses linear program representations, and the general idea may be applicable to non-linear representations as well. Parameter Value Among our ndings is that UMAD works well with mutation alternation rate 0.01 rates that are considerably higher than those generally used in the alignment deviation 10 eld. For this reason, we also present the results of experiments uniform mutation rate 0.01 using replacement-based mutation with the same high rates; we uniform close mutation rate 0.1 nd that this also works surprisingly well, but not as well as UMAD. Because gene addition and deletion are dissociated in UMAD, Genetic Operator Rates Prob the rates of addition and deletion can be set independently. While alternation 0.2 some settings are size neutral on average, others produce a bias uniform mutation 0.2 toward smaller or larger programs. In the experiments described uniform close mutation 0.1 here, we explore the eects of these settings. alternation followed by uniform mutation 0.5 In the following section we describe the PushGP system in which our experiments are conducted, along with the genetic operators against which we will later compare UMAD. We then describe UMAD itself, our experimental design, and the results of our exper- of being incremented or decremented. These are described in more iments. After discussing results we describe related work and draw detail in [8]. general conclusions, including suggestions for future research. Our experiments refer to a genetic operator setting used in pre- vious papers on the problems studied here [6]; we call this setting 2 PUSHGP the benchmark genetic operators (see Table 1), and we note that PushGP is a GP system that evolves Push programs [15, 16]. Push is experiments have shown that 100% uniform mutation, with a rate a stack-based language that uses a separate stack for each data type, of 0.01, can be almost equally eective [8]. including a stack for program code that remains to be executed. These features facilitate the evolution of programs that involve 3 UNIFORM MUTATION BY ADDITION AND multiple data types and non-trivial control structures, including for DELETION general program synthesis benchmark problems [6]. Uniform Mutation by Addition and Deletion (UMAD) could be Although Push programs can have a nested, hierarchical struc- applied to any variable-length linear genome representation in evo- ture, recent versions of PushGP represent the programs in their lutionary computing. For each gene in a parent’s genome, UMAD populations as linear sequences of genes. These sequences, which considers whether to add a new random gene before or after it are called “Plush genomes” (the “l” is for “linear”), are translated with probability given by the parameter addition rate. Then, in this into Push programs prior to execution [8]. Each gene in a Plush augmented genome, UMAD considers whether to delete each gene genome species an instruction or a constant, and when certain with probability given by the parameter deletion rate. This process instructions (such as conditional or looping instructions) are pro- is formalized in the Algorithm 1. UMAD is uniform in that every cessed for translation to a Push program, one or more code blocks gene in the parent has equal chance of having a new gene added may be opened, resulting in a hierarchical Push program. Genes in before or after it, and also has equal chance of being deleted. During a Plush genome also include epigenetic close markers that specify the addition step of UMAD, we generate new genes using the same where these code blocks end.