Neutrality and Epistasis in Program Space

2018 ACM/IEEE 4th International Genetic Improvement Workshop Neutrality and Epistasis in Program Space Joseph Renzullo Westley Weimer Arizona State University University of Michigan Tempe, Arizona Ann Arbor, Michigan [email protected] [email protected] Melanie Moses Stephanie Forrest∗ University of New Mexico Arizona State University Albuquerque, New Mexico Tempe, Arizona [email protected] [email protected] ABSTRACT 1 INTRODUCTION Neutral networks in biology often contain diverse solutions with Genetic Improvement of software is more likely to succeed when the equal fitness, which can be useful when environments (require- search algorithm is well-matched to the fitness landscape, whether ments) change over time. In this paper, we present a method for for repairing bugs or improving a nonfunctional property of the studying neutral networks in software. In these networks, we find program. The fitness landscape integrates information about the multiple solutions to held-out test cases (latent bugs), suggesting problem to be solved, its representation, and the search operators that neutral software networks also exhibit relevant diversity. We that are used. Neutral spaces are an important component of many also observe instances of positive epistasis between random mu- fitness landscapes, and we hypothesize that they help determine tations, i.e. interactions that collectively increase fitness. Positive the dynamics and ultimate success of genetic improvement. The epistasis is rare as a fraction of the total search space but significant topology of the neutral space, i.e. the network of programs (nodes) as a fraction of the objective space: 9% of the repairs we found that are equivalent under the fitness metric and connected by sin- to look (and 4.63% across all programs analyzed) were produced gle applications of the search operators (edges) is referred to as a by positive interactions between mutations. Further, the majority neutral network. In this work, fitness is defined to be test-suite per- (62.50%) of unique repairs are instances of positive epistasis. formance,1 and two programs are considered to be neutral if they are test-suite equivalent, even if they behave differently on held-out CCS CONCEPTS tests or achieve identical input/output behavior using different im- plementations. We consider the three most widely used mutations • Applied computing → Biological networks; • Software and in genetic improvement: delete, copy, and swap. We characterize its engineering → Software evolution; Search-based software the topology of the neutral space located around C programs, and engineering; we study epistasis, or interactions, between multiple neutral mutations (single-edits) to the same programs. Our results on extant C KEYWORDS programs illustrate how neutral spaces can be studied in software, Software evolution, Network science, Biological networks, Software which we hope can inform design choices in genetic improvement testing and debugging, Automated software engineering (e.g., representations, choice of operators and search strategies) and ultimately, provide insights about the nature of software to inform ACM Reference Format: other types of software engineering tools. Joseph Renzullo, Westley Weimer, Melanie Moses, and Stephanie Forrest. We motivate our study by observing that the impressive results 2018. Neutrality and Epistasis in Program Space. In GI’18: GI’18:IEEE/ACM 4th International Genetic Improvement Workshop , June 2, 2018, Gothenburg, achieved to date using genetic methods for improving software Sweden. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3194810. leverage the power of evolutionary search to only a limited extent. 3194812 For example, most approaches rely primarily on mutation [14– 16, 23–25, 35], use small population sizes2 [14, 23, 24], and search for a small number of generations [14, 23–25]. Further, many of ∗ Also at the Santa Fe Institute the repairs found using these methods can be reduced to a single mutation [18, 24, 25, 35], suggesting that a large part of the successes achieved to date can be reduced to a form of random search. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed Tackling larger and more complex problems will require meth- for profit or commercial advantage and that copies bear this notice and the full citation ods that can scale up effectively. This includes supporting larger on the first page. Copyrights for components of this work owned by others than the populations, longer runs, and the productive use of crossover to author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 1Test-suite performance is the pattern of passing and failing test cases (input/output GI’18, June 2, 2018, Gothenburg, Sweden pairs). A program passes a test case if, when given a reference input, it generates an © 2018 Copyright held by the owner/author(s). Publication rights licensed to the output that is character-for-character identical to the reference output; otherwise, it Association for Computing Machinery. fails the test case. ACM ISBN 978-1-4503-5753-1/18/06...$15.00 2Some non-functional property improvement strategies, e.g. energy optimization, are https://doi.org/10.1145/3194810.3194812 a notable exception. 1 GI’18, June 2, 2018, Gothenburg, Sweden Joseph Renzullo, Westley Weimer, Melanie Moses, and Stephanie Forrest combine promising partial solutions and advantageous traits which Genetic variants that have the same fitness are referred to as have evolved independently in the population. In addition, we may neutral, and a neutral network is a set of equal-fitness individuals need to consider more carefully where and when mutations are ap- related by single mutations. Neutral networks help a population of plied. For example, in biology, different mutation rates have evolved independent individuals manage the exploration vs. exploitation in different types of organisms, and mutation rates can vary widely trade-off, a problem faced by any population-based search. Previous even within a single chromosome [17] or at different times, as in work argues that the topology of neutral networks is key to a pop- affinity maturation when mutation rates are increased byupto ulation’s ability to find high-fitness innovations (exploration) [34], five orders of magnitude. Baudry’s work [2, 6] suggests that both while preserving already-discovered innovations (exploitation), and targeting specific locations and varying mutation rates could be that neutrality and robustness are key to evolution [12, 13]. In a advantageous in genetic improvement. sense, it is safe for a population to search by spreading out over How should we scale up current methods in genetic improvement a neutral network: each member of the population maintains its to tackle much larger codebases and more complex repairs? What fitness (because each is neutral) while collectively diversifying the is the best way to navigate fitness plateaus (a.k.a. neutral networks), population’s genome. This makes it more likely that small muta- which are common when test cases are used as fitness functions? tions will lead to novelty. What portion of a fitness landscape is made up of this plateau? Neutral mutations indicate a type of robustness in the sense that Does this portion change, along with mutational robustness, as we the mutation is a change that does not affect fitness. The interplay move away from the original program by iterated mutation? Are between robustness and evolution has been studied extensively in there more or better repairs near the original program? Or farther biology [19, 26, 34], producing many theoretical models, e.g., [5], away in program-space? How do edits interact, as they accumulate? and an increasing body of experimental results, e.g., [32]. It has These motivating questions are not answered here, but in this paper been found, e.g. in [1] that large, connected neutral networks are we show how they can be investigated by adapting techniques from prevalent in the regulatory gene binding sites of multiple species. evolutionary biology, and we provide preliminary evidence that To summarize, neutral networks allow evolution to maintain fit these methods are fruitful when applied to example programs. phenotypes (external appearance and behavior) while exploring In Section 2 we discuss a framework for the methods we develop, a large genetic search space. In this paper we study neutral net- which draws on results in the field of evolutionary biology. In works in a computational context. Specifically, we analyze neutral Section 3 we apply some of these results, demonstrate a method networks in example computer programs, relate their structure to for building a neutral network for a computer program, and briefly neutral networks observed in biology, and conjecture that software discuss characteristics of interest. In Section 4 we consider the is evolvable, at least in part, because neutral networks enable sto- problem of interacting mutations (a.k.a. higher order mutants). We chastic search methods to search for useful innovations (e.g., bug find that although rare, positive epistasis comprises a significant repairs) without damaging existing functionality. fraction (9%) of discovered repairs to a latent

Neutrality and Epistasis in Program Space

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support