Symbolic Regression for Knowledge Discovery – Bloat, Overfitting, And

Total Page:16

File Type:pdf, Size:1020Kb

Symbolic Regression for Knowledge Discovery – Bloat, Overfitting, And JOHANNES KEPLER UNIVERSITAT¨ LINZ JKU Technisch-Naturwissenschaftliche Fakult¨at Symbolic Regression for Knowledge Discovery { Bloat, Overfitting, and Variable Interaction Networks DISSERTATION zur Erlangung des akademischen Grades Doktor im Doktoratsstudium der Technischen Wissenschaften Eingereicht von: Dipl.-Ing. Gabriel Kronberger Angefertigt am: Institut f¨urformale Modelle und Verifikation Beurteilung: Priv.-Doz. Dipl.-Ing. Dr. Michael Affenzeller (Betreuung) Assoc.-Prof. Dipl.-Ing. Dr. Ulrich Bodenhofer Linz, 12, 2010 For Gabi Acknowledgments This thesis would not have been possible without the support of a number of people. First and foremost I want to thank my adviser Dr. Michael A®enzeller, who has not only become my mentor but also a close friend in the years since I ¯rst started my forays into the area of evolutionary computation. Furthermore, I would like to express my gratitude to my second adviser Dr. Ulrich Bodenhofer for introducing me to the statistical elements of machine learning and en- couraging me to critically scrutinize modeling results. I am also thankful to Prof. Dr. Witold Jacak, dean of the School of Informatics, Communications and Media in Hagenberg of the Upper Austria University of Applied Sciences, for giving me the opportunity to work at the research center Hagenberg and allowing me to work on this thesis. I thank all members of the research group for heuristic and evolutionary algorithms (HEAL). Andreas Beham, for implementing standard variants of heuristic algorithms in HeuristicLab. Michael Kommenda, for discussing extensions and improvements of symbolic regression for practical applications, some of which are described in this thesis. Dr. Stefan Wagner, for tirelessly improving HeuristicLab, which has been used as the software environment for all experiments presented in this thesis. Dr. Stephan Winkler, for introducing me to genetic programming and system identi¯cation and for proofread- ing a ¯rst draft of this thesis to improve its linguistic quality. Ciprian Zavoianu, for the interesting discussions about aspects of genetic programming. I also wish to extend my thanks to my colleagues who invested a part of their valuable time to improve application-speci¯c sections of this thesis. Christoph Feilmayr, for advice regarding data-based modeling of the blast furnace process and for proofreading and improving a draft of Chapter 8. Leonhard Schickmair, for his input on variable relevance metrics and for proofreading and improving a draft of Chapter 8. Dr. Stefan Fink, for preparing a dataset for economic modeling and for the discussion of economic models produced by genetic programming. Finally, I thank my parents and my family who supported me and my interests in computer science. I thank all my friends for encouraging me to ¯nish this thesis. This work is dedicated to Gabi for her love and support. This thesis mainly reflects research work done within the Josef Ressel-center for heuris- tic optimization \Heureka!" at the Upper Austria University of Applied Sciences, Cam- pus Hagenberg. The center \Heureka!" is supported within the program \Josef Ressel- Centers" by the Austrian Research Promotion Agency (FFG) on behalf of the Austrian Federal Ministry of Economy, Family and Youth (BMWFJ). i Zusammenfassung Mit der wachsenden Datenmenge, die in unterschiedlichsten Anwendungsbereichen ge- sammelt und aufgezeichnet wird, wÄachst auch das Bedurfnis,Ä diese Daten sinnvoll zu nutzen. In der Wissenschaft haben Daten schon seit jeher einen hohen Stellenwert. In den jungerenÄ Jahren gewinnt aber auch das wirtschaftliche Potential von Daten zu- nehmend an Bedeutung. In Kombination mit Verfahren zur Datenanalyse kann dieses Potential ausgeschÄopft werden, sei es im kommerziellen Bereich zur Optimierung von Angeboten, oder im industriellen Bereich zur Optimierung von Ressourceneinsatz oder ProduktqualitÄat basierend auf Prozessdaten. In dieser Arbeit wird ein neuer Ansatz furÄ die Analyse von Daten vorgestellt, der auf symbolischer Regression mit genetischer Programmierung basiert und das Ziel verfolgt, einen GesamtuberblickÄ uberÄ das Zusammenspiel von einzelnen Faktoren eines Systems zu ermÄoglichen. Dabei sollen mÄoglichst alle potentiell interessanten ZusammenhÄange, welche in einem Datensatz erkennbar sind, in Form von kompakten und verstÄandlichen Modellen identi¯ziert werden. Im ersten Teil der vorliegenden Arbeit wird dieser Ansatz der umfassenden symbo- lischen Regression im Detail beschrieben. Wesentliche Themen, die dabei eine Rolle spielen, sind die Vermeidung von Bloat und Uberanpassung,Ä die Vereinfachung von Mo- dellen und die Identi¯kation von relevanten EinflussgrÄo¼en. In diesem Zusammenhang werden unterschiedliche Verfahren zur Vermeidung von Bloat vorgestellt und verglichen. Insbesondere wird der Einfluss von Nachkommenselektion auf Bloat analysiert. DaruberÄ hinaus wird eine neue MÄoglichkeit zur Erkennung von UberanpassungÄ vorgestellt. Im Zu- ge dessen werden darauf basierende Erweiterungen zur Reduktion von UberanpassungÄ vorgestellt und verglichen. Eine wichtige Rolle spielt dabei das Pruning von Modellen, einerseits um UberanpassungÄ zu verhindern und andererseits um komplexe Modelle zu vereinfachen. Ein weiterer wesentlicher Aspekt ist die Analyse und Darstellung der umfangreichen Menge von unterschiedlichen Modellen, die aus dem vorgestellten Ansatz resultiert. In diesem Zusammenhang werden MÄoglichkeiten zur Quanti¯zierung von relevanten Ein- flussgrÄo¼en vorgestellt, die in weiterer Folge verwendet werden kÄonnen, um Interaktionen von Variablen des analysierten Systems zu identi¯zieren. Durch die Visualisierung die- ser Interaktionen entsteht ein GesamtuberblickÄ uberÄ das betrachtete System. Dies wÄare alleine durch die Analyse einzelner Modelle, die sich auf spezielle Aspekte konzentrie- ren, nicht mÄoglich. ZusÄatzlich wird im ersten Teil auch die Prognose von multivariaten Zeitserien mit genetischer Programmierung beschrieben. Im zweiten Teil dieser Arbeit wird gezeigt, wie der vorgestellte Ansatz furÄ die Analyse von realen Systemen eingesetzt werden kann und dadurch neue Einblicke ermÄoglicht werden kÄonnen. Die Daten stammen von einem Hochofen furÄ die Produktion von Stahl und von einem industriellen chemischen Prozess. ZusÄatzlich wird gezeigt wie derselbe Ansatz zur Identi¯kation von makroÄokonomischen ZusammenhÄangen eingesetzt werden kann. Abstract With the growing amount of data that are collected and recorded in various application areas the need to utilize these data is also growing. In science, data have always played an important role; in recent years, however, the economic potential of data has also become increasingly important. In combination with methods for data analysis, data can be utilized to their full potential, whether in the commercial sector to optimize o®ers, or in the industrial sector to optimize resources and product quality based on process data. This work describes a new approach for the analysis of data which is based on sym- bolic regression with genetic programming and aims to generate an overall view of the interactions of various variables of a system. By this means, all potentially interesting relationships, which can be detected in a dataset, should be identi¯ed and represented as compact and understandable models. In the ¯rst part of this work, this approach of comprehensive symbolic regression is described in detail. Important issues that play a role in the process are the prevention of bloat and over-¯tting, the simpli¯cation of models, and the identi¯cation of relevant input variables. In this context, di®erent methods for bloat control and prevention are presented and compared. In particular, the influence of o®spring selection on bloat is analyzed. In addition, a new way to detect over-¯tting is presented. On the basis of this, extensions for the reduction of over-¯tting are presented and compared. Pruning of models is featured prominently, on the one hand to prevent over-¯tting and on the other hand to simplify complex models. An important aspect is the analysis of the vast amount of di®erent models that results from the proposed approach. In this context, di®erent methods to quantify relevant factors are proposed. These methods can be used to identify interactions of variables of the analyzed system. Visualizing such interactions provides a general overview of the system in question which would not be possible by analysis of individual models which are concentrated on selected aspects of the problem. Additionally, the prognosis of multivariate time series with genetic programming is described in the ¯rst part. The second part of this work shows how the described approach can be applied to the analysis of real-world systems, and how the result of this data analysis process can result in the gain of new knowledge about the investigated system. The analyzed data stem from a blast furnace for the production of steel and an industrial chemical process. In addition the same approach is also applied on a data collection storing economic data in order to identify macro-economic interactions. Eidesstattliche ErklÄarung Ich erklÄarean Eides statt, dass ich die vorliegende Dissertation selbststÄandigund ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wÄortlich oder sinngemÄa¼entnommenen Stellen als solche kenntlich gemacht habe. Linz, December 15, 2010 Dipl.-Ing. Gabriel Kronberger vii Contents 1. Introduction 7 1.1. Thesis Statement . 10 1.2. Research Project Background . 11 2.
Recommended publications
  • Constituent Grammatical Evolution
    Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Constituent Grammatical Evolution Loukas Georgiou and William J. Teahan School of Computer Science, Bangor University Bangor, Wales [email protected] and [email protected] Abstract Grammatical Evolution’s unique features compared to other evolutionary algorithms are the degenerate genetic code We present Constituent Grammatical Evolution which facilitates the occurrence of neutral mutations (vari- (CGE), a new evolutionary automatic program- ous genotypes can represent the same phenotype), and the ming algorithm that extends the standard Gram- wrapping of the genotype during the mapping process which matical Evolution algorithm by incorporating the enables the reuse of the same genotype for the production of concepts of constituent genes and conditional be- different phenotypes. haviour-switching. CGE builds from elementary Grammatical Evolution (GE) takes inspiration from na- and more complex building blocks a control pro- ture. It embraces the developmental approach and draws gram which dictates the behaviour of an agent and upon principles that allow an abstract representation of a it is applicable to the class of problems where the program to be evolved. This abstraction enables the follow- subject of search is the behaviour of an agent in a ing: the separation of the search and solution spaces; the given environment. It takes advantage of the pow- evolution of programs in an arbitrary language; the exis- erful Grammatical Evolution feature of using a tence of degenerate genetic code; and the wrapping that BNF grammar definition as a plug-in component to allows the reuse of the genetic material.
    [Show full text]
  • Automated Design of Genetic Programming Classification Algorithms
    Automated Design of Genetic Programming Classification Algorithms Thambo Nyathi Supervisor: Prof. Nelishia Pillay This thesis is submitted in fulfilment of the academic requirements of Doctor of Philosophy in Computer Science School of Mathematics , Statistics and Computer Science University of KwaZulu Natal Pietermaritzburg South Africa December 2018 PREFACE The research contained in this thesis was completed by the candidate while based in the Discipline of Computer Science, School of Mathematics, Statistics and Computer Science of the College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa. The contents of this work have not been submitted in any form to another university and, except where the work of others is acknowledged in the text, the results reported are due to investigations by the candidate. ————————————— Date: 04-12-2018 Signature Professor Nelishia Pillay Declaration PLAGIARISM I, Thambo Nyathi, declare that: i) this dissertation has not been submitted in full or in part for any degree or examination to any other university; ii) this dissertation does not contain other persons’ data, pictures, graphs or other informa- tion, unless specifically acknowledged as being sourced from other persons; iii) this dissertation does not contain other persons’ writing, unless specifically acknowl- edged as being sourced from other researchers. Where other written sources have been quoted, then: a) their words have been re-written but the general information attributed to them has been referenced; b) where their exact words have been used, their writing has been placed inside quotation marks, and referenced; iv) where I have used material for which publications followed, I have indicated in detail my role in the work; v) this thesis is primarily a collection of material, prepared by myself, published as journal articles or presented as a poster and oral presentations at conferences.
    [Show full text]
  • Arxiv:2005.04151V1 [Cs.NE] 25 Apr 2020
    Noname manuscript No. (will be inserted by the editor) Swarm Programming Using Moth-Flame Optimization and Whale Optimization Algorithms Tapas Si Received: date / Accepted: date Abstract Automatic programming (AP) is an important area of Machine Learning (ML) where computer programs are generated automatically. Swarm Programming (SP), a newly emerging research area in AP, automatically gen- erates the computer programs using Swarm Intelligence (SI) algorithms. This paper presents two grammar-based SP methods named as Grammatical Moth- Flame Optimizer (GMFO) and Grammatical Whale Optimizer (GWO). The Moth-Flame Optimizer and Whale Optimization algorithm are used as search engines or learning algorithms in GMFO and GWO respectively. The proposed methods are tested on Santa Fe Ant Trail, quartic symbolic regression, and 3-input multiplexer problems. The results are compared with Grammatical Bee Colony (GBC) and Grammatical Fireworks algorithm (GFWA). The ex- perimental results demonstrate that the proposed SP methods can be used in automatic computer program generation. Keywords Automatic Programming · Swarm Programming · Moth-Flame Optimizer · Whale Optimization Algorithm 1 Introduction Automatic programming [1] is a machine learning technique by which com- puter programs are generated automatically in any arbitrary language. SP [3] is an automatic programming technique which uses SI algorithms as search en- gine or learning algorithms. The grammar-based SP is a type of SP in which Context-free Grammar (CFG) is used to generate computer programs in a target language. Genetic Programming (GP) [2] is a evolutionary algorithm T. Si Department of Computer Science & Engineering arXiv:2005.04151v1 [cs.NE] 25 Apr 2020 Bankura Unnayani Institute of Engineering Bankura-722146, West Bengal, India E-mail: [email protected] 2 Tapas Si in which tree-structured genome is used to represent a computer program and Genetic algorithm (GA) is used as a learning algorithm.
    [Show full text]
  • Grammatical Evolution: a Tutorial Using Gramevol
    Grammatical Evolution: A Tutorial using gramEvol Farzad Noorian, Anthony M. de Silva, Philip H.W. Leong July 18, 2020 1 Introduction Grammatical evolution (GE) is an evolutionary search algorithm, similar to genetic programming (GP). It is typically used to generate programs with syntax defined through a grammar. The original author's website [1] is a good resource for a formal introduction to this technique: • http://www.grammatical-evolution.org/ This document serves as a quick and informal tutorial on GE, with examples implemented using the gramEvol package in R. 2 Grammatical Evolution The goal of using GE is to automatically generate a program that minimises a cost function: 1.A grammar is defined to describe the syntax of the programs. 2.A cost function is defined to assess the quality (the cost or fitness) of a program. 3. An evolutionary algorithm, such as GA, is used to search within the space of all programs definable by the grammar, in order to find the program with the lowest cost. Notice that by a program, we refer to any sequence of instructions that perform a specific task. This ranges from a single expression (e.g., sin(x)), to several statements with function declarations, assignments, and control flow. The rest of this section will describe each component in more details. 2.1 Grammar A grammar is a set of rules that describe the syntax of sentences and expressions in a language. While grammars were originally invented for studying natural languages, they are extensively used in computer science for describing programming languages. 2.1.1 Informal introduction to context-free grammars GE uses a context-free grammar to describe the syntax of programs.
    [Show full text]
  • Applications of Evolutionary Computation 23Rd European Conference, Evoapplications 2020 Held As Part of Evostar 2020 Seville, Spain, April 15–17, 2020 Proceedings
    Lecture Notes in Computer Science 12104 Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA More information about this series at http://www.springer.com/series/7407 Pedro A. Castillo • Juan Luis Jiménez Laredo • Francisco Fernández de Vega (Eds.) Applications of Evolutionary Computation 23rd European Conference, EvoApplications 2020 Held as Part of EvoStar 2020 Seville, Spain, April 15–17, 2020 Proceedings 123 Editors Pedro A. Castillo Juan Luis Jiménez Laredo University of Granada Université Le Havre Normandie Granada, Spain Le Havre, France Francisco Fernández de Vega Universidad de Extremadura Mérida, Spain ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-43721-3 ISBN 978-3-030-43722-0 (eBook) https://doi.org/10.1007/978-3-030-43722-0 LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc.
    [Show full text]
  • Extending Grammatical Evolution to Evolve Digital Surfaces with Genr8
    Extending Grammatical Evolution to Evolve Digital Surfaces with Genr8 Martin Hemberg1 and Una-May O'Reilly2 1 Department of Bioengineering, Bagrit Centre, Imperial College London, South Kensington Campus, London SW7 2AZ, UK [email protected] 2 Computer Science and Arti¯cial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA [email protected] Abstract. Genr8 is a surface design tool for architects. It uses a grammar- based generative growth model that produces surfaces with an organic quality. Grammatical Evolution is used to help the designer search the universe of possible surfaces. We describe how we have extended Gram- matical Evolution, in a general manner, in order to handle the grammar used by Genr8. 1 Genr8:The Evolution of Digital Surfaces We have built a software tool for architects named Genr8. Genr8 'grows' dig- ital surfaces within the modeling environment of the CAD tool Maya. One of Genr8's digital surfaces, is shown on the left of Figure 1. Beside it, is a real (i.e. physical) surface that was evolved ¯rst in Genr8 then subsequently fabri- cated and assembled. A surface in Genr8 starts as an axiomatic closed polygon. It grows generatively via simultaneous, multidimensional rewriting. During growth, a surface's de¯nition is also influenced by the simulated physics of a user de¯ned 'environment' that has attractors, repellors and boundaries as its features. This integration of simulated physics and tropic influences produces surfaces with an organic appearance. In order to automate the speci¯cation, exploration and adaptation of Genr8's digital surfaces, an extended version of Grammatical Evolution (GE) [7] has been developed.
    [Show full text]
  • Artificial Neural Networks Generation Using Grammatical Evolution
    Artificial Neural Networks Generation Using Grammatical Evolution Khabat Soltanian1, Fardin Akhlaghian Tab2, Fardin Ahmadi Zar3, Ioannis Tsoulos4 1Department of Software Engineering, University of Kurdistan, Sanandaj, Iran, [email protected] 2Department of Engineering, University of Kurdistan, Sanandaj, Iran, [email protected] 3Department of Engineering, University of Kurdistan, Sanandaj, Iran, [email protected] 4Department of Computer Science, University of Ioannina, Ioannina, Greece, [email protected] Abstract: in this paper an automatic artificial neural network BP). GE is previously used for neural networks generation method is described and evaluated. The proposed construction and training and successfully tested on method generates the architecture of the network by means of multiple classification and regression benchmarks [3]. In grammatical evolution and uses back propagation algorithm for training it. In order to evaluate the performance of the method, [3] the grammatical evolution creates the architecture of a comparison is made against five other methods using a series the neural networks and also suggest a set for the weights of classification benchmarks. In the most cases it shows the of the constructed neural network. Whereas the superiority to the compared methods. In addition to the good grammatical evolution is established for automatic experimental results, the ease of use is another advantage of the programming tasks not the real number optimization. method since it works with no need of experts. Thus, we altered the grammar to generate only the Keywords- artificial neural networks, evolutionary topology of the network and used BP algorithm for computing, grammatical evolution, classification problems. training it. The following section reviews several evolutionary 1.
    [Show full text]
  • A Tool for Estimation of Grammatical Evolution Models
    AutoGE: A Tool for Estimation of Grammatical Evolution Models Muhammad Sarmad Ali a, Meghana Kshirsagar b, Enrique Naredo c and Conor Ryan d Biocomputing and Developmental Systems Lab, University of Limerick, Ireland Keywords: Grammatical Evolution, Symbolic Regression, Production Rule Pruning, Effective Genome Length. Abstract: AutoGE (Automatic Grammatical Evolution), a new tool for the estimation of Grammatical Evolution (GE) parameters, is designed to aid users of GE. The tool comprises a rich suite of algorithms to assist in fine tuning BNF grammar to make it adaptable across a wide range of problems. It primarily facilitates the identification of optimal grammar structures, the choice of function sets to achieve improved or existing fitness at a lower computational overhead over the existing GE setups. This research work discusses and reports initial results with one of the key algorithms in AutoGE, Production Rule Pruning, which employs a simple frequency- based approach for identifying less worthy productions. It captures the relationship between production rules and function sets involved in the problem domain to identify optimal grammar structures. Preliminary studies on a set of fourteen standard Genetic Programming benchmark problems in the symbolic regression domain show that the algorithm removes less useful terminals and production rules resulting in individuals with shorter genome lengths. The results depict that the proposed algorithm identifies the optimal grammar structure for the symbolic regression problem domain to be arity-based grammar. It also establishes that the proposed algorithm results in enhanced fitness for some of the benchmark problems. 1 INTRODUCTION of fitness functions that gives a fitness score to the individuals in the population.
    [Show full text]
  • In This Issue
    Volume 7 SIGEVOLution Issue 4 newsletter of the ACM Special Interest Group on Genetic and Evolutionary Computation in this issue GECCO 2015 Summary GECCO best paper winners Latest contents from ECJ and GPEM CFPs for upcoming conferences Latest vacancies Track: Estimation of Distribution Algorithms (EDA)+ EDITORIAL THEORY Those of you who managed to make it to sunny Winner: Improved Runtime Bounds for the (1+1) Madrid for GECCO this year will already know EA on Random 3-CNF Formulas Based on Fitness- what a fantastic conference it was from every Distance Correlation perspective – great science, great company and Benjamin Doerr, Frank Neumann, Andrew M. great food! For those who missed it, you can catch Sutton (Full text) up with best paper awards and read a summary from this year’s General Chair Anna Esparcia- Track: Evolutionary Combinatorial Optimization Alcazar inside. You will also find the latest tables and Metaheuristics (ECOM) of contents from ECJ and GPEM, as well as calls Winner: Global vs Local Search on Multi-objective for some of the conferences coming up in 2016. NK-landscapes: Contrasting the Impact of Problem Get writing those papers! The next issue of the Features newsletter will provide an insider’s view on Denver Fabio Daolio, Arnaud Liefooghe, Sébastien Verel, which will be the location for GECCO 2016, just in Hernán Aguirre, Kiyoshi Tanaka (Full text) case you need any persuasion to attend. I’d also like to take this opportunity to congratulate Marc Track: Evolutionary Machine Learning -EML Schoenauer on becoming Chair of SIGEVO from July and to extend a warm thank you to Wolfgang Winner: Subspace clustering using evolvable Banzhaf for his hard work over the last four years.
    [Show full text]
  • Grammatical Evolution
    Chapter 2 Grammatical Evolution Genetic Programming (GP) is a population-based search method based upon neo-Darwinian principles of evolution, which manipulates executable struc- tures. It can be used to automatically generate computer code to solve real- world problems in a broad array of application domains [173]. Grammatical Evolution(GE) (see [154, 148, 24, 54, 55, 151, 190, 160, 86, 25, 43, 7, 163]) is a grammar-based form of GP. It marries principles from molecular biology to the representational power of formal grammars. GE’s rich modularity gives a unique flexibility, making it possible to use alterna- tive search strategies, whether evolutionary, or some other heuristic (be it stochastic or deterministic) and to radically change its behaviour by merely changing the grammar supplied. As a grammar is used to describe the struc- tures that are generated by GE, it is trivial to modify the output structures by simply editing the plain text grammar. The explicit grammar allows GE to easily generate solutions in any language (or a useful subset of a language). For example, GE has been used to generate solutions in multiple languages including Lisp, Scheme, C/C++, Java, Prolog, Postscript, and English. The ease with which a user can manipulate the output structures by simply writ- ing or modifying a grammar in a text file provides an attractive flexibility and ease of application not as readily enjoyed with the standard approach to Genetic Programming. The grammar also implicitly provides a mechanism by which type information can be encoded thus overcoming the property of closure, which limits the traditional representation adopted by Genetic Pro- gramming to a single type.
    [Show full text]
  • On the Locality of Grammatical Evolution
    On the Locality of Grammatical Evolution Franz Rothlauf and Marie Oetzel Department of Business Administration and Information Systems University of Mannheim, 68131 Mannheim/Germany [email protected] Abstract. This paper investigates the locality of the genotype- phenotype mapping (representation) used in grammatical evolution (GE). The results show that the representation used in GE has prob- lems with locality as many neighboring genotypes do not correspond to neighboring phenotypes. Experiments with a simple local search strat- egy reveal that the GE representation leads to lower performance for mutation-based search approaches in comparison to standard GP repre- sentations. The results suggest that locality issues should be considered for further development of the representation used in GE. 1 Introduction Grammatical Evolution (GE) [1] is a variant of Genetic Programming (GP) [2] that can evolve complete programs in an arbitrary language using a variable- length binary string. In GE, phenotypic expressions are created from binary genotypes by using a complex representation (genotype-phenotype mapping). The representation selects production rules in a Backus-Naur form grammar and thereby creates a phenotype. GE approaches have been applied to test problems and real-world applications and good performance has been reported [1,3,4]. The locality of a genotype-phenotype mapping describes how well genotypic neighbors correspond to phenotypic neighbors. Previous work has indicated that a high locality of representations is necessary for efficient evolutionary search [5– 9]. Until now locality has mainly been used in the context of standard genetic algorithms to explain performance differences. The purpose of this paper is to investigate the locality of the genotype- phenotype mapping used in GE.
    [Show full text]
  • Structured Grammatical Evolution Applied to Program Synthesis
    Structured Grammatical Evolution Applied to Program Synthesis by Andrew H. Zhang Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology May 2019 © 2019 Andrew H. Zhang. All rights reserved. The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole and in part in any medium now known or hereafter created. Author: _____________________________________________________ Department of Electrical Engineering and Computer Science May 20, 2019 Certified by: ________________________________________________________ Una-May O’Reilly Principal Research Scientist, MIT CSAIL Thesis Supervisor May 20, 2019 Certified by: _________________________________________________ Erik Hemberg Research Scientist, MIT CSAIL Thesis Co-Supervisor May 20, 2019 1 Structured Grammatical Evolution Applied to Program Synthesis By Andrew Zhang Submitted to the Department of Electrical Engineering and Computer Science May 28, 2018 In Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Grammatical Evolution (GE) is an evolutionary algorithm that is gaining popularity due to its ability to solve problems where it would be impossible to explore every solution within a realistic time. Structured Grammatical Evolution (SGE) was developed to overcome some of the shortcomings of GE, such as locality issues as well as wrapping around the genotype to 2 complete the phenotype .​ In this paper, we apply SGE to program synthesis, where the ​ computer must generate code to solve algorithmic problems. SGE was improved upon, because 2 the current definition of SGE ​ does not work.
    [Show full text]