6.1 Standard GP

Eingereicht von Bogdan Burlacu Angefertigt am Institute for Formal Models and Verification Erstbeurteiler FH-Prof. PD DI Dr. Michael Aenzeller Zweitbeurteiler a.Univ.-Prof. Dr. Tracing of Evolutionary Josef Küng Search Trajectories in August 2017 Complex Hypothesis Spaces Dissertation zur Erlangung des akademischen Grades Doktor der Technischen Wissenschaen im Doktoratsstudium der Technischen Wissenschaen JOHANNES KEPLER UNIVERSITÄT LINZ Altenbergerstraße 69 4040 Linz, Österreich www.jku.at DVR 0093696 Acknowledgements First and foremost, I would like to thank Prof. Michael Affenzeller for the opportunity to pursue a PhD within the Heuristics and Evolutionary Algorithms Laboratory (HEAL) at the University of Applied Sciences Upper Austria and for the continuous guidance and friendship. The work described in this thesis would not have been possible without the funding provided by the International PhD program in Informatics Hagenberg, offered as a specialization of the computer science PhD program at the Johannes Kepler University Linz (Austria). I would also like to extend my thanks and gratitude to my colleagues from HEAL who supported me during this time with many insights, criticisms, discussions and development advice: Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Andreas Beham, Stefan Vonolfen, and the whole Heuristiclab team. i Abstract Understanding the internal functioning of evolutionary algorithms is an essential require- ment for improving their performance and reliability. Increased computational resources available in current mainstream computers make it possible for new previously infeasible research directions to be explored. Therefore, a comprehensive theoretical analysis of their mechanisms and dynamics using modern tools becomes possible. Recent algorithmic achievements like offspring selection in combination with linear scaling have enabled genetic programming (GP) to achieve high quality results in system identification in less than 50 generations using populations of only several hundred individuals. Therefore, the active gene pool of evolutionary search remains manageable and may be subjected to new theoretical investigations closely related to genetic programming schema theories, building block hypotheses and bloat theories. Genetic algorithms emulate emergent systems in which complex patterns are formed from an initially simple and random pool of elementary structures. In GP, complexity emerges under the influence of stabilizing selection which preserves the useful genetic variation created by recombination and mutation. The mapping between the structures used for solution representation and the ones used for the evaluation of fitness has a major influence on algorithm behavior. Population-wide effects concerning building blocks, genetic diversity and bloat can be conceptually seen as results of the complex interaction between phenotypic operators (selection) and genotypic operators (mutation and recombination). This coupling known as the “variation-selection loop” is the main engine for GP emergent behavior and constitutes the main topic of this research. This thesis aims to provide a unified theoretical framework which can explain GP evolution. To this end, it explores the way in which the genotype-phenotype map, in relation with the evolutionary operators (selection, recombination, mutation) determines algorithmic behaviour. As the title suggests, the main contribution consists of a novel “tracing” framework that makes it possible to determine the exact patterns of building block and gene propagation through the generations and the way smaller elements are gradually assembled into more complex structures by the evolutionary algorithm. ii Zusammenfassung Um die Leistung und Zuverlässigkeit von evolutionären Algorithmen zu verbessern, ist es notwendig deren interne Funktionsweise zu verstehen. Die hohe Rechenleistung aktuel- ler Mainstream-Computer erlaubt neue Forschungsrichtungen zu erkunden, welche früher, aufgrund fehlender Rechenleistung, nicht möglich waren. Für evolutionäre Algorithmen wird es dadurch möglich, deren interne Mechaniken und Dynamiken umfangreich zu analysieren. Aktuelle algorithmische Fortschritte, wie Nachkommensselektion (Offspring Selection) in Kombination mit linearer Skalierung, ermöglicht Genetischer Programmie- rung (GP) hochqualitative Ergebnisse in der Systemidentifikation zu erreichen, in weniger als 50 Generationen bei einer Populationsgröße von nur wenigen hunderten Individuen. Der dadurch überschaubare aktive Genpool der evolutionären Suche ermöglicht neue theoretische Untersuchungen zum GP Schema Theorem, zur Baustein Hypothese und zu Bloat-Theorien. Genetische Algorithmen emulieren emergente Systeme in denen komplexe Muster ge- formt werden, basierend auf einer initialen, zufällig generiertem Menge an elementaren Strukturen. In GP entsteht die Komplexität durch den Einfluss der stabilisierenden Selek- tion, welche nützliche genetische Variation erhält die von Rekombination und Mutation erzeugt werden. Die Zuordnung zwischen Strukturen zur Lösungsrepräsentation (Ge- notyp) und Fitnessevaluierung (Phänotyp) beeinflusst das algorithmische Verhalten stark. Populationsweite Auswirkungen betreffend Bausteine, genetischer Diversität und Bloat entstehen durch das komplexe Zusammenwirken phänotypischen Operatoren (Selektion) und genotypischen Operatoren (Rekombination und Mutation). Dieser Mechanismus, be- kannt als „Variation-Selektion Schleife”, ist die treibende Kraft des emergente Verhalten von GP und bildet das Hauptforschungsthema dieser Arbeit. Diese Arbeit zielt darauf ab, einen einheitlichen, theoretischen Rauem zu schaffen wel- cher Evolution in GP erklären kann. Dafür wird der Einfluss von auf das algorithmische Verhalten untersucht, basierend auf die Zuordnung von Genotyp und Phänotyp, unter Berücksichtigung der evolutionärer Operatoren (Selektion, Rekombination, Mutation). Wie der Titel der Arbeit bereits andeutet, besteht der wichtigste Beitrag aus einem neuartigen System zur Überwachung und Rückverfolgung von Genen über mehrere Generationen hinweg. Dies ermöglicht es das Verhalten von Bausteinen zu erforschen, sowie zu erkunden wie bei evolutionären Algorithmen aus kleinen Elementen nach und nach komplexere Strukturen gebildet werden. iii Declaration Ich erkläre an Eides statt, dass ich die vorliegende Dissertation selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäßentnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Dissertation ist mit dem elektronisch übermittelten Textdokument iden- tisch. Bogdan Burlacu, Linz, 2017 iv Contents 1 Introduction 5 1.1 Thesis Statement ................................ 5 1.1.1 Research Objectives........................... 6 1.1.2 Main Results and Contribution..................... 6 1.1.3 Thesis Structure and Organization................... 7 2 Heuristics and Metaheuristics 9 2.1 Optimization Problems............................. 9 2.1.1 Formal Statement............................ 9 2.1.2 Discrete and Combinatorial Optimization............... 10 2.2 Heuristics .................................... 11 2.2.1 Which Problems are Considered “Hard”? ............... 13 2.2.2 The Subset Sum Problem (Unidimensional Knapsack) . 14 2.2.3 Computational Class NP ........................ 15 2.2.4 The Traveling Salesman Problem.................... 16 2.3 Metaheuristics.................................. 19 2.3.1 Definition ................................ 19 2.3.2 Randomized Algorithms and Probabilistic Methods ......... 20 2.3.3 Metaphors and Metaheuristics..................... 21 2.3.4 Taxonomy of Metaheuristics...................... 22 2.3.5 Trajectory-based Metaheuristics.................... 22 Local Search............................... 22 Simulated Annealing........................... 23 Tabu Search ............................... 23 Scatter Search.............................. 23 2.3.6 Population-based Metaheuristics.................... 23 Particle Swarm Optimization...................... 24 2.3.7 Constructive Metaheuristics ...................... 24 Greedy Randomized Adaptive Search ................. 24 Ant Colony Optimization........................ 24 2.4 The No Free Lunch theorem.......................... 25 2.4.1 Implications for Metaheuristics..................... 25 2.4.2 Summary................................. 29 1 Contents 3 Evolutionary Computation 30 3.1 Introduction................................... 30 3.1.1 Evolution through Natural Selection.................. 30 3.1.2 A Short History of Population Genetics................ 31 3.1.3 The Modern Synthesis.......................... 33 3.1.4 Developmental and Molecular Genetics................ 34 3.2 Evolutionary Search Methods ......................... 35 3.2.1 Evolution Strategies........................... 36 3.2.2 Evolutionary Programming....................... 38 3.2.3 Differential Evolution.......................... 39 3.2.4 Genetic Algorithms........................... 41 Recombination in Genetic Algorithms................. 41 Selection in Genetic Algorithms .................... 43 Holland’s Schema Theorem....................... 43 Criticism of the Schema Theorem................... 45 4 Genetic Programming 47 4.1 Solution Encoding................................ 47 4.1.1 Syntax Tree Encoding.......................... 48 Closure and Sufficiency. Properties of Good Representations . 49 4.1.2 Other Encodings............................. 50 Grammar-based GP..........................

6.1 Standard GP

Population Genetics I

Workshop in Molecular Evolution

Evolutionary Genetics LV 25600-01 | Lecture with Exercises | 4KP

Prediction and Estimation of Effective Population Size

RS7069 18 Pp033 80 Waples.Pdf (921.1Kb)