6.1 Standard GP

Total Page:16

File Type:pdf, Size:1020Kb

6.1 Standard GP Eingereicht von Bogdan Burlacu Angefertigt am Institute for Formal Models and Verification Erstbeurteiler FH-Prof. PD DI Dr. Michael Aenzeller Zweitbeurteiler a.Univ.-Prof. Dr. Tracing of Evolutionary Josef Küng Search Trajectories in August 2017 Complex Hypothesis Spaces Dissertation zur Erlangung des akademischen Grades Doktor der Technischen Wissenschaen im Doktoratsstudium der Technischen Wissenschaen JOHANNES KEPLER UNIVERSITÄT LINZ Altenbergerstraße 69 4040 Linz, Österreich www.jku.at DVR 0093696 Acknowledgements First and foremost, I would like to thank Prof. Michael Affenzeller for the opportunity to pursue a PhD within the Heuristics and Evolutionary Algorithms Laboratory (HEAL) at the University of Applied Sciences Upper Austria and for the continuous guidance and friendship. The work described in this thesis would not have been possible without the funding provided by the International PhD program in Informatics Hagenberg, offered as a specialization of the computer science PhD program at the Johannes Kepler University Linz (Austria). I would also like to extend my thanks and gratitude to my colleagues from HEAL who supported me during this time with many insights, criticisms, discussions and development advice: Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Andreas Beham, Stefan Vonolfen, and the whole Heuristiclab team. i Abstract Understanding the internal functioning of evolutionary algorithms is an essential require- ment for improving their performance and reliability. Increased computational resources available in current mainstream computers make it possible for new previously infeasible research directions to be explored. Therefore, a comprehensive theoretical analysis of their mechanisms and dynamics using modern tools becomes possible. Recent algorithmic achievements like offspring selection in combination with linear scaling have enabled genetic programming (GP) to achieve high quality results in system identification in less than 50 generations using populations of only several hundred individuals. Therefore, the active gene pool of evolutionary search remains manageable and may be subjected to new theoretical investigations closely related to genetic programming schema theories, building block hypotheses and bloat theories. Genetic algorithms emulate emergent systems in which complex patterns are formed from an initially simple and random pool of elementary structures. In GP, complexity emerges under the influence of stabilizing selection which preserves the useful genetic variation created by recombination and mutation. The mapping between the structures used for solution representation and the ones used for the evaluation of fitness has a major influence on algorithm behavior. Population-wide effects concerning building blocks, genetic diversity and bloat can be conceptually seen as results of the complex interaction between phenotypic operators (selection) and genotypic operators (mutation and recombination). This coupling known as the “variation-selection loop” is the main engine for GP emergent behavior and constitutes the main topic of this research. This thesis aims to provide a unified theoretical framework which can explain GP evolution. To this end, it explores the way in which the genotype-phenotype map, in relation with the evolutionary operators (selection, recombination, mutation) determines algorithmic behaviour. As the title suggests, the main contribution consists of a novel “tracing” framework that makes it possible to determine the exact patterns of building block and gene propagation through the generations and the way smaller elements are gradually assembled into more complex structures by the evolutionary algorithm. ii Zusammenfassung Um die Leistung und Zuverlässigkeit von evolutionären Algorithmen zu verbessern, ist es notwendig deren interne Funktionsweise zu verstehen. Die hohe Rechenleistung aktuel- ler Mainstream-Computer erlaubt neue Forschungsrichtungen zu erkunden, welche früher, aufgrund fehlender Rechenleistung, nicht möglich waren. Für evolutionäre Algorithmen wird es dadurch möglich, deren interne Mechaniken und Dynamiken umfangreich zu analysieren. Aktuelle algorithmische Fortschritte, wie Nachkommensselektion (Offspring Selection) in Kombination mit linearer Skalierung, ermöglicht Genetischer Programmie- rung (GP) hochqualitative Ergebnisse in der Systemidentifikation zu erreichen, in weniger als 50 Generationen bei einer Populationsgröße von nur wenigen hunderten Individuen. Der dadurch überschaubare aktive Genpool der evolutionären Suche ermöglicht neue theoretische Untersuchungen zum GP Schema Theorem, zur Baustein Hypothese und zu Bloat-Theorien. Genetische Algorithmen emulieren emergente Systeme in denen komplexe Muster ge- formt werden, basierend auf einer initialen, zufällig generiertem Menge an elementaren Strukturen. In GP entsteht die Komplexität durch den Einfluss der stabilisierenden Selek- tion, welche nützliche genetische Variation erhält die von Rekombination und Mutation erzeugt werden. Die Zuordnung zwischen Strukturen zur Lösungsrepräsentation (Ge- notyp) und Fitnessevaluierung (Phänotyp) beeinflusst das algorithmische Verhalten stark. Populationsweite Auswirkungen betreffend Bausteine, genetischer Diversität und Bloat entstehen durch das komplexe Zusammenwirken phänotypischen Operatoren (Selektion) und genotypischen Operatoren (Rekombination und Mutation). Dieser Mechanismus, be- kannt als „Variation-Selektion Schleife”, ist die treibende Kraft des emergente Verhalten von GP und bildet das Hauptforschungsthema dieser Arbeit. Diese Arbeit zielt darauf ab, einen einheitlichen, theoretischen Rauem zu schaffen wel- cher Evolution in GP erklären kann. Dafür wird der Einfluss von auf das algorithmische Verhalten untersucht, basierend auf die Zuordnung von Genotyp und Phänotyp, unter Berücksichtigung der evolutionärer Operatoren (Selektion, Rekombination, Mutation). Wie der Titel der Arbeit bereits andeutet, besteht der wichtigste Beitrag aus einem neuartigen System zur Überwachung und Rückverfolgung von Genen über mehrere Generationen hinweg. Dies ermöglicht es das Verhalten von Bausteinen zu erforschen, sowie zu erkunden wie bei evolutionären Algorithmen aus kleinen Elementen nach und nach komplexere Strukturen gebildet werden. iii Declaration Ich erkläre an Eides statt, dass ich die vorliegende Dissertation selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäßentnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Dissertation ist mit dem elektronisch übermittelten Textdokument iden- tisch. Bogdan Burlacu, Linz, 2017 iv Contents 1 Introduction 5 1.1 Thesis Statement ................................ 5 1.1.1 Research Objectives........................... 6 1.1.2 Main Results and Contribution..................... 6 1.1.3 Thesis Structure and Organization................... 7 2 Heuristics and Metaheuristics 9 2.1 Optimization Problems............................. 9 2.1.1 Formal Statement............................ 9 2.1.2 Discrete and Combinatorial Optimization............... 10 2.2 Heuristics .................................... 11 2.2.1 Which Problems are Considered “Hard”? ............... 13 2.2.2 The Subset Sum Problem (Unidimensional Knapsack) . 14 2.2.3 Computational Class NP ........................ 15 2.2.4 The Traveling Salesman Problem.................... 16 2.3 Metaheuristics.................................. 19 2.3.1 Definition ................................ 19 2.3.2 Randomized Algorithms and Probabilistic Methods ......... 20 2.3.3 Metaphors and Metaheuristics..................... 21 2.3.4 Taxonomy of Metaheuristics...................... 22 2.3.5 Trajectory-based Metaheuristics.................... 22 Local Search............................... 22 Simulated Annealing........................... 23 Tabu Search ............................... 23 Scatter Search.............................. 23 2.3.6 Population-based Metaheuristics.................... 23 Particle Swarm Optimization...................... 24 2.3.7 Constructive Metaheuristics ...................... 24 Greedy Randomized Adaptive Search ................. 24 Ant Colony Optimization........................ 24 2.4 The No Free Lunch theorem.......................... 25 2.4.1 Implications for Metaheuristics..................... 25 2.4.2 Summary................................. 29 1 Contents 3 Evolutionary Computation 30 3.1 Introduction................................... 30 3.1.1 Evolution through Natural Selection.................. 30 3.1.2 A Short History of Population Genetics................ 31 3.1.3 The Modern Synthesis.......................... 33 3.1.4 Developmental and Molecular Genetics................ 34 3.2 Evolutionary Search Methods ......................... 35 3.2.1 Evolution Strategies........................... 36 3.2.2 Evolutionary Programming....................... 38 3.2.3 Differential Evolution.......................... 39 3.2.4 Genetic Algorithms........................... 41 Recombination in Genetic Algorithms................. 41 Selection in Genetic Algorithms .................... 43 Holland’s Schema Theorem....................... 43 Criticism of the Schema Theorem................... 45 4 Genetic Programming 47 4.1 Solution Encoding................................ 47 4.1.1 Syntax Tree Encoding.......................... 48 Closure and Sufficiency. Properties of Good Representations . 49 4.1.2 Other Encodings............................. 50 Grammar-based GP..........................
Recommended publications
  • Population Genetics I
    21 March, 2016: Population genetics I 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Background reading This book is OK primer to pop.gen. in genomics era - focus on coalescent theory and SNP data - unfortunately it contains lots of typos 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics Definition - studies distributions & changes of allele frequencies in populations over time - effects considered: - natural selection, genetic drift, mutation and gene flow - recombination, population subdivision and population structure - allows inferring past events as well as predicting future History - fundamental work by Haldane, Wright and Fisher on first half of 20th century - recent development: coalescent theory by Kingman in 1980’s - suitable for SNPs data - computationally highly efficient 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g.
    [Show full text]
  • Workshop in Molecular Evolution
    Workshop in Molecular Evolution Jan 7 - 11, 2019 Shanghai Two faces of one process: phylogenetics vs. population genetics Phylogenetics – model of speciation Tine et al., 2014 Population genetics – model of coalescence Population genetics Idealised population (Fisher-Wright population) • Random mating – each copy of the gene found in the new generation is drawn independently at random from all copies of the gene in the old generation • No selection • No migration • No mutation • Large population size, no drifting British biologist and statistician Ronald Fisher • In a series of papers starting in 1918 and culminating in his 1930 book The Genetical Theory of Natural Selection • Fisher showed that the continuous variation measured by the biometricians could be produced by the combined action of many discrete genes, and that natural selection could change allele frequencies in a population, resulting in evolution. British geneticist J.B.S. Haldane • worked out the mathematics of allele frequency change at a single gene locus under a broad range of conditions. • Haldane also applied statistical analysis to real-world examples of natural selection, such as peppered moth evolution and industrial melanism The American biologist Sewall Wright • animal breeding experiments, focused on combinations of interacting genes, and the effects of inbreeding on small, relatively isolated populations that exhibited genetic drift. • In 1932 Wright introduced the concept of an adaptive landscape and argued that genetic drift and inbreeding could drive a small,
    [Show full text]
  • Evolutionary Genetics LV 25600-01 | Lecture with Exercises | 4KP
    Evolutionary Genetics LV 25600-01 | Lecture with exercises | 4KP 1 HS2019 Population Genetics Extinction vortex is caused by a positive feedback loop (Gilpin and Soule, 1986). Gilpin ME, Soulé ME (1986). "Minimum Viable Populations: Processes of Species Extinction". In M. E. Soulé. Conservation Biology: The Science of Scarcity and Diversity. Sinauer, Sunderland, Mass. pp. 19–34. !2 HS19 | UniBas | JCW Population Genetics ▷ Effective Population Size Ne “The number of breeding individuals in an idealised population that would show the same amount of dispersion of allele frequencies under random genetic drift or the same amount of inbreeding as the population under consideration.” Ronald Fisher and Sewall Wright 3 HS19 | UniBas | JCW Population Genetics ▷ Effective Population Size unequal sex-ratios chromosome (organelle) linkage number of progeny (k≠2) Ne variance in offsprings (Vk≠2) fluctuation in population size 4 HS19 | UniBas | JCW Population Genetics ▷ Non-Random Mating The inbreeding coefficient of an individual (F) is the probability that an individual has two alleles at a locus that are identical by descent. It measures the amount of inbreeding by comparing the observed frequency of heterozygotes (Ho) in the population to the frequency expected under random mating - Hardy-Weinberg (He). H F = 1− o H e In a panmixic population the observed (Ho) and the expected frequency of heterozygosity is not significantly different. H o ≈ H e → F = 0 !5 HS19 | UniBas | JCW Population Genetics ▷ Mutation In the irreversible mutation model the allele frequency decreases over time depending on the mutation rate. p : frequency of allele A after t generations t t p0 :starting frequency of allele A pt = p0 (1− µ) µ : muation rate The mutation rate in an ideal the population equals zero and the allele frequency does not change from one generation to the next.
    [Show full text]
  • Prediction and Estimation of Effective Population Size
    Heredity (2016) 117, 193–206 & 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 0018-067X/16 www.nature.com/hdy REVIEW Prediction and estimation of effective population size J Wang1, E Santiago2 and A Caballero3 Effective population size (Ne) is a key parameter in population genetics. It has important applications in evolutionary biology, conservation genetics and plant and animal breeding, because it measures the rates of genetic drift and inbreeding and affects the efficacy of systematic evolutionary forces, such as mutation, selection and migration. We review the developments in predictive equations and estimation methodologies of effective size. In the prediction part, we focus on the equations for populations with different modes of reproduction, for populations under selection for unlinked or linked loci and for the specific applications to conservation genetics. In the estimation part, we focus on methods developed for estimating the current or recent effective size from molecular marker or sequence data. We discuss some underdeveloped areas in predicting and estimating Ne for future research. Heredity (2016) 117, 193–206; doi:10.1038/hdy.2016.43; published online 29 June 2016 INTRODUCTION which correspond to the so-called inbreeding and variance effective The concept of effective population size, introduced by Sewall Wright sizes, respectively (Crow and Kimura, 1970). (1931, 1933), is central to plant and animal breeding (Falconer and Predictions of the effective population size can also be obtained Mackay, 1996), conservation genetics (Frankham et al.,2010; from the largest nonunit eigenvalue of the transition matrix of a Allendorf et al., 2013) and molecular variation and evolution Markov Chain which describes the dynamics of allele frequencies.
    [Show full text]
  • RS7069 18 Pp033 80 Waples.Pdf (921.1Kb)
    J. CETACEAN RES. MANAGE. 18: 33–80, 2018 33 Guidelines for genetic data analysis ROBIN S. WAPLES1, A. RUS HOELZEL2, OSCAR GAGGIOTTI3, RALPH TIEDEMANN4, PER J. PALSBØLL5, FRANK CIPRIANO6, JENNIFER JACKSON7, JOHN W. BICKHAM8 AND AIMÉE R. LANG9 Contact e-mail: [email protected] ABSTRACT The IWC Scientific Committee recently adopted guidelines for quality control of DNA data. Once data have been collected, the next step is to analyse the data and make inferences that are useful for addressing practical problems in conservation and management of cetaceans. This is a complex exercise, as numerous analyses are possible and users have a wide range of choices of software programs for implementing the analyses. This paper reviews the underlying issues, illustrates application of different types of genetic data analysis to two complex management problems (involving common minke whales and humpback whales), and concludes with a number of recommendations for best practices in the analysis of population genetic data. An extensive Appendix provides a detailed review and critique of most types of analyses that are used with population genetic data for cetaceans. KEYWORDS: ABUNDANCE ESTIMATE; BREEDING GROUNDS; CONSERVATION; DNA FINGERPRINTING; FEEDING GROUNDS; GENETICS; HUMPBACK WHALE; MIGRATION; MINKE WHALE; REPRODUCTION; TAXONOMY INTRODUCTION before the analyses considered here begin, the DNA data Recently, guidelines were adopted for quality control of quality-control guidelines have been consulted and followed DNA data intended for use within the International Whaling to the extent possible, and that any substantial deviations Commission (IWC, 2009; 2015a). Once the data have have been documented and explained. been collected, the next step is to analyse the data and make As discussed in detail later, genetic information can inferences that are useful for addressing practical problems provide insights relevant to many types of problems in the management of cetaceans.
    [Show full text]