A Novel Developmental Genetic Programming Methodology For

ANOVEL DEVELOPMENTAL GENETIC PROGRAMMING METHODOLOGY FOR MATHEMATICAL MODELING AND NEUROEVOLUTION by Stephen Johns B.Sc. Ryerson University, 2007 A thesis presented to Ryerson University in partial fulfillment of the requirements for the degree of Master of Science in the Program of Computer Science Toronto, Canada, 2010 c Stephen Johns 2010 ii Author’s Declaration I hereby declare that I am the sole author of this thesis. I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research. Signed: I further authorize Ryerson University to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. Signed: iii iv ANOVEL DEVELOPMENTAL GENETIC PROGRAMMING METHODOLOGY FOR MATHEMATICAL MODELING AND NEUROEVOLUTION Stephen Johns M. Sc. in Computer Science, 2010 Ryerson University, Toronto, Canada Abstract In this work, a novel developmental genetic programming methodology called NEXT (Next Encoding of eXpression Trees) is introduced. NEXT was designed to include the following key properties: a variable-length solution representation with automatic sizing of individuals, an efficient interpretation of solution repre- sentations, a diverse repertoire of search operators, and the ability to be customized to work on multiple problem domains, including mathematical modeling via symbolic regression, and neuroevolution (the evolution of artificial neural networks). The approach was tested using a selection of problems involving symbolic regression of poly- nomials of different degrees, and neuroevolution for logic synthesis and pairwise classification. Our experimental results, compared against those of Gene Expression Programming on the same problem set, demonstrate that NEXT was capable of successfully evolving variable- length solutions to these problems. v vi Acknowledgements Dr. Marcus Vinicius dos Santos for his assistance, guidance, patience, endless reviewing, tolerance of my barrages of e-mails, and an innate ability to come up with acronyms. My parents Christopher and Jill, without the support and encouragement of whom this work would never have been possible. My brother Jason for his support, the motivation to return to school, and the occasional well timed kick in the pants. Dr. Rixi Abrahamsohn for helping me find my way. Dr. Katrin Rohlf for spending countless office hours explaining things I ought to have learned years before, and for sparking my interest in research. Lori Christie for being ”just Lori”. Travis Christie Dunk for adding some colour to my work, and for providing a reminder of the truly important things in life (like apple juice and fast cars). Dave Gerencer for snow trudging, brewing, Texas, and listening to me talk about chromosomes and fitness func- tions for two years. Maria Landau for her endless assistance in all administrative matters, and for the cheerful encouragement. Michelle Walker for understanding. Thanks to Cory Baker, Nigel Browne, Tim Dafoe, Joel Micallef, Rajan Parthasarathy, Jesse Robertson, Michael Robertson, Ryan Shaw, Matt Watson and numerous others for their support, encouragement, LATEXwizardry, CPU time, and distraction. Additional thanks to Elliott Brood, Sam Calagione, and various purveyors of C8H10N4O2-based beverages. vii viii Dedication This thesis is dedicated to memory of my grandfather Roy Wilson, who taught me the value of learning; whatever form it may take. ix x Table of Contents 1 Introduction1 1.1 Approach.....................................2 1.2 Contributions...................................3 1.3 Overview of Thesis................................4 2 Background and Related Work5 2.1 Introduction to Evolutionary Computation....................5 2.1.1 Basis of Evolutionary Computation...................5 2.1.2 Representation, Encoding, and the Genotype-Phenotype Duality....6 2.1.3 Genetic Operators............................ 10 2.2 Evolutionary Algorithms............................. 11 2.2.1 Genetic Algorithm............................ 12 2.2.2 Genetic Programming.......................... 13 2.2.3 Gene Expression Programming..................... 15 2.3 Neuroevolution.................................. 20 2.3.1 Neuroevolution in GEP.......................... 20 2.3.2 NEAT................................... 24 3 Methodology and Implementation 25 3.1 NEXT....................................... 26 3.1.1 The Entities of NEXT.......................... 26 3.1.2 NEXT and Expression Trees....................... 30 3.1.3 Evaluation of NEXT Chromosomes................... 31 3.1.4 Genetic Operators............................ 34 3.1.5 Neuroevolution Operators........................ 39 3.1.6 Creation of the Initial Population..................... 40 3.1.7 Selection Method............................. 40 3.1.8 The Evolutionary Algorithm of NEXT.................. 41 3.2 Experimental Design............................... 42 3.2.1 Runs, Sets of Runs, & Validation.................... 42 3.2.2 Symbolic Regression........................... 43 3.2.3 Neuroevolution for Logic Synthesis................... 46 3.2.4 Neuroevolution for Pairwise Classification............... 47 4 Results and Discussions 51 4.1 Symbolic Regression Results........................... 51 4.1.1 Experiment 1: Simple Symbolic Regression............... 51 4.1.2 Experiment 2: Multi-variable Function................. 54 4.1.3 Experiment 3: Higher Order Polynomial................. 56 4.1.4 Experiment 4: Sequence Induction.................... 59 4.2 Neuroevolution for Logic Synthesis....................... 61 xi 4.2.1 Experiment 5: XOR........................... 61 4.3 Neuroevolution for Pairwise Classification.................... 62 4.3.1 Experiment 6: Fisher Iris Data Set.................... 62 4.3.2 Experiment 7: Wine Data Set...................... 65 5 Conclusion and Future Work 69 5.1 Future Work.................................... 71 Appendix A: Glossary 73 xii List of Tables 3.1 Symbolic regression experiment parameters................... 43 3.2 Experiment 2 data................................. 44 3.3 Experiment 3 data................................. 45 3.4 Experiment 4 data................................. 46 3.5 Truth table for XOR................................ 46 3.6 Experiment 5 parameters............................. 47 3.7 Experiment 6 & 7 parameters........................... 48 3.8 Attributes of the Fisher Iris Data Set....................... 48 3.9 Attributes of the Wine Data Set.......................... 49 4.1 Summary of symbolic regression results..................... 51 4.2 Experiment 6: Classification results....................... 63 4.3 Experiment 6: Chromosome size over 100 runs................. 65 4.4 Experiment 7: Classification results....................... 65 4.5 Experiment 7: Chromosome size over 100 runs................. 67 xiii xiv List of Figures 2.1 A simple genotype................................7 2.2 Phenotype for the genotype shown in Figure 2.1.................7 2.3 An example of search space and solution space.................8 2.4 Different genotypes resulting in identical phenotype...............9 2.5 A small mutation results in three distinct phenotypes..............9 2.6 Recombination.................................. 11 2.7 Mutation...................................... 11 2.8 Basic evolutionary algorithm........................... 12 2.9 Sample GP individual............................... 14 2.10 Before GP crossover............................... 14 2.11 After GP crossover................................ 14 2.12 GEP gene for Equation 2.2............................ 16 2.13 Expression tree for Equation 2.2......................... 17 2.14 A 3-genic GEP chromosome........................... 17 2.15 Recombination in GEP.............................. 18 2.16 Matrix representation for the expression tree for Equation 2.2.......... 19 2.17 Conventional neural network........................... 20 2.18 GEP-NN representation of Figure 2.17...................... 20 2.19 Comparison of non-coding regions in GEP vs GEP-NN............. 21 2.20 Sample GEP-NN gene.............................. 22 2.21 Associated weight and threshold arrays..................... 22 2.22 Phenotypic expression of Figure 2.20...................... 22 2.23 A sample destructive point mutation on a GEP-NN chromosome........ 23 2.24 Phenotype of Figure 2.23 prior to mutation.................... 24 2.25 Phenotype of Figure 2.23 after mutation..................... 24 3.1 NEXT entities................................... 27 3.2 Sample neuroevolution cistron.......................... 28 3.3 Sample symbolic regression cistron....................... 28 3.4 Sample neuroevolution cistron with weights & thresholds............ 29 3.5 Sample symbolic regression gene with 3 cistrons................ 29 3.6 Sample multigenic symbolic regression chromosome with 3 genes....... 30 3.7 A simple NEXT gene............................... 31 3.8 Pseudo-ET for 3.7................................. 31 3.9 Conventional ET for Figure 3.7.......................... 31 3.10 Simple four-cistron gene............................. 32 3.11 Simple four-cistron gene............................. 32 3.12 Equations for each cistron............................ 33 3.13 Right-to-left evaluation of gene, step by step................... 33 3.14 Point mutation on a function........................... 35 3.15 Sub-ETs resulting from this mutation....................... 35 xv 3.16 Point mutation on

Load more