Genetic Programming an Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Genetic Programming An Introduction On the Automatic Evolution of Computer Programs and Its Applications Wolfgang Banzhaf Peter Nordin Robert E. Keller Frank D. Francone dpunkt.verlag - Morgan Kaufmann Publishers, Inc. dpunkt San Francisco, California Verlag fuer digitale Technologie GmbH Heidelberg Copublished by dpunkt.verlag and Morgan Kaufmann Publishers, Inc. Morgan Kaufmann Publishers, Inc. dpunkt.verlag Sponsoring Editor Michael B. Morgan Sponsoring Editor Michael Barabas Production Manager Yonie Overtoil Production Manager Josef Hegele Production Editor Elisabeth Beller Copyeditor Andrew Ross Cover Design Ross Carron Design Cover Photo Chris Howes/Masterfile Cover Illustration Cherie Plumlee Proofreader Robert Fiske Printer Courier Corporation This book has been author-typeset using LaTEX. Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances where Morgan Kaufmann Publishers, Inc. and dpunkt GmbH are aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Available in Germany, Austria, and Switzerland from dpunkt —Verlag fur digitale Technologic GmbH Ringstrasse 19 D-69115 Heidelberg Germany Telephone +49/6221/1483-12 Facsimile +49/6221/1483-99 Email [email protected] WWW www.dpunkt.de Available in all other countries from Morgan Kaufmann Publishers, Inc. Editorial and Sales Office 340 Pine Street, Sixth Floor San Francisco, CA 94104-3205 USA Telephone 415/392-2665 Facsimile 415/982-2665 Email [email protected] WWW www.mkp.com Order toll free 800/745-7323 © 1998 Morgan Kaufmann Publishers, Inc. and dpunkt—Verlag fur digitale Technologic GmbH All rights reserved Printed in the United States of America 02 01 543 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher. Library of Congress Cataloging-in-Publication Data Genetic programming—an introduction : on the automatic evolution of computer programs and its applications / Wolfgang Banzhaf... [et al.]. p. cm. Includes bibliographical references and index. ISBN 1 -55860-510-X 1. Genetic programming (Computer science) I. Banzhaf, Wolfgang, date. QA76.623.G46 1998 006.3'1—dc21 97-51603 CIP MKP ISBN: 1-55860-510-X dpunkt ISBN: 3-920993-58-6 To Pia, Teresa, Benedikt and Judith, who had to sacrifice many evenings and weekends for this book to become possible. Wolfgang Banzhaf To my family and friends, who were there when times got tough. To the lovely Miss Christina Espanto, who makes even the tough times good. Frank D. Francone To those who missed me while I was working on this book. Robert E. Keller To my parents Set and Inga. Peter Nordin Foreword by John R. Koza Genetic programming addresses the problem of automatic program¬ ming, namely, the problem of how to enable a computer to do useful things without instructing it, step by step, on how to do it. Banzhaf, Nordin, Keller, and Francone have performed a remark¬ able double service with this excellent book on genetic programming. First, they have written a book with an up-to-date overview of the automatic creation of computer programs by means of evolution. This effort is especially welcome because of the rapid growth of this field over the past few years (as evidenced by factors such as the more than 800 papers published by some 200 authors since 1992). G Second, they have brought together and presented their own in¬ novative and formidable work on the evolution of linear genomes and machine code in particular. Their work is especially im¬ portant because it can greatly accelerate genetic programming. The rapid growth of the field of genetic programming reflects the growing recognition that, after half a century of research in the fields of artificial intelligence, machine learing, adaptive systems, au¬ tomated logic, expert systems, and neural networks, we may finally have a way to achieve automatic programming. When we use the term automatic programming, we mean a system that 1. produces an entity that runs on a computer (i.e., either a com¬ puter program or something that is easily convertible into a program), 2. solves a broad variety of problems, 3. requires a minimum of user-supplied problem-specific informa¬ tion, 4. in particular, doesn't require the user to prespecify the size and shape of the ultimate solution, 5. implements, in some way, all the familiar and useful program¬ ming constructs (such as memory, iteration, parameterizable subroutines, hierarchically callable subroutines, data structures, and recursion), 6. doesn't require the user to decompose the problem in advance, to identify subgoals, to handcraft operators, or to tailor the system anew for each problem, 7. scales to ever-larger problems, 8. is capable of producing results that are competitive with those produced by human programmers, mathematicians, and spe¬ cialist designers or of producing results that are publishable in their own right or commercially usable, and 9. is well-defined, is replicable, has no hidden steps, and requires no human intervention during the run. w Genetic programming is fundamentally different from other ap¬ proaches to artificial intelligence, machine learning, adaptive systems, automated logic, expert systems, and neural networks in terms of (i) its representation (namely, programs), (ii) the role of knowledge (none), (iii) the role of logic (none), and (iv) its mechanism (gleaned from nature) for getting to a solution within the space of possible solutions. Among these four differences, representation is perhaps the most important distinguishing feature of genetic programming. Computers are programmed with computer programs - and genetic programming creates computer programs. Computer programs offer the flexibility to perform computations on variables of many different types, perform iterations and recur¬ sions, store intermediate results in data structures of various types (indexed memory, matrices, stacks, lists, rings, queues), perform al¬ ternative calculations based on the outcome of complex calculations, perform operations in a hierarchical way, and, most important, em¬ ploy parameterizable, reusable, hierarchically callable subprograms (subroutines) in order to achieve scalability. In attacking the problem of automatic programming, genetic pro¬ gramming does not temporize or compromise with surrogate struc¬ tures such as Horn clauses, prepositional logic, production rules, frames, decision trees, formal grammars, concept sets, conceptual clusters, polynomial coefficients, weight vectors, or binary strings. Significantly, human programmers do not commonly regard any of the above surrogates as being suitable for programming computers. Indeed, we do not see computers being ordinarily programmed in the language of any of them. My view is that if we are really interested in getting computers to solve problems without explicitly programming them, the structures that we need are computer programs. This book will be coming out almost exactly ten years since my first run of genetic programming in October 1987 (solving a pair of linear equations and inducing the Fibonacci sequence). Certainly I could not have anticipated that this field would have grown the way it has when I thought of the idea of genetic programming while flying over Greenland on my return from the 1987 meeting of the International Joint Conference on Artificial Intelligence in Italy. We know from Yogi Berra that predictions are risky, particularly when they involve the future. But, it is a good guess that genetic programming will, in the future, be successfully expanded to greater levels of generality and practicality. In trying to identify future areas for practical application, the presence of some or all of the following characteristics should provide a good indication: 1. areas where the interrelationships among the relevant variables are poorly understood (or where it is suspected that the current understanding may well be wrong), 2. areas where finding the size and shape of the ultimate solution to the problem is a major part of the problem, 3. areas where conventional mathematical analysis does not, or cannot, provide analytic solutions, 4. areas where an approximate solution is acceptable (or is the only result that is ever likely to be obtained), 5. areas where small improvements in performance are routinely measured (or easily measurable) and highly prized, 6. areas where there is a large amount of data, in computer read¬ able form, that requires examination, classification, and inte¬ gration (such as molecular biology for protein and DNA se¬ quences, astronomical data, satellite observation data, financial data, marketing transaction data, or data on the World Wide Web). 1 Genetic Programming as Machine Learning Contents 1.1 Motivation 4 1.2 A Brief History of Machine Learning 7 1.3 Machine Learning as a Process 9 1.4 Major Issues in Machine Learning 12 1.5 Representing the Problem 12 1.5.1 What Is the Problem Representation? 12 1.5.2 Boolean Representations 14 1.5.3 Threshold Representations 17 1.5.4 Case-Based Representations 19 1.5.5 Tree Representations 20 1.5.6 Genetic Representations - 21 1.6 Transforming Solutions with Search Operators 23 1.6.1 Generality/Specificity Operators 23 1.6.2 Gradient Descent Operators 24 1.6.3 Genetic Programming Operators 24 1.7 The Strategy of the Search 25 1.7.1 Blind Search 25 1.7.2 Hill Climbing 26 1.7.3 Beam Search 27 1.8 Learning 28 1.9 Conclusion 29 1 Genetic Programming as Machine Learning Evolution is Nature's ijiistake. Intelligence is its insistence on mak¬ ing the same mistake. S.LEM, GOLEM XIV, 1981 1.1 Motivation Automatic programming will be one of the most important areas of computer science research over the next twenty years. Hardware speed and capability has leapt forward exponentially.( Yet software consistently lags years behind the capabilities of the hardware. The gap appears to be ever increasing.