Quick viewing(Text Mode)

Coevolution Learning Behavior

Coevolution Learning Behavior

From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.

CoevolutionLearning: Synergistic of LearningAgents and ProblemRepresentations

LawrenceHunter

National Library of Medicine 8600 RockvillePike Bethesda, MD20894 hunter@,nlm.nih.gov

Abstract representationsand parametersettings, usingas inspiration This paper describes exploratory workinspired by a a recent mathematicalmodel of the coevolutionof genetics recent mathematicalmodel of genetic and cultural and culture from anthropology (Durham, 1991). coevohition.In this work,a simulator implementstwo importantadditional feature of this workis that is provides independent evolutionary competitions which act a simple, seamlessand effective wayof synergistically simultaneouslyon a diverse population of learning combiningmultiple inference methodsin an integrated and agents: one competition searches the space of free extensible framework. Whenonly a single inference parameters of the learning agents, and the other methodis used, this frameworkreduces to a variant on searches the space of input representations used to constructive induction. However,with multiple and characterize the training data. The simulations diverse learningagents, the systemis able to generateand interact with eachother indirectly, both effecting the exploit synergies betweenthe methodsand achieve results (and hencereproductive success) of agents that canbe superiorto anyof the individualmethods acting the population. This frameworksimultaneously alone. addressesseveral openproblems in machinelearning: The workpresented here fits into a growingbody of selection of representation, integration of multiple research on these issues. The importance of bringing heterogeneouslearning methodsinto a single system, learningengineering tasks withinthe purviewof the theory and the automated selection of learning bias and practice of automated systems themselves was appropriatefor a particular problem. described at length in (Schank, et al., 1986). Some effective computationalmethods for aspects of this task have been reported recently. (Kohavi & John, 1995,) Introduction describes an automatedmethod for searching through the space of possible parameter values for C4.5. Genetic Oneclear lesson of machinelearning research is that algorithms have also been used to search the space of problemrepresentation is crucial to the success of all parameter values for artificial neural networks (Yao, inference methods(see, e.g. (Dietterich, 1989;Rendell 1993). There has also been recent work on selecting Cho, 1990; Rendell &Ragavan, 1993) ). However,it appropriate("relevant") subsets of features froma superset generallythe case that the choiceof problemrepresentation of possible features for input representation in machine is a task doneby a humanexperimenter, rather than by an learning ((Langley, 1994) is a review of 14 such automatedsystem. Also significant in the generalization approaches), as well as a long history of work on performanceof machinelearning systemsis the selection constructive induction ((Wnek & Michalski, 1994) of the inference method’sfree parametervalues (e.g. the includes a review, but see also (Wisniewski& Medin, numberof hiddennodes in an artificial neural network,or 1994) for critical analysis of this work. This paper the pruningseverity in a decision tree inductionsystem), describes an exploratory approachthat appears to have which is also a task generally accomplishedby human promisein addressingthese issues in an integratedway. "learning engineers"rather than by the automatedsystems themselves. The effectiveness of input representations and free parametervalues are mutuallydependent. For example,the Background appropriatenumber of hiddennodes for an artificial neural networkdepends crucially on the numberand semanticsof Theidea of coevolutionlearning is to use a kindof genetic the input nodes. This paper describes exploratorywork on algorithm to search the space of values for the free a method for simultaneously searching the spaces of parametersfor a set of learningagents, and to use a system

Hunter 81 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. metaphorically based on recent anthropological theories of synthesis as the source of novelty. By providing a to search through the space of possible computationalmodel in whichindividuals are able to learn input representations for the learning agents. This section from their experiences and share what they have learned, it provides somebrief backgroundon these ideas. becomespossible to simulate a cultural kind of evolution. Evolutionary algorithms are a weak search method A key aspect of any model of cultural evolution is the related to, although clearly distinguishable from, other specification of the smallest unit of information that is weak methods (e.g. beam search). They are based on transmitted from one agent to another during cultural metaphor with naturally occurring genetic evolution. In transmission, christened the "meme"by . general, evolution has three components:inheritance, or the Although there is a great deal of argument over what passage of traits from one individual to another: variation, memosare (ideas. symbols, thoughts, rules patterns, values, of the generation of novel traits or combinationsof traits" principles, postulates, concepts, essences and premises and selection, a competition betweenindividuals based on have all been suggested), and a great deal of theoretical their traits that effects the probability that an individualwill analysis describing howmemes compete, are transformed, have descendants that inherit from it. There are manyways interact with genes, etc., I am aware of no attempts to of building computational models that evolve, including operationalize the term so that it wouldbe possible to build genetic algorithms, genetic programmingand evolutionary computational simulations of populations of memes. strategies; see (Angeline, 1993) for an excellent survey A memomust play several roles in a simulation of and analysis of the approach. cultural inheritance. First, it mustbe able to have an effect Anthropological theories of culture have been on the behavior of the individuals in simulation. Second, converging for some time on what are now known as individuals must be able to create newmemos as a result of "’ideational" theories. Theyhold that culture "consists of their experiences (e.g. by innovation, discovery or shared ideational phenomena(values, beliefs, ideas, and the synthesis). Third, it must be possible for other individuals like) in the minds of humanbeings. [They refer] to a body in the simulation to evaluate memesto determine whether or ’pool’ of informationthat is both public (socially shared) or not they will adopt a particular memefor its ownuse. and prescriptive (in the sense of actually or potentially In the sections below, I will show how the input guiding behavior)." (Durham,1991), p. 3. Note that this representation used by a system can be view is different from one that says culture is some used to meet these requirements. particular set of concrete behavior patterns; it instead suggests that culture is just one of the factors that shapes A Formal Definition of Coevolution Learning behavior. An individual’s phenotype(here, its behavio0 is influenced by its genotype, its individual psychologyand A coevolution learning system functions by evolving a its culture. Durhamsummarizes "*the new consensus in population of learning agents. The population is defmedby anthropology regards culture as systems of symbolically a classification task T, a set of classified examplesof that encoded conceptual phenomena that are socially and task expressed in a primitive representation Ep, a fitness historically transmitted within and between populations" function for agents fA, and a set of learning agents A: (p. 8-9). This historically rooted, social transmission ideational elements can be analyzed as an evolutionary ecoevffi-{Y, Ep, fA, A} process. The characterization of that evolutionary process A fitness function for agents mapsa memberof the set A to (and its relationship to genetic evolution) is the subject a real number between 0 and 1. For convenience, it is Durham’sbook, and also the subject of a great deal other useful to define PL to be the subset of Pcoevwhere all the research dating back at least one hundred years (muchof agents use learning methodL (see below). which is surveyed by Durham). Each learning agent A is defined by a learning methodL, a Thereare several significant differences betweencultural vector of parameter values v, a fitness function for memes and genetic evolution. Cultural traits are transmitted fro, and an ordered set of problem representation differently than genetic ones in various ways.It is possible transformations R: to transfer cultural traits to other membersof your current generation, or even to your cultural "parents." It is also Ai-{L, V, fm, R} possible for one individual to pass cultural waits to very The vector of parameter values mayhave different length manymore others than he or she could genetically. The for different learning methods. Each memberof the set of selection pressure in the competition among cultural problem representation transformations Ri is a mapping entities is not based on their reproductive fitness as with from an example (ep ~ Ep) to a value which is a legal genetic evolution, but on the decisions of individuals to element of an input representation for L (e.g. a real adopt them, either through preference or imposition. And number, nominal value, bit, horn clause, etc.). The most importantly for the purposes of this work, cultural individual mappingsare called memes.The ordered set of evolution involves different sources of variation than memos(Ri) is called the agent’s memome,and the vector genetic evolution. Rather than relying on or of parameter values is called its genome. The fitness sexual recombination of genomesto provide novel genetic function for memosfm is a function that mapsa memberof variants, culture relies on individual’s owndiscovery, and the set Ri to a real numberbetween 0 and 1.

82 MSL-96 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. A transformedexample ei is the result of sequentially methods. These parameter vectors maybe of different applying each of the problem representation sizes, so crossover can only be applied with membersof transformations ri e Ri to an example ep ~ Ep. The the samesubpopulation. application of the set of problem representation Themain difference betweencoevolution learning and transformations to the set of examples in primitive other evolutionarymethods derives fromthe creation and representation results in a set of transformedexamples, exchangeof momes.The memecreation process takes the which is called El. WhenEi~Ep, the transformation is output of learning agents that have been applied to a said to be the identity. particular problem,and identifies combinationsof the input Given a population as defined above, the process of features that the learner determinedwere relevant in coevolutionis defined in the pseudocodein figure 1. The makingthe desired distinction. Theprocess of parsing creation of the next generationof agentsis a minorvariant outputand creating newmemes is specific to each learning on traditional genetic algorithms. Instead of having the method.For example,a programthat learns rules from genomebe "bits" it is a vector of parametervalues. The examplesmight create newmemes from the left handsides crossover, mutationand fitness proportionalreproduction of each of the inducedrules. Or, a programthat learned functionsare all identical withthe related operationson bit weightsin a neural networkmight create newmemes’ that vectors in genetic algorithms.It is, of course, possibleto werethe weightedsum of the inputs to each of its hidden transformparameter vectors into bitstring representations nodes (perhaps thresholded to remove marginal themselves, if it were desirable. In addition to using contributions). Specific memegeneration methodsare parametervectors instead of bits, the other difference is discussed in moredetail in the implementationsection, that eachsubpopulation using a particular learning method below. (the PL’s) has its owntype of parametervector, since the free parametersand their legal valuesvary amonglearning

Initialize the population with randomlegal parameter vectors and the identity representation transformation. Determinethe phenotypeof each agent i by applying learning algorithm Li to transformed examplesEi using parameter values vi using K-waycross-validation. The phenotypeis an ordered set of: the output of the learner’s cross-validation runs, the cross validation accuracies and the learning time. Determinethe fitness of each agent by applyingfAto the phenotypeof each agent. Create newmemes by parsing the output in each learner’s phenotype and extracting important feature combinations. The union of all memesgenerated by aU membersof the population is called the memepool. Exchangememes. For each agent i, apply its memefitness function Fmto elements of the memepool. Select the agent’s target numberof memes(a free parameter) from the memepool with probability proportional to the fitness of the memes. Repeat from * meme-transfers-per-generation times Determine phenotype of the agents with their new memes Determinethe fitness of the agents Create the next generation of agents by mutation, crossover and fitness proportional reproduction. Agents whose genomesare mutated keep their memomesintact. For each pair of agents whose genomesare crossed over to create a newpair of agents, the memomesof the parents are arbitrarily assignedto the offspring. Repeat from * until a memberof the population can solve the problem Figure 1: Pseudocodeof eoevolutionalgorithm

Hunter 83 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.

weight they put on memosthey generate themselves versus An Implementation of a Coevolution Learner memesgenerated by others. This mechanism could be straightforwardly generalized to weight the fitnesses of COEV(short for "the beginning of coevolution") is memesgenerated by different classes of agents differently. simple implementation of the coevohition learning Importance is defined differently for different learning framework, written in Carnegie Mellon CommonLisp 17f methods. For C4.5 and LFC++,the importance of a feature and the PCLCommon Lisp Object System, running on a 2 combinationis the ratio of correctly classified examples Silicon Graphics Indigo workstation. The current that triggered the rule containing the feature to the total implementation includes the C4.5 decision tree induction numberof occurrences of the feature. So, for example, if system and C4.5rules rule extraction tool (Quinlan, 1991), "A and not B" is a feature extracted from a single C4.5 the LFC++constructive induction program (Rendell learner that had a 80%cross-validation accuracy, and 9 of Ragavan, 1993; Vilalta, 1993) and the conjugate gradient the 10 examples the feature appeared in were classified descent trained feedforward neural network (CG) from the correctly by the rule containingthe feature, it’s importance UTSneural network simulation package (van Camp,1994). would be 9/10 and its fitness would be 0.9 * 0.8 = 0.72. Each learning method is associated with an object class For UTS,the importanceof a feature is the absolute value which defines howto execute it, howto parse the results of the ratio of the weight fromthe hidden node that defined returned, and what the vector of free parametersis for that the feature to the threshold of the output unit. If there is type of learner. morethan one output unit, it is the maximumratio for any Most of the documentedparameters for each of type of of the output units. Features that play a significant role in learning programis included in the free parameter vector learners that are accurate have higher fitness than features for that system, along with a specification of either an that do not play as muchera role or appear in learners that upper and lower bound for the parameter’s values or a list are less accurate. It is possible for a feature to have of possible parameter values. For example, the parameter relatively low prevalencein the population and yet still be vector for the UTSconjugate gradient descent learner important if it tends to generate correct answerswhen it is includes the numberof hidden nodes in the network, the used. maximumnumber of iterations before halting, the output In addition to defining the fitness of features, COEV tolerance (specifying howclose to 1 or 0 an output has to must define the fitness of learners themselves for the be to count as true or false), a flag for whetheror not to use evolution of parameter values. The fitness function tbr competitive learning on the output, two nominalparameters learners was selected to evolve learners that are accurate, specifying the kind of line search and direction finding robust and fast: method to use, and three parameters specifying the maximumnumber of function evaluations per iteration, the (tA,- minimumfunction reduction necessary to continue the f (A,)= c(A,) - k 07)) search and the maximumslope ratio to continue the search. Theses parameters are explained more fully in the whereC(Ai) is the cross-validation accuracy of the agent documentationavailable with the source code. Ai, S(Ai) is the standard deviation of that accuracy, tAi Each learner must also have a methodfor extracting new the time it took for that agent to learn, tA is the mean features from its output. LFC++,like other constructive execution time for that type of agent, S(tA) is the standard induction programs, specifically defines new features as deviation of that mean,and k is a constant that trades off part of its output. For C4.5, the output of the C4.5rules tool the value of accuracy versus that of learning time. was parsed so that each left hand side of each rule was Execution time is measured as the number of standard reified into a new feature definition. For the neural deviations from the meanlearning time for that class of network learner, a newfeature was created for each hidden learners so that classes of learners with different training node defined by the weighted sum of the inputs to that times can coexist in the same population. In the COEV node, with any input whose contribution to that sum was system, k was selected to be 3 and accuracies and standard less than the threshold of the node divided by the number deviations are measuredin percentage points. So, a learner of inputs removedfrom the sum. These extracted features that had a meanaccuracy of 80%, a standard deviation of are added to the memepool. 9% and took ! standard deviation less than the mean Memes (feature combinations) must be assigned training time for that class of learner would get a fimess fitnesses by agents. It is possible for each agent to have its score of 80 - q9 - (-1) = 78. own method of determining memefitness. However, a Several other aspects of the frameworkmust be specified simplified method is used in COEV.Whenever a program in order to implement the system. Memedefinitions must generates a new meme, its fitness is defined to be the be stored in a canonical formso that it is possible to detect average "importance" of the feature combination, weighted by the accuracy of the learners that used the meme.In whentwo or more generated memesare in effect identical and should have their fitness scores combined. Memesin Durham’sterms, the inventor of the new meme"’imposes" COEVare represented as boolean combinations of its belief in its value on others. In addition, receivers of primitives or mathematical formula over primitives, which memes have a parameter which determines how much

84 MSL-96 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. can be slraighfforwardly compared, although the extension Figure 3 illustrates the separate contributions of the to first order predicate calculus would makethis a more genetic evolution componentand the memetic evolution difficult problem. The numberof memesthat an agent uses componentcompared to coevolution in the performance of (i.e. the dimensionalityof its input representation) is treated the best individual learning agent in the population. as a flee parameter of the agent, and allowed to vary from Consider first the purely genetic evolution of the free two to twice the numberof primitives. In addition, agents parameter vectors. Whenthere is no memetic component, are allowed to prefer memesof their own creation to there is no interaction between the different classes of memescreated by other learners. A single parameter, learners, since crossover and other sources of genetic ranging over [0,1 ] specifies the internal memepreference variation apply only to a specific type of learner. Using of each agent. genetic search to find the best free parameter values for a neural network is knownto be slow (Yao, 1993). In this genetic-only neural network simulation, only a small SimulationResults on a SimpleTest Problem improvementwas found in the fifth generation. Similarly for the genetic evolution of LFC++’sfree parameters, a The COEVsystem was applied to an artificial problem modest difference was found in the sixth generation. designed to be moderately difficult for the machine Memeticevolution alone (without changing any of the free learning programs included in the system. The problem parameter values) showed a more significant effect. was to identify whetherany three consecutive bits in a nine Because both the neural network and the constructive bit string were on, e.g. 011011011is false, and 001110011 induction program make contributions to memetic is true. This problemis analogous both to detecting a win evolution, any synergies between them will appear in the in tic-tac-toe and to solving parity problems, which are memetic-only curve (see discussion of figure 4, below). knownto be difficult for manytypes of learning systems. However, the steady rise of the coevolution curve, There are 512 possible examples, and the problemis simple compared to the memetic-evolution only curve suggests enoughso that the machinelearning systems used (even the that someadjustment of the free parameter values to match neural network) run reasonably quickly. This is important, the representation changes induced by memetic transfer since each program is executed a large number of times mayhave a positive effect. At the end of 10 generations, (see the computationalcomplexity section, below). the maximumaccuracy of the coevolved population is This problem was run on a system that used only LFC++ more than 5 percentage points higher than the memetic and the CG learners. The population consisted of 10 only population. However, it is worth noting that the LFC++learners and 10 CG learners, and was run for 10 memeticonly population equaled that pcfformance figure generations. Six fold cross-validation was used, and there earlier in the simulation, and then drifted away. was one meme exchange per generation. This rather Figure 4 illustrates the synergy that coevolution f’mds modest problem therefore required 2400 learner between the CG agent population and the LFC++agent executions, taking more than four days of R4400 CPU population. Whenrun with only a single type of learner, time. (The CPUtime use was dominated by the 1200 the memetic evolution part of COEVbecomes a kind of neural network training runs.) Despite the large amountof constructive induction program, where the output of the computation required for this simple problem, it is learner is feed back as an input feature in the next iteration. reasonable to hope that the amountof computationrequired Since LFC++is already a constructive induction program, will grow slowly with more difficult problems (see the it is somewhat surprising to note that LFC++agents computational complexity section, below). coevolving with each other still managea small amountof This simulationillustrated three significant points. First, improvement. Perhaps this is due to the fact that the the coevolution learning system was able to consistently appropriate features for this problemhave three conjuncts. improve as it evolved. The improvementwas apparent in and LFC++under most free parameter values tends to both the maximumand the average performance in the construct new features from pairs of input features. CG population, and in both the cross-validation accuracies of alone does not do very well on this problem. Even when the learning agents and in their fitnesses (i.e. accuracy, coevolved with itself, its most accurate agent improves speed and robustness). Figure 2 showsthese results. In ten only from about 60%to a bit over 70%in ten generations. generations, the average fitness of the population climbed However, when these two learning methods are combined from 51%%to 75%, and the standard deviation fell from in a single coevolutionlearner, the results are clearly better 21% to 12.5%. Looking at the best individual in the than either of the methods used alone. The gap between population shows that the maximumfitness climbed from the best performance of the pair of methods and the best 84%to 94%, and the maximumcross-validation accuracy single methodtends to be about 5 percentage points in this climbed from 82.3% to 95%. Average execution time fell simulation, and the difference tends to be larger as the by more than 25%for both classes of learners (it fell population evolves. somewhat more sharply, and in much greater absolute magnitudefor the neural network learners.)

Hunter 85 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.

100

90 ...... m __._:_ m ’ "! i 8O mm . 70 60 i mm 5O m .o. 4O 3O "" ... ¯...." 20 maxfitness 10 maxaccuracy .¯ meanfitness (SD) ’... 0 minfitness

0 5 10 Generation

Figure2: Fitness of best, worst andpopulation average (with S.D.) fitness over I0 generations.

100

90 .- ¯

80

’-L O ¢.) < > 70 O LFC++(genetic only) cg(genetic only) 60 memeticonly (LFC++& CG) coevolution

5O 0 5 10 Generations

Figure 3: Genetic evolution only, memeticevolution only and coevolution

86 MSL-96 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.

100

90

80

70

60

5O

40 CG alone LFC++ alone 3O LFC++ & CG together

20

10

...... = 0 2 4 6 8 10 Generation

Figure 4: Synergistic interaction of the different types of learners

a revised domaintheory. Flexibility in the type of learning used and the order in which learning agents are executed Whydoes Coevolution Appear to Work? maymake possible important synergies that are difficult to capture with a predefmedrelationship. This is intended to be an exploratory paper, describing a Myown earlier work was dedicated to building a planner novel approachto learning. It does not present any proofs which could select amongand combine multiple learning nor any detailed empirical evaluations. However, it is methods based on reasoning about explicit goals for possible to speculate about why the method appears to knowledge (Hunter, 1989; Hunter, 1990; Hunter, 1992) work, and this speculation maybe useful in guiding future However,first principles planning for achieving learning researchin this area. goals is even more difficult than planning for physical Constructive induction systems have not been able to goals. Dueto the large inherent uncertainty in predicting solve problems that are a great deal more difficult than the outcome of the execution of any particular learning their root form of induction, even though they method,it is extremelydifficult to reason about the likely automatically generate relevant feature combinationsto use outcomeof any significant compositionof learning actions. in representation. One possible explanation is that there I also pursued the use of plan skeletons, templates and features are constructed with the same learning biases as strategies, whichare the usual responseto the difficulties of the method that uses those features to build concepts, first principles planning, but these run into the same limiting the value of composingthese actions. Since the problems in maintaining flexibility as other predefined learning agents in coevolutionlearning express a variety of approaches to multistrategy learning. Coevolutionlearning biases, it maybe that features discovered by one learner is able to flexibly combinelearning methods(even in novel can moresignificantly facilitate learning by another learner ways) without having to reason ahead of time about what which embodiesa different bias. the outcomeof that combinationwill be. Coevolution learning has a different advantage over The reason that coevolution appears to work better than most other forms of multistrategy learning. In these other just memetic evolution alone may be the two way linkage forms, the relationships betweendifferent learning methods between parameter values and representations. Not only tend to be predefined by the learning researcher (see the does it appear to be the case that which parameter values introduction and manyof the chapters in (Michalski produce the best outcome depends on the input Tecuci, 1994) ). For example, the KBANNsystem representation used, but it also appearsthat the selection of (Towell, et al., 1990) uses a domain theory to specify parameter values has a significant effect on what new neural networkarchitecture, then trains the net and extracts

Hunter 87 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. features are constructed. This seems especially relevant ¯ a function finding system that can generate memes when considering that two of the parameters in COEV that identify mathematicalrelationships betweenfeatures; specify howmany input features each learner will use, and ¯ a feature relevance metric (e.g. (Kira & Rendell, what the propensity of each learner is for taking those 1992) ) that can be added to the memefitness function: features fromexternal sources is. ¯ a relation learning system (e.g. (Quinlan, 1990) one of the ILP learners) that can help the system transcend the limits of boolean feature combinations Computational Complexity. and Each of these desired additions presents challenges to the Parallelization Issues design of the current system that must be addressed. Our third current goal is to add at least somecoarse Oneof the drawbacksto coevolution learning is the large grained parallelism to the system so that it can be run on a amount of computation required. Doing an evolutionary network of workstations. Due to the nature of the search is slow in general, and whenthe evaluation of a computations, we expect nearly speedup nearly linear with single individual can take minutes or hours (as training the number of workstations used, up to the point where neural network can) then the problem is exacerbated. there is one workstationper agent. There are three reasons to be optimistic that this approach can be used to solve large scale problems. First, the fitness function for the population includes a Conclusion time factor, whichtends to increase the speed of learning as the evolution progresses. Average running time of both Coevolution provides a powerful metaphor for the LFC++and CGwas substantially lower at the end of ten design of a machine learning system. Models of genetic generations than it was at the beginning. The search evolution have already driven significant advances in through the space of parameter values not only finds values machinelearning. Whennature added cultural evolution to that provide good results, but also ones that provide them genetic evolution, the result was spectacular. The quickly. computational model of culture proposed in this paper is Second, as better input representations are found, most tremendously empovershed in comparison to people. learning methodsincrease their learning speed. Since the Nevertheless,the ability of even this drastically simplified concepts learned from good representations tend to be modelof coevolution to s~aergize the abilities of disparate simple (that’s what makesa representation good), and most learners appears promising. machinelearning methodshave a computationalcost that is dependenton the complexity of the learned concept, as the system evolves better representations, the speed of learning References should go up. Angeline, P. J. (1993) Evolutionary Algorithms and Finally, coevolution learning is well suited to EmergentIntelligence. Ph.D. diss, OhioState University. parallelization. The vast majority of the computation is done by the learning agents, whichare entirely independent Dietterich, T. (1989). Limitations on Inductive Learning. of each other during learning. The only communication Proceedings of Sixth international Workshopon Machine required is at the time of memeand gene exchange, where Learning, (pp. 125-128). Ithaca, NY: MorganKaufman. all agents must register their fitnesses and the fitnesses of Durham,W. H. (1991). Coevolution: Genes, Culture and their memes. It is also possible to relax even that HumanDiversiO,. Stanford, CA: Stanford University Press. requirement, so that agents use only a sample of the fitnesses from other agents to determine whoto exchange Hunter, L. (1989) KnowledgeAcquisition Planning: memesand genes with. Since the selection of memesand Gaining Expertise ThroughExperience. PhDdiss. Yale genetic partners is probabilistic anyway,sampling will be University, Available as YALEU/DCS/TR-678. indistinguishable from exhaustive search if the samplesize Hunter, L. (1990). Planning to Learn. In Proceedings is large enough. By using sampling, it is also possible to The Twelveth Annual Conference of the Cogntive Science do asynchronousupdating of the population. Society, (pp. 26-34). Boston, MA: Future directions There are three current research goals for coevolution Hunter. L. (1992). KnowledgeAcquisition Planning: Using learning currently being pursued. The first is to empirically Multiple Sources of Knowledgeto AnswerQuestions in demonstrate its effectiveness on realistic problems. In Biomedicine. Mathematical and Computer Modelling. particular, we are hoping to demonstrate that this approach 16(6/7), 79-91. solves problemsthat are very difficult for other machine learning or statistical approaches. Solving real world Kira. K., & Rendell, L. 0992). The Feature Selection Problem: Traditional Methods and a NewAlgorithm. In problemsis likely to require additional learning methods. Proceedings of National Conference on AI (AAAI-92), larger agent populations and longer evolution times. The second goal is to add several new learning (pp. 129-134). San Jose, CA:AAAI Press. techniques to the system. In particular, we are interested in Kohavi, R., & John, G. (1995). AutomaticParameter adding: Selection by MinimizingEstimated Error. In Proceedings

88 MSL-96 From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. of ECML-95:The Eighth EuropeanConference on Schank,R., Collins, G., &Hunter, L. (1986). Transcending MachineLearning. Inductive CategoryFormation In Learning.Behavioral and Langley,P. (1994).Selection of RelevantFeatures BrainSciences, 9(4), 639-687. MachineLearning. In Proceedingsof AAAIFall Towell,G. G., Shavlik, J. W., &Noordewier, M. O. Symposiumon Relevance. NewOrleans, LA: AAAIPress. (1990). Refmementof ApproximateDomain Theories Knowledge-BasedArtificial Neural Networks.In Michalski,R. S., & Tecuci, G. (Ed.). (1994). Machine LearningIF." A MultistrategyApproach. San Mateo,CA: Proceedingsof SeventhInternational Conferenceon MorganKaufman Publishers. MachineLearning, . van Camp,D. (1994). UTS/Xerion.In Toronto,Canada: Quinlan,J. R. (1990). LearningLogical Definitions from Available by anonymousfile Wansferfrom Relations. MachineLearning, 5(3), 239-266. ftp.cs.tornnto.edu:/pub/xerion/uts-4.0.tar.Z. Quinlan,J. R. (1991).C4.5. In Sydney,: Availablefrom the author: [email protected]. Vilalla, R. (1993).LFC++. In Availablefrom the author: [email protected]. Rendell, L., &Cho, H. (1990). EmpiricalLearning as Functionof ConceptCharacter. MachineLearning, 5(3), Wisniewski,E., & Medin,D. (1994). TheFiction and 267-298. Nonfictionof Features.In R. Michalski& G. Tecuci(Eds.), MachineLearning IV: A Multistrategr’ ApproachSan Rendell, L., &Ragavan, H. (1993). Improvingthe Design Francisco, CA:Morgan Kanfmaun. of Induction Methodsby AnalyzingAlgorithm Wnek,J., & Michalski,R. (1994). Hypothesisdriven Fmlctionalityand Data-basedConcept Complexity. In constructive induction in AQI7: A methodand Proceedingsof IJCAI,(pp. 952-958).Cl~amhery, France: experiments. MachineLearning 14:139-168. Yao,X. (1993).Evolutionary artificial neural networks. InternationalJournal of NeuralSystems, 4(3), 203-221.

Hunter 89