961 Learning47. Learning Classifier Cla Systems s Martin V. Butz

47.1 Background...... 962 Learning Classifier SystemsLCS ( s) essentially 47.1.1 Early Applications...... 962 combine fast approximation techniques with evo- 47.1.2 The Pitt and Michigan Approach . 963 lutionary optimization techniques. Despite their 47.1.3 Basic Knowledge Representation 964 somewhat misleading name, LCSs are not only sys- tems suitable for classification problems, but may 47.2 XCS...... 965 be rather viewed as a very general, distributed 47.2.1 System Overview ...... 965 47.2.2 When and How XCS Works ...... 968 optimization technique. Essentially, LCSshavevery 47.2.3 When and How to Apply XCS...... 968 high potential to be applied in any problem do- 47.2.4 Parameter Tuning in XCS...... 969 main that is best solved or approximated by means of a distributed of local approximations, or 47.3 XCSF ...... 970 predictions. The evolutionary is de- 47.4 Data Mining ...... 972 signed to optimize a partitioning of the problem 47.5 Behavioral Learning...... 973 domain for generating maximally useful predic- 47.5.1 Reward-Based Learning with LCSs 973 tions within each subspace of the partitioning. 47.5.2 Anticipatory Learning The predictions are generated and adapted by Classifier Systems ...... 974 the approximation technique. Generally any form 47.5.3 Controlling a Robot Arm of spatial partitioning and prediction are pos- with an LCS ...... 975 sible – such as a Gaussian-based partitioning combined with linear approximations, yielding 47.6 Conclusions ...... 977 a Gaussian mixture of linear predictions. In fact, 47.7 Books and Source Code ...... 978 such a solution is developed and optimized by References...... 979 XCSF (XCS for function approximation). The LCSs XCS

(X classifier system) and the function approxima- 47 | E Part tion version XCSF, indeed, are probably the most provides historical background on LCSs. Then XCS well-known LCS architectures to date. Their opti- and XCSF are introduced in detail providing enough mization technique is very-well balanced with the information to be able to implement, understand, approximation technique: as long as the approxi- and apply these systems. Further LCS architectures mation technique yields reasonably good solutions are surveyed and their potential for future research and evaluations of these solutions fast, the evolu- and for applications is discussed. The conclusions tionary component will pick-up on the evaluation provide an outlook on the many possible future signal and optimize the partitioning. This chapter LCS applications and developments.

Learning classifier systems (LCSs) are machine learn- The main feature of LCSs is their innovative combina- ing algorithms that combine gradient-based approx- tion of two learning principles; whereas gradient-based imation with evolutionary optimization. Due to this approximation adapts local, predictive approximations flexibility, LCSs have been successfully applied to of target function values, evolutionary optimization classification and data mining problems, reinforcement structures individual classifiers to enable the formation learning (RL) problems, regression problems, cogni- of effectively distributed and accurate approximations. tive map learning, and even robot control problems. The two learning methods interact bidirectionally in , ], 1 4 sand [47. modified LCS GA) [47. architectures GA s were pub- De’s Jong ap- Reitman version that can LCS LCS and implementation, i. e., counterpart for regres- worked on animal-like XCSF developed a poker deci- LCS ] based on XCS applications to data mining 9 Holland Booker techniques. However, the com- ]. Smith by LCS RL 10 s to reinforcement learning. More- classifier system providing a detailed XCSF, is introduced. Focusing then on LCS s [47. GAs yielded a highly interactive and very XCS LCS in Holland and Reitman’s cognitive system. s) – which learn predictive schema models of cognitive system In sum, the first actual This chapter starts with a historical perspective, taxa nt learning community – and can now partially be ] as its second learning mechanism. The ALCS effectively learn a redundant forward-inverseics kinemat- model of a robotwrap up arm. the A chapter. summary and conclusions 8 rently experienced reward. Meanwhile, Holland’snitive cog- system applied a genetic algorithm ( the the was ahead of itsrelated time. ideas that It were implemented later established variousme in reward- the reinforce- sion making system [47. proach to establishing some terminology background. We then in- troduce the over, we cover anticipatory( learning classifier systems the environment rather thanand reward we prediction maps introduce the – modified lished in the 1980s. system overview as wellconceptual as insights theoretical and onand facet-wise tweaks its are performance.problem at discussed hand. Also to Next, the tricks tune the system to the general are compared with other machineniques. learning tech- providing information on the beginnings of sion problems, the application-side, that have been successfully appliedrealm. With in respect the to behavioral data learning, we mining the point out relation of tasks and to behavioral learning and cognitivetasks modeling are surveyed. We cover various 47.1.1 Early Applications Nonetheless, early applications of regarded as standard complex system that wasThus, and while still proposing islearning a hard approach, highly to the innovative analyze. mained applicability cognitive limited of at the the time. system re- bination with , ), sin Hol- XCS is also LCS SARSA) eligibility introduced effector bit ility bounds parameter in XCS cognitive sys- ), an action part s ], which also dis- calab 7 – such as the now ]. Originally, , Reitman , where each classi- LCS 3 ,andan s are often perceived 4 taxon ) were proposed over – 1 RL and attenuation ]. His cognitive system LCS LCS 6 [47. architectures and , classifier system (note that 5 message hematical s classifiers LCS of XCS Holland components and their interactions, ) [47. algorithm [47. actually called Holland RL ]– does not really encode any particular 6 LCS memory – distributing a currently encountered re- of the classifier. ], focusing on problems related to reinforce- XCS Reitman 4 RL in in and [47. , which is the X bucket brigade Concurrently with the development of temporal dif- Due to the innovative combination of two learning the ment learning ( ference learning techniques in developed a a payoff prediction part,that and stored the several age, otherattenuation the parameters application frequency, and the well-known state-action-reward-state-action ( algorithm [47. ward also tosteps classifiers ago and that that thus were indirectly led active to gaining several the cur- time a classifier realized somethingtrace similar to an tributes reward backwards inmechanism. time In with addition, a the discounting land fier consisted of a condition(originally part consisting ( of a tems the for learning and antems intuitive in understanding general. of Moreover, the theLCS sys- currently most common however, give both mat as being hardthe to individual understand. Facet-wise analyses of compared with other gives a facet-wisetails several overview enhancements, and of highlightscessful various its suc- application functionality, domains. de- However, acronym according tocomparatively the easy system to understand, creatorply. to Wilson), Thus, tune, is the and to core ap- of this chapter focuses on 30 years ago by 47.1 Background Learning classifier systems ( that the gradient-based approximationsness yield quality local estimates of fit- thewhich generated approximations, the evolutionaryfor optimization optimizing technique classifierevolutionary uses structures. optimization technique Concurrently, is the generatingclassifier new structures, which againby need the gradient-based to approach be inother, evaluated competition locally with overlapping, the interacting classifiers. and optimization techniques, Evolutionary Computation Part E

962 Part E | 47.1 Learning Classifier Systems 47.1 Background 963 automation based on the cognitive systems architec- the GA. As a consequence, the Pitt-style LCSsaremuch ture [47.11]. Wilson proposed and worked on the animat closer to general GAs because each individual consti- problem with LCS architectures derived from Hol- tutes an overall problem solution. In the Michigan-style land and Reitman’s cognitive systems approach [47.12, LCSs, on the other hand, each individual only applies 13]. Goldberg solved a gas pipeline control task with in a subspace of the overall problem and only the whole a simplified version of the cognitive system archi- set of rules that evolves constitutes the overall problem tecture [47.8, 14]. Despite these successful early ap- solution. Figure 47.1 illustrates this fundamental con- plications, a decade passed until a growing research trast between the two approaches. community developed that worked on learning classi- As a consequence of this contrast, Pitt-style LCSs fier systems. usually apply rather standard GA approaches. The whole population of rule sets is evolved. For fitness 47.1.2 The Pitt and Michigan Approach evaluation purposes, each set of rules needs to be evaluated in the problem environment addressed. On Two fundamentally different LCS approaches were pur- the other hand, Michigan-style LCSs need to con- sued from early on. The Pitt approach was fostered by tinuously interact with an environment to sufficiently the work of De Jong et al. [47.10, 15, 16]. On the other evaluate all the rules in the rule set – essentially ex- hand, the Michigan approach developed in the further ploring all the environmental subspaces to make sure all years at Michigan under the supervision of John H. rules can develop a sufficiently useful fitness estimate. Holland [47.11, 14, 17, 18]. Diverse perspectives on the This continuous interaction and the typical interacting Michigan approach can be found in [47.19]. components of Michigan-style LCSs are illustrated in The essential difference between the two ap- further detail in Fig. 47.2. Due to the continuously de- proaches is that in the Pitt approach rule sets are evolved veloping fitness estimates, often a more steady-state, where each particular rule set constitutes an individual niched GA is applied online in Michigan-style LCSs. for the GA. In contrast, in the Michigan approach one The undertaken updates then depend directly on the cur- set of rules is evolved and each rule is an individual for rent interaction and thus on the current subset of rules relevant in the experienced interaction. The steady- state, niched GA optimizes the internal knowledge a) Pitt approach base iteratively depending on the incoming learning samples. Rule set Rule set Rule set Rule set

Rule set Rule set ... Rule set Learning classifier system architecture atE|47.1 | E Part Problem instance Actions Evaluation Evolutionary learning component (state information) & (classifications) of individuals quality feedback Population Rule selection, reproduction, mutation, Condition Action Reward recombination, and deletion

Environment (problem) C1 A1 R1 C2 A2 R2 Reinforcement learning component b) Michigan approach C3 A3 R3 C A R Reward propagation & 4 4 4 rule evaluation C5 A5 R5 ...... Action decision making CN AN R Rule set N (behavioral policy) (population of classifiers) Problem instance Action Reinforcement (state information) (classification) feedback Problem instance Action Evaluation (state information) & (classification) Environment (problem) of classifiers quality feedback Environment (problem) Fig. 47.2 LCSs consist of a knowledge base (population of classifiers), a genetic algorithm for rule structure evolution, Fig. 47.1a,b While the Pitt approach to LCSsevolves and a reinforcement learning component for rule evalua- a population of sets of rules, in the Michigan approach tion, reward propagation, and decision making. The system there is only one set of rules (i. e., the population) that is interacts with its environment or problem iteratively learn- evolved ing online XCS mod- .After ]. After learning 4 and GA set a mile- ] technique [47. 22 classifier sys- ZCS implementation as well as their s are often ap- shows the basic is applied to the action set SARSA CS1 Wilson LCS XCS GA s, 47.2 LCS LCS LCS ] :the consequently is an interac- 20 s evaluate their sets of rules LCS classifier system. We then also LCS [47. and the now most promi- LCS exchanges rules and rule-structures s, including XCS(F), have produced XCS ZCS GA research by introducing the zeroth level XCSF. After that, we provide spot-lights community, offering with LCS scenarios in which reward estimates need ]. Both systems were explicitly compared RL LCS 21 RL In the following, we now first give a precise in- A Michigan-style The earliest Michigan-style tive, online learning system.of It classifiers maintains as a its knowledge population base.steady-state genetic It algorithm applies for a gradual rule niched, structure evolution; it appliesponent a for gradient-based rule learningfitness evaluation estimates. com- – Michigan-style yielding prediction and plied in to be propagated andon action decisions the are made learnedcase, based typically reward techniques prediction similar to estimates. In this or Q-learning are applied. Figure stone in components of a Michigan-style two learning classifier systemsfunctions with that a can compact highly learnrepresentation. generalized Q-value rule-based troduction to the highly competitive problem solutions,to when compared other machinealgorithms. learning techniques and regression introduce the real-valued version for solvingproblems, regression with a Gaussianmations, mixture i. e., of linearon approxi- various currentous application types of domains where vari- ecutes it. Classifiersexecuted in action the constitute match the set current that specify the within and across the sets of rules. typically independently of eachproblem. The other in the provided various early applications of classifier system is the introduced cognitive system interactions. to the very well-known Q-learning [47. tem [47. from the feedback is received, thein predictions the of action the setdiction classifiers are estimates, a adjusted. fitness Fromclassifier. estimate the Finally, is the classifier derived steady-state pre- formatch each set or the population as a whole. The ifies classifier structures by reproducing,recombining mutating, and well-performing classifiers anding by ill-performing delet- ones.approach, In Pitt-style contrast to the Michigan nent and well-known clas- action imple- consists s, fitness s, on the about the LCS LCS LCS estimate. The and an LCS . Michigan-style of those classifiers (that is, a finite set fitness s evaluate and opti- offline cognitive system itions match. Next, the condition LCS and a match set s usually learn online about that problem instance. The action s. LCS ], we now provide a general sketch prediction 4 LCS GAs. In consequence, typically larger, match population of classifiers – usually consists of a s evaluate and optimize their set of rules online s are that they evolve competing global problem ,aswellasa Michigan-style The knowledge representation of an In summary, Pitt-style problem the system is applied to. Each rule – or sifier condition part specifies the problemthe subspace classifier in is which applicable.satisfied When given the a condition particularfier part problem is is instance, said a to classi- part part specifies an action thatsification may that be executed, may or bethe a expected clas- tested. reward, The or feedback predictionified value, action given specifies was the executed under spec- theconditions. specified The contextual fitness estimates the value offier this classi- relative to other, competingapproaches, classifiers. fitness In was the often early diction simply value. equal In to the the currently established pre- typically estimates the accuracy of the prediction. a problem, iteratively perceiving orproblem actively instances. Given generating a particular problemfirst, instance, the system forms a system decides on an action or classification and ex- in the population whose cond of rules). This population ofresents classifiers the essentially rep- current knowledge of the of the knowledgeMichigan-style representation typically found in of a finite mize their rule setsinstances. globally based They on usually sets of learn problem other hand, are designed tocally develop one optimized distributed, lo- problem solutiongradient-based by approximation techniques combining with local state, steady- niched more distributed sets of rulessolutions develop with yielding potentially problem 1000 s of rules. Because an exemplaryalready knowledge discussed representation for the was early 47.1.3 Basic Knowledge Representation mentation of [47. LCS while interacting with theing problem, problem iteratively instances. perceiv- The majorLCS qualities of Pitt-style solutions in therule structure form optimization of issmall used sets sets – typically of of evolving rules rules. (10 s). Evolutionary Michigan-style Evolutionary Computation

Part E

Part E | 47.1 964 Learning Classifier Systems 47.2 XCS 965

47.2 XCS

Wilson introduced the XCS classifier system in structures are iteratively optimized by the steady-state 1995 [47.21]. The two main novel features of XCS GA, the estimates are adjusted using the Widrow–Hoff in comparison to earlier Michigan-style LCSs are its delta rule [47.26] based on an approximation of the Q- accuracy-based fitness estimation and its niche-based value signal. application of the evolutionary component. The intro- While condition and action parts can be generally duction of accuracy-based fitness essentially decoupled represented in any way desired [47.25], in this overview the classifier fitness estimate from the reward pre- we focus on binary problems and a ternary representa- diction, enforcing that XCS learned complete payoff tion of the condition part. Conventionally, the condition landscapes rather than only estimates for those sub- part C is coded by C 2f0; 1; #gL, where the # symbol spaces where high reward is encountered. In addition, matches zero and one. Condition C essentially speci- Wilson related XCS directly to Q-Learning [47.21, fies a hypercube within which the classifier matches and 22]. Much later, even a relation to Kalman filtering can be said to cover a certain volume of the complete and general regression tasks was made mathemati- problem space. Action part A 2 A defines an action or cally explicit [47.23, 24]. The niche-based GA repro- classification from a provided finite set of possible ac- duction combined with population-wide deletion en- tions A. Reward prediction r 2 R estimates the moving abled a much more focused GA-based optimization average of the received reward in the recent activations of classifier structures as well as the generalization of the classifier. Reward prediction error " estimates of classifier structures based on the sampling distri- the moving average of the absolute error of the reward bution [47.25]. In consequence, XCS is an LCS that prediction. Finally, fitness f 2 Œ0; 1 estimates the mov- is designed to evolve not only the best solution to ing average of the relative accuracy of the classifier a problem, but it evolves all alternative solutions with compared to the competing classifiers in the activated associated Q-value estimations and variance estima- match sets (or action sets). The larger the fitness esti- tions of the respective Q-value estimates. Due to its mate, the on average larger the accuracy of a classifier GA design and fitness definition, XCS strives to ap- in comparison to all classifiers that encode the same proximate the full Q-table of a problem with a maxi- action and whose condition parts define overlapping mally accurate and maximally compact classifier-based subspaces. representation. Each classifier also maintains several additional pa- Despite its original strong relation to Q-learning and rameters. The action set size estimate as estimates the RL in general, XCS has also been applied successfully moving average of the action sets the classifier was to classification problems and regression problems. In part of. It is updated similarly to the reward predic- the former case, XCS identifies locally relevant features tion r.Atime stamp ts specifies the last time the 47.2 | E Part for the generation of maximally accurate classification classifier was part of a GA competition. An expe- estimates. In the latter case, XCS optimizes the distri- rience counter exp specifies the number of applied bution and structure of local, typically linear estimators parameter updates. The numerosity num specifies the for a maximally accurate approximation of the func- number of (micro-) classifiers, this macro-classifier tion surface. Thus, despite its original strong relation actually represents – mainly for saving computation to RL, XCS is a much more generally applicable learn- time. ing system that can solve single-step classification or Learning usually starts with an empty population. regression problems as well as multi-step RL prob- The problem faced is sampled iteratively, encounter- lems, which are typically defined as Markov decision ing particular problem instances s 2 S.Thesetofall processes. matching classifiers in the classifier population ŒP is termed the match set ŒM. If some action in A is not 47.2.1 System Overview represented in ŒM,acovering mechanism is applied. Covering creates classifiers that match s (inserting #- XCS evolves one population of classifiers. Classifier symbols in the new C with a probability P# at each structures are optimized by means of a steady-state GA. position) and that specify the unrepresented actions. ŒM A classifier consists of a condition part C, an action essentially contains all the knowledge of XCS about the part A, reward prediction r, reward prediction error ", current problem instance. Given ŒM, XCS estimates the and fitness f estimates. While the condition and action payoff for each possible action forming a prediction ar- of r (47.3) (47.2) .Fig- R of each " is applied. are updated  A GA classifier system Œ XCS The "/ ; / value depends on the most recent A j . is updated by " /: R  P Fig. 47.3 learns iteratively online. With each iteration it forms a match setthe given current problem instance. Next, it chooses an action orand classification applies it. After the perceptionfeedback, of the classifiers in thesponding corre- action set and the steady-state After that, the next problem iteration proceeds r ]. A  Œ A  21 essentially applies Q-learning updates,  2 is updated by j A max  in classification problems and specifies a learning rate, which is typically A  ˇ. ˇ. [47. Œ 2 1  M' ... R R R R / 1 XCS C C C Reward ; D illustrates the iterative learning process applied " r R A Next problem instance 0 2 1 M' Œ . ... . A A A  Action P D

2

Next Match is, the more the with (P') r  Rule Evaluation In particular, the reward prediction error " 47.3 2 1 M' match set ˇ ... ˇ C C XCS C Condition problem interactions. Next, the reward prediction set to values between 0.05 andof 0.2. The higher the value each classifier in where Note that update only considers the immediate reward where Q-values are not approximated by abut tabular by entry a collectionarray of rules expressed in the prediction ure in multi-step reinforcement learning problems. Parame- ter in To evaluate thetheir classifiers, parameter itness estimates is estimate. and Parameter crucialtively derive updates in a to are respective relativetion update action applied error fit- sets. itera- isfitness. Usually, updated the Other before predic- order. parameters the may prediction and be the updated in any classifier in A max γ .  P(A) + Value / updates R M A Œ . 1 A ... R R P (47.1) Reward feedback . Thus, A 1 s A ... A A 2 Action A action set Reinforcement Reproduce 1 A ... Action set C C max Condition ,an  – the Q-learning ; A C Steady-state f : action XCS Choose R genetic algorithm f cl :

1 2 M may be applied, repro- , and the next prob- ... R R R cl r Reward : R Action  , but choosing classifiers  cl 1 2 M M 2 ... GA A A Œ A A  Action Œ Environment (problem) Environment 2 R relies on exploring the com- M Œ cl (classification) 1 2 M Match set 2 ... ^ C C C cl A Condition derived from the resulting match ^ D XCS A A /

: Insert & delete D A cl A . single–step problems Match : computes the fitness-averaged Q-value P cl P / 1 2 3 4 5 N ... formed is updated according to the esti- R R R R R R A P ]). Reward .  P , for deletion. In classification problems – of- 27 1 2 3 4 5 N A D ... / A A A A A Œ A value with the aim of maximizing informa-  Action / P A can be used to decide on the currently most A " Œ . . 1 2 3 4 5 N Problem instance ... Population is formed, which contains all classifiers in C C C C C C / P Any action selection policy may be applied, such After the choice of an action (state information) Condition A  . A where classifier parameters are addressednotation. using the dot ray P ducing two classifiers in Moreover, the steady-state ten also termed from estimates for each action in the current state P promising action. as choosing thepectation. action Because with the largest Q-value ex- plete problem spaces,all however, actions itternatively, are is also important applied thebe that sufficiently prediction considered frequently. error forexample, estimates Al- action that may selectionaged action – with choosing, the for highest fitness-aver- tion gain (seeveyed recently also in more theature [47. computational elaborate intelligence techniques liter- sur- lem instance maythe be maximum perceived. In conjunction with that specify thesen chosen action is action.form executed, Moreover, of feedback the scalar is cho- reward received in the Œ set, the mated Q-value signal, which is Evolutionary Computation

Part E

Part E | 47.2 966 Learning Classifier Systems 47.2 XCS 967

To update the fitness estimate of each classifier unbalanced sampling. The higher the threshold GA is, in ŒA, a current scaled relative accuracy 0 is deter- the slower evolution proceeds, but also the less prone mined. XCS is to unbalanced problem sampling [47.29]. ( The steady-state GA first selects two parental clas- 1if"<"0 sifiers for reproduction in ŒA. While this selection    ; (47.4) D "  process was done by proportionate selection based on ˛ 0 otherwise " fitness in the original XCS, more recently it was shown  num that tournament selection can improve the robustness of 0 D P : (47.5) cl: cl:num the system highly significantly [47.30]. Tournament se- cl2ŒA lection in XCS chooses the classifier with the highest fitness from a tournament of randomly chosen clas-  essentially measures the current inverse error of sifiers from ŒA. The tournament size is usually set a classifier. "0 specifies the targeted error below which relative to the current action set size jŒAj to jŒAj.Two a classifier is considered maximally accurate. 0 then classifiers are selected in two independent tournaments. determines the current relative accuracy with respect The selected classifiers are reproduced generating the to all other classifiers in the current action set ŒA. offspring. Crossover and mutation are applied to the Thus, each classifier in ŒA competes for a limited fit- offspring. The parents stay in the population. Muta- ness resource, which is distributed relative to the current tion usually changes each condition and action symbol accuracy estimates . Finally, the fitness estimate f is randomly with a certain probability . Crossover ex- updated given the current 0 by changes condition and action symbols. Often, simple uniform crossover is applied (exchanging each symbol f f C ˇ.0  f /: (47.6) with a probability of 0:5). However, also more sophisti- cated estimation of distribution (EDAs) algorithms have In effect, fitness reflects the moving average, set- been applied for more effective building block process- relative accuracy of a classifier. As before, ˇ controls ing [47.31]. the sensitivity of the fitness estimates to changes in the The offspring parameters are initialized by setting population. prediction R, ", f ,andas to the parental values. Fitness The action set size estimate as is updated similarly f is often decreased to 10% of the parental fitness. Ex- to the reward prediction R but with respect to the current perience counter exp and numerosity num are set to one. action set size jŒAj The resulting offspring classifiers are finally added to the population. In this case, GA subsumption may be as as C ˇ.jŒAjas/; (47.7) applied [47.32] to stress generalization. GA subsump- tion searches for another classifier in ŒA that may sub- 47.2 | E Part resulting in an action set size adaptation to changes jŒAj sume an offspring classifier. This classifier must have in an order similar to the fitness changes. Parameters r, a more general condition than the offspring classifier, ",andas are updated using the moyenne adaptive mod- its error estimate must be below "0, and its experience ifiée technique [47.28]. This technique sets parameter counter must be sufficiently high (exp > sub). If such values directly to the average of the so far encountered a classifier is found, the offspring is subsumed, increas- cases until the resulting update is smaller than ˇ (which ing the numerosity of the more general classifier by one is the case after 1=ˇ updates). Finally, the experience and discarding the offspring. counter exp is increased by one. If the GA is applied, The population of classifiers ŒP is maximally of fi- the time stamps ts of all classifiers in ŒA are set to the nite size N. When this size is exceeded after offspring current iteration time t. insertion, classifiers are deleted from ŒP. Fitness pro- portionate selection is applied depending on the action Rule Evolution set size estimates as. Note that tournament selection is XCS applies a steady-state genetic algorithm (GA)for not suitable in this case because a balance in the action rule evolution. Given a current action set ŒA,theGA set sizes is most desirable. The likelihood of deletion is invoked if the average time since the last GA ap- of a classifier is further increased by a factor f =f if this plication (stored in parameter ts)inŒA is larger than classifier is experienced exp > del and additionally if threshold GA. This mechanism is applied to ensure suf- its fitness f is below a fraction ı of the average fitness f ficient evaluation of classifiers, as well as to control in the population. , is is 25 XCS is even XCS XCS XCS XCS should be set to N elevant classifier struc- probability. For binary function in classification es and the resulting max- 01 of the encountered prob- : lly accurate predictors, that is, clas- ize necessary to cover a problem correctness 1000 to assure proper rule evolu- ]. In most cases, though, a sufficiently population size D 36 is the number of bits of a problem in- , 01 l : 31 0 problems, a Together these bounds give estimates on the neces- The considerations above ensure the theoretic RL l,where = sary initial condition volum imal population s space. Forclassifier example, volume given of the 0 need for an initial lem space, the about 10= an approximate polynomial-time learning algorithmproblem in domains with bounded complexity [47. tion. Givenclassifiers that are these assuredin factors to the are be populationand satisfied, identified with for and better high real-valuederations to problem have domains, grow been these quantified, consid- showing that 35]. growth of better classifiers. However,component the may evolutionary still destroy r tures due tomutation nor crossover mutation may be overly andtreme disruptive. In cases, ex- crossover. where highly Thus,need unstructured subspaces neither to may bedistribution identified algorithms can and helpspaces recombined, to [47. estimation identify these of sub- low mutation rate and uniform crossover sufficesuccessfully. to learn However, clearly mutationto is detect more mandatory accurateThus, classifier a structures good over compromiseoffspring time. is is necessary usually tofully ensure mutated destructed. that In butthe the its mutation binary structure probability domain,1 is is for not consequently example, often set to stance. This is a typical choiceused for in the genetic mutation strength algorithmspected – essentially number setting the ofone. ex- attributes that will be mutated to better suited to beities applied in to problems the where targetclassifier regular- function conditions, can that be is, well-represented subspaces in in which the best suited to bebe applied partitioned in into problem subspaces domains withindictions that which yield can simple accurate pre- values. Moreover, 47.2.3 When and How to Apply From the reflections aboveis it designed to becomes learn clear thea target that population function of of loca a problemsifiers. by This target function may be thein Q-value function problems, or also any other type of function. for vol-  clas- (error A Œ works and on the pop- fitness pres- for deletion. reproductive optimal solu-  generalization P XCS will on average Œ  strives to develop complete problem A Œ Works works, successful rule XCS maximally general ]. schema bound XCS maximally accurate 34 XCS [47. ). The resulting problem solution in  0 learns successfully. This section O " Œ . GA strives to evolve a ]. In consequence, it has been put for- XCS 33 ]. Meanwhile, rules are selected in covering bound, which quantifies the need XCS [47. 33 that is represented by , more general classifiers will be reproduced on [47.  than the average condition volumes of classifiers While these evolutionary pressures generally de- The two interacting learning components, which are P Œ evolution still relies on sufficientlynals. accurate Thus, fitness rule evaluation sig- needs to haveestimate enough time rule to fitness before expectedleads rule deletion. to This a for a sufficiently large populationlar size initial given condition a volume. particu- problem Moreover, each can particular be assumedin to terms of have subspace a sizeslearning certain that to complexity need take to be place,below separated that the for average is, deviation of for theceive decreasing payoff an signal initial the to fitness error signal per- towards higherconsequence, accuracy. the In subspace size requiresof the classifiers generation with condition volumessize, of consequently yielding maximally a that ulation size to bewith able such to condition cover volumes. thewith Finally, full a better problem certain classifiers condition space volume needthat to is, be able have to reproductive grow, opportunitiescan before be deletion expected, consequently yieldingopportunity bound a maximally accurate classifiers applyingsure a scribe how the below the threshold sifiers that are meanwhile reproduction but they are selected in Since the classifier conditions in points to relevant literature thatout quantifies the intuition. sketched- gradient-based rule evaluation andrule evolutionary-based evolution, are stronglylutionary interactive. point From of view, an severalyield evo- evolutionary particular pressures learning biases.designed Since to reproduction maximize is fitness, 47.2.2 When and How From the descriptionderstand above why it may seem hard to un- provides intuition about when and how representation was previously termed the tion representation solution cover a largerume subspace, i. e., theyin have a larger ward that average (when ignoring the fitness pressurement), for the yielding mo- apressure sampling-dependent Evolutionary Computation

Part E

Part E | 47.2 968 Learning Classifier Systems 47.2 XCS 969 function values are approximately equal should be com- 47.2.4 Parameter Tuning in XCS pactly representable with few classifiers. Overall, XCS thus strives to develop distributed problem solutions in While XCS does, indeed, specify many parameters, the form of a set of locally partially overlapping classi- only few parameters are really crucial. All other param- fier structures, which cover the whole sampled problem eter values can typically be set to standard values. Here space in a generalized way. we discuss some rules of thumb for tuning the critical As long as a condition representation can be cho- parameter settings and also provide standard settings. sen that identifies expectable regularities in a data set While the following recommendations have not been or also in a reinforcement learning problem well, XCS published elsewhere so far, they can be derived from is a good candidate to optimize these local condition observations and other recommendations found in the structures iteratively online. However, also in offline, literature [47.25, 35, 40]. data mining-based classification problems XCS was The two most important parameters are the maximal applied successfully and it was shown that the gen- population size N and the strived-for error threshold "0. eralization and accuracy performance XCS yield is The larger the population size N is, the more capacity comparable to other state-of-the art machine learning XCS has for learning and thus the more complex prob- algorithms [47.25, 37], such as decision tree learners, lems XCS can learn. On the other hand, the larger N instance-based classifiers, or support vector machines. is, the slower XCS learns, because it reproduces and Thus, XCS may be applied to multi-step Q-learning deletes only two classifiers in a typical learning itera- problems but also to single-step classification problems tion. Parameter "0 specifies the targeted approximation and general regression problems. Online generalization error. In continuous function approximation problems, and optimal condition structuring for accurate predic- smaller "0 values demand finer problem space partition- tions are the major features of XCS. From a regres- ings and thus larger population sizes to cover the whole sion perspective, XCS is a non-parametric regression problem space and to enable reproductive opportunities algorithm that strives to minimize the expected abso- (see above). Moreover, "0 can partially determine the lute function approximation error, or also the expected fitness signal available to XCS:if"0 is chosen very squared function approximation error as put forward small, (47.4) will yield values very close to zero for elsewhere [47.24]. all highly inaccurate classifiers. Thus, overly small "0 The two components, (a) gradient-based rule pre- values should be avoided. In noisy problems, "0 should diction approximation and evaluation and (b) evolution- thus also not be chosen much smaller than the standard ary rule structure evolution, are the key to successful deviation of the noise expected in the function value XCS applications. With respect to rule structure evolu- signal. tion, also the XCS system strongly depends on distance Without much knowledge of a problem, one may representations, which can be compared with general start with a rather small population size N – say 1000 – 47.2 | E Part kernel representations as used in support vector ma- and evaluate learning progress in this setting with a de- chines and elsewhere [47.38, 39]. As long as the repre- sired "0. If the generated approximation error over time sented kernel-based condition structures can be mean- does not decrease, then "0 should be set to about 1=10 ingfully modified by genetic operators, evolution and of the encountered error. Next, the population size N thus also XCS can be applied. Meanwhile, also sensible should be progressively increased, for example, to N D value predictions need to be generated. Gradient-based 5000 or more. If still no error decrease is observed, fur- methods work best to approximate these predictions, ther analysis is necessary. If the population is filled with whether the prediction is a single value, is computed classifiers but the match set sizes are very small (be- linearly or polynomially from input, or its structured low 5), better classifiers probably do not receive enough otherwise depends on the problem at hand and the reproductive opportunities. In this case, first the initial gradient-based approximation approach available. The condition volume should be increased – for example, in more the prediction structure fits with the regularities in the binary domain the probability P# would need to be the target function, the faster and more robust learning increased (up to close to 1). If the match set still de- can be expected. While such structural considerations creases to sizes below 5, the problem is rather hard, can improve system performance, the successful appli- requiring a further population size increase. On the cations of XCS to various problem domains show that other hand, if the match set sizes are very large (above successful learning is usually not precluded by subopti- 100), then over-generalization takes place and XCS ap- mal structural choices. parently does not pick up the fitness signal. In this case, 1, 1, : de- 0 D l right- D ˛ ı develops 1, illustrates ]. Finally, 20, D 44 l,where  XCS D 47.4 = l, 1 del = ) [47.

1 in various problem D 9,  D : RLS in this case disruption 0 XCSF. 4.  bottom.Onthe left : 0 D 1, LWPR :  D 0 XCSF. Figure

D in 25,  0 I " D M Œ ]. A comparative study has shown 5, and : GA 45 0

], often yielding better problem space 1000, 5, D 46 # can outperform D D is thus a regression system that solves func- P  N ), which is rather well-known in the robotics 0) – especially when tournament selection for 20, : 2, : is: 1 0 XCSF D XCSF to the match set D  D sub LWPR  A the action part ofparts the of system is the removed, algorithmŒ applying that the were previously applied to tion approximation problemsoverlapping by locally weighted developing projections in partially thea form population of of classifiers. In this form, changing its classifier condition structure tovalued accept input. real- Moreover, thepredicts prediction single part no values, longer from but the it input computes usingsuch linear its as approximation prediction recursive techniques, least squares ( problem solutions that areby the similar locally-weighted projection to regression algorithm those( developed

for example, a mutation rate of the iterative learning process in community [47. that domains [47. partitionings, as well asapproximations with more a accurate comparable function numberual value of locally individ- linearXCSF, approximators each classifier (i. specifies e., inspace its classifiers). within condition which the In it sub- ismay applicable. be Thus, compared the with condition a receptiveneural field determining the of thefier classifier. specifies Moreover, a each linear classi- subspace. approximator weighted In within effect, its lem the is function approximated approximation bylinear prob- locally-weighted, approximations. overlapping While typically the weighting is ˇ notes the conditionCrossover can size, mostly is( be a applied without good restrictions rule of thumb. reproduction is chosen because is often preventedOther by parameters choosing can two bedardized equal safely values. classifiers. A set typical toXCS initial somewhat parameter stan- setting for , ], by 41 43 ]. XCS 29 plotted. For visualization purposes, the receptive field sizes are plotted value program learning to approximate the crossed ridge function. Current performance Corresponding classifier conditions have higher fitness values to say 100, 500, or RLS update can also be crucial in Steady-state Reproduce genetic algorithm GA XCSF GA

hould be decreased. If this application rate should be

Darker 1 2 M for function approximation ... P P P Prediction top. left The current approximation surface is approximated on the GA in 1999, introducing Michigan- 1 2 M ... C C XCSF XCS C Match set Condition Wilson Screenshots of the classifier system learns linear value predic-

Insert & delete (problem) Environment Match Problem instance function) of (argument classifier system for real-valued inputs was s to the real-valued problem domain [47. the classifier condition structures are XCSF XCSF LCS XCS 5 1 2 3 4 Several other parameter settings may be checked N ... P P P P P P ]. It was further enhanced to approximate continu- The Prediction 42 ous real-valued function surfaces in 2001/2002 [47. )(XCSF essentially enhances and modifies yielding an iterativeregression online system. learning non-parametric introduced by style hand side smaller than their actual size. values are plotted on the even higher. An increase in Fig. 47.5a,b 47.3 The the initial condition volume s does not help, then the decreased to enforce ation more before accurate evolution applies. classifier This evalua- canby be accomplished increasing the threshold problems where the problem domain isunevenly, as sampled is highly studied in detail elsewhere [47. as well; thehigh. mutation As stated rate above, should in the not binary be problem set domain, overly Evolutionary Computation 2 3 4 5 1 N ... C C C C C C Population Condition Part E Fig. 47.4 tions and usuallyactual does function not value, specify whichmators is of actions. the used matching The to classifiers.estimation feedback update The updates consequent the is error are linear and then the fitness approxi- considerednent for in further the optimization evolutionary of compo- the condition structures

970 Part E | 47.3 Learning Classifier Systems 47.3 XCSF 971

a)

b) atE|47.3 | E Part , , ], D di- 56 55 N 2 XCS (47.8) x was also or 52]. The , /; 1 : 2 2 x 45 « XCS 50x // 2 2  x . C 2 1 exp x 005 on this function, : . is able to yield a good /; 0 5 2 1  D . ]. 10x 0 in some cases due to its more

29  XCSF . 25 exp [47. : XCS exp 1 ˚ s have been evaluated and applied GA

with a maximum population size max LCS shows that D ]. A similar performance was achieved with / 2 25 x XCSF outperformed these competing techniques in 47.5 ; 1 x , outperforming . applications Pitt-style 1 f to data miningtypical problems even offline-learning more scenarioing extensively. The faced particularly inalso suits data the the min- factstrived for Pitt that is advantageous for often approach.than the very Pitt 10 However, approach. More compact years rule ago, sets the are GALE architecture [47. 57] yielded verylection good performance of results datasetsdistributes on its from a evolutionary col- the processniching UCI biases adding due repository. additional toof GALE a individuals. grid-based A spatial comparative distribution study of GALE, mension due to the non-linearitiesextending caused to by the the four ridges sides.input Towards space, the the corners of function the tivettens fla fields out become so increasingly that wider. the recep- We ran 4000 and a target error rithms, such as support-vector machines,learning, decision naive tree Bayesmented classifiers, in the and WEKA others machine learning imple- tool [47. XCSF many cases –hand [47. often depending on the problem at UCS applying a condensationFigure mechanism late in the run. function approximationevolving very classifier structures early learnthe in to problem suitably space the into partition local subspaces.a run. In consequence, smooth The overall approximationNote how surface the inverse exponential is hill in generated. theproximated center is with ap- nearly circularthe receptive fields fields, are while selectively elongated in the further enhanced to beanced able datasets to in data deal mining with domains byadjusting highly automatically unbal- the threshold thatGA controls the frequency of accurate classifier prediction estimates. function contains a mixspaces. of It is linear specified and in two non-linear dimensions sub- as follows learning community for many years [47. ]. is 54 were crossed XCSF ](seefur- UCS 51 have shown to the UCS is a highly flexible s have proven to be XCSF and s then typically learn to LCS XCSF XCS LCS ] determines classifier predic- realm, the addressed data usually 53 ]. Moreover, the linear approxima- . Both, 49 ]. Finally, it is also possible to cluster LCS , system was also converted to an of- 50 XCS 48 classifier system was successfully applied , ]. In effect, various condition structures have 35 XCS 47 XCS function – a function that has been used as ) algorithm [47. The With this structure it has been shown that As an example, we applied UCS tions may be enhancedand to others polynomial [47. approximations mine the data by predictingseen the class data likelihoods instances, of as un- wellrelevant as features by and identifying thecation. feature most Particularly interactions Pitt-style for classifi- in this domain. fline learning system; the( sUpervised classifier system highly valuable in dataalso mining the applications. However, ther details below). Thus, been applied, including rectangular structures withwithout and rotation and withtation [47. various forms of represen- consists of a set ofspecifies a data set instances, of where features eachdata and a instance instance corresponding belongs class the to. very well suited for developing any typeture of [47. kernel struc- a contextual space withing conditions, while linear approximat- (or other)inputs. predictions For example, given the totally velocitycan different kinematics be of predicted an locally arm dependentconstellation for on redundancy resolution the [47. angular arm 47.4 Data Mining Data mining is a rathererally large addresses field the of challenge research of thatfrom extracting gen- data. knowledge In the fitness-dependent, also a weightingtance based to on the theapplied. center dis- of the classifier condition can be derived from tions and resulting fitnessner. values Meanwhile, the in other learning a aspects of supervised man- effective if not evenvarious superior data mining prediction tasks accuraciesthe – in UCI most machine learning of database them repository taken [47. from When applying always thecomparing same with standard various setting other and decision making algo- system with whichtion other and prediction modifications parts in of thehighly classifiers the vital may system still condi- applications. yield ridge a benchmark in the neural computation and machine Evolutionary Computation Part E

972 Part E | 47.4 Learning Classifier Systems 47.5 Behavioral Learning 973 and other machine learning algorithms can be found example, windowing techniques select subsets of data in [47.58]. instances to speed up the classifier evaluation process. The GAassist architecture [47.59, 60] develops Fitness surrogates were used to make the fitness es- a priority list of classification rules. The advantage of timation even cheaper [47.66]. Hybrid methods were GAassist is its developing compactness. A compara- already mentioned above; they combine traditional GA tive analysis with XCS is provided in [47.61]. Later, operators with informed ones, as is done when ap- the architecture was enhanced with ensemble learn- plying memetic algorithms, which locally improve the ing techniques [47.62] and memetic algorithms [47.63], developing classifier structures when applied to LCSs. proving high scalability and fast learning of very com- In combination, such techniques can yield LCSsthat pact rule sets. not only produce highly accurate classification perfor- Recently, many efficiency enhancement techniques mance and good generalizations, but they also offer from the GA literature (cf. [47.64]) and from other solution interpretability allowing mining of the knowl- fields, including bioinformatics and systems biol- edge developed in the LCS rules, and they generate ogy [47.65] were applied to various LCSs. These tech- these results without requiring much computational niques can help tremendously to improve the learning time – which is often comparable to the time needed speed of LCSs, particularly in data mining realms. For by much simpler machine learning techniques.

47.5 Behavioral Learning

While the application of LCSs to data mining problems weighting based on the relative accuracies, which are will certainly still produce many further impressive re- normalized to one, assures that these Q-value esti- sults and promises to yield novel, deep insights into data mates on average do not over or underestimate the structures, LCSs were originally designed as cognitive expected Q-value. Moreover, since Q-learning is an systems. Thus, in the following we will focus on LCSs off-policy learning technique, XCS is well-suited to as cognitive systems, their structures, and their poten- be combined with it because also XCS benefits from tial as neural cognitive models. As had been sketched exploring all possible state–action combinations in out above, the XCS classifier system in particular was the long run – striving to develop an approxima- compared with Q-learning in RL. We start from this tion of the complete Q-value function in the problem perspective and detail various successful applications of space. XCS in reinforcement learning problems. Next, ALCSs As a result, XCS has been successfully applied to are surveyed. ALCSs learn generalized cognitive maps learning optimal paths in various maze environments. that are suitable to apply Sutton’s Dyna algorithm and Starting from the Woods1 and Woods2 environments 47.5 | E Part value iteration techniques in general. A strong relation proposed by Wilson [47.20, 21], XCS’s performance to factored RL techniques was pointed out recently in and generalization capabilities have been investigated this respect [47.67]. Finally, robotics applications of in various mazes [47.68]. For illustrative purposes such LCSs are discussed and their potential is revealed. mazes are shown in Fig. 47.6. These maze environ- ments provide information about the surrounding grid 47.5.1 Reward-Based Learning with LCSs cells, indicating whether they are either free or occu- pied by an obstacle or by food. Reaching the latter cell From the beginning [47.2] a big appeal to LCSslayin usually results in a reward trigger. Movements are typ- the fact that they are designed for reward-based learn- ically possible to the eight surrounding cells, yielding ing. Once the original bucket-brigade algorithm was a rather large action space. The point of providing sen- replaced by Q-learning techniques, a theory developed sory state information rather that cell IDs or coordinates in the RL community also applied to LCSs to a certain is that XCS is then able to exhibit its generalization extent. capabilities. It essentially manages to generalize over In XCS, in particular, it was shown that the sys- the sensory state space ignoring irrelevant bits and gen- tem approximates the Q-value function by a collection eralizing over the states with respect to state–action of classifiers. The prediction array (47.1) calculation combinations that yield the same reward. essentially approximates the current Q-value estimates Performance in many of these environments has for the current state in the environment. The fitness yielded extreme generalization capabilities. For exam- s s, ]. an- XCS illus- 25 Stolz- s) are LCS ALCS , that gen- ], various s reached 47.7 techniques 74 ]. However, ALCS message list XCS LCS 73 LCS ALCS ] of the encountered en- 72 s are typical Michigan-style s have a state prediction or [47. ALCS architecture. ALCS , other Michigan-style ]. In this case, however, reward learn- originally proposed an 71 , ALCS XCS part that predicts the environmental changes 70 cognitive map Classifier Systems architectures were developed. Particularly in s. However, in contrast to the usual classifier struc- Besides condition–action Michigan-style The generalization capabilities of Rick Riolo have been appliedfor learning for cognitive maps. behavioral Such anticipatoryclassifier learning learning systems are and surveyed in the also following. such as the ing was maximized andapproximation no developed. complete Nonetheless, this Q-value work function stituted con- one ofrobotics the domain. first successful application in the even as farsimple as light being following behaviorform successfully on [47. applied a to real control robot plat- derive classifier fitness estimatestheir from predictions. the However, accuracy the of patory accuracy state of predictions the are considered, antici-accuracy rather of than the the reward prediction. Figure vironment online. LCS ture, classifiers in in the environment causedfied action when in executing the specified the context. speci- As in ticipatory 47.5.2 Anticipatory Learning Anticipatory learning classifier systems ( learning systems thatmodel or learn a generalized predictive ALCS maze problems, optimal behavior was achieved with sults in much morelearning stable performance and and generalization successfulthe in establishment problems ofthe long that reward learning reward require by down-scaling chains. updatesrate of It and inaccu- unreliable classifiers. stabilizes Consequently, these rules do not tend toprogress over-estimate reward, is and stabilized. thus As learning with a further gradient-based consequence, rewardalso predictions successfully applied updates towhich was blocks even world more generalizations problems, are in possible [47. trates the typical structuresapply and in learning an processes that erated its cognitivestorage map system, which mediated was also byinal used in a classifier Holland’s orig- system architecture [47. this approach appearedto to enable not be any sufficiently serious elegant learning. Starting with ’smann anticipatory classifier system [47. (47.9) F )upto90 )to ] the encoding 47.6 36 47.3 : learned the optimal f was still able to learn : in these scenarios, one cl Maze6  1 b)  XCS XCS f A Œ XCS literature for generalized rein- 2 cl P LCS ]. The gradient term essentially re- / r is able to identify those aspects of the ]. Thus, 69  potential sensory state encodings. Also 25 F 30 XCS Two highly typical maze environments used ˇ. C r -controlled agent perceives information about the

To successfully apply r LCS Woods1 a) for each bit was changedsuch to as a the nested parity Boolean function. function, rather noisy actioning outcomes success. did Later, not itgeneralizations preclude was are learn- even shown possible thatsensory when highly encoding each is effective bit relevant. in In the [47. the optimal Q-valueout function, generalization while failed miserably Q-learning due tospace. with- the Thus, large state available sensory information that are relevantrate for reward accu- predictions. The exact derivation ofthe this literature equation [47. can be found in ple, in the Maze6 environment (Fig. Fig. 47.6a,b Q-value function inmore a than 10 problem space that contained irrelevant bits weredomly introduced, while which interacting changedlearning with was the ran- slightly environment. delayed While size and was a needed largerQ-value population for function successful was learning, stillteractions extracted the [47. from optimal iterative in- as benchmarks in the eight neighboring cells encodingcells by free, means blocked, of andments two food to bits. each The of agentyield these can no cells. execute reward. Movements A move- to movementward blocked and to a cells the reset food of cell the agent triggers re- forcement learning. Woods1 is athe toroidal food maze. location Inthe Maze6 is much harder to find. In both cases, crucial modification was necessaryvalues and to thus the stabilize derived the fitnessthe values: Q- the classifier update predictions of had tothe be error gradient further factor, modified converting ( by Evolutionary Computation Part E

974 Part E | 47.5 Learning Classifier Systems 47.5 Behavioral Learning 975

Fig. 47.7 Instead of the condition– ALCS action–reward prediction rules in Potential typical LCSs, ALCSs encode and de- RL update Next velop condition–action–effect rules. Population Match set Choose Action set R+γ maxA match set Condition Action Effect Match Condition Action Effect action Condition Action Effect Typically, the structural optimization P(A) Condition Action Effect C1 A1 E1 C1 A1 E1 C1 A1 E1 ...... of these rules is done by a combi- C2 A2 E2 C1 A1 E1 C2 A2 E2 CA AA EA ...... C2 A2 E2 nation of evolutionary and heuristic C3 A3 E3 C A E ...... M M M C A E C4 A4 E4 M' M' M' techniques C5 A5 E5 ...... Anticipatory CN AN EN learning process Match with (P')

Problem instance Action Reinforcement Next problem (state information) (classification) feedback instance Environment (problem) various ALCSs [47.75–79]. To prevent the development optimal behavior control. Moreover, the reward-based of overgeneral models for concurrent reward learning, generalization mechanism in XACS is directly based the reward learning process was often decoupled, yield- on the XCS classifier system, thus enabling the in- ing a system that learns a cognitive map based on corporation of any tools and representations developed LCS principles, and, additionally, a state value esti- for XCS so far. The generalizations that were de- mation system. In combination, DYNA-based learning veloped confirmed the identification of task-relevant techniques [47.80] were applied to improve the state perceptual attributes. In the XCS components, reward- value estimations also offline. These techniques al- distinguishing attributes were identified. In the ACS2 lowed the simulation of animal-like behavioral patterns, component, on the other hand, state prediction-relevant such as reward adaptations based on knowledge about components were detected. In consequence, general- the behavioral consequences in rats in a T-maze envi- ized detectors for prediction with respect to reward ronment [47.81], as well as in controlled devaluation and state could be distinguished. The implementa- or satiation experiments [47.82]. In these studies it tion of other anticipatory mechanisms in XACS,such was also pointed out that ALCSs do not only allow as task-dependent attentional mechanisms, further in- DYNA-based reward learning updates, but also en- teractions of the learning components, and multiple able the application of search and planning techniques behavioral modules for the representation of multiple for improving behavioral performance of the system. motivations (or needs) [47.84] are still open issues in 47.5 | E Part Even curiosity mechanisms have been added [47.83]to the LCS realm. Further research with ALCSsisex- speed up the learning progress. Most approaches, how- pected to yield highly promising, cognitive learning ever, never generalized the list of states with associated architectures. rewards. The combination of the ACS2 system with the XCS 47.5.3 Controlling a Robot Arm with an LCS system for state-value estimations, terming the resulting system XACS (x-anticipatory classifier system), may We end this section of behavioral learning with the be the one with the most current potential for future XCSF system. Over the last decade or so it has be- research [47.84]. XACS essentially applies two LCS come increasingly clear that XCS is extremely well learning mechanisms: one being an ALCS architecture suited to partition a contextual space for the generation in the form of ACS2, which learns a cognitive model of of accurate predictions. Predictions, however, do not the encountered environment, and the other one being necessarily need to be reward predictions. Behavioral the XCS system, which learns state-value estimations consequences serve just as well as a target for predic- in this case. Figure 47.8 illustrates the components in tions. The forward kinematics mapping in the robotics the XACS architecture and their interactions. domain [47.85] offers even another potential target for XACS has been shown to develop optimal behav- learning. ior in blocks world problems in which other approaches Consequently, XCSF was modified to learn the failed to yield proper generalizations and resultingly forward velocity kinematics of a robotic arm in simu- is XCSF XCSF illustrates 47.9 may very well system com- yielded equally good XCSF XACS ]. Figure learned different classi- 51 The s with the generalizing re- XCSF . Consequently, generalizations XCSF ALCS Fig. 47.8 bines the model learning capabilities of inforcement learning capabilities of XCS in the two system componentstargeted are towards a compact represen- tation for accurate predictions and for reward predictions, respectively. The combined system enables the application of lookahead planning and search techniques for behavioral control as well as oflearning reinforcement techniques and combinations thereof ]. Recent advancements in the explo- 87 1 2 M' ... E E E 1 2 setup for arm control. M' Effect ... R R R Reward Next problem instance 1 2 M' ... A A A Action Next Next Match Match with (P') with (P') 1 2 These results confirmed that XCSF M' 1 2 ... M' match set match set C C ... C C C C Condition Condition be further developed intoture a for cognitive behavioral system control. architec- Whileture this was type probably of architec- notoriginally, the it one may envisioned still by prove Holland highly valuable. Various the fier structures dueencountered. to Nonetheless, the differences in the linearities arm control in both cases [47. able to learnof to a control all humanoidhering seven arm to degrees highly different of constraints effectivelyto while freedom – pursuing certain flexibly motions goal ad- be locations. learned Moreover, in different mappings referenceFor could frame example, representations. end-effector locations weresented in either a repre- Cartesian coordinate systemplus or in angles a encoding. distance angle at all [47. ration strategy, which can becontroller self-induced by during the learning, have shown that 1 A A ... E E feedback Effect max Anticipatory γ 1 A + (S’(A)) learning process ... A A R V Action Reinforcement Value updates Value 1 A ... Action set C C Condition expected Determine choose action reward values & values reward Steady-state Action XCS genetic algorithm ALCS 1 2 M ... projects its condition R R R 1 2 Reward M Insert & delete ... E E E (classification) Effect Reproduce 1 2 M 1 2 M ... A A ... A C C C Action Match set Environment (problem) Environment XCSF next states Condition Predict potential 1 2 M Match set ... C C C Condition Match Insert & delete Match arm posture while pursuing a certain goal 1 2 3 4 5 N ]. To do so, ... E E E E E E 1 2 3 4 5 N Effect ... R R R R R R 86 Reward 1 2 3 4 5 N ... A A A A A A Action CS 1 2 3 4 5 N relaxed ... C C C C C C 1 2 3 4 5 N Problem instance ... Population Population C C C C C C XA Condition (state information) Condition or it may be forced to prevent moving a certain joint parts into theHowever, its joint locally angle linear predictionssmall space receive joint of as movements, input that theand is, robotic predict changes in the arm. joint consequentis, space changes change of in the taskhas end-effector space, location. the This that great mapping advantagegiven that a current it joint is angleonly locally constellation of linear location the so changes arm that be not of predicted joint but angleeffector also movements can can directional be motion invokedforward by of velocity inverting the mappings. the Seeing end- locallyear, that the linear inversion those can be are ratheralgebra lin- easily techniques. done using Given linear aone redundant that arm has more systemgles degrees to – of manipulate) than freedomit actual (i. e., is locations joint possible to an- to movemotion. to add For – example, additional the constraintstain arm to a can the be arm driven to main- lation [47. Evolutionary Computation Part E

976 Part E | 47.5 Learning Classifier Systems 47.6 Conclusions 977

Fig. 47.9 In the published robot arm XCSF for control control applications, XCSF clusters the contextual configuration state of Population Match Match set RLS update the arm and learns linear approxi- Condition Prediction Condition Prediction mations of the average Jacobian in C1 P1 C1 P1 the respective subspaces. In con- C2 P2 C2 P2 ...... sequence, the system can generate C3 P3 CM PM C4 P4 Reproduce both forward predictions of move- C5 P5 ment consequences as well as inverse ...... Steady-state C P Insert & delete N N genetic algorithm control commands when directional movements of the arm are desired Arm state Desired Desired Experienced Experienced (angular motion angular angular effector location configuration) direction motion change change Arm controller/goal generator

Current Current Resulting Resulting Motor angular effector angular effector command configuration location configuration location Environment (problem) neuroscientific evidence points out that similar forward- While the brain may not implement actual evolutionary inverse predictive-control structures may be found in techniques literally, as XCSF does, it appears plausible the cerebellum [47.88, 89]. Only more detailed knowl- that local competitions take place [47.90]. Moreover, edge on cortical and cerebellar structures may allow the it is known that neurons populate novel information direct comparison of the shapes and orientations of the sources once available – as XCSF does. Further re- receptive fields developed by the XCSF system and po- search in neural computation with LCSs may prove tential cortical and neural structures found in the brain. highly valuable.

47.6 Conclusions atE|47.6 | E Part While LCSs have been applied to a wide variety of the development of a hierarchically-structured LCS problems, still there are many potential developments architecture is within our grasp. We expect such a hi- that have not been further evaluated. In the following, erarchical LCS to progressively refine its predictions potential future research directions are summarized. in a hierarchical way. Default rules may gain a cer- At the moment nearly all LCSsareflatinthat tain level of accuracy, but more specialized rules may they develop one population of classifiers (or compet- identify exceptions of the default prediction. Alterna- ing sets of classifiers in the Pitt-style system). All of the tively, the more specialized rules may also simply add classifiers, however, apply to the same problem granu- further accuracy to the default predictions where and larity. Ever since the introduction of LCSs by Holland, when necessary. In the latter case, a hierarchical pre- the development of default hierarchies was envisioned. dictive system may develop that allows the progressive However, so far it was never convincingly or rigor- refinement of activated predictions until the finest pre- ously accomplished [47.19]. Default hierarchies refer diction granularity in the hierarchical representation is to classifier systems in which general rules predict one reached. thing but more specialized rules predict exceptions of When developing hierarchical LCSs, also network the general rule. The emergent development of default LCSs seem to be of vital importance. For example, hierarchies in LCSs remains an open challenge. when developing classifier structures in spatial do- With the most recent understanding of LCSsand mains that are intricately structured, a network structure the XCS system in particular, it seems that at least may provide additional hints on the connectivity of in ,in ]. The , XCSF, XCS XCSF 101 , XCS s can be suc- can be found 100 -based cognitive ]aswellason LCS techniques – such 99 -based reward learn- XCS , s [47. ALCS LCS 23 ] should be further pur- s are highly flexible, it XCS 98 LCS s gain even more recognition LCS ]aswellasfor LCS 102 ]. A more theoretical coverage of [47. 25 problems [47. ]. CC RL on the Michigan side – can be further ex- ]. Several books also give further details on 103 23 in C Even without the addition of hierarchies, network XACS cessfully applied toforcement various learning domains problems, includingmining rein- classification problems, and and regression data problems. or as GALE and GAassist on the Pitt side and and be successfully applied tolem even domains more and diverse challenging prob- research tasks. is possible towith substitute any the other conditionas form of this or a structurea classifier condition can way structure, be that aschanges mutated small in long the and structural defined recombined subspace changesdition within also matches. in which the Similarly, yield con- thebe small prediction replaced with structure any can otherbe prediction quickly structure and that accurately can techniques. adapted Thus, by the suitable available learning particular, learns iterativelydevelopment online, of striving amaximally for compact, accurate the maximally problemtems typically general, solution. learn offline and Pitt-style and arein sys- thus large-scale most promising data miningcompact tasks sets in of which ruleslearning rather are small mechanisms searched for. of Seeing that the map or concept learning and ing promises further research advancements. structures, or anticipations, however, ploited and combined withof novel representations. structures Learning promises and to formsthe be combination robust of due awhich flexible evolutionary component, searches forgradient-based fitness optimal estimation, which quickly ruleuseful yields prediction structures, and fitness and estimations.a It matter the seems of only time until sued. Also, the combination of to factored Java [47. successful applications of source code canXCS be found online, for example, for the system [47. theoretical considerations [47. the approximation approach in in [47. ]. clas- 93 , above – ], which learns ve- 92 96 XCS – from a the- 94 s may be pur- XCSF ]. The lookahead XCSF XCS ]. 97 ALCS 91 sandthe s essentially learn gener- LCS ALCS s can be very effectively applied structures [47. classifiers may allow the application matching process. For example, when ALCS ACT-R LCS is used for goal-directed control – as men- XCS XCSF system matching may proceed from coarse-to- research and has been improved by means of nu- Besides these additions, also A network structure may also enable the speed- alized schemata or production rules [47. planning capabilities, thecapabilities, sensorimotor as generalization wellthese as systems the still ask abstraction forcent further capabilities point development. of that The re- specify the expected stateexecution of changes the perceived specified action. after Suchplied the rules by may the be cognitive ap- sciencefor community example, for learning, tioned above with respect tomay improve velocity the efficiency kinematics of the – systemFurthermore, this tremendously. given a hierarchically networkLCS structured fine-grained levels. All thesethe matching, processes which may is speed oftenLCS considered up a bottleneck in merous approaches over the recent years [47. sued further, as sketchedmodeling out perspective, above. From a cognitive a problem spacewalk is process, sampled overlapping by classifiersidentified within may means a be of classifier directly ing structure a instead a random of global apply- matchingwhen process in each iteration. Also, oretical and application-orientedalso point provides of a view detailed and algorithmic description of up of the sifier system in particular covers of lookahead planning andwas pursued goal-oriented in control early – work in as [47. a network structure can give additional hintssensorimotor on space how the is structured andNetworks may of be traversed. locity kinematics, or generally, contextually-dependent sensory-motor contingencies – assection sketched on controlling out a in robot the arm with 47.7 Books and SourceFurther Code information aboutcan learning classifier be systems (International found Workshop in ontems) the Learning Classifier workshop biannually Sys- proceedings publishedon and the IWLCS topic. yearly A workshops book on the space. Especially the case where Evolutionary Computation Part E

978 Part E | 47.7 Learning Classifier Systems References 979

References

47.1 J.H. Holland: Adaptation in Natural and Artificial What is a learning classifier system?, Lect. Notes Systems (Univ. of Michigan, Ann Arbor 1975) Comput. Sci. 1813, 3–6 (2000) 47.2 J.H. Holland: Adaptation. In: Progress in Theoret- 47.20 S.W. Wilson: ZCS: A zeroth level classifier system, ical Biology, Vol. 4, ed. by R. Rosen, F.M. Snell Evol. Comput. 2, 1–18 (1994) (Academic, New York 1976) pp. 263–293 47.21 S.W. Wilson: Classifier fitness based on accuracy, 47.3 L.B. Booker, D.E. Goldberg, J.H. Holland: Classifier Evol. Comput. 3,149–175(1995) systems and genetic algorithms, Artif. Intell. 40, 47.22 C.J.C.H. Watkins: Learning from delayed rewards, 235–282 (1989) Ph.D. Thesis (King’s College, Cambridge 1989) 47.4 J.H. Holland, J.S. Reitman: Cognitive systems 47.23 J. Drugowitsch: Design and Analysis of Learn- based on adaptive algorithms. In: Pattern Di- ing Classifier Systems: A Probabilistic Approach, rected Inference Systems, ed. by D.A. Waterman, Studies in Computational Intelligence (Springer, F. Hayes-Roth (Academic, New York 1978) pp. 313– Berlin, Heidelberg 2008) 329 47.24 J. Drugowitsch, A. Barry: A formal framework and 47.5 L.P. Kaelbling, M.L. Littman, A.W. Moore: Rein- extensions for function approximation in learn- forcement learning: A survey, J. Artif. Intell. Res. ing classifier systems, Mach. Learn. 70,45–88 4, 237–285 (1996) (2008) 47.6 R.S. Sutton, A.G. Barto: Reinforcement Learning: 47.25 M.V. Butz: Rule-Based Evolutionary Online Learn- An Introduction (MIT Press, Cambridge 1998) ing Systems: A Principled Approach to LCS Analysis 47.7 J.H. Holland: Properties of the bucket brigade al- and Design (Springer, Berlin, Heidelberg 2006) gorithm, Proc. Int. Conf. Genet. Algorithms Appl. 47.26 B. Widrow, M. Hoff: Adaptive switching circuits, (1985) pp. 1–7 West. Electron. Show Conv. 4, 96–104 (1960) 47.8 D.E. Goldberg: Genetic Algorithms in Search, 47.27 P.-Y. Oudeyer, F. Kaplan, V.V. Hafner: Intrinsic Optimization and Machine Learning (Addison- motivation systems for autonomous mental de- Wesley, Reading 1989) velopment, IEEE Trans. Evol. Comput. 11,265–286 47.9 S.F. Smith: A learning system based on genetic (2007) adaptive algorithms, Ph.D. Thesis (Univ. of Pitts- 47.28 G. Venturini: Adaptation in dynamic environ- burgh, Pittsburgh 1980) ments through a minimal probability of explo- 47.10 K.A. De Jong: An analysis of the behavior of a class ration, from animals to animats 3, Proc. 3rd Int. of genetic adaptive systems, Ph.D. Thesis (Univ. of Conf. Simul. Adapt. Behav. (1994) pp. 371–381 Michigan, Ann Arbor 1975) 47.29 A. Orriols-Puig, E. Bernadó-Mansilla, D.E. Gold- 47.11 L.B. Booker: Intelligent behavior as an adaptation berg, K. Sastry, P.L. Lanzi: Facetwise analysis of XCS to the task environment, Ph.D. Thesis (The Univ. for problems with class imbalances, IEEE Trans. of Michigan, Ann Arbor 1982) Evol. Comput. 13, 1093–1119 (2009) 47.12 S.W. Wilson: Knowledge growth in an artifi- 47.30 M.V. Butz, K. Sastry, D.E. Goldberg: Strong, stable, cial animal, Proc. Int. Conf. Genet. Algorit. Appl. and reliable fitness pressure in XCS due to tour- atE|47 | E Part (1985) pp. 16–23 nament selection, Genet. Program. Evol. Mach. 6, 47.13 S.W. Wilson: Classifier systems and the animat 53–77 (2005) problem, Mach. Learn. 2, 199–228 (1987) 47.31 M.V. Butz, M. Pelikan, X. Llorà, D.E. Goldberg: Au- 47.14 D.E. Goldberg: Computer-aided gas pipeline op- tomated global structure extraction for effective eration using genetic algorithms and rule learn- local building block processing in XCS, Evol. Com- ing, Diss. Abstr. Int. 44,3174B(1983) put. 14, 345–380 (2006) 47.15 K.A. De Jong: Learning with genetic algorithms: 47.32 S.W. Wilson: Generalization in the XCS classifier An overview, Mach. Learn. 3, 121–138 (1988) system, genetic programming 1998, Proc. 3rd Ann. 47.16 K.A. De Jong, W.M. Spears, D.F. Gordon: Using Conf. (1998) pp. 665–674 genetic algorithms for concept learning, Mach. 47.33 M.V. Butz, T. Kovacs, P.L. Lanzi, S.W. Wilson: To- Learn. 13, 161–188 (1993) ward a theory of generalization and learning in 47.17 R.L. Riolo: Bucket brigade performance: I. Long XCS, IEEE Trans. Evol. Comput. 8, 28–46 (2004) sequences of classifiers, Proc. 2nd Int. Conf. 47.34 T. Kovacs: XCS classifier system reliably evolves ac- Genet. Algorithms (ICGA87), ed. by J.J. Grefen- curate, complete, and minimal representations stette (Lawrence Erlbaum Associates, Cambridge for Boolean functions. In: Soft Computing in 1987) pp. 184–195 Engineering Design and Manufacturing,ed.by 47.18 R.E. Smith, H. Brown Cribbs: Is a learning classifier R. Roy, P.K. Chawdhry, R.K. Pant (Springer, Berlin, system a type of neural network?, Evol. Comput. Heidelberg 1997) pp. 59–68 2,19–36(1994) 47.35 P.O. Stalph, X. Llorà, D.E. Goldberg, M.V. Butz: Re- 47.19 J.H. Holland, L.B. Booker, M. Colombetti, source management and scalability of the XCSF M. Dorigo, D.E. Goldberg, S. Forrest, R.L. Riolo, learning classifier system, Theor. Comput. Sci. R.E. Smith, P.L. Lanzi, W. Stolzmann, S.W. Wilson: 425, 126–141 (2012) , , (Il- 2321 4998 Antic- http://archive. , 125–149 (1999) 7 , 282–290 (2007) , 452–473 (2005) UCI Machine Learning 9 , 307–342 (2009) 4399 (Springer, Berlin, Heidel- Data mining in learning , 1818–1831 (2003) 17 Data Mining. Practical Ma- 2724 , 55–67 (2009) (Morgan Kaufmann, San Fran- 1 hniques for a Pittsburgh learning (Univ. of California, School of Informa- chine Learning ToolsImplementations and Techniques withcisco Java 2000) data mining withtionary fine-grained algorithms, parallel Proc.Conf. evolu- (GECCO Genet. 2001) (2001) Evol. pp. Comput. 461–468 instances with18th evolutionary Int. Conf. Mach. algorithms, Learn. (ICML Proc. 2001) (2001) A comparative study of two learning classifier Repository tion and Computer Sciencesics.uci.edu/ml 2013) ipatory Learning Classifier SystemsReinforcementand Learning Factored berg 2009) pp. 321–333 classifier system, Evol. Comput. classifier systems: ComparingLect. XCS Notes Comput. with Sci. GAssist, bit optimization via atribution parallel estimation algorithm, of Proc. dis- Conf. Genet. (GECCO Evol. 2007) (2007) Comput. pp. 577–584 scalability of rule-basedMemet. Comput. evolutionary learning, classifier systems: Comparing XCSliGAL, with Univ. of GAssist Illinois at Urbana-Champign 2004) ensemble tec classifier system, Lect. Notes Comput. Sci. 255–268 (2008) ciency of memeticsystems, Pittsburgh Evol. Comput. learning classifier scent methods in learning classifierproving systems: XCS Im- performance inIEEE multistep Trans. problems, Evol. Comput. match, inherit: Fitnessbased surrogates machine learning for techniques, Proc. genetics- Evol. Genet. Comput. Conf. (GECCO 2007) (2007) pp. 1798– 115–132 (2002) cretizations with adaptiveburgh intervals rule-based learning classifier for system, Lect. aNotes Comput. Pitts- Sci. systems and sixclassification tasks, Lect. other Notes Comput. learning Sci. algorithms on 1805 47.56 X. Llorà, J.M. Garrell: Knowledge independent 47.57 X. Llorà, J.M. Garrell: Inducing47.58 partially-defined E. Bernadó, X. Llorà, J.M. Garrell: XCS and GALE: 47.55 I.H. Witten, E. Frank: 47.54 K. Bache, M. Lichman: 47.68 P.L. Lanzi: An analysis of generalization in the XCS 47.60 J. Bacardit, M.V. Butz: Data mining in learning 47.65 J. Bacardit, E. Burke, N. Krasnogor: Improving the 47.64 K. Sastry, D.E. Goldberg, X. Llorá: Towards billion- 47.62 J. Bacardit, N. Krasnogor: Empirical evaluation of 47.63 J. Bacardit, N. Krasnogor: Performance and effi- 47.61 J. Bacardit, M.V. Butz: 47.69 M.V. Butz, D.E. Goldberg, P.L. Lanzi: Gradient de- 47.66 X. Llorà, K. Sastry, T.-L. Yu, D.E. Goldberg: Do not 47.67 O. Sigaud, M.V. Butz, O. Kozlova, C. Meyer: 47.59 J. Bacardit, J.M. Garrell: Evolving multiple dis- , , ,70–86 , 103–116 , 1st edn. , 355–376 ,209–219 5 12 6471 1813 Kernel Adaptive , 4th edn. (Pren- , 299–336 (2003) , 137–157 (2012) 11 13 Learning with Kernels: (MIT Press, Cambridge 2001) , 144–153 (2002) , 211–234 (2002) 6 1 , 209–238 (2003) , 141–147 (2009) 11 2 Festschrift in honor of John H. Holland Adaptive Filter Theory , 2047–2084 (1998) , 2602–2634 (2005) 10 17 (2010) bians for flexibleand adaptive robotGenet. Program. control, arm Evol. Mach. learning from only local information, Neural Com- put. racy-based learning classifieranalysis, systems: and Models, applications to classificationEvol. tasks, Comput. test generalization in learning classifierEvol. systems, Comput. Learning Classifier Systems in Data Mining Studies in(Springer, Computational Berlin, Heidelberg 2008) Intelligence, Vol. 125 Support Vector Machines,mization, and Beyond Regularization, Opti- Filtering: A Comprehensive Introduction (Wiley, Hoboken 2010) of XCS, Soft Comput. inputs. In: Approximation With XCS:ditions, Hyperellipsoidal Recursive Con- paction, Least IEEE Trans. Squares, Evol. and Comput. Com- (2008) and quadratic prediction in continuousproblems, multistep Lect. Notes Comput. Sci. (2012) the real-valued XCS classifier system,Proc. Evol. Genet. Comput. Conf. (GECCO 2005) (2005) pp. 1835– valued inputs, Evol. Comput. ed.byL.Booker,S.Forrest,M.Mitchell,R.L.Ri- olo (Center for the Study ofArbor Complex 1999) Systems, pp. Ann 111–121 inputs, Lect. Notes Comput. Sci. (2000) tions, Nat. Comput. Function approximationA with comparative LWPR study, and Evol. Comput. XCSF: tice Hall, Upper Saddle River 2002) online learning in high dimensions,put. Neural Com- 1842 47.51 P.O. Stalph, M.V. Butz: Learning local linear Jaco- 47.52 S. Schaal, C.G. Atkeson: Constructive incremental 47.53 E. Bernadó-Mansilla, J.M. Garrell-Guiu: Accu- 47.36 M.V. Butz, P.L. Lanzi: Sequential problems that 47.37 L. Bull, E. Bernadó-Mansilla, J. Holmes (Eds.): 47.38 B. Schökopf, A.J. Smola: 47.40 M.V. Butz, S.W. Wilson: An algorithmic description 47.39 W. Liu, J.C. Principe, S. Haykin: 47.41 S.W. Wilson: Get real! XCS with continuous-valued 47.49 M.V. Butz, P.L. Lanzi, S.W. Wilson: Function 47.50 D. Loiacono, P.L. Lanzi: Recursive least squares 47.47 M.V. Butz: Kernel-based, ellipsoidal conditions in 47.48 C. Stone, L. Bull: For real! XCS with continuous- 47.42 S.W. Wilson: Get real! XCS with continuous-valued 47.43 S.W. Wilson: Classifiersthat approximate func- 47.46 P. Stalph, J. Rubinsztajn, O. Sigaud, M.V. Butz: 47.45 S. Vijayakumar, A. D’Souza, S. Schaal: Incremental 47.44 S. Haykin: Evolutionary Computation

Part E

Part E | 47 980 Learning Classifier Systems References 981

47.70 J. Hurst, L. Bull: Self-adaptation in classifier Genet. Evol. Comput. Conf. (GECCO 2008) (2008) system controllers, Artif. Life Robot. 5, 109–119 pp. 1357–1364 (2001) 47.87 M.V. Butz, G.K.M. Pedersen, P.O. Stalph: Learn- 47.71 J. Hurst, L. Bull: A neural learning classifier sys- ing sensorimotor control structures with XCSF: tem with self-adaptive constructivism for mobile Redundancy exploitation and dynamic control, robot learning, Artif. Life 12, 1–28 (2006) Proc. Genet. Evol. Comput. Conf. (GECCO 2009) 47.72 E.C. Tolman: Cognitive maps in rats and men, Psy- (2009) pp. 1171–1178 chol. Rev. 55,189–208(1948) 47.88 D.M. Wolpert, R.C. Miall, M. Kawato: Internal 47.73 R.L. Riolo: Lookahead planning and latent learn- models in the cerebellum, Trends Cogn. Sci. 2, ing in a classifier system, from animals to an- 338–347 (1998) imats, Proc. 1st Int. Conf. Simul. Adapt. Behav. 47.89 J.G. Fleischer: Neural correlates of anticipation (1991) pp. 316–326 in cerebellum, basal ganglia, and hippocampus, 47.74 W. Stolzmann: Anticipatory classifier systems, Ge- Lect. Notes Comput. Sci. 4520, 19–34 (2007) netic Programming 1998, Proc. 3rd Ann. Conf. 47.90 C.T. Fernando, E. Szathmary, P. Husbands: Se- (1998) pp. 658–664 lectionist and evolutionary approaches to brain 47.75 M.V. Butz: Anticipatory Learning Classifier Systems function: A critical appraisal, Front. Comput. Neu- (Kluwer, Boston 2002) rosci. 6, doi: 10.3389/fncom.2012.00024 (2012) 47.76 M.V. Butz, D.E. Goldberg, W. Stolzmann: The an- 47.91 A. Tomlinson, L. Bull: A corporate XCS, Lect. Notes ticipatory classifier system and genetic general- Comput. Sci. 1813, 195–208 (2000) ization, Nat. Comput. 1,427–467(2002) 47.92 X. Llorà, K. Sastry: Fast rule matching for learn- 47.77 P. Gérard, O. Sigaud: YACS: Combining dynamic ing classifier systems via vector instructions, Proc. programming with generalization in classifier Genet. Evol. Comput. Conf. (GECCO 2006) (2006) systems, Lect. Notes Comput. Sci. 1996,52–69 pp. 1513–1520 (2001) 47.93 M.V. Butz, P.L. Lanzi, X. Llorà, D. Loiacono: An 47.78 P. Gérard, J.-A. Meyer, O. Sigaud: Combining la- analysis of matching in learning classifier sys- tent learning and dynamic programming in MACS, tems, Proc. Genet. Evol. Comput. Conf. (GECCO Eur. J. Oper. Res. 160, 614–637 (2005) 2008) (2008) pp. 1349–1356 47.79 W. Stolzmann, M.V. Butz: Latent learning and ac- 47.94 J.R. Anderson: Rules of the Mind (Lawrence Erl- tion planning in robots with anticipatory classi- baum Associates, Hillsdale 1993) fier systems, Lect. Notes Comput. Sci. 1813,301–317 47.95 G.L. Drescher: Made-Up Minds: A Constructivist (2000) Approach to Artificial Intelligence (MIT Press, 47.80 R.S. Sutton: DYNA: an integrated architecture for Cambridge 1991) learning, planning, and reacting, ACM SIGART Bull. 47.96 A. Newell: Physical symbol systems, Cogn. Sci. 4, 2(4), 160–163 (1991) 135–183 (1980) 47.81 W. Stolzmann, M.V. Butz, J. Hoffmann, D.E. Gold- 47.97 J.R. Anderson, D. Bothell, M.D. Byrne, S. Dou- berg: First cognitive capabilities in the anticipa- glass, C. Lebiere, Y. Qin: An integrated theory of tory classifier system, from animals to animats 6, the mind, Psychol. Rev. 111, 1036–1060 (2004) Proc. 6th Int. Conf. Simul. Adapt. Behav. (2000) 47.98 O. Sigaud, S. Wilson: Learning classifier systems: 47 | E Part pp. 287–296 A survey, soft computing – a fusion of founda- 47.82 M.V. Butz, J. Hoffmann: Anticipations control be- tions, Methodol. Appl. 11, 1065–1078 (2007) havior: Animal behavior in an anticipatory learn- 47.99 L. Bull, T. Kovacs (Eds.): Foundations of Learn- ing classifier system, Adapt. Behav. 10, 75–96 ing Classifier Systems, Stud. Fuzziness and Soft (2002) Comput, Vol. 183 (Springer, Berlin, Heidelberg 47.83 M.V. Butz: Biasing exploration in an anticipatory 2005) learning classifier system, Lect. Notes Comput. Sci. 47.100 L. Bull (Ed.): Applications of Learning Classifier 2321,3–22(2002) Systems (Springer, Berlin, Heidelberg 2004) 47.84 M.V. Butz, D.E. Goldberg: Generalized state values 47.101 L. Bull: On lookahead and latent learning in in an anticipatory learning classifier system. In: Simple LCS, Learn. Classif. Syst. Int. Workshops, Anticipatory Behavior in Adaptive Learning Sys- IWLCS 2006-2007, ed. by J. Bacardit, E. Bernad- tems: Foundations, Theories, and Systems,ed.by Mansilla, M.V. Butz (Springer, Berlin, Heidelberg M.V. Butz, O. Sigaud, P. Gérard (Springer, Berlin, 2008) pp. 154–168 Heidelberg 2003) pp. 282–301 47.102 P:L. Lanzi: xcslib - The XCS Library. http://xcslib. 47.85 B. Siciliano, O. Khatib: Springer Handbook of sourceforge.net/ Robotics (Springer, Berlin, Heidelberg 2007) 47.103 P. O. Stalph, M. V. Butz: Documentation of 47.86 M.V. Butz, O. Herbort: Context-dependent predic- JavaXCSF (COBOSLAB, University of Würzburg, Ger- tions and cognitive arm control with XCSF, Proc. many, Y2009N001 2009)