Proceedings of the Twenty-Second International Joint Conference on

Constituent Grammatical Evolution

Loukas Georgiou and William J. Teahan School of Computer Science, Bangor University Bangor, Wales [email protected] and [email protected]

Abstract Grammatical Evolution’s unique features compared to other evolutionary algorithms are the degenerate genetic code We present Constituent Grammatical Evolution which facilitates the occurrence of neutral mutations (vari- (CGE), a new evolutionary automatic program- ous genotypes can represent the same phenotype), and the ming algorithm that extends the standard Gram- wrapping of the genotype during the mapping process which matical Evolution algorithm by incorporating the enables the reuse of the same genotype for the production of concepts of constituent genes and conditional be- different phenotypes. haviour-switching. CGE builds from elementary Grammatical Evolution (GE) takes inspiration from na- and more complex building blocks a control pro- ture. It embraces the developmental approach and draws gram which dictates the behaviour of an agent and upon principles that allow an abstract representation of a it is applicable to the class of problems where the program to be evolved. This abstraction enables the follow- subject of search is the behaviour of an agent in a ing: the separation of the search and solution spaces; the given environment. It takes advantage of the pow- evolution of programs in an arbitrary language; the exis- erful Grammatical Evolution feature of using a tence of degenerate genetic code; and the wrapping that BNF grammar definition as a plug-in component to allows the reuse of the genetic material. describe the output language to be produced by the Before the evaluation of each individual, the following system. The main benchmark problem in which steps take place in Grammatical Evolution: CGE is evaluated is the Santa Fe Trail problem us- i. The genotype (a variable-length binary string) is used ing a BNF grammar definition which defines a to map the start symbol of the BNF grammar definition into search space semantically equivalent with that of terminals. The grammar is used to specify the legal pheno- the original definition of the problem by Koza. types. Furthermore, CGE is evaluated on two additional ii. The GE algorithm reads “codons” of 8 bits and the in- problems, the Loss Altos Hills and the Hampton teger corresponding to the codon bit sequence is used to Court Maze. The experimental results demonstrate determine which form of a rule is chosen each time a non- that Constituent Grammatical Evolution outper- terminal to be translated has alternative forms. forms the standard Grammatical Evolution algo- iii. The form of the production rule is calculated using rithm in these problems, in terms of both efficiency the formula form = codon mod forms where codon is the (percent of solutions found) and effectiveness codon integer value, and forms is the number of alternative (number of required steps of solutions found). forms for the current non-terminal. Grammatical Evolution has been shown to be an effective 1 Background and efficient in a series of both static and dynamic problems [Dempsey et al., 2009]. However, 1.1 Grammatical Evolution there are known issues that still need to be tackled such as Grammatical Evolution [O’Neill and Ryan, 2003] is an evo- destructive crossovers [O’Neill et al., 2003], genotype lutionary algorithm that can evolve complete programs in an bloating [Harper and Blair, 2006], and low locality of geno- arbitrary language using populations of variable-length bi- type-to-phenotype mapping [Rothlauf and Oetzel, 2006]. nary strings. Namely, a chosen evolutionary algorithm 1.2 The Santa Fe Trail Problem (typically a variable-length ) creates and evolves a population of individuals and the binary string The Santa Fe Trail (SFT) is an instance of the Artificial Ant (genome) of each individual determines which production problem and it is standard problem used for benchmarking rules in a Backus Naur Form (BNF) grammar definition are purposes in the areas of [Koza, 1992; used in a genotype-to-phenotype mapping process to gener- Koza, 1994] and Grammatical Evolution [O’Neill and Ryan, ate a program. According to O’Neill and Ryan [2001], 2001; O’Neill & Ryan, 2003]. Its objective is to find a com-

1261 puter program to control an artificial ant, so that it can find advantage in GE when it is compared against Genetic Pro- all 89 pieces of food located on the grid. gramming in the Santa Fe Trail problem. Furthermore, it has The more general Artificial Ant problem was devised by been shown that during an evolutionary run, GE is not capa- Jefferson et al. [Koza, 1992, p.54] with the Genesys-Tracker ble of finding solutions using the BNF grammar definition it System [Jefferson et al., 1992]. The Santa Fe Trail was de- uses in its benchmark against GP with less than 607 steps. signed by Christopher Langton [Koza, 1992, p.54], and is a When biasing the search space with the incorporation of somewhat more difficult trail than the “John Muir Trail” domain knowledge (namely, suiting a grammar to the prob- originally used in the Artificial Ant problem. Although there lem in question), this is a great advantage not only of are other more difficult trails than the Santa Fe Trail, such Grammatical Evolution but of any grammar-based Evolu- as the “Los Altos Hills” [Koza, 1992, p. 156], it is acknowl- tionary Algorithm. But this is irrelevant when comparing edged as a challenging problem for evolutionary algorithms the actual performance of an Evolutionary Algorithm be- to tackle. The Santa Fe Trail has the following irregularities cause in the case of GE experiments mentioned in the litera- [Koza, 1992, p.54]: single gaps, double gaps, single gaps at ture, it is the human designed bias of the search space that corners, double gaps at corners (short knight moves), and improves the success rate of the algorithm. Consequently, it triple gaps at corners (long knight moves). is not fair to compare GE with GP when semantically dif- The artificial ant of the Santa Fe Trail problem operates ferent search spaces are used. However, if the bias could in a square 32x32 toroidal grid in the plane. The ant starts in emerge or be enforced during the evolution through some the upper left cell of the grid (0, 0) facing east. The trail other mechanism (and not by design) this would be a differ- consists of 144 squares with 21 turns, and the 89 units of ent case; for example, through evolution of the grammar or food are distributed non-uniformly along it. The artificial a different mechanism inspired by nature and biology like ant can use three operations: move, turn-right, turn-left. the one proposed in this work. Each of these takes one time unit. In addition, the artificial ant can use one sensing function: food-ahead. It looks into 2 Constituent Grammatical Evolution the square the ant is currently facing and returns true or false depending upon whether that square contains food or is empty respectively. 2.1 Motivation It has been shown by Langdon and Poli [Robilliard et al., The goal of Constituent Grammatical Evolution (CGE) is to 2006] that Genetic Programming does not improve much on improve Grammatical Evolution in terms of effectiveness pure random search in the Santa Fe Trail. This problem has (percent of solutions found in a total of evolutionary runs) become quite popular as a benchmark in the GP field and is and efficiency (solution quality in terms of the characteris- still repeatedly used [Robilliard et al., 2006]. Hugosson et tics of the problem in question) in agent problems like the al. [2010] mention that the Santa Fe Trail has the features Artificial Ant [Koza, 1992]. The main benchmark problem often suggested in real program spaces – it is full of local in which CGE is first evaluated is the Santa Fe Trail prob- optima and has many plateaus. Furthermore, they note that a lem using a BNF grammar definition which defines a search limited GP schema analysis shows that it is deceptive at all space semantically equivalent with this of the original defi- levels and that there are no beneficial building blocks, lead- nition of the problem by Koza [1992]. Consequently, the ing Genetic Algorithms to choose between them randomly. specific goals are: first, to improve the success rate of evolu- These reasons make the Santa Fe Trail a challenging prob- tionary runs in the Santa Fe Trail problem against standard lem for Evolutionary Algorithms. GE; and second, to find solutions which will require fewer steps than the solutions the GE algorithm is usually able to 1.3 Performance of GE on the Santa Fe Trail find. Robilliard et al. [2006] say that in order to evaluate GP al- Constituent Grammatical Evolution takes inspiration gorithms using the Santa Fe Trail problem as a benchmark from two very elementary concepts: genes and conditional “… the search space of programs must be semantically behaviour switching. It augments the standard Grammatical equivalent to the set of programs possible within the origi- Evolution algorithm by incorporating and utilising these nal SFT definition”. However, it has been argued that concepts in order to improve the performance of the later in Koza’s [Koza, 1992] and GE’s [O'Neill and Ryan, 2003] agent problems. search spaces are not semantically equivalent [Robilliard et The proposed evolutionary algorithm constitutes a condi- al., 2006]. tional behaviour switching evolutionary algorithm, based on Georgiou and Teahan [2010] have proved experimentally the Grammatical Evolution algorithm, and which incorpo- the above claim and furthermore show that Grammatical rates the notion of genes in the individual’s genotype. CGE Evolution gives very poor results in the Santa Fe Trail prob- builds from elementary and more complex building blocks a lem when a BNF grammar is used which defines a search control program which dictates the behaviour of an agent. It space semantically equivalent with the search space used in is applicable to the class of problems where the subject of the original problem [Koza, 1992]. Indeed, the same work search is the behaviour of an agent in a given environment, proved experimentally that GE literature [O’Neill and Ryan, like the Santa Fe Trail problem. CGE’s current implementa- 2001; O’Neill and Ryan, 2003] uses a BNF grammar which tion takes advantage of the powerful Grammatical Evolution biases the search space and consequently gives an unfair feature of using a BNF grammar definition as a plug-in component to describe the output language to be produced

1262 by the system (control program). Namely, it utilizes this genes are randomly created which are represented as binary powerful BNF based phenotype-to-genotype feature of GE strings. These genes will be candidates for the pool of the which enables the implementation of the CGE’s unique fea- constituent genes. The minimum length of each string is tures (genes and conditional behaviour switching) in a CMin codons and the maximum length is CMax codons (the straightforward and very flexible way. size in bits of each codon is specified by the GE algorithm; The main GE issues tackled by CGE are the following: usually its value in the GE literature is eight). • decreasing the actual search space without using do- main knowledge of the problem in question and without Set current constituent genes pool P of size S changing the semantics of the search space defined by the Repeat G times BNF grammar compared with the original problem; Create a new empty pool P’ • preventing destructive crossover events of Grammati- Repeat until P’ is full cal Evolution; and Create gene N with genotype size L codons • reducing the bloating of the genotype of the Grammati- Set i = 0 cal Evolution algorithm. While i < E The Constituent Grammatical Evolution algorithm aug- Map genotype of N to its phenotype (wrap W) ments the standard Grammatical Evolution algorithm, in Put N in a random environment location order to tackle the above issues, with the incorporation of Set direction of N randomly three unique features: Set interim fitness value Vi of N to 0 • the conditional behaviour switching approach, which Repeat L times biases the search space toward useful areas; Execute phenotype (program) of N • incorporation of the notion of genes, which tackles the Calculate performance Vi’ of N issue of destructive crossovers and provides useful reusable Set Vi = Vi + Vi’ building blocks; and End Repeat • restriction of the genotype size, which tackles the phe- End While notype bloat phenomenon. Calculate and set fitness value F of gene N Add N to P’ 2.2 CGE Algorithm Description End Repeat The Constituent Grammatical Evolution algorithm uses the Create a pool T of size 2*S following inputs: Add to T the genes of P and P’ 1. Problem Specification (PS): A program which simu- Sort T according the fitness values F of genes lates the problem in question and assigns to each individual Create an empty P’’ and constituent gene a fitness value. Add to P’’ the best S genes of T 2. Language Specification (BNF-LS): A BNF grammar Replace P with P’’ definition which dictates the grammar of the output pro- End Repeat grams of the individuals which are executed by the problem For each gene N in P specification simulator. Add phenotype of N in BNF-BS grammar definition 3. Behaviour Switching Specification (BNF-BS): The End For BNF grammar definition which augments the Language Create population R of individuals Specification by incorporating in the original specification Evolve population R enforcing IMC limit to indi- (BNF-LS) the switching behaviour approach. Namely, the vidual’s genotype BNF-BS specification is a BNF grammar definition which Listing 1: The Constituent Grammatical Evolution algorithm. enforces a conditional check before each agent’s action and which results (in conjunction with the phenotypes of the The fitness function for the evaluation of the candidate constituent genes) to the BNF grammar definition used for constituent genes depends on the problem specification and the evolution of the agents. Using such an approach in the is similar to the fitness function used for the individuals in Artificial Ant problem, for example, the resultant ant control the problem in question. The general concept behind the programs must perform a check (condition) whether there is creation and evaluation of the constituent genes is to find food ahead or not. In this way the created control programs reusable modules from which the genotype of the individual can be expressed as simple finite-state machines and a form will be constructed. For demonstration purposes, the way of memory is incorporated indirectly in the ant’s behaviour. constituent genes are evaluated (described below) is based 4. Grammatical Evolution Algorithm (GE): The genetic on the implementation in the SFT problem. algorithm which is used for the evolution of the population The fitness value F of each gene is calculated as follows. of the individuals. The gene is placed in a random location of the environment The overall description of the Constituent Grammatical with a random heading. Then the gene’s genotype is Evolution algorithm (Listing 1) is as follows. mapped to its phenotype using the language specification Using the CGE specific input parameters (IMC, S, G, L, BNF-LS (BNF grammar definition) and the Grammatical E, CMin, CMax, and W) as defined below, a pool P of con- Evolution mapping formula with W genotype wraps limit. stituent genes is created and filled in the following way. S The gene’s code (the phenotype as for an individual) is exe-

1263 cuted for L times and evaluated to produce an interim fitness evolutionary runs each. Namely, a total of five hundred evo- value. The interim fitness value is calculated E times. Then, lutionary runs were performed for each algorithm. The tab- the final fitness value of the candidate constituent gene is leau in Table 2 shows the GE settings and parameters of calculated with the formula each evolutionary run and the Table 3 shows the CGE spe- cific parameters. The BNF grammars used in these experi-

(1) ments are BNF-BS (Listing 4), BNF-Koza, and BNF- O’Neill [Georgiou and Teahan, 2010]. where: F is the final fitness value of the constituent gene used for comparison with other genes and for deciding Objective Find a computer program in the NetLogo programming whether it will placed in the genes pool or not; N (N 0 E) is language to control an artificial ant so that it can find all 89 pieces of food located on the Santa Fe Trail. the number or interim evaluations with fitness value > 0; Vn 0 Terminal turn-left, turn-right, move, food-ahead, plus constituent is the Interim fitness value of the interim evaluation n (0 n Operators genes when the CGE algorithm is used instead of the 0 N); Vmax is the maximum interim fitness value found; and standard GE algorithm. S is the number of steps performed during this interim Raw Fitness Number of pieces of food picked up before the ant evaluation. times out with 650 operations The genes of each generation are compared with the BNF CGE: BNF-Koza (original) for Genes and BNF-BS Grammar (behaviour switching) for Ants. genes of the previous generation according to their fitness GE using BNF-Koza: BNF-Koza. value and the best S genes of these two generations are GE using BNF-O’Neill: BNF-O’Neill. placed in the pool P replacing the existing genes. This will Evolutionary Steady-State Genetic Algorithm, Generation Gap = 0.9 be repeated for G generations. Finally, the pool P will be Algorithm Selection Mechanism: Roulette-Wheel Selection filled with the best S genes (and their corresponding pheno- Initial Randomly created with the following restrictions: types) created during the G generations. Population Minimum Codons = 15 and Maximum Codons = 25 Parameters Population Size = 500, Maximum Generations = 50 When the genes pool is finally created and filled, the Mutation Prob. = 0.01, Crossover Prob. = 0.9 phenotypes of the constituent genes are added as terminal Codon Size = 8, Wraps Limit = 10 operators in the behaviour switching BNF grammar defini- Table 2: GE Tableau for the Santa Fe Trail. tion (BNF-BS). Then an initial population of individuals is created randomly where each individual consists of a con- Codons Limit, IMC 250 Gene Evaluations, E 50 trol gene which dictates which original terminal operators Gene Pool Size, S 3 Gene Codons Min, CMin 10 and which constituent genes of the BNF-BS grammar defini- Gene Generations, G 850 Gene Codons Max, CMax 20 tion are used and in what order. Finally, the population of Gene Code Iterations, L 10 Gene Max Wraps, W 5 individuals is evolved using the standard Grammatical Evo- Table 3: CGE Settings for the Santa Fe Trail. lution algorithm, enforcing the individual’s genotype size maximum limit IMC after each crossover. N = {behaviour, op} T = {turn-left, turn-right, move, ifelse, 3 Experimental Results food-ahead, [, ]} S = behaviour 3.1 The Santa Fe Trail Problem P: The performance of the Constituent Grammatical Evolution ::= ifelse food-ahead [ ] (CGE) algorithm has been benchmarked against the Gram- [ ] matical Evolution (GE) algorithm using the Santa Fe Trail | ifelse food-ahead [ ][ ] problem. Two distinct benchmarks have been performed for | the following situations: a) when GE uses the standard BNF ::= turn-left | turn-right | move grammar definition mentioned in the Grammatical Evolu- | ... {Constituent Genes Phenotypes}(*) tion literature [O’Neill and Ryan, 2001; O’Neill and Ryan Listing 4: The BNF-BS Grammar Definition for the Santa Fe Trail 2003] for the SFT problem, which defines a search space problem. The phenotype of every constituent gene (*) is added as a biased and not semantically equivalent with that of the production rule in the non-terminal symbol. BNF-BS defines original problem [Koza, 1992], named here BNF-O’Neill; a search space semantically equivalent with that of the original and b) a BNF grammar named BNF-Koza which defines a problem [Koza, 1992]. search space semantically equivalent with the search space of the original problem as it was defined by Koza [1992]. Constituent Grammatical Evolution was successful at When the Grammatical Evolution (GE) algorithm uses the finding a solution in the Santa Fe Trail problem with very BNF-O’Neill grammar, we will refer to it as “GE using high success rates, ranging from 85% to 94% with an aver- BNF-O’Neill”. In the same way, when it uses the BNF- age success rate of 90%. The best solution found by CGE, Koza grammar, we will refer to it as “GE using BNF-Koza”. requires only 337 steps and there is no GP or GE publica- For these benchmarks, three series of experiments were tion in our knowledge presenting a solution requiring less performed to evaluate CGE, GE using BNF-Koza, and GE steps. The NetLogo code for this solution is shown in List- using BNF-O’Neill. In each series of experiments, five dis- ing 5. tinct experiments were conducted consisting of one hundred

1264 ifelse food-ahead The results of the experiments using the standard GE with [move] BNF-Koza and BNF-O’Neill can be seen in Table 7 and in [turn-right Table 8. A cumulative frequency measure of success over ifelse food-ahead 500 runs of CGE and GE can be seen in Figure 9. [ifelse food-ahead The experimental results show that Constituent Gram- [ifelse food-ahead matical Evolution outperforms Grammatical Evolution in [move move] [move] terms of success rate whether the later uses a BNF grammar ifelse food-ahead definition (BNF-Koza) which defines a search space seman- [move move] [move] tically equivalent with that used in the original problem ] [Koza, 1992], or whether it uses a BNF grammar definition [turn-left] (BNF-O’Neill) which defines a biased search space. In con- move trast, Constituent Grammatical Evolution is able to find ] much better solutions in terms of the required steps. [ifelse food-ahead [move] [turn-right] ifelse food-ahead [turn-left] [turn-right ifelse food-ahead [move] [ifelse food-ahead [move] [turn-right] move ] ] ] ] Listing 5: NetLogo code of the best solution (in terms of required steps) found by CGE in the Santa Fe Trail problem.

The Table 6 shows the detailed results of the experiments. The column “Best” shows the best value of all five experi- Figure 9: CGE vs. GE on the Santa Fe Trail problem. ments. “Runs” is the number of evolutionary runs performed in the experiment, “Steps”, the required steps of the best The poor performance of GE using BNF-Koza can be ex- solution found in the particular experiment, “Success”, how plained if we take in account that this grammar definition many evolutionary runs (percentage) found a solution, and defines a search space semantically equivalent with that of “Avg.Suc.”, the average success rate of all five experiments. the original problem, which is very large, making it difficult for GE to find a solution due to its main issues which were Exp #1 Exp #2 Exp #3 Exp #4 Exp #5 Best mentioned in section 1.1. Instead, BNF-O’Neill defines a Runs 100 100 100 100 100 100 smaller search space and biases the search to areas where a Steps 393 375 393 377 337 337 solution can be found more easily (with higher success rate) Success 85% 93% 89% 94% 87% 94% but with the cost of excluding other areas where more effi- Avg.Suc. 90% cient programs could be found (using less steps). This is the Table 6: CGE Experimental Results in SFT. reason why GE using BNF-O’Neill didn’t find solutions with less than 607 steps. In order to support the last argu- Exp #1 Exp #2 Exp #3 Exp #4 Exp #5 Best ment, a series of five new experiments of one hundred evo- lutionary runs each was conducted, setting the steps limit to Runs 100 100 100 100 100 100 606. In these experiments the success rate of GE was 0%, Steps 419 507 415 541 479 415 providing further experimental evidence that GE using Success 8% 11% 10% 6% 13% 13% BNF-O’Neill cannot find solutions with less than 607 steps. Avg.Suc. 10% Furthermore, Georgiou & Teahan [2010] tried to solve Table 7: GE using BNF-Koza Experimental Results in SFT. the Santa Fe Trail problem using random search as the search mechanism of Grammatical Evolution instead of the Exp #1 Exp #2 Exp #3 Exp #4 Exp #5 Best standard steady-state genetic algorithm. Their results show Runs 100 100 100 100 100 100 that using the search space defined by the BNF-O’Neill Steps 609 609 607 609 607 607 grammar definition, GE with random search has a success Success 80% 76% 75% 81% 74% 81% rate of 50% in finding a solution. Instead, the success rate of Avg.Suc. 78% GE with random search using the search space defined by Table 8: GE using BNF-O’Neill Experimental Results in SFT. the BNF-Koza grammar definition is just 1.4%. These re- sults further support the claim that the BNF-O’Neill gram- mar definition defines a different search space than the

1265 BNF-Koza grammar definition, where solutions can be found more easily. Codons Limit, IMC 750 Gene Evaluations, E 100 The promising results of CGE benchmarking on the Santa Gene Pool Size, S 3 Gene Codons Min, CMin 40 Fe Trail problem raises the question whether it can outper- Gene Generations, G 1000 Gene Codons Max, CMax 100 form GE in other problems as well. For this reason, CGE Gene Code Iterations, L 10 Gene Max Wraps, W 10 was applied and benchmarked on two additional problems. Table 11: CGE Settings for the Los Altos Hills problem. The first is a more difficult version of the Santa Fe Trail problem and the second a typical maze search problem. N = {behaviour, op} T = {turn-left, turn-right, move, ifelse, 3.2 The Los Altos Hills Problem food-ahead, [, ]} The Los Altos Hills (LAH) problem is an instance of the S = behaviour Artificial Ant problem and was introduced by Koza [1992, P: p. 156]. The objective of this problem is to find a computer ::= program to control an artificial ant, so that it can find all 157 ifelse food-ahead [][ ] | pieces of food located on a 100x100 toroidal grid. The ant ifelse food-ahead [][ ] | starts in the upper left cell of the grid facing east. The trail ifelse food-ahead [ ][ ] | consists of 221 squares with 29 turns. The artificial ant can ifelse food-ahead [ ][ ] | uses the same operations as with the Santa Fe Trail problem. ifelse food-ahead [ ][ ] | The Los Altos Hills trail is larger and more difficult than ::= turn-left | turn-right | move the Santa Fe Trail. It begins with the same irregularities and | ... {Constituent Genes Phenotypes} in the same order. Then it introduces two new kinds of ir- Listing 12: The BNF-BS Grammar Definition for LAH. regularity that appear toward the end of the trail. In order to evaluate CGE against GE, a series of three ex- periments was conducted (CGE, GE using BNF-Koza, and CGE GE BNF-Koza GE BNF-O’Neill GE using BNF-O’Neill), consisting of 100 evolutionary Runs 100 100 100 runs each using the configuration shown in Table 10 and in Best Solution’s Steps 1093 No solution No solution Table 11. The constituent genes evaluation formula used in Success Rate 9% 0% 0% this problem was the same as that used on the Santa Fe Table 13: Experimental Results of the Los Altos Hills problem. Trail. The BNF-BS grammar definition used in this problem is The experimental results are shown in Table 13. Even shown in Listing 12. Note that a slightly different BNF-BS though the Los Altos Hill is a much more challenging prob- grammar definition is used than that used for the Santa Fe lem than the Santa Fe Trail, CGE managed to find a solution Trail problem. These modifications in the BNF-BS grammar in contrast to GE which wasn’t able to find one. definition were applied because the solution to the Los Altos The main reason why LAH proved to be so difficult for Hills problem (according to its specifications) requires less GE is: first, the two new irregularities introduced which efficient individuals in terms of the fraction of steps and require a more complex behaviour by the ant; and second, food pieces. Namely, the fraction is 19.2 (3000/156) against because these irregularities first appear at the end of the 7.3 (650/89) for the Santa Fe Trail. trail, with the consequence that the evolved population con- verges to programs tackling only the first part of the trail Objective Find a computer program in the NetLogo programming language to control an artificial ant so that it can find which is identical to the Santa Fe Trail. all 157 pieces of food located on the Los Altos Hills trail. 3.3 The Hampton Court Maze Problem Terminal turn-left, turn-right, move, food-ahead, plus constituent The third problem where CGE was benchmarked against Operators genes when the CGE algorithm is used instead of the standard GE algorithm. GE is a typical maze searching problem [Teahan, 2010a]. Raw Fitness Number of pieces of food picked up before the ant The maze used for the experiments is the Hampton Court times out with 3000 operations Maze [Teahan, 2010b; p. 79] which is a simple connected BNF CGE: BNF-Koza (original) for Genes and BNF-BS maze of grid size 39x23 (Figure 14). The goal is to find a Grammar (behaviour switching) for Ants. program guiding the traveller agent from the entry square of GE using BNF-Koza: BNF-Koza. the maze to the centre of the maze. GE using BNF-O’Neill: BNF-O’Neill. Evolutionary Steady-State Genetic Algorithm, Generation Gap = 0.9 As with the LAH, three experiments were conducted of Algorithm Selection Mechanism: Roulette-Wheel Selection one hundred evolutionary runs each in order to compare the Initial Randomly created with the following restrictions: performance of CGE, GE using BNF-Koza, and GE using Population Minimum Codons = 50 and Maximum Codons = 100 BNF-O’Neill. The BNF grammar definitions used in these Parameters Population Size = 2000, Maximum Generations = 50 experiments are variations of those used in the SFT problem Mutation Prob. = 0.01, Crossover Prob. = 0.9 with the following differences: the replacement of the food- Codon Size = 8, Wraps Limit = 10 ahead sensing operator with the wall-ahead? operator, the Table 10: GE Tableau for the Los Altos Hills problem. addition of two sensing operators, wall-left? and wall- right?; and the addition of a non-terminal symbol for choos-

1266 ing any one of these sensing operators when a condition Objective Find a computer program in the NetLogo programming statement is to be selected. The sensing operators were in- language to control an artificial traveller agent so that spired by the work of Sondahl [2005]. it can find the centre of the maze. Terminal turn-left, turn-right, move, wall-ahead?, wall-left?, Operators wall-right?, plus constituent genes when the CGE algorithm is used instead of the standard GE algorithm. Raw Fitness The geometric distance between the agent’s position and the entrance to the centre of the maze before the agent times out with 500 operations, divided by the number of new squares of the path visited. Adjusted 1 / (1 + Raw Fitness) Fitness BNF CGE: BNF-Koza (maze version) for Genes and BNF- Grammar BS (behaviour switching maze version) for travellers (agents). GE using BNF-Koza: BNF-Koza (maze version). GE using BNF-O’Neill: BNF-O’Neill (maze version). Evolutionary Steady-State Genetic Algorithm, Generation Gap = 0.9 Algorithm Selection Mechanism: Roulette-Wheel Selection Initial Randomly created with the following restrictions: Population Minimum Codons = 15 and Maximum Codons = 25 Figure 14: The Hampton Court Maze. Reprinted from [Teahan, Parameters Population Size = 100, Maximum Generations = 25 2010b; p. 79]. Mutation Prob. = 0.01, Crossover Prob. = 0.9 Codon Size = 8, Wraps Limit = 10 The experimental configuration is shown in Tables 15 Table 15: GE Tableau for the Hampton Court Maze problem. and 16. The fitness value of the candidate constituent genes is calculated with the formula Codons Limit, IMC 250 Gene Evaluations, E 50 S CMin Gene Pool Size, 3 Gene Codons Min, 10 (2) Gene Generations, G 500 Gene Codons Max, CMax 20 Gene Code Iterations, L 10 Gene Max Wraps, W 5 where: F is the fitness value of the constituent gene used for Table 16: CGE settings for the Hampton Court Maze problem. comparison with other genes and for deciding whether it will placed in the genes pool or not; E is the number of per- CGE GE BNF-Koza GE BNF-O’Neill formed gene’s interim evaluations; Vn is the Interim fitness Runs 100 100 100 0 0 value of the interim evaluation n (0 n E); Vmax is the Best Solution’s Steps 384 439 494 maximum interim fitness value found; and S is the number Success Rate 82% 1% 1% of operations (time steps) performed during the interim Table 17: Experimental results for the Hampton Court Maze prob- evaluation of the gene with the maximum interim fitness. lem. Each interim fitness value of the candidate constituent genes is calculated with the formula 4 Conclusions

(3) Constituent Grammatical Evolution is a new evolutionary automatic programming algorithm that extends Grammatical where: V is the interim fitness value; DS is the geometric Evolution by incorporating in the original algorithm the distance between the gene and the exit of the maze before concepts of constituent genes and conditional behaviour- the gene executes its code; DP is the geometric distance be- switching. CGE builds from elementary and more complex tween the gene and the exit of the maze after the execution building blocks a control program which dictates the behav- of the gene’s code for L times (Gene Code Iterations); and P iour of an agent and it is applicable to the class of problems is the number of new path squares visited during the gene’s where the subject of search is the conditional behaviour of code execution. an agent in a given environment. It takes advantage of the The results of these experiments are shown in Table 17. powerful Grammatical Evolution feature of using a BNF The Hampton Court Maze problem proved to be difficult for grammar definition as a plug-in component to describe the GE, mainly because the first square the agent visits when it output language to be produced by the system. enters the maze is assigned a high fitness value due its small CGE is able through its conditional behaviour-switching geometric distance from the exit. This leads to convergence feature to focus the search in more useful areas by decreas- of the population to this local optimum. Namely it advances ing the actual search space without using domain knowledge individuals which just execute the move operator. In con- of the problem in question and without changing semanti- trast, CGE proved to be very effective for this problem be- cally the original search space. Indeed, due to the constitu- cause it was able to easily overcome this local optimum due ent genes and genotype size limit features of CGE, the to the constituent genes. Indeed, the solution it found re- Grammatical Evolution issues of destructive crossover quires much fewer steps than those found by GE. events and genotype bloating are tackled respectively.

1267 The experimental results presented in this paper show that [Hugosson et al., 2010] Hugosson, J., Hemberg, E., Braba- CGE outperforms the standard GE algorithm in the Santa Fe zon, A. and O’Neill, M. Genotype Representations in Trail problem whether the later uses a search space semanti- Grammatical Evolution. Applied Soft Computing, 10(1), cally equivalent with that of the original GP problem [Koza, 36-43, 2010 1992], or whether it uses the biased search space of the [Jefferson et al., 1992] Jefferson, D., Collins, R., Cooper, standard Grammatical Evolution system (BNF-O’Neill) C., Dyer, M., Flowers, M., Korf, R., Taylor, C. and used in the Grammatical Evolution literature as was first Wang, A. Evolution as a Theme in Artificial Life: The presented by O’Neill and Ryan [2003]. Namely, CGE Genesys/Tracker System. In Langton, C. G., Taylor, C., achieves better results than the standard Grammatical evolu- Doyne Farmer, J. and Rasmussen, S. (Eds.), Artificial tion algorithm in terms of both efficiency (percent of solu- Life II (pp. 549-578). USA: Addison-Wesley, 1992. tions found) and effectiveness (number of required steps of solutions found). Further experimental results show that [Koza, 1992] Koza, J. R. Genetic Programming: On the CGE outperforms GE in two additional and more difficult Programming of Computers by the Means of Natural Se- problems, the Los Altos Hills and the Hampton Court Maze. lection. Cambridge, MA: MIT Press, 1992. All experiments mentioned in this study have been per- [Koza, 1994] Koza, J. R. Genetic Programming II: Auto- formed using the jGE library [Georgiou and Teahan, 2006a; matic Discovery of Reusable Programs. Cambridge, 2006b; 2008], which is a Java implementation of the MA: MIT Press, 1994. Grammatical Evolution algorithm, and the jGE NetLogo [O’Neill and Ryan, 2001] O’Neill, M. and Ryan, C. Gram- extension, which is a NetLogo extension of the jGE library. matical Evolution. IEEE Transactions on Evolutionary The simulation of the Santa Fe Trail problem is imple- Computation, 5(4), 349–358, 2001. mented in the NetLogo modelling environment. The source code of the jGE library, the jGE NetLogo extension, and the [O’Neill and Ryan, 2003] O’Neill, M. and Ryan, C. Gram- Santa Fe Trail simulation in NetLogo, are freely available matical Evolution: Evolutionary Automatic Program- [Georgiou, 2006; Wilensky, 1999]. ming in an Arbitrary Language. USA: Kluwer, 2003. [O’Neill et al., 2003] O’Neill, M., Ryan, C., Keijzer, M. and References Cattolico, M. Crossover in Grammatical Evolution. Ge- netic Programming and Evolvable Machines 4(1), 67- [Dempsey et al, 2009] Dempsey, I., O’Neill, M. and Braba- zon, A. Foundations in Grammatical Evolution for Dy- 93, 2003. namic Environments. Berlin, Germany: Springer, 2009. [Robilliard et al., 2006] Robilliard, D., Mahler, S., Ver- haghe, D. and Fonlupt, C. Santa Fe Trail Hazards. Lec- [Georgiou, 2006] Georgiou, L. Java GE (jGE) Official Web ture Notes in Computer Science, 3871, 1-12, 2006 Site. School of Computer Science, Bangor University, Available from http://www.bangor.ac.uk/~eep201/jge, [Rothlauf and Oetzel, 2006] Rothlauf, F. and Oetzel, M. On 2006. the Locality of Grammatical Evolution. In Proceedings of the 9th European Conference on Genetic Program- [Georgiou and Teahan, 2006a] Georgiou, L. and Teahan, W. ming, Lecture Notes in Computer Science (vol. 3905), J. jGE – A Java implementation of Grammatical Evolu- tion. 10th WSEAS International Conference on Systems, 320-330. Budapest, Hungary: Springer, 2006. Athens, Greece, July 10-15, 2006. [Sondahl, 2005] Sondahl, F. Genetic Programming Library for NetLogo project. Northwestern University. Available [Georgiou and Teahan, 2006b] Georgiou, L. and Teahan, W. from http://cs.northwestern.edu/~fjs750/netlogo/final, J. Implication of Prior Knowledge and Population Thinking in Grammatical Evolution: Toward a Knowl- 2005. edge Sharing Architecture. WSEAS Transactions on Sys- [Teahan, 2010a] Teahan, W. J. Artificial Intelligence – tems, 5(10), 2338-2345, 2006. Agent Behaviour I. Ventus Publishing ApS, 2010. [Georgiou and Teahan, 2008] Georgiou, L. and Teahan, W. [Teahan, 2010b] Teahan, W. J. Artificial Intelligence – J. Experiments with Grammatical Evolution in Java. Agents and Environments. Ventus Publishing ApS, 2010. Studies in Computational Intelligence, 102, 45-62, 2008. [UCD, 2008] UCD Natural Computing Research & Appli- [Georgiou and Teahan, 2010] Georgiou, L. and Teahan, W. cation Group. Grammatical Evolution in Java (GEVA) J. (2010) Grammatical Evolution and the Santa Fe Trail Official Web Site. Ireland, Dublin. Available from Problem. In Proceedings of the International Conference http://ncra.ucd.ie/geva, 2008. on (ICEC 2010), Valencia, [Wilensky, 1999] Wilensky, U. NetLogo. Evanston, IL: Spain, pages 10-19, October 24-26, 2010. Center for Connected Learning and Computer-Based [Harper and Blair, 2006] Harper, R. and Blair, A. Dynami- Modeling, Northwestern University. Available from cally Defined Functions In Grammatical Evolution. In http://ccl.northwestern.edu/netlogo, 1999. Proceedings of the 2006 IEEE Congress on Evolutionary Computation (IEEE CEC 2006), pp. 2638-2645, 2006

1268