Constituent Grammatical Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Constituent Grammatical Evolution Loukas Georgiou and William J. Teahan School of Computer Science, Bangor University Bangor, Wales [email protected] and [email protected] Abstract Grammatical Evolution’s unique features compared to other evolutionary algorithms are the degenerate genetic code We present Constituent Grammatical Evolution which facilitates the occurrence of neutral mutations (vari- (CGE), a new evolutionary automatic program- ous genotypes can represent the same phenotype), and the ming algorithm that extends the standard Gram- wrapping of the genotype during the mapping process which matical Evolution algorithm by incorporating the enables the reuse of the same genotype for the production of concepts of constituent genes and conditional be- different phenotypes. haviour-switching. CGE builds from elementary Grammatical Evolution (GE) takes inspiration from na- and more complex building blocks a control pro- ture. It embraces the developmental approach and draws gram which dictates the behaviour of an agent and upon principles that allow an abstract representation of a it is applicable to the class of problems where the program to be evolved. This abstraction enables the follow- subject of search is the behaviour of an agent in a ing: the separation of the search and solution spaces; the given environment. It takes advantage of the pow- evolution of programs in an arbitrary language; the exis- erful Grammatical Evolution feature of using a tence of degenerate genetic code; and the wrapping that BNF grammar definition as a plug-in component to allows the reuse of the genetic material. describe the output language to be produced by the Before the evaluation of each individual, the following system. The main benchmark problem in which steps take place in Grammatical Evolution: CGE is evaluated is the Santa Fe Trail problem us- i. The genotype (a variable-length binary string) is used ing a BNF grammar definition which defines a to map the start symbol of the BNF grammar definition into search space semantically equivalent with that of terminals. The grammar is used to specify the legal pheno- the original definition of the problem by Koza. types. Furthermore, CGE is evaluated on two additional ii. The GE algorithm reads “codons” of 8 bits and the in- problems, the Loss Altos Hills and the Hampton teger corresponding to the codon bit sequence is used to Court Maze. The experimental results demonstrate determine which form of a rule is chosen each time a non- that Constituent Grammatical Evolution outper- terminal to be translated has alternative forms. forms the standard Grammatical Evolution algo- iii. The form of the production rule is calculated using rithm in these problems, in terms of both efficiency the formula form = codon mod forms where codon is the (percent of solutions found) and effectiveness codon integer value, and forms is the number of alternative (number of required steps of solutions found). forms for the current non-terminal. Grammatical Evolution has been shown to be an effective 1 Background and efficient evolutionary algorithm in a series of both static and dynamic problems [Dempsey et al., 2009]. However, 1.1 Grammatical Evolution there are known issues that still need to be tackled such as Grammatical Evolution [O’Neill and Ryan, 2003] is an evo- destructive crossovers [O’Neill et al., 2003], genotype lutionary algorithm that can evolve complete programs in an bloating [Harper and Blair, 2006], and low locality of geno- arbitrary language using populations of variable-length bi- type-to-phenotype mapping [Rothlauf and Oetzel, 2006]. nary strings. Namely, a chosen evolutionary algorithm 1.2 The Santa Fe Trail Problem (typically a variable-length genetic algorithm) creates and evolves a population of individuals and the binary string The Santa Fe Trail (SFT) is an instance of the Artificial Ant (genome) of each individual determines which production problem and it is standard problem used for benchmarking rules in a Backus Naur Form (BNF) grammar definition are purposes in the areas of Genetic Programming [Koza, 1992; used in a genotype-to-phenotype mapping process to gener- Koza, 1994] and Grammatical Evolution [O’Neill and Ryan, ate a program. According to O’Neill and Ryan [2001], 2001; O’Neill & Ryan, 2003]. Its objective is to find a com- 1261 puter program to control an artificial ant, so that it can find advantage in GE when it is compared against Genetic Pro- all 89 pieces of food located on the grid. gramming in the Santa Fe Trail problem. Furthermore, it has The more general Artificial Ant problem was devised by been shown that during an evolutionary run, GE is not capa- Jefferson et al. [Koza, 1992, p.54] with the Genesys-Tracker ble of finding solutions using the BNF grammar definition it System [Jefferson et al., 1992]. The Santa Fe Trail was de- uses in its benchmark against GP with less than 607 steps. signed by Christopher Langton [Koza, 1992, p.54], and is a When biasing the search space with the incorporation of somewhat more difficult trail than the “John Muir Trail” domain knowledge (namely, suiting a grammar to the prob- originally used in the Artificial Ant problem. Although there lem in question), this is a great advantage not only of are other more difficult trails than the Santa Fe Trail, such Grammatical Evolution but of any grammar-based Evolu- as the “Los Altos Hills” [Koza, 1992, p. 156], it is acknowl- tionary Algorithm. But this is irrelevant when comparing edged as a challenging problem for evolutionary algorithms the actual performance of an Evolutionary Algorithm be- to tackle. The Santa Fe Trail has the following irregularities cause in the case of GE experiments mentioned in the litera- [Koza, 1992, p.54]: single gaps, double gaps, single gaps at ture, it is the human designed bias of the search space that corners, double gaps at corners (short knight moves), and improves the success rate of the algorithm. Consequently, it triple gaps at corners (long knight moves). is not fair to compare GE with GP when semantically dif- The artificial ant of the Santa Fe Trail problem operates ferent search spaces are used. However, if the bias could in a square 32x32 toroidal grid in the plane. The ant starts in emerge or be enforced during the evolution through some the upper left cell of the grid (0, 0) facing east. The trail other mechanism (and not by design) this would be a differ- consists of 144 squares with 21 turns, and the 89 units of ent case; for example, through evolution of the grammar or food are distributed non-uniformly along it. The artificial a different mechanism inspired by nature and biology like ant can use three operations: move, turn-right, turn-left. the one proposed in this work. Each of these takes one time unit. In addition, the artificial ant can use one sensing function: food-ahead. It looks into 2 Constituent Grammatical Evolution the square the ant is currently facing and returns true or false depending upon whether that square contains food or is empty respectively. 2.1 Motivation It has been shown by Langdon and Poli [Robilliard et al., The goal of Constituent Grammatical Evolution (CGE) is to 2006] that Genetic Programming does not improve much on improve Grammatical Evolution in terms of effectiveness pure random search in the Santa Fe Trail. This problem has (percent of solutions found in a total of evolutionary runs) become quite popular as a benchmark in the GP field and is and efficiency (solution quality in terms of the characteris- still repeatedly used [Robilliard et al., 2006]. Hugosson et tics of the problem in question) in agent problems like the al. [2010] mention that the Santa Fe Trail has the features Artificial Ant [Koza, 1992]. The main benchmark problem often suggested in real program spaces – it is full of local in which CGE is first evaluated is the Santa Fe Trail prob- optima and has many plateaus. Furthermore, they note that a lem using a BNF grammar definition which defines a search limited GP schema analysis shows that it is deceptive at all space semantically equivalent with this of the original defi- levels and that there are no beneficial building blocks, lead- nition of the problem by Koza [1992]. Consequently, the ing Genetic Algorithms to choose between them randomly. specific goals are: first, to improve the success rate of evolu- These reasons make the Santa Fe Trail a challenging prob- tionary runs in the Santa Fe Trail problem against standard lem for Evolutionary Algorithms. GE; and second, to find solutions which will require fewer steps than the solutions the GE algorithm is usually able to 1.3 Performance of GE on the Santa Fe Trail find. Robilliard et al. [2006] say that in order to evaluate GP al- Constituent Grammatical Evolution takes inspiration gorithms using the Santa Fe Trail problem as a benchmark from two very elementary concepts: genes and conditional “… the search space of programs must be semantically behaviour switching. It augments the standard Grammatical equivalent to the set of programs possible within the origi- Evolution algorithm by incorporating and utilising these nal SFT definition”. However, it has been argued that concepts in order to improve the performance of the later in Koza’s [Koza, 1992] and GE’s [O'Neill and Ryan, 2003] agent problems. search spaces are not semantically equivalent [Robilliard et The proposed evolutionary algorithm constitutes a condi- al., 2006]. tional behaviour switching evolutionary algorithm, based on Georgiou and Teahan [2010] have proved experimentally the Grammatical Evolution algorithm, and which incorpo- the above claim and furthermore show that Grammatical rates the notion of genes in the individual’s genotype.