Automatic Generation of Neural Networks with Structured Grammatical Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Generation of Neural Networks with Structured Grammatical Evolution Filipe Assunc¸ao,˜ Nuno Lourenc¸o, Penousal Machado and Bernardete Ribeiro CISUC, Department of Informatics Engineering University of Coimbra, Coimbra, Portugal ffga, naml, machado, [email protected] Abstract—The effectiveness of Artificial Neural Networks II. BACKGROUND AND STATE OF THE ART (ANNs) depends on a non-trivial manual crafting of their In this section we introduce the background concepts related topology and parameters. Typically, practitioners resort to a time consuming methodology of trial-and-error to find and/or adjust with GGP: Grammatical Evolution (GE) (Section II-A) and the models to solve specific tasks. To minimise this burden one Structured Grammatical Evolution (SGE) (Section II-B). In might resort to algorithms for the automatic selection of the most Sections II-C and II-D we review the state of the art in the appropriate properties of a given ANN. A remarkable example field of Evolutionary Artificial Neural Networks (EANNs). of such methodologies is Grammar-based Genetic Programming. This work analyses and compares the use of two grammar- A. Grammatical Evolution based methods, Grammatical Evolution (GE) and Structured Grammatical Evolution (SGE), to automatically design and Grammatical Evolution (GE) is a Genetic Programming configure ANNs. The evolved networks are used to tackle several (GP) algorithm introduced by Ryan et al. [2], where the classification datasets. Experimental results show that SGE is evolved programs are derivations of an a-priori defined able to automatically build better models than GE, and that Backus-Naur Form (BNF) grammar: a notation for represent- are competitive with the state of the art, outperforming hand- ing Context-Free Grammars (CFGs). CFGs are rewriting sys- designed ANNs in all the used benchmarks. tems that are formally defined by a 4-tuple, G = (N; T; P; S), I. INTRODUCTION where: (i) N is the set of non-terminal symbols; (ii) T is the The process of searching for adequate classification models set of terminal symbols; (iii) P is the set of production rules often follows a difficult and time consuming trial-and-error of the form x ::= y, x 2 N and y 2 N [ T ∗; and (iv) S is approach [1]. Moreover, such models tend to behave well the initial symbol. on specific problems and fail to generalise to others where In GE individuals are commonly represented as variable- they were not trained on, i.e., learning is not incremental or length decimal strings, where each integer represents a gram- cumulative. That said, for each problem one wants to solve mar derivation step. A decoding procedure is applied to there is the need to repeat the entire trial-and-error search. each individual, mapping it into a grammar derivation. The To promote learning in an incremental fashion, first there is decoding procedure reads each codon from left to right and the need to develop automatic approaches capable of finding decides which derivation step is to be applied, according to accurate models able to learn specific tasks, so that at a later the following rule: stage those models can be reused to allow modular and cumu- rule = codon MOD non terminal possibilities; lative knowledge acquisition. In order to answer this question, we propose Neural Networks Structured Grammatical Evolu- where non terminal possibilities is the number of possibilities tion (NN-SGE): a novel approach, where we rely on evolution for the expansion of the current non-terminal symbol. If for a to automatically build Artificial Neural Networks (ANNs). It is given non terminal symbol there is only one possibility for its a Grammar-based Genetic Programming (GGP) methodology expansion no integer is consumed. An example of the mapping designed not only for the evolution of the topologies of ANNs procedure can be found in Section 4.1 of [2]. (number of neurons and their connectivity) but also for the When applying the decoding procedure, and due to recursive evolution and tuning of the numerical parameters (weights, grammars, it is common for the number of codons (i.e., bias). We rely on the use of grammars to define the problem integers) to be insufficient. To tackle that, it is possible to domain space, in order to allow a plug-and-play approach, use wrapping, i.e., the genetic material is reused and codons easy to adapt to a wide range of problems and ANN types. reread starting from the first one. However, it has been shown The remainder of the paper is organised as follows. Sec- in the literature that wrapping is not effective [3]. tion II details the background concepts and state of the art methodologies focusing on grammar-based methodologies for B. Structured Grammatical Evolution the evolution of ANN topologies and/or weights optimisation. Motivated by the redundancy and locality issues in GE, In Section III we present NN-SGE, followed by an experimen- Lourenc¸o et al. [4] proposed Structured Grammatical Evolu- tal analysis (Section IV). To end, in Section V, conclusions tion (SGE): a new genotypic representation for GE. Redun- are drawn and future work is addressed. dancy is linked with the modulus mathematical operation used 978-1-5090-4601-0/17/$31.00 ©2017 IEEE in GE’s decoding operation, that allows multiple codons to When applying evolutionary approaches to evolve the generate the same output. Consequently, mutation is affected weights, the topology of the network is predefined and kept because it is likely that a mutation in one of the individual’s fixed during the evolutionary process. To overcome this lim- codons generates no change in its phenotype. Additionally, itation methodologies for automatising the evolution of the when a mutation does impact the phenotype of the individual structure have also been investigated. When focusing on the it is probable that the mutation changes too much of the evolution of the topology there are two possibilities for finding individual structure (low locality), since a modification in one the weights: (i) use a deterministic learning methodology (e.g., of the codons may alter all the derivation procedure from that backpropagation) or (ii) simultaneously evolve the topology point onwards. and weights of the networks. In SGE the genotype of an individual consists of a Regarding the used representation for the evolution of the list of genes, one for each possible non-terminal sym- topology of ANNs it is possible to divide the methodologies bol. Furthermore, a gene is a list of integers of the into two main types: those that use direct encodings (e.g., size of the maximum number of possible expansions Topology-optimization Evolutionary Neural Network [11], for the non-terminal it encodes; each integer is a value NeuroEvolution of Augmenting Topologies [12]) and those in the interval [0; non terminal possibilities − 1], where that use indirect encodings (e.g., Cellular Encoding [13]). As non terminal possibilities is the number of possibilities for suggested, in the first two works the genotype is a direct the expansion of the considered non-terminal symbol. As a representation of the network, and in the latter there is the consequence, it is necessary to pre-process the input grammar need to apply a mapping procedure to transform the genotype so that the maximum number of expansions for each non- into a readable and interpretable network. Focusing on indirect terminal is discovered (further detailed in Section 3.1 of [4]). representations, in the next section we detail grammar-based To deal with recursion, a set of intermediate symbols are approaches for evolving ANNs. created, which are no more than the same production rule D. Grammar-based Evolutionary Artificial Neural Networks replicated a pre-defined number of times. In a sense, the creation of intermediate symbols works as the definition of Over the last years several approaches applying GE to a maximum tree depth, as in standard tree-based GP. The EANNs have been proposed. Tsoulos et al. [14] describe a direct result of the SGE representation is that, on the one methodology that uses a BNF grammar to encode both the hand, the decoding of individuals no longer depends on the network topology and parameters, i.e., input vectors, weights modulus operation and thus its redundancy is removed; on the and bias. The evolved networks are simple Feed-Forward other hand, locality is enhanced, as codons are associated with Neural Networks (FFNNs) with just one hidden-layer. Later, specific non-terminals. motivated by the fact that GE is not suited for the evolution The procedure of mapping an individual from the genotype and tuning of real values, Soltanian et al. [15] introduced GE- to the phenotype is similar to the one found in GE but with BP: an approach similar to the one proposed by Tsoulos et two main differences: (i) as mentioned above, there is no need al., but that uses GE only for the evolution of the topology. to use the modulus operation, and integers are used as in The training of the networks relies on the backpropagation the genotype; (ii) integers are not read sequentially from a algorithm, this way avoiding the optimisation of the weights single list of integers, but are rather read sequentially from and bias. The used BNF grammar is a simpler version of the lists associated to each non-terminal symbol. An example of one used in [14], where all the production rules associated the aforementioned procedure is detailed in Section 3 of [4]. with the encoding of real values are removed. In another approach to try to optimise both the topology and C. Evolution of Artificial Neural Networks weights of FFNNs with just one hidden-layer, Ahmadizar et al. combine GE with a Genetic Algorithm (GA) [16].