Towards the Evolution of Multi-Layered Neural Networks: a Dynamic Structured Grammatical Evolution Approach

Towards the Evolution of Multi-Layered Neural Networks: A Dynamic Structured Grammatical Evolution Approach Filipe Assunçao,˜ Nuno Lourenço, Penousal Machado, Bernardete Ribeiro CISUC, Department of Informatics Engineering, University of Coimbra, Portugal ffga,naml,machado,[email protected] ABSTRACT considered: (i) the denition of the topology of the network, i.e., Current grammar-based NeuroEvolution approaches have several number of hidden-layers, number of neurons of each hidden-layer, shortcomings. On the one hand, they do not allow the generation and how should the layers be connected between each other; and of Articial Neural Networks (ANNs) composed of more than one (ii) the choice and parameterisation of the learning algorithm that hidden-layer. On the other, there is no way to evolve networks is used to tune the weights and bias of the network connections with more than one output neuron. To properly evolve ANNs with (e.g., initial weights distribution and learning rate). more than one hidden-layer and multiple output nodes there is the Evolutionary Articial Neural Networks (EANNs) or NeuroEvo- need to know the number of neurons available in previous layers. lution refers to methodologies that aim at the automatic search and In this paper we introduce Dynamic Structured Grammatical Evo- optimisation of the ANNs’ parameters using Evolutionary Compu- lution (DSGE): a new genotypic representation that overcomes the tation (EC). With the popularisation of Deep Learning (DL) and the aforementioned limitations. By enabling the creation of dynamic need for ANNs with a larger number of hidden-layers, NeuroEvo- rules that specify the connection possibilities of each neuron, the lution has been vastly used in recent works [19]. methodology enables the evolution of multi-layered ANNs with e goal of the current work is the proposal of a novel Grammar- more than one output neuron. Results in dierent classication based Genetic Programming (GGP) methodology for evolving the problems show that DSGE evolves eective single and multi-layered topologies, weights and bias of ANNs. We rely on a GGP method- ANNs, with a varying number of output neurons. ology because in this way we allow the direct encoding of dierent topologies for solving distinct problems in a plug-and-play fash- CCS CONCEPTS ion, requiring the user to set a grammar capable of describing the parameters of the ANN to be optimised. One of the shortcomings •Computing methodologies ! Neural networks; Genetic pro- of previous GGP methodologies applied to NeuroEvolution is the gramming; Supervised learning by classication; •eory of com- fact that they are limited to the evolution of network topologies putation ! Genetic programming; with one hidden-layer. e proposed approach, called Dynamic KEYWORDS Structured Grammatical Evolution (DSGE) solves this constraint. e remainder of the document is organised as follows: In Sec- NeuroEvolution, Articial Neural Networks, Classication, Grammar- tion 2 we detail Structured Grammatical Evolution (SGE) and survey based Genetic Programming the state of the art on NeuroEvolution; en, in Sections 3 and 4, ACM Reference format: DSGE and its adaption to the evolution of multi-layered ANNs Filipe Assunçao,˜ Nuno Lourenço, Penousal Machado, Bernardete Ribeiro. are presented, respectively; e results and comparison of DSGE 2017. Towards the Evolution of Multi-Layered Neural Networks: with other GGP approaches is conducted in Section 5; Finally, in A Dynamic Structured Grammatical Evolution Approach. In Proceedings of Section 6, conclusions are drawn and future work is addressed. GECCO ’17, Berlin, Germany, July 15-19, 2017, 8 pages. DOI: hp://dx.doi.org/10.1145/3071178.3071286 2 BACKGROUND AND STATE OF THE ART 1 INTRODUCTION In the following sub-sections we present SGE, which serves as base arXiv:1706.08493v1 [cs.NE] 26 Jun 2017 for the development of DSGE. en, we survey EC approaches for Machine Learning (ML) approaches, such as Articial Neural Net- the evolution of ANNs. works (ANNs), are oen used to learn how to distinguish between multiple classes of a given problem. However, to reach near-optimal classiers a laborious process of trial-and-error is needed to hand- 2.1 Structured Grammatical Evolution cra and tune the parameters of ML methodologies. In the specic Structured Grammatical Evolution (SGE) was proposed by Lourenço case of ANNs there are at least two manual steps that need to be et al. [11] as a new genotypic representation for Grammatical Evo- Permission to make digital or hard copies of all or part of this work for personal or lution (GE) [13]. e new representation aims at solving the redun- classroom use is granted without fee provided that copies are not made or distributed dancy and locality issues in GE, and consists of a list of genes, one for prot or commercial advantage and that copies bear this notice and the full citation for each non-terminal symbol. Furthermore, a gene is a list of inte- on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, gers of the size of the maximum number of possible expansions for to post on servers or to redistribute to lists, requires prior specic permission and/or a the non-terminal it encodes; each integer is a value in the interval fee. Request permissions from [email protected]. »0; non terminal possibilities−1¼, where non terminal possibilities is GECCO ’17, Berlin, Germany © 2017 ACM. 978-1-4503-4920-8/17/07...$15.00 the number of possibilities for the expansion of the considered non- DOI: hp://dx.doi.org/10.1145/3071178.3071286 terminal symbol. Consequently, there is the need to pre-process GECCO ’17, July 15-19, 2017, Berlin, Germany Filipe Assunçao˜ et al. the input grammar to nd the maximum number of expansions for <start> ::= <oat> (0) each non-terminal (further detailed in Section 3.1 of [11]). <oat> ::= <rst>.<second> (0) To deal with recursion, a set of intermediate symbols are created, <rst> ::= 0 (0) j 1 (1) which are no more than the same production rule replicated a j 2 (2) pre-dened number of times (recursion level). e direct result of <second> ::= <digit> <second> (0) the SGE representation is that, on the one hand, the decoding of j <digit> (1) individuals no longer depends on the modulus operation and thus <digit> ::= 0 (0) its redundancy is removed; on the other hand, locality is enhanced, j ::: (:::) j 9 (9) as codons are associated with specic non-terminals. e genotype to phenotype mapping procedure is similar to <start> <oat> <rst> <second> <digit> GE, with two main dierences: (i) the modulus operation is not used; and (ii) integers are not read sequentially from a single list of [0] [0] [1] [0, 0, 1] [2, 5, 9] integers, but are rather read sequentially from lists associated to each non-terminal symbol. An example of the mapping procedure Figure 1: Example of a grammar (top) and of a genotype of is detailed in Section 3 of [11]. a candidate solution (bottom). 2.2 NeuroEvolution Works focusing the automatic generation of ANNs are oen grouped topology and the GA is used for searching the real values (i.e., according to the aspects of the ANN they optimise: (i) evolution weights and bias). e use of a GA to optimise the real values is of the network parameters; (ii) evolution of the topology; and (iii) motivated by the fact that GE is not suited for the evolution and evolution of both the topology and parameters. tuning of real values. For that reason, Soltanian et al. [16] just e gradient descent nature of learning algorithms, such as back- use GE to optimise the topology of the network, and rely on the propagation, makes them likely to get trapped in local optima. One backpropagation algorithm to train the evolved topologies. way to overcome this limitation is by using EC to tune the ANN Although GE is the most common approach for the evolution weights and bias values. Two types of representation are commonly of ANNs by means of grammars it is not the only one. Si et al. used: binary [21] or real (e.g., CoSyNE [6], Gravitational Swarm in [14] present Grammatical Swarm Neural Networks (GSNN): an and Particle Swarm Optimization applied to OCR [4], training of approach that uses Grammatical Swarm for the evolution of the deep neural networks [3]). weights and bias of a xed ANN topology, and thus, EC is used When training ANNs using EC the topology of the network just for the generation of real numbers. In [9], Jung and Reggia is provided and is kept xed during evolution. Nevertheless, ap- detail a method for searching adequate topologies of ANNs based proaches tackling the automatic evolution of the topology of ANNs on descriptive encoding languages: a formal way of dening the have also been proposed. In this type of approaches for nding environmental space and how should individuals be formed; the the adequate weights some authors use a o-the-shelf learning Resilient Backpropagation (RPROP) algorithm is used for training methodology (e.g., backpropagation) or evolve both the weights the ANNs. SGE has also been used for the evolution of the topology and the topology simultaneously. and weights of ANNs [2]. Regarding the used representation for the evolution of the topology of ANNs it is possible to divide the methodologies into two main types: those that use direct encodings (e.g., Topology-optimization 3 DYNAMIC STRUCTURED GRAMMATICAL Evolutionary Neural Network [12], NeuroEvolution of Augmenting EVOLUTION Topologies [17]) and those that use indirect encodings (e.g., Cellular Dynamic Structured Grammatical Evolution (DSGE) is our novel Encoding [8]). In the rst two works the genotype is a direct rep- GGP approach.

Load more