Evolution Strategies for Deep Neural Network Models Design
Total Page:16
File Type:pdf, Size:1020Kb
J. Hlavácovᡠ(Ed.): ITAT 2017 Proceedings, pp. 159–166 CEUR Workshop Proceedings Vol. 1885, ISSN 1613-0073, c 2017 P. Vidnerová, R. Neruda Evolution Strategies for Deep Neural Network Models Design Petra Vidnerová, Roman Neruda Institute of Computer Science, The Czech Academy of Sciences [email protected] Abstract: Deep neural networks have become the state-of- The proposed algorithm is evaluated both on benchmark art methods in many fields of machine learning recently. and real-life data sets. As the benchmark data we use the Still, there is no easy way how to choose a network ar- MNIST data set that is classification of handwritten digits. chitecture which can significantly influence the network The real data set is from the area of sensor networks for performance. air pollution monitoring. The data came from De Vito et This work is a step towards an automatic architecture al [21, 5] and are described in detail in Section 5.1. design. We propose an algorithm for an optimization of a The paper is organized as follows. Section 2 brings an network architecture based on evolution strategies. The al- overview of related work. Section 3 briefly describes the gorithm is inspired by and designed directly for the Keras main ideas of our approach. In Section 4 our algorithm library [3] which is one of the most common implementa- based on evolution strategies is described. Section 5 sum- tions of deep neural networks. marizes the results of our experiments. Finally, Section 6 The proposed algorithm is tested on MNIST data set brings conclusion. and the prediction of air pollution based on sensor mea- surements, and it is comparedto several fixed architectures and support vector regression. 2 Related Work Neuroevolution techniques have been applied successfully 1 Introduction for various machine learning problems [6]. In classical neuroevolution, no gradient descent is involved, both ar- chitecture and weights undergo the evolutionary process. Deep neural networks (DNN) have become the state-of- However, because of large computational requirements the art methods in many fields of machine learning in recent applications are limited to small networks. years. They have been applied to various problems, in- There were quite many attempts on architecture opti- cluding image recognition, speech recognition, and natural mization via evolutionary process (e.g. [19, 1]) in previous language processing [8, 10]. decades. Successful evolutionary techniques evolving the Deep neural networks are feed-forward neural networks structure of feed-forward and recurrent neural networks with multiple hidden layers between the input and output include NEAT [18], HyperNEAT [17] and CoSyNE [7] al- layer. The layers typically have different units depending gorithms. on the task at hand. Among the units, there are traditional On the other hand, studies dealing with evolution of perceptrons, where each unit (neuron) realizes a nonlin- deep neural networks and convolutional networks started ear function, such as the sigmoid function, or the rectified to emerge only very recently. The training of one DNN linear unit (ReLU). usually requires hours or days of computing time, quite While the learning of weights of the deep neural net- often utilizing GPU processors for speedup. Naturally, work is done by algorithms based on the stochastic gradi- the evolutionary techniques requiring thousands of train- ent descent, the choice of architecture, including a number ing trials were not considered a feasible choice. Never- and sizes of layers, and a type of activation function, is theless, there are several approaches to reduce the overall done manually by the user. However, the choice of archi- complexity of neuroevolution for DNN. Still due to limited tecture has an important impact on the performance of the computational resources, the studies usually focus only on DNN. Some kind of expertise is needed, and usually a trial parts of network design. and error method is used in practice. For example, in [12] CMA-ES is used to optimize hy- In this work we exploit a fully automatic design of perparameters of DNNs. In [9] the unsupervised convo- deep neural networks. We investigate the use of evolu- lutional networks for vision-based reinforcement learning tion strategies for evolution of a DNN architecture. There are studied, the structure of CNN is held fixed and only a are not many studies on evolution of DNN since such ap- small recurrent controller is evolved. However, the recent proach has very high computational requirements. To keep paper [16] presents a simple distributed evolutionary strat- the search space as small as possible, we simplify our egy that is used to train relatively large recurrent network model focusing on implementation of DNN in the Keras with competitive results on reinforcement learning tasks. library [3] that is a widely used tool for practical applica- In [14] automated method for optimizing deep learning tions of DNNs. architectures through evolution is proposed, extending ex- 160 P. Vidnerová, R. Neruda isting neuroevolution methods. Authors of [4] sketch a Algorithm 1 (n,m)-Evolution strategy optimizing real- genetic approach for evolving a deep autoencoder network valued vector and utilizing adaptive variance for each pa- enhancing the sparsity of the synapses by means of special rameter operators. Finally, the paper [13] presents two version of procedure (n,m)-ES an evolutionary and co-evolutionary algorithm for design t 0 ← of DNN with various transfer functions. Initialize population Pt n by randomly generated t t t σt σt vectors x = (x1,...,xN, 1,..., N ) 3 Our Approach Evaluate individuals in Pt while not terminating criterion do In our approach we use evolution strategies to search for i 1,...,m do ← t for optimal architecture of DNN, while the weights are choose randomly a parent xi, learned by gradient based technique. t generate an offspring yi The main idea of our approach is to keep the search by Gaussian mutation: space as small as possible, therefore the architecture spec- for j 1,...,N do ification is simplified. It directly follows the implementa- σ← σ α j′ j (1 + N(0,1)) tion of DNN in Keras library, where networks are defined ← · σ · x′j x j + j′ N(0,1) layer by layer, each layer fully connected with the next end for← · layer. A layer is specified by number of neurons, type of t insert yi to offspring candidate population Pt′ an activation function (all neurons in one layer have the end for same type of an activation function), and type of regular- Deterministically choose Pt+1 as n best individ- ization (such as dropout). uals from Pt′ In this paper, we work only with fully connected feed- Discard Pt and Pt′ forward neural networks, but the approach can be further t t + 1 modified to include also convolutional layers. Then the end while← architecture specification would also contain type of layer end procedure (dense or convolutional) and in case of convolutional layer size of the filter. 4.1 Individuals 4 Evolution Strategies for DNN Design Individuals are coding feed-forward neural networks im- Evolution strategies (ES) were proposed for work with plemented as Keras model Sequential. The model imple- real-valued vectors representing parameters of complex mented as Sequential is built layer by layer, similarly an optimization problems [2]. In the illustration algorithm individual consists of blocks representing individual lay- bellow we can see a simple ES working with n individuals ers. in a population and generating m offspring by means of Gaussian mutation. The environmental selection has two I = ( [size ,drop ,act ,σ size,σ drop] ,..., traditional forms for evolution strategies. The so called 1 1 1 1 1 1 σ size σ drop (n + m)-ES generates new generation by deterministically [sizeH ,dropH ,actH , H , H ]H ), choosing n best individuals from the set of (n + m) par- ents and offspring. The so called (n,m)-ES generates new generation by selecting from m new offspring (typically, where H is the number of hidden layers, sizei m > n). The latter approach is considered more robust is the number of neurons in corresponding layer against local optima premature convergence. that is dense (fully connected) layer, dropi is the Currently used evolution strategies may carry more dropout rate (zero value represents no dropout), meta-parameters of the problem in the individual than just acti relu,tanh,sigmoid,hardsigmoid,linear ∈ { σ size σ drop } a vector of mutation variances. A successful version of stands for activation function, and i and i are evolution strategies, the so-called covariance matrix adap- strategy coefficients corresponding to size and dropout. tation ES (CMA-ES) [12] uses a clever strategy to approx- So far, we work only with dense layers, but the individ- imate the full N N covariance matrix, thus representing ual can be further generalized to work with convolutional a general N-dimensional× normal distribution. Crossover layers as well. Also other types of regularization can be operator is usually used within evolution strategies. considered, we are limited to dropout for the first experi- In our implementation (n,m)-ES (see Alg. 1) is used. ments. Offspring are generated using both mutation and crossover operators. Since our individuals are describing network 4.2 Crossover topology, they are not vectors of real numbers. So our operators slightly differ from classical ES. The more detail The operator crossover combines two parent individuals description follows. and produces two offspring individuals. It is implemented Evolution Strategies for Deep Neural Network Models Design 161 as one-pointcrossover, where the cross-point is on the bor- 4.5 Selection der of a block. Let two parents be The tournamentselection is used, i.e. each turn of the tour- p1 p1 p1 nament k individuals are selected at random and the one I = (B ,B ,...,B ) p1 1 2 k with the highest fitness, in our case the one with the low- p2 p2 p2 Ip2 = (B1 ,B2 ,...,Bl ), est crossvalidation error, is selected.