Lamarckian Evolution and the Baldwin Effect in Evolutionary Neural

Lamarckian Evolution and the Baldwin Effect in Evolutionary Neural Networks P.A. Castillo1, M.G. Arenas1, J.G. Castellano1, J.J. Merelo1, A. Prieto1, V. Rivas2 and G. Romero1 Resumen— Hybrid neuro-evolutionary algorithms it is possible to implement Lamarckian evolution may be inspired on Darwinian or Lamarckian evolu- in EAs, so that an individual can modify its ge- tion. In the case of Darwinian evolution, the Baldwin effect, that is, the progressive incorporation of learned netic code during or after fitness evaluation (its characteristics to the genotypes, can be observed and “lifetime”). These ideas have been used by sev- leveraged to improve the search. eral researchers with particular success in prob- The purpose of this paper is to carry out an exper- imental study into how learning can improve G-Prop lems where the application of a local search op- genetic search. Two ways of combining learning and erator obtains a substantial improvement (trav- genetic search are explored: one exploits the Baldwin elling salesman problem, Gorges-Schleuter [4], effect, while the other uses a Lamarckian strategy. Merz and Freisleben [5], Ross [6]). In general, Our experiments show that using a Lamarckian operator makes the algorithm find networks with a low hybrid algorithms are nowadays acknowledged error rate, and the smallest size, while using the Bald- as the best solution to a wide array of optimiza- win effect obtains MLPs with the smallest error rate, tion problems. and a larger size, taking longer to reach a solution. • Both approaches obtain a lower average error than Studying the Baldwin effect in hybrid algorithms other BP-based algorithms like RPROP, other evolu- [7], [8], [9], [10], [11]. Some authors have studied tionary methods and fuzzy logic based methods. the Baldwin effect, carrying out a local search on certain individuals to improve their fitness Palabras clave— Evolutionary Algorithms, General- without modifying the genetic code of the indi- ization, Learning, Neural Networks, Optimization, vidual. This is the strategy proposed by Hin- Baldwin Effect, Lamarckian Search ton and Nowlan in [7], who found that learning alters the shape of the search space in which I. Introduction and State of the Art evolution operates and that the Baldwin effect Hybrid algorithms often implement non- allows learning organisms to evolve much faster than their nonlearning equivalents, even though Darwinian ideas, e.g. Lamarckian evolution or the characteristics acquired by the phenotype the Baldwin effect, where learning influences evolution. are not communicated to the genotype. Ack- ley and Littman [10] studied the Baldwin effect Lamarck’s theory states that the characteristics in an artificial life system, obtaining the result an individual acquires during its life are passed to that experiments in which the individuals had the offspring [1]. Thus, the following generation will learning capabilities obtained the best results. inherit any acquired or learned characteristic, this Boers et al. [11] describe a hybrid algorithm to mechanism would be responsible for the evolution of evolve ANN architectures, whose effectivity is species. According to this approach, learning has a arXiv:cs/0603004v1 [cs.NE] 1 Mar 2006 explained with the Baldwin effect, implemented great influence on evolution, since all the characteris- not as a process of learning in the network, but tics learned are passed on to the following generation. changing the network architecture as part of the Nevertheless, Baldwin [2] and Waddington [3] ar- learning process. gued that this influence is limited to the fact that the • Comparative studies of Lamarckian mechanisms individuals with greater learning capacity will adapt and the Baldwin effect in hybrid algorithms. better to the environment, and thus will live longer. Some studies have investigated whether a strat- The longevity they acquire allows them to have more egy based on a hybrid algorithm that takes ad- offspring through time, and propagate their abilities. vantage of the Baldwin effect is better or worse As the number of offspring who have acquired the than one implementing Lamarckian mechanisms ability grows, this characteristic becomes part of the to accelerate the search [12]. The results ob- genetic code. tained are different, and very dependent on the These ideas have previously been used by numer- problem. Gruau and Whitley [13] compared ous researchers in different approaches: Baldwinian, Lamarckian and Darwinian mecha- • Lamarckian mechanisms in hybrid evolutionary nisms implemented in a genetic algorithm that algorithms. Lamarckian theory is today totally evolves ANNs, finding that the first and the discredited from the biological point of view, but second strategies are equally effective for solv- ing their problem. Nevertheless, for another Grupo Geneura 1Departamento de Arquitectura y Tec- nolog´ıa de Computadores. Universidad de Granada. Campus problem, the results obtained by Whitley et al. de Fuentenueva 18071 Granada (Spain). 2Departamento de [14] show that taking advantage of the Bald- Informática. Universidad de Jaén. E.P.S., Avda. Madrid, win effect can find the global optimum, while 35 23071 Jaén (Spain). e-mail: [email protected] URL: http://www.geneura.org a Lamarckian strategy, although faster, usually converges to a local optimum. neurons, which implies greater speed when training On the other hand, results obtained by Ku and and classifying and facilitates its hardware imple- Mak [15] with a GA designed to evolve recurrent mentation. neural networks, show that the use of a Lamar- The classification accuracy or number of hits is ckian strategy implies an improvement of the obtained by dividing the number of hits among the algorithm, while the Baldwin effect does not. In total number of examples in the validating set. The Houck et al. [16] several algorithms are stud- approximation ability is obtained using the normal- ied, and similar conclusions drawn, as in [17], ized mean squared error (NMSE) given by: where a comparison between the Darwinian, Baldwinian and Lamarckian mechanisms, ap- N 2 v i (si − oi) NMSE = uP (1) plied to the 4-cycle problem, is made. u N − 2 t i (si s¯) G-Prop (a genetic evolution of BP trained MLP), P used in this paper to tune learning parameters and where si is the real output for the example i, oi is to set the initial weights and hidden layer size of a the obtained output, ands ¯ is the mean of all the real MLP, searches for the optimal set of weights, the op- outputs. timal topology and learning parameters, using an EA The Lamarckian approach uses no special fitness and Quick-Propagation [18] (QP). In this method no function; instead, a local search genetic operator (QP ANN parameters have to be set by hand; it obviously application) has been designed to improve the indi- needs to set the EA constants, but is robust enough viduals, saving the individual trained weights (ac- to obtain good results under the default parameter quired characteristics) back to the population. settings (all operators applied with the same prob- On the other hand, the Baldwin effect requires ability, 300 generations and 200 individuals in the some type of learning to be applied to the in- population). dividuals, and the changes (trained weights) are This paper carries out a study of the Baldwin ef- not codified back to the population. In order to fect in the G-Prop [19], [20], [21], [22] method to take advantage of the Baldwin effect, the follow- solve pattern classification and function approxima- ing fitness function is proposed: firstly the classi- tion problems. We compare results with those of fication/approximation ability on the validation set other authors, and intend to check the results ob- of the individual before being trained is calculated. tained by Gruau and Whitley [13], i.e., that the use Then it is trained and its ability (after training) is of learning that modifies fitness without modifying calculated. Three criteria are used to decide which is the genetic code improves the task of finding an ANN the best individual: the best MLP is that with higher to solve the problem at hand. classification/approximation ability after training; if We compare the results obtained taking advantage both MLPs show the same accuracy, then the best of the Baldwin effect with those obtained using a is that whose ability before training is higher (the Lamarckian local search mechanism. We will also MLP is more likely to have a high accuracy when compare them with other non-hybrid (RPROP [23]), trained); if both MLPs show the same accuracy be- hybrid algorithms, and those based on fuzzy logic, fore and after training, then the best is the smallest to prove that both versions of G-Prop obtain better one. results (or at least comparable) than other methods, III. Experiments although one of these versions is more likely to be trapped at a local optimum due to the fact that it The algorithm was run for a fixed number of gen- uses a local search genetic operator. erations. When training each individual of the pop- The remainder of this paper is structured as fol- ulation to obtain its fitness, a limit of epochs was lows: Section II presents the new fitness functions de- established. We used 300 generations and 200 indi- signed to determine if the Baldwin effect takes place viduals in the population in every run, and 200 train- in G-Prop. Section III describes the experiments, ing epochs in order to avoid long simulation times Section IV presents the results obtained, followed by and also to avoid overfitted networks, making the EA a brief conclusion in Section V. carry out the search and the training operator refine the solutions. In addition, the number of epochs cho- II. The G-Prop Algorithm sen was much smaller than that necessary to train a single MLP, so that the time taken to find a suitable In this section we will only describe the new fitness network to solve the problem is similar to that would functions designed to determine if the Baldwin ef- be needed to train a MLP (that obtains similar re- fect takes place in G-Prop.

Load more