Real-Time Hebbian Learning from Autoencoder Features for Control Tasks
Total Page:16
File Type:pdf, Size:1020Kb
Real-time Hebbian Learning from Autoencoder Features for Control Tasks Justin K. Pugh1, Andrea Soltoggio2, and Kenneth O. Stanley1 1Dept. of EECS (Computer Science Division), University of Central Florida, Orlando, FL 32816 USA 2Computer Science Department, Loughborough University, Loughborough LE11 3TU, UK [email protected], [email protected], [email protected] Downloaded from http://direct.mit.edu/isal/proceedings-pdf/alife2014/26/202/1901881/978-0-262-32621-6-ch034.pdf by guest on 25 September 2021 Abstract (Floreano and Urzelai, 2000), neuromodulation (Soltoggio et al., 2008), the evolution of memory (Risi et al., 2011), and Neural plasticity and in particular Hebbian learning play an reward-mediated learning (Soltoggio and Stanley, 2012). important role in many research areas related to artficial life. By allowing artificial neural networks (ANNs) to adjust their However, while Hebbian rules naturally facilitate learn- weights in real time, Hebbian ANNs can adapt over their ing correlations between actions and static features of the lifetime. However, even as researchers improve and extend world, their application in particular to control tasks that re- Hebbian learning, a fundamental limitation of such systems quire learning new features in real time is more complicated. is that they learn correlations between preexisting static fea- tures and network outputs. A Hebbian ANN could in principle While some models in neural computation in fact do enable achieve significantly more if it could accumulate new features low-level feature learning by placing Hebbian neurons in over its lifetime from which to learn correlations. Interest- large topographic maps with lateral inhibition (Bednar and ingly, autoencoders, which have recently gained prominence Miikkulainen, 2003), such low-level cortical models gener- feature in deep learning, are themselves in effect a kind of ally require prohibitive computational resources to integrate accumulator that extract meaningful features from their in- puts. The insight in this paper is that if an autoencoder is into real-time control tasks or especially into evolutionary connected to a Hebbian learning layer, then the resulting Real- experiments that require evaluating numerous separate indi- time Autoencoder-Augmented Hebbian Network (RAAHN) viduals. Thus there is a need for a convenient and reliable can actually learn new features (with the autoencoder) while si- feature generator that can accumulate features from which multaneously learning control policies from those new features Hebbian neurons combined with neuromodulation (Soltoggio (with the Hebbian layer) in real time as an agent experiences its environment. In this paper, the RAAHN is shown in a simu- et al., 2008) can learn behaviors in real time. lated robot maze navigation experiment to enable a controller Interestingly, such a feature-generating system already ex- to learn the perfect navigation strategy significantly more of- ten than several Hebbian-based variant approaches that lack ists and in fact has become quite popular through the rise of the autoencoder. In the long run, this approach opens up the deep learning (Bengio et al., 2007; Hinton et al., 2006; Le intriguing possibility of real-time deep learning for control. et al., 2012; Marc’Aurelio et al., 2007): the autoencoder (Hin- ton and Zemel, 1994; Bourlard and Kamp, 1988). Autoen- coders, which can be trained through a variety of algorithms Introduction from Restricted Boltzmann Machines (RBMs) (Hinton et al., As a medium for adaptation and learning, neural plasticity has 2006) to more conventional stochastic gradient descent (Le long captivated artificial life and related fields (Baxter, 1992; et al., 2012) (e.g. similar to backpropagation; Rumelhart et al. Floreano and Urzelai, 2000; Niv et al., 2002; Soltoggio et al., 1986), aim simply to output the same pattern as they input. 2008, 2007; Soltoggio and Jones, 2009; Soltoggio and Stan- By training them to mimic their inputs, they are forced to ley, 2012; Risi and Stanley, 2010; Risi et al., 2011; Risi and learn key features in their hidden layer that efficiently en- Stanley, 2012; Stanley et al., 2003; Coleman and Blair, 2012). code such inputs. In this way, they accumulate such key Much of this body of research focuses on Hebbian-inspired features as they are exposed to more inputs. However, in rules that change the weights of connections in proportion deep learning autoencoders are usually trained for classifica- to the correlation of source and target neuron activations tion tasks through an unsupervised pre-training phase and (Hebb, 1949). The simplicity of such Hebbian-inspired rules in fact recent results have raised doubts on their necessity makes them easy and straightforward to integrate into larger for such tasks anyway (Cires¸an et al., 2012). The idea in systems and experiments, such as investigations into the evo- this paper is that in fact autoencoders can instead be put to lution of plastic neural networks (Floreano and Urzelai, 2000; good use as feature accumulators that work synergistically Soltoggio et al., 2008; Risi et al., 2011). Thus they have en- with neuromodulated Hebbian connections that learn from abled inquiry into such diverse problems as task switching the accumulating features in real time, as an autonomous ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems agent acts in the world. The resulting structure is the Real- an initial step. If Hebbian connections can be trained from time Autoencoder-Augmented Hebbian Network (RAAHN), a a dynamically adjusting autoencoder, then perhaps one day novel algorithm for learning control policies through rewards they will learn in real time from deepening stacked autoen- and penalties in real time. coders or within complex evolved networks that incorporate In short, the key idea behind the RAAHN is that a missing autoencoders as a basic element. While much remains to be ingredient that can reinvigorate the field of Hebbian learning explored, the initial study here thereby hints at what might is the ability of an autonomous agent to learn new features be possible in the future. as it experiences the world. Those new features are then simultaneously the inputs to a neuromodulated Hebbian layer Background that learns to control the agent based on the accumulating This section reviews Hebbian learning in artificial neural features. In effect, the RAAHN realizes the philosophy that networks (ANNs) and the application of autoencoders in real-time reward-modulated learning cannot achieve its full deep learning. Downloaded from http://direct.mit.edu/isal/proceedings-pdf/alife2014/26/202/1901881/978-0-262-32621-6-ch034.pdf by guest on 25 September 2021 potential unless the agent is simultaneously and continually learning to reinterpret and re-categorize its world. Introduc- Hebbian ANNs ing the RAAHN thereby creates the opportunity to build and The plain Hebbian plasticity rule is among the simplest for study such systems concretely. training ANNs: The experiment in this paper is intended as a proof of ∆wi = ηxiy, (1) concept designed to demonstrate that it is indeed possible to where wi is the weight of the connection between two neu- learn autoencoded (and hence unsupervised) features at the rons with activation levels xi and y, and η is the learning same time as Hebbian connections are dynamically adapting rate. As an entirely local learning rule, it is appealing both based both on correlations with the improving feature set and for its simplicity and biological plausibility. For this reason, neuromodulatory reward signals. In the experiment, simu- many researchers have sought to uncover the full extent of lated robots with two kinds of sensors (one to see the walls functionality attainable by ANNs only of Hebbian rules. and the other to see a non-uniform distribution of “crumbs” As researchers have gained insight into such networks, on the floor) are guided through a maze to show them the op- they have also found ways to increase the rule’s sophisti- timal path, after which they are released to navigate on their cation by elaborating on its central theme of strengthening own. However, supervised learning does not take place in the through correlated firing (e.g. Oja 1982; Bienenstock et al. conventional sense during this guided period. Instead, both 1982). Researchers also began to evolve such ANNs in the the autoencoder and the Hebbian connections are adjusting hope of achieving more brain-like functionalities by produc- in real time without supervisory feedback to the autoencoder. ing networks that change over their lifetime (Floreano and Furthermore, to isolate the advantage of the autoencoder, two Urzelai, 2000; Niv et al., 2002; Risi and Stanley, 2010; Risi other variants are attempted – in one the Hebbian connec- et al., 2011; Stanley et al., 2003). tions learn instead from the raw inputs and in the other the One especially important ingredient for Hebbian ANNs Hebbian connections learn from a set of random features is neuromodulation, which in effect allows Hebbian connec- (e.g. somewhat like an extreme learning machine; Huang and tions to respond to rewards and penalties. Neuromodulation Wang 2011). enables the increase, decrease, or reversal of Hebbian plas- The main result is that only the full RAAHN learns the ticity according to feedback from the environment. Models optimal path through the maze in more than a trivial percent- augmented with neuromodulation have been shown