Embodied Neuromorphic Vision with Event-Driven Random Backpropagation
Total Page:16
File Type:pdf, Size:1020Kb
1 Embodied Neuromorphic Vision with Event-Driven Random Backpropagation Jacques Kaiser1, Alexander Friedrich1, J. Camilo Vasquez Tieck1, Daniel Reichard1, Arne Roennau1, Emre Neftci2, Rüdiger Dillmann1 Abstract—Spike-based communication between biological neu- rons is sparse and unreliable. This enables the brain to process visual information from the eyes efficiently. Taking inspiration from biology, artificial spiking neural networks coupled with silicon retinas attempt to model these computations. Recent findings in machine learning allowed the derivation of a family of powerful synaptic plasticity rules approximating backpropaga- tion for spiking networks. Are these rules capable of processing real-world visual sensory data? In this paper, we evaluate the performance of Event-Driven Random Back-Propagation (eRBP) at learning representations from event streams provided by a Dynamic Vision Sensor (DVS). First, we show that eRBP matches state-of-the-art performance on the DvsGesture dataset with the addition of a simple covert attention mechanism. By remapping visual receptive fields relatively to the center of the motion, this attention mechanism provides translation invariance at low computational cost compared to convolutions. Second, we successfully integrate eRBP in a real robotic setup, where a robotic arm grasps objects according to detected visual affordances. In this setup, visual information is actively sensed Figure 1: Our robotic setup embodying the synaptic learning by a DVS mounted on a robotic head performing microsaccadic rule eRBP [1]. The DVS is mounted on a robotic head per- eye movements. We show that our method classifies affordances forming microsaccadic eye movements. The spiking network within 100ms after microsaccade onset, which is comparable to is trained (dashed-line connections) in a supervised fashion human performance reported in behavioral study. Our results suggest that advances in neuromorphic technology and plasticity to classify visual affordances from the event streams online. rules enable the development of autonomous robots operating at During training, output neurons and label neurons project to an high speed and low energy consumption. error neuron population, which conveys a random feedback to Index Terms—Neurorobotics, Spiking Neural Networks, Event- hidden layers. The output neurons of the network correspond Based Vision, Learning, Synaptic Plasticity to the four types of affordances: ball-grasp, bottle-grasp, pen- grasp or do nothing. At test time, a Schunk LWA4P arm equipped with a five-finger Schunk SVH gripper performs the I. INTRODUCTION detected reaching and grasping motion. HERE are important discrepancies between the computa- T tional paradigms of the brain and modern computer archi- tectures. Remarkably, evolution discovered a way to learn and arXiv:1904.04805v2 [cs.NE] 6 May 2019 [6], approximations of the gradient can be made without perform energy-efficient computations from large networks of significantly deteriorating the accuracy of the network. These neurons. Particularly, the communication of spikes between approximations allow weight updates to only depend on local neurons is asynchronous, sparse and unreliable. Understanding information available at the synapse, enabling online learning what computations the brain performs and how they are with an efficient neuromorphic hardware implementation. Are implemented on a neural substrate is a global endeavor of these rules capable of efficiently processing complex visual our time. From an engineering perspective, this would enable information like the brain? Synaptic learning rules are rarely the design of autonomous learning robots operating at high evaluated in an embodiment, and often report below state-of- speed for a fraction of the energy budget of current solutions. the-art performance compared to classical methods [7]. Recently, a family of synaptic learning rules building on In this paper, we evaluate the ability of one of these rules groundbreaking machine learning research have been pro- – eRBP [1] – to learn from real visual event-based sensor posed in [1]–[4]. These rules implement variations of Back- data. eRBP was the first synaptic plasticity rule formally Propagation (BP) adapted to spiking networks by approx- derived from Random Backpropagation [5], its deep learning imating the gradient computations. As was shown in [5], equivalent. First, we benchmark eRBP on the popular visual event-based dataset DvsGesture [8] from IBM. We introduce 1FZI Research Center For Information Technology, Karlsruhe, Germany 2Department of Cognitive Sciences, University of California Irvine, Irvine, a covert attention mechanism which further improves the per- USA formance by providing translation invariance without convolu- 2 tions. Second, we integrate the rule in a real-world closed-loop with bmin and bmax the window of the boxcar function. The robotic grasping setup involving a robotic head, arm and a five- weight update for the last layer is an exception, since there finger hand. The spiking network learns to classify different are no random feedback weights. Each time a pre-synaptic types of affordances based on visual information obtained neuron j with a connection to an output neuron k spikes, the with microsaccades, and communicates this information to the weight update of this connection is calculated by: arm for grasping. This real-world task has the potential to ( −ek(t) if bmin < Ik < bmax enhance neuromorphic and neurorobotics research since many ∆wkj(t) / : (3) functional components such as reaching, grasping and depth 0 otherwise perception can be easily segregated and implemented with Since eRBP only uses two comparisons and one addition brain models. This work paves the way towards the integration for each pre-synaptic spike to perform the weight update, of brain-inspired computational paradigms into the field of it allows a real-time, energy-efficient and online learning robotics. implementation running on neuromorphic hardware. This paper is structured as follows. We give a brief in- troduction to the synaptic learning rule eRBP in Section II B. Network Architecture together with the network architecture. We further present We propose a similar network architecture to the one our covert attention mechanism and the microsaccadic eye proposed in [1]. Specifically, the network consists of spiking movements. Our approach is evaluated in Section III, first neurons organized in feedforward layers, performing classifi- on the DvsGesture [8] benchmark, second in our real-world cation with one-hot encoding. In other words, the k output grasping setup. We conclude in Section IV. neurons of the last layer correspond to the k class of the dataset. The error signals ek(t) are encoded in spikes and II. APPROACH provided to all hidden neurons with backward connections A. Event-Driven Random Backpropagation from error neurons. These error spikes are integrated in a dedicated leaking dendritic compartment locally for every eRBP [1] is an interpretation for spiking neurons of feed- learning neuron. The weight of the backward connections are back alignment presented in [5] for analog neurons. Feedback drawn from a random distribution and fixed – they correspond alignment is an approximation of backpropagation, where the to the factors g in Equations (1) and (2). prediction error is fed back to all neurons directly with fixed, ik For each class, there are two error neurons conveying random feedback weights. With a mean-square error loss, the positive and negative error respectively. This is required, update for a hidden weight from neuron j to neuron i can be since spikes are not signed events. A pair of positive and formulated as follows: negative error neurons are connected to the corresponding 0 1 output neuron and label neuron. There is one label neuron for 0 X ∆wij(t) = yjφ @ wij0 (t)yj0 (t)A Ti(t); each class, which is spiking repeatedly during training when a j0 (1) sample of the respective class is presented. Formally, the error X e (t) is approximated as: Ti(t) = ek(t)gik; k k ∼ + − ek(t) = νk (t) − νk (t); with y the output of neuron j, φ the activation function of + P L j νk (t) / νk (t) − νk (t); (4) i e k g neuron , k the prediction error for output neuron and ik ν−(t) / −νP (t) + νL(t); a fixed random feedback weight. This rule is interpreted for k k k P L spiking neurons with yj the spiking output (0 or 1) of neuron j where νk (t); νk (t) are the firing rates of the output neurons + − and φ the function mapping membrane potential to spikes. The and label neurons respectively, and νk (t); νk (t) are the firing P weighted sum of input spikes j0 wij0 (t)yj0 (t) is interpreted rates of the positive and negative error neurons. Within this as the membrane potential Ii of neuron i. eRBP relies on framework, all computations are performed with spiking neu- hard-threshold Leaky Integrate-And-Fire neurons, leading φ rons and communicated as spikes, including the computation to be non-differentiable. Following the approach of surrogate of errors. As in the brain, all synapses are stochastic, with gradients [9], φ0 is approximated as the boxcar function, equal a probability of dropping spikes. The network is depicted in to 1 between bmin and bmax and 0 otherwise. Note that the Figure 1. temporal dynamics of the Leaky Integrate-And-Fire neuron – In this work, the network learns from event streams provided post-synaptic potentials and refractory period – are not taken by a DVS. Since spikes are not signed events, we associate into account in Equation (1). More recent plasticity rules now two neurons for each pixel to convey ON- and OFF-events include an eligibility trace [2]–[4] accounting for some of these separately. This distinction is important since event polarities dynamics. carry the instantaneous information of direction of motion This rule can be efficiently implemented in a spiking (see Figure 4). Since events are generated only upon light network with weight updates triggered by pre-synaptic spikes change, two different setup are analyzed: a dataset where yj as: changes originate from motion in the scene, and a dataset where changes originate from fixational eye movements.