Learning in Large Scale Spiking Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Learning in large-scale spiking neural networks by Trevor Bekolay A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo, Ontario, Canada, 2011 c Trevor Bekolay 2011 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Learning is central to the exploration of intelligence. Psychology and machine learning provide high-level explanations of how rational agents learn. Neuroscience provides low- level descriptions of how the brain changes as a result of learning. This thesis attempts to bridge the gap between these two levels of description by solving problems using ma- chine learning ideas, implemented in biologically plausible spiking neural networks with experimentally supported learning rules. We present three novel neural models that contribute to the understanding of how the brain might solve the three main problems posed by machine learning: supervised learning, in which the rational agent has a fine-grained feedback signal, reinforcement learning, in which the agent gets sparse feedback, and unsupervised learning, in which the agents has no explicit environmental feedback. In supervised learning, we argue that previous models of supervised learning in spiking neural networks solve a problem that is less general than the supervised learning problem posed by machine learning. We use an existing learning rule to solve the general supervised learning problem with a spiking neural network. We show that the learning rule can be mapped onto the well-known backpropagation rule used in artificial neural networks. In reinforcement learning, we augment an existing model of the basal ganglia to im- plement a simple actor-critic model that has a direct mapping to brain areas. The model is used to recreate behavioural and neural results from an experimental study of rats per- forming a simple reinforcement learning task. In unsupervised learning, we show that the BCM rule, a common learning rule used in unsupervised learning with rate-based neurons, can be adapted to a spiking neural network. We recreate the effects of STDP, a learning rule with strict time dependencies, using BCM, which does not explicitly remember the times of previous spikes. The simulations suggest that BCM is a more general rule than STDP. Finally, we propose a novel learning rule that can be used in all three of these simu- lations. The existence of such a rule suggests that the three types of learning examined separately in machine learning may not be implemented with separate processes in the brain. iii Acknowledgements I would like to thank Xuan Choo, Travis DeWolf, Bruce Bobier, Daniel Rasmussen, Terry Stewart, and Charlie Tang for their help getting acquainted with the University of Waterloo, the Center for Theoretical Neuroscience, and the NEF. Extra thanks to Terry and Xuan for continuing help with NEF models and Nengo programming, and for providing some of the figures in this thesis. Some comments on early drafts were provided by Lucy Spardy and Travis DeWolf, to whom I am grateful. Gratitude also goes to my readers, Matthijs van der Meer and Jeff Orchard, for their comments and fruitful discussion. Most of all, I am truly indebted to my supervisor Chris Eliasmith, whose insights are responsible for any good ideas that may have sneaked into this thesis. iv Dedication For my parents, Cathy and David, who would be proud of me if my only publication was on their refrigerator. v Table of Contents List of Figures xi 1 Introduction1 1.1 Thesis organization...............................2 1.2 Thesis goals...................................3 2 Synaptic plasticity4 2.1 Hebbian learning................................6 2.1.1 Long-term potentiation (LTP).....................8 2.1.2 Long-term depression (LTD)...................... 11 2.1.3 Spike-timing dependent plasticity (STDP).............. 12 2.2 Non-Hebbian plasticity............................. 16 2.3 Dopamine modulated plasticity........................ 17 2.4 Explaining behaviour with synaptic strengths................ 18 3 Large-scale neural modelling 20 3.1 Single-neuron models.............................. 20 3.1.1 Leaky integrate-and-fire model..................... 21 3.2 The Neural Engineering Framework...................... 24 3.2.1 Representation............................. 24 3.2.2 Transformation............................. 32 vi 3.3 Plasticity in the NEF.............................. 37 3.3.1 Error minimization rule........................ 37 4 Supervised learning 40 4.1 Supervised learning in traditional artificial neural networks......... 41 4.1.1 Backpropagation............................ 41 4.2 Supervised learning in spiking neural networks................ 44 4.2.1 Temporal-coding based models.................... 46 4.2.2 Biologically plausible backpropagation................ 48 4.3 Supervised learning with the NEF....................... 49 4.3.1 Theoretical argument.......................... 49 4.3.2 Biological plausibility.......................... 52 4.3.3 Simulation results............................ 54 4.3.4 Conclusion................................ 61 5 Reinforcement learning 62 5.1 Reinforcement learning in traditional artificial neural networks....... 63 5.1.1 The agent-environment interface.................... 63 5.1.2 Markov decision processes....................... 64 5.1.3 Value functions............................. 65 5.1.4 Temporal-difference learning...................... 67 5.1.5 Using artificial neural networks.................... 68 5.1.6 Comparison to supervised learning.................. 69 5.2 Reinforcement learning in spiking neural networks.............. 70 5.2.1 Dopamine may encode TD-like reward prediction error....... 70 5.2.2 Previous neural models......................... 73 5.3 Reinforcement learning in the NEF...................... 79 5.3.1 Action selection............................. 79 vii 5.3.2 Critiquing the actor........................... 81 5.3.3 Simulation results............................ 81 5.3.4 Challenges for the current model................... 84 5.3.5 Conclusions............................... 88 6 Unsupervised learning 89 6.1 Unsupervised learning in traditional artificial neural networks....... 90 6.2 Unsupervised learning in spiking neural networks.............. 90 6.2.1 Artola, Br¨ocher, Singer (ABS) rule.................. 91 6.2.2 Bienenstock, Cooper, Munro (BCM) rule............... 92 6.2.3 Spike-timing dependent plasticity rules................ 96 6.2.4 Relationship between BCM and STDP rules............. 98 6.2.5 General unsupervised learning..................... 99 6.3 Unsupervised learning in the NEF....................... 99 6.3.1 Simulation results............................ 100 6.3.2 A unifying learning rule........................ 105 7 Discussion and conclusions 107 7.1 Large-scale unsupervised learning....................... 107 7.2 Model-based reinforcement learning...................... 108 7.3 Supervised error signals............................ 108 7.3.1 Solving the supervised spike-time learning problem......... 109 7.4 Computational complexity........................... 109 7.4.1 Decoder-level learning......................... 110 References 130 viii List of Figures 2.1 Illustration of the main parts of a neuron...................4 2.2 Illustration of the main parts of a synapse...................5 2.3 First experimental evidence of LTP.......................9 2.4 Illustration depicting homosynaptic and associative LTP........... 10 2.5 Illustration depicting heterosynaptic and homosynaptic LTD......... 11 2.6 Evidence that temporal order of pre- and postsynaptic stimulation affects LTP/LTD induction............................... 13 2.7 The classical STDP curve............................ 14 2.8 Five different STDP curves, showing the diversity of STDP in different synapses. Recreated from [1].......................... 15 2.9 Frequency dependence of the STDP protocol................. 17 3.1 Illustration of the bilipid membrane of a neuron. Labelled items refer to elements of the circuit diagram, figure 3.2. Recreated from [58]....... 22 3.2 Circuit diagram that corresponds to the leaky integrate-and-fire (LIF) neu- ron. Recreated from [58]............................. 22 3.3 Membrane voltage of a LIF neuron with constant input J, from [42]..... 23 3.4 Example tuning curves. (Left) Experimentally determined tuning curves, from [137]. (Right) Similar tuning curves for LIF neurons, from [42]..... 28 3.5 Illustration showing how the tuning curves of a population of LIF neurons can be linearly combined to estimate an input signal, x. From [42]..... 30 3.6 Decoding a time-varying scalar signal using a filtered spike train....... 32 ix 3.7 Network structure for computing a non-linear transformation of values en- coded in two separate populations....................... 35 3.8 Illustration showing how the tuning curves of a population of LIF neu- rons can be linearly combined to estimate a nonlinear function, in this case sin(2πx). From [42]............................... 36 4.1 The supervised learning problem........................ 41 4.2 Multi-layer perceptron architecture....................... 42 4.3 Sigmoid curves.................................. 43 4.4 Architecture of a model of the cerebellum that posits that the cerebellum is