Simultaneous Unsupervised and Supervised Learning of Cognitive

Simultaneous unsupervised and supervised learning of cognitive functions in biologically plausible spiking neural networks Trevor Bekolay ([email protected]) Carter Kolbeck ([email protected]) Chris Eliasmith ([email protected]) Center for Theoretical Neuroscience, University of Waterloo 200 University Ave., Waterloo, ON N2L 3G1 Canada Abstract overcomes these limitations. Our approach 1) remains functional during online learning, 2) requires only two layers con- We present a novel learning rule for learning transformations of sophisticated neural representations in a biologically plau- nected with simultaneous supervised and unsupervised learn- sible manner. We show that the rule, which uses only infor- ing, and 3) employs spiking neuron models to reproduce cen- mation available locally to a synapse in a spiking network, tral features of biological learning, such as spike-timing de- can learn to transmit and bind semantic pointers. Semantic pointers have previously been used to build Spaun, which is pendent plasticity (STDP). currently the world’s largest functional brain model (Eliasmith et al., 2012). Two operations commonly performed by Spaun Online learning with spiking neuron models faces signifi- are semantic pointer binding and transmission. It has not yet been shown how the binding and transmission operations can cant challenges due to the temporal dynamics of spiking neu- be learned. The learning rule combines a previously proposed rons. Spike rates cannot be used directly, and must be esti- supervised learning rule and a novel spiking form of the BCM mated with causal filters, producing a noisy estimate. When unsupervised learning rule. We show that spiking BCM in- creases sparsity of connection weights at the cost of increased the signal being estimated changes, there is some time lag signal transmission error. We also demonstrate that the com- before the spiking activity reflects the new signal, resulting in bined learning rule can learn transformations as well as the situations during online learning in which the inputs and de- supervised rule and the offline optimization used previously. We also demonstrate that the combined learning rule is more sired outputs are out of sync. Our approach is robust to these robust to changes in parameters and leads to better outcomes sources of noise, while only depending on quantities that are in higher dimensional spaces, which is critical for explaining locally available to a synapse. cognitive performance on diverse tasks. Keywords: synaptic plasticity; spiking neural networks; unsu- Other techniques doing similar types of learning in spik- pervised learning; supervised learning; Semantic Pointer Ar- chitecture; Neural Engineering Framework. ing neural networks (e.g., SpikeProp [Bohte, Kok, & Poutre, 2002], ReSuMe [Ponulak, 2006]) can learn only simple op- In this paper, we demonstrate learning of cognitively rele- erations, such as learning to spike at a specific time. Oth- vant transformations of neural representations online and in ers (e.g., SORN [Lazar, Pipa, & Triesch, 2009], reservoir a biologically plausible manner. We improve upon a tech- computing approaches [Paugam-Moisy, Martinez, & Bengio, nique previously presented in MacNeil and Eliasmith (2011) 2008]) can solve complex tasks like classification, but it is not by combining their error-minimization learning rule with an clear how these approaches can be applied to a general cogni- unsupervised learning rule, making it more biologically plau- tive system. The functions learned by our approach are com- sible and robust. plex and have already been combined into a general cognitive There are three weaknesses with most previous attempts at system called the Semantic Pointer Architecture (SPA). Pre- combining supervised and unsupervised learning in artificial viously, the SPA has been used to create Spaun, a brain model neural networks (e.g., Backpropagation [Rumelhart, Hinton, made up of 2.5 million neurons that can do eight diverse tasks & Williams, 1986], Self-Organizing Maps [Kohonen, 1982], (Eliasmith et al., 2012). Spaun accomplishes these tasks by Deep Belief Networks [Hinton, Osindero, & Teh, 2006]). transmitting and manipulating semantic pointers, which are These approaches 1) have explicit offline training phases compressed neural representations that carry surface seman- that are distinct from functional use of the network, 2) re- tic content, and can be decompressed to generate deep se- quire many layers with some layers connected with super- mantic content (Eliasmith, in press). Semantic pointers are vised learning and others with unsupervised learning, and 3) composed to represent syntactic structure using a “binding” use non-spiking neuron models. The approach proposed here transformation, which compresses the information in two semantic pointers into a single semantic pointer. Such repre- BCM Bienenstock, Cooper, Munro learning rule; Eq (7) sentations can be “collected” using superposition, and col- hPES Homeostatic Prescribed Error Sensitivity; Eq (9) lections can participate in further bindings to generate deep NEF Neural Engineering Framework; see Theory structures. Spaun performs these transformations by using PES Prescribed Error Sensitivity; Eq (6) the Neural Engineering Framework (NEF; Eliasmith & An- SPA Semantic Pointer Architecture; see Theory derson, 2003) to directly compute static connection weights STDP Spike-timing dependent plasticity (Bi & Poo, 2001) between populations. We show that our approach can learn to transmit, bind, and classify semantic pointers. 169 Theory Transforming semantic pointers through connection Cognitive functions with spiking neurons weights The encoders and decoders used to represent semantic pointers also enable arbitrary transformations (i.e., In order to characterize cognitive functions at the level of mathematical functions) of encoded semantic pointers. If spiking neurons, we employ the methods of the Semantic population A encodes pointer X, and we want to connect it to Pointer Architecture (SPA), which was recently used to create population B, encoding pointer Y, a feedforward connection the world’s largest functional brain model (Spaun; Eliasmith with the following connection weights transmits that seman- et al., 2012), able to perform perceptual, motor, and cogni- tic pointer, such that Y ≈ X. tive functions. Cognitive functions in Spaun include work- ing memory, reinforcement learning, syntactic generalization, wi j = a je jdi; (5) and rule induction. The SPA is implemented using the principles of the Neu- where i indexes the presynaptic population A, and j indexes ral Engineering Framework (NEF; Eliasmith & Anderson, the postsynaptic population B. Other linear transformations 2003), which defines methods to 1) represent vectors of num- are implemented by multiplying di by a linear operator. Non- bers through the activity of populations of spiking neurons, 2) linear transformations are implemented by solving for a new transform those representations through the synaptic connec- set of decoding weights. This is done by minimizing the dif- tions between those populations, and 3) incorporate dynam- ference between the decoded estimate of f (x) and the actual ics through connecting populations recurrently. In the case f (x), rather than just x, in Equation (4). of learning, we exploit NEF representations in order to learn transformations analogous to those that can be found through Supervised learning: PES rule the NEF’s methods. MacNeil and Eliasmith (2011) proposed a learning rule that minimizes the error minimized in Equation (4) online. Representing semantic pointers in spiking neurons Rep- resentation in the NEF is similar to population coding, as pro- Ddi = kEai posed by Georgopoulos, Schwartz, and Kettner (1986), but Dw = ka e · Ea ; (6) extended to n-dimensional vector spaces. Each population of i j j j i neurons represents a point in an n-dimensional vector space where k is a scalar learning rate, E is a vector representing over time. Each neuron in that population is sensitive to a the error we wish to minimize, and other terms are as before. direction in the n-dimensional space, which we call the neu- When put in terms of connections weights (wi j), the rule ron’s encoder. The activity of a neuron can be expressed as resembles backpropagation. The quantity a je j · E is analogous to the local error term d in backpropagation (Rumelhart a = G[ae · x + J ]; (1) bias et al., 1986); they are both a means of translating a global where G[·] is the nonlinear neural activation function, a is error signal to a local error signal that can be used to change a scaling factor (gain) associated with the neuron, e is the an individual synaptic connection weight. The key difference between this rule and backpropagation is that the global-to- neuron’s encoder, x is the vector to be encoded, and Jbias is the background current of the cell. local mapping is done by imposing the portion of the error The vector (i.e., semantic pointer) that a population repre- vector space each neuron is sensitive to via its encoder. This sents can be estimated from the recent activity of the popu- limits flexibility, but removes the dependency on global infor- lation. The decoded estimate, xˆ, is the sum of the activity of mation, making the rule biologically plausible. We will refer each neuron, weighted by an n-dimensional decoder. to Equation (6) as the Prescribed Error Sensitivity (PES) rule. Unsupervised learning: Spiking BCM rule xˆ(t) = ∑diai(t); (2) i

Simultaneous Unsupervised and Supervised Learning of Cognitive

Arxiv:2107.04562V1 [Stat.ML] 9 Jul 2021 ¯ ¯ Θ∗ = Arg Min `(Θ) Where `(Θ) = `(Yi, Fθ(Xi)) + R(Θ)

Training Plastic Neural Networks with Backpropagation

The Neocognitron As a System for Handavritten Character Recognition: Limitations and Improvements

Connectionist Models of Cognition Michael S. C. Thomas and James L

Stochastic Gradient Descent Learning and the Backpropagation Algorithm

The Hebbian-LMS Learning Algorithm

Artificial Neural Networks Supplement to 2001 Bioinformatics Lecture on Neural Nets

A Unified Framework of Online Learning Algorithms for Training

Big Idea #3: Learning Key Insights Explanation

Using Reinforcement Learning to Learn How to Play Text-Based Games

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

Recent Advances in the Deep CNN Neocognitron