A Tandem Learning Rule for Effective Training and Rapid Inference Of

1 A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural Networks Jibin Wu, Yansong Chua, Malu Zhang, Guoqi Li, Haizhou Li, and Kay Chen Tan Abstract—Spiking neural networks (SNNs) represent the most neural systems evolve the event-driven computation strategy, prominent biologically inspired computing model for neuromor- whereby energy consumption matches with the activity level phic computing (NC) architectures. However, due to the non- of sensory stimuli. differentiable nature of spiking neuronal functions, the standard error back-propagation algorithm is not directly applicable to Neuromorphic computing (NC), as an emerging non-von SNNs. In this work, we propose a tandem learning framework, Neumann computing paradigm, aims to mimic such asyn- that consists of an SNN and an Artificial Neural Network chronous event-driven information processing with spiking (ANN) coupled through weight sharing. The ANN is an auxiliary neural networks (SNNs) in silicon [5]. The novel neuromor- structure that facilitates the error back-propagation for the phic computing architectures, instances include TrueNorth [6] training of the SNN at the spike-train level. To this end, we consider the spike count as the discrete neural representation in and Loihi [7], leverage on the low-power, densely-connected the SNN, and design ANN neuronal activation function that can parallel computing units to support spike-based computation. effectively approximate the spike count of the coupled SNN. The Furthermore, the co-located memory and computation can proposed tandem learning rule demonstrates competitive pattern effectively mitigate the problem of low bandwidth between the recognition and regression capabilities on both the conventional CPU and memory (i.e., von Neumann bottleneck) [8]. When frame-based and event-based vision datasets, with at least an order of magnitude reduced inference time and total synaptic implemented on these neuromorphic architectures, deep SNNs operations over other state-of-the-art SNN implementations. benefit from the best of two worlds: superior classification Therefore, the proposed tandem learning rule offers a novel accuracies and compelling energy efficiency [9]. solution to training efficient, low latency, and high accuracy deep While neuromorphic computing architectures offer attractive SNNs with low computing resources. energy-saving, how to train large-scale SNNs that can operate Index Terms—Deep Spiking Neural Network, Object Recog- efficiently and effectively on these NC architectures remains nition, Event-driven Vision, Efficient Neuromorphic Inference, a challenging research topic. The spiking neurons exhibit a Neuromorphic Computing rich repertoire of dynamical behaviours [10], such as phasic spiking, bursting, and spike frequency adaptation, which sig- I. INTRODUCTION nificantly increase the modeling complexity over the simplified ANNs. Moreover, due to the asynchronous and discontinuous Deep learning has greatly improved pattern recognition nature of synaptic operations within the SNN, the error back- performance by leaps and bounds in computer vision [1], propagation algorithm that is commonly used for the ANN speech processing [2], language understanding [3] and robotics training is not directly applicable to the SNN. [4]. However, deep artificial neural networks (ANNs) are Over the years, a growing number of neural plasticities computationally intensive and memory inefficient, thereby, or learning methods, inspired by neuroscience and machine limiting their deployments in mobile and wearable devices that learning studies, have been proposed for SNNs [11], [12]. The have limited computational budgets. This prompts us to look biological plausible Hebbian learning rules [13] and spike- into energy-efficient solutions. timing-dependent plasticity (STDP) [14] are intriguing local arXiv:1907.01167v3 [cs.NE] 30 Jun 2020 The human brain, with millions of years of evolution, learning rules for computational neuroscience studies and also is incredibly efficient at performing complex perceptual and attractive for hardware implementation with emerging non- cognitive tasks. Although hierarchically organized deep ANNs volatile memory device [15]. Despite their recent successes are brain-inspired, they differ significantly from the biologi- on the small-scale image recognition tasks [16], [17], they are cal brain in many ways. Fundamentally, the information is not straightforward to be used for large-scale machine learning represented and communicated through asynchronous action tasks due to the ineffective task-specific credit assignment and potentials or spikes in the brain. To efficiently and rapidly time-consuming hyperparameter tuning. process the information carried by these spike trains, biological Recent studies [18]–[20] show that it is viable to convert J. Wu, M. Zhang and H. Li are with the Department of Electri- a pre-trained ANN to an SNN with little adverse impacts cal and Computer Engineering, National University of Singapore, (e-mail: on classification accuracy. This indirect training approach as- [email protected], [email protected], [email protected]). sumes that the activation value of analog neurons is equivalent Y. Chua is with the Institute for Infocomm Research, A*STAR, Singapore, (Corresponding author, e-mail: [email protected].) to the average firing rate of spiking neurons, and simply G. Li is with the Center for Brain Inspired Computing Research and Beijing requires parsing and normalizing of weights of the trained Innovation Center for Future Chip, Department of Precision Instrument, ANN. Rueckauer et al. [19] provide a theoretical analysis Tsinghua University, P. R. China., (e-mail: [email protected]) K. C. Tan is with the Department of Computer Science, City University of of the performance deviation of such an approach as well Hong Kong, Hong Kong, (e-mail: [email protected]). as a systematic study on the Convolutional Neural Network 2 (CNN) models for the object recognition task. This conversion propagation, whereby a surrogate gradient can be derived approach achieves the best-reported results for SNNs on many based on the instantaneous membrane potential at each time conventional frame-based vision datasets including the chal- step. In practice, the surrogate gradient learning performs ex- lenging ImageNet-12 dataset [19], [20]. However, this generic ceedingly well for both static and temporal pattern recognition conversion approach comes with a trade-off that has an impact tasks [27]–[30]. By removing the constraints of steady-state on the inference speed and classification accuracy and requires firing rate for rate-based SNN and spike-timing dependency of at least several hundred of inference time steps to reach an temporal-based SNN, the surrogate gradient learning supports optimal classification accuracy. rapid and efficient pattern recognition with SNNs. Additional research efforts are also devoted to training While competitive accuracies were reported on the MNIST constrained ANNs that can approximate the properties of and CIFAR-10 [31] datasets with the surrogate gradient learn- SNNs [21], [22], which allow the trained model to be trans- ing, it is both memory and computationally inefficient to ferred to the target hardware platform seamlessly. Grounded train deep SNNs using BPTT, especially for more complex on the rate-based spiking neuron model, this constrain-then- datasets and network structures. Furthermore, the vanishing train approach transforms the steady-state firing rate of spiking gradient problem [32] that is well-known for vanilla RNNs neurons into a continuous and hence differentiable form that may adversely affect the learning performance for spiking can be optimized with the conventional error back-propagation patterns with long temporal duration. In this paper, to improve algorithm. By explicitly approximating the properties of SNNs the learning efficiency of the surrogate gradient learning, we during the training process, this approach performs better propose a novel learning rule with the tandem neural net- than the aforementioned generic conversion approach when work. As illustrated in Fig.4, the tandem network architecture implemented on the target neuromorphic hardware. consists of an SNN and an ANN that is coupled layer-wise While competitive classification accuracies are shown with with weights sharing. The ANN is an auxiliary structure that both the generic ANN-to-SNN conversion and the constrain- facilitates the error back-propagation for the training of the then-train approaches, the underlying assumption of a rate- SNN at the spike-train level, while the SNN is used to derive based spiking neuron model requires a long encoding time the exact spiking neural representation. This tandem learning window (i.e., how many time steps the image or sample are rule allows rapid, efficient, and scalable pattern recognition presented) or a high firing rate to reach the steady neuronal with SNNs as demonstrated through extensive experimental firing state [19], [21], such that the approximation errors studies. between the pre-trained ANN and the SNN can be eliminated. The rest of this paper is organized as follows: in Sec- This steady-state requirement limits the computational benefits tion II, we formulate the proposed tandem learning frame- that can be acquired from the NC architectures and remain work. In Section III, we evaluate the proposed tandem learn- a major roadblock for applying these methods to real-time ing framework on both

Load more