Exploiting Neuron and Synapse Filter Dynamics in Spatial Temporal Learning of Deep Spiking Neural Network

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Exploiting Neuron and Synapse Filter Dynamics in Spatial Temporal Learning of Deep Spiking Neural Network Haowen Fang , Amar Shrestha , Ziyi Zhao and Qinru Qiu Syracuse University fhfang02, amshrest, [email protected], [email protected] Abstract spatial temporal patterns with an ability superior to conventional artificial neural network (ANN) [Wu et al., 2018b]. The recently discovered spatial-temporal informa- The potential of SNNs has not been fully explored. First of tion processing capability of bio-inspired Spiking all, due to the lack of unified and robust training algorithms, neural networks (SNN) has enabled some interest- the performance of SNNs is still not comparable with deep ing models and applications. However designing neural networks (DNN). Directly adapting backpropagation large-scale and high-performance model is yet a is not feasible because their output is a sequence of Dirac challenge due to the lack of robust training algo- delta functions, hence is non-differentiable. Secondly, most rithms. A bio-plausible SNN model with spatial- SNN models and training algorithms use rate coding, repre- temporal property is a complex dynamic system. senting a numerical value in DNN by spike counts in a time Synapses and neurons behave as filters capable of window, and consider only the statistics of spike activities. preserving temporal information. As such neuron Temporal structure of spike train and spike timing also con- dynamics and filter effects are ignored in existing vey information [Mohemmed et al., 2012]. Spike trains with training algorithms, the SNN downgrades into a similar rates may have distinct temporal patterns representing memoryless system and loses the ability of tem- different information. To detect the temporal pattern in the poral signal processing. Furthermore, spike tim- spike train, novel synapse and neuron models with temporal ing plays an important role in information repre- dynamics are needed. However, synapse dynamics are often sentation, but conventional rate-based spike coding ignored in the computational models of SNNs. models only consider spike trains statistically, and To address the problem with non-differentiable neuron out- discard information carried by its temporal struc- put, one approach is to train an ANN such as a multi-layer tures. To address the above issues, and exploit the perceptron (MLP) and convert the model to an SNN. This temporal dynamics of SNNs, we formulate SNN as method is straightforward, but it requires additional fine- a network of infinite impulse response (IIR) filters tuning of weights and thresholds [Diehl et al., 2015]. There with neuron nonlinearity. We proposed a training are also works that directly apply backpropagation to SNN algorithm that is capable to learn spatial-temporal training by approximating the gradient of the spiking func- patterns by searching for the optimal synapse fil- tion [Lee et al., 2016; Esser et al., 2015; Shrestha et al., ter kernels and weights. The proposed model and 2019], or utilizing gradient surrogates [Wu et al., 2018b; training algorithm are applied to construct associa- Shrestha and Orchard, 2018]. Other approaches include us- tive memories and classifiers for synthetic and pub- ing derivatives of soft spike [Neftci et al., 2019] or membrane lic datasets including MNIST, NMNIST, DVS 128 potential [Zenke and Ganguli, 2018]. etc. Their accuracy outperforms state-of-the-art approaches. The ability of capturing temporal patterns relies on neuron and synapse dynamics [Gutig¨ and Sompolinsky, 2006]. Synapse function can be modeled as filters, whose states pre- 1 Introduction serve rich temporal information. The challenge is how to cap- ture the dependencies between the current SNN states and Spiking neural networks have demonstrated their capability previous input spikes. This challenge has been addressed in signal processing and pattern detection by mimicking the by some existing works. [Gutig¨ and Sompolinsky, 2006] behavior of biological neural systems. In SNNs, informa- and [Gutig,¨ 2016] train individual neuron to classify different tion is represented by sparse and discrete spike events. The temporal spike patterns. [Mohemmed et al., 2012] is capable sparsity of spike activities can be exploited by event driven to train neurons to associate an input spatial temporal pattern implementation for energy efficiency. In a more bio-realistic with a specific output spike pattern. However, the aforemen- neuron and synapse model, each neuron is a dynamic system, tioned works cannot be extended to multiple layers and there- which is capable of spatial temporal information processing. fore are not scalable. Some recent works utilize backpropaga- The network made of such neurons can memorize and detect tion through time (BPTT) to address the temporal dependency 2799 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) problems. [Wu et al., 2018b] proposed simplified iterative The ODE system is linear time invariant (LTI). It can also leaky integrate and fire (LIF) neuron model. [Gu et al., 2019] be interpreted as the convolution of an impulse response of a derived an iterative model from a current based LIF neu- filter with the input spike train, which leads to the spike re- ron. Based on the iterative model, network can be unrolled sponse model [Gerstner et al., 2014]. The relation between hence BPTT is possible. However, these works only consider the v(t), O(t) and the historical spike input can clearly be the temporal dynamics of membrane potential, the synapse seen in the spike response model. We denote the input spike dynamics and the filter effect of SNN are ignored. There trains as a sequence of time-shifted Dirac delta functions, are also works that introduced the concept of IIR and FIR P j j Si(t) = δ(t − ti ), where ti denotes the jth spike ar- [ j into Multi Layer Perceptron (MLP) Back and Tsoi, 1991; rival time from the ith input synapse. Similarly, output spike Campolucci et al., 1999], which enabled MLP to model time train can be defined as O(t) = P δ(t − tf ); tf 2 ftf : f series. v(t ) = Vthg. To simplify the discussion, we consider only In this work, our contributions are summarized as follows: one synapse. The impulse response kernel k(t) of a neuron 1. The dynamic behavior of LIF neuron is formulated by described by above ODE system is obtained by passing a sin- infinite impulse response (IIR) filters. We exploit the gle spike at time 0 at the input, such that the initial condi- synapse and neuron filter effect, derive a general repre- tions are x(0) = 1 and v(0) = 0. By solving equation 1a η −t −t sentation of SNN as a network of IIR filters with neuron and 1b, we have k(t) = η η−1 (e τm − e τs ). Given the gen- non-linearity. eral input S(t), PSP is the convolution of k(t) and S(t). For 2. A general algorithm is proposed to train such SNN to a neuron with M synapses, without reset, the sub-threshold learn both rate-based and spatial temporal patterns. The membrane potential is the summation of all PSPs, such that PM R 1 algorithm does not only learn the synaptic weight, but v(t) = i wi 0 k(s)Si(t − s)ds. is also capable to optimize the impulse response kernel In hybrid model, the reset is modeled by simply setting v of synapse filters to improve convergence. The similar to vrest, and regarding the reset as the start of the next evalua- learning behavior has been discovered in biological sys- tion and discarding the neuron’s history information. A more tems [Hennig, 2013]. Our training algorithm can be ap- biological way is to treat reset as a negative current impulse plied to train simple LIF, and neurons with more com- applied to the neuron itself [Gerstner et al., 2014]. The re- plex synapses such as alpha synapse, dual-exponential −t set impulse response is h(t) = −Vthe τr , where τr controls synapse etc. the decay speed of reset impulse. Such that the membrane 3. Our algorithm is tested on various datasets includ- potential is the summation of all PSPs and reset voltage: ing MNIST, neuromorphic MNIST, DVS128 gesture, Z 1 M Z 1 TIDIGITS and Australian Sign Language dataset, and X v(t) =− h(t)O(t− s)ds+ wi k(s)Si(t− s)ds (2) outperform state of the art approaches. 0 i 0 Treating reset as a negative impulse enables adaptive 2 Neuron Model threshold, which is observed in biological neurons. Neu- Without loss of generality, we consider a LIF neuron with ron’s threshold depends on its prior spike activity. With dual exponential synapse for its biological plausibility. The adaptation, frequent spike activity increases the reset volt- neuron can be described as a hybrid system, i.e. the mem- age, which inhibits the neuron activity, preventing SNNs from brane potential and synapse status evolve continuously over over-activation. Such that additional tuning methods such as [ et al. ] time, depicted by ordinary differential equations (ODE), weight-thresholds balancing Diehl , 2015 is not neces- while a spike event triggers the update of the state variables sary. as the following [Brette et al., 2007]: Above equations reveal the filter nature of the biologically realistic neuron model. Each synapse act like a low pass fil- M ter. Synapse filter is causal, and the kernel is defined to decay dv(t) η X over time, hence the current state of the PSP is determined by τm = −(v(t) − vrest) + η η−1 wixi(t) (1a) dt all previous input spikes up to current time. The temporal de- i pendency calls for temporal error propagation in the training. dx (t) τ i = −x (t) (1b) s dt i 3 Neuron and Synapse as IIR Filters xi(t) xi(t) + 1; upon receiving spike (1c) In practice, for computational efficiency, spiking neural net- v(t) vrest, if v(t) = Vth (1d) work are usually simulated in discrete time domain and network states are evaluated for every unit time.

Load more